I would like to know how to assemble human refefence genome GRCh37 from individual chromosome files and latest patches.
This is ensembl's ftp site which lists >300 fasta files.
ftp://ftp.ensembl.org/pub/release-67...o_sapiens/dna/
For primary assembly, one might simply concatenate chromosome 1, 2, ..., X, and Y. However, X and Y chromosomes share pseudoautosomal region (PAR) as README points out. I could just leave out Y chromosome since I work on K562 cells, but otherwise what am I supposed to do? There is a big file called toplevel.fa which appears to have PAR sequenes masked, but does it contain all chromosomes and patches? README does not say anything about its content.
There seem to be two kinds of patches: fixes and novel additions. How are these patches correctly incorporated into the primary assembly? Is there a utility software to handle this? Or are patches treated as separate entities (e.g. as PATCH_xxx instead of being integrated into chromosome proper)?
Thank you for your very kind help.
This is ensembl's ftp site which lists >300 fasta files.
ftp://ftp.ensembl.org/pub/release-67...o_sapiens/dna/
For primary assembly, one might simply concatenate chromosome 1, 2, ..., X, and Y. However, X and Y chromosomes share pseudoautosomal region (PAR) as README points out. I could just leave out Y chromosome since I work on K562 cells, but otherwise what am I supposed to do? There is a big file called toplevel.fa which appears to have PAR sequenes masked, but does it contain all chromosomes and patches? README does not say anything about its content.
There seem to be two kinds of patches: fixes and novel additions. How are these patches correctly incorporated into the primary assembly? Is there a utility software to handle this? Or are patches treated as separate entities (e.g. as PATCH_xxx instead of being integrated into chromosome proper)?
Thank you for your very kind help.