Dear all,
I downloaded full reference fasta genomes for hg19, mm9 and mm10 from gencode. They include reference chromosomes, scaffolds, assembly patches and haplotypes.
For a specific purpose, I want to remove haplotypes.
I did not manage to find a description of the nomenclature used for the contigs in fasta file.
In the hg19 reference genome I use, there are 297 contigs. One can see for example:
I would like to know how many of them are haplotypes and which ones are they.
Is there a way to get this information?
Thank you in advance for your help.
I downloaded full reference fasta genomes for hg19, mm9 and mm10 from gencode. They include reference chromosomes, scaffolds, assembly patches and haplotypes.
For a specific purpose, I want to remove haplotypes.
I did not manage to find a description of the nomenclature used for the contigs in fasta file.
In the hg19 reference genome I use, there are 297 contigs. One can see for example:
>chr1 1
>chr2 2
...
>chrM MT
>GL877870.2 HG1007_PATCH
>GL877872.1 HG1032_PATCH
...
>GL383545.1 HSCHR10_1_CTG2 <- is it an "unlocalized sequence" ?
>GL383546.1 HSCHR10_1_CTG5
...
>GL000191.1 GL000191.1
>hg19GL000192.1 GL000192.1
>chr2 2
...
>chrM MT
>GL877870.2 HG1007_PATCH
>GL877872.1 HG1032_PATCH
...
>GL383545.1 HSCHR10_1_CTG2 <- is it an "unlocalized sequence" ?
>GL383546.1 HSCHR10_1_CTG5
...
>GL000191.1 GL000191.1
>hg19GL000192.1 GL000192.1
Is there a way to get this information?
Thank you in advance for your help.
Comment