Seqanswers Leaderboard Ad

**GenoMax** · 11-08-2012, 06:25 AM

Unless you have a specific need why not download the per-created index files that you can find at the iGenomes site: http://cufflinks.cbcb.umd.edu/igenomes.html

**chenjy** · 11-08-2012, 06:29 AM

Originally posted by GenoMax View Post

Unless you have a specific need why not download the per-created index files that you can find at the iGenomes site: http://cufflinks.cbcb.umd.edu/igenomes.html

I am just wondering how much the difference between the alignment results will be by including/excluding those abnormal sequences when building the index.

**GenoMax** · 11-08-2012, 06:51 AM

Originally posted by chenjy View Post

I am just wondering how much the difference between the alignment results will be by including/excluding those abnormal sequences when building the index.

Calling these sequences "Abnormal" may be a bit extreme. These sequences have just not been confidently assigned a location though they are part of the human chromosome/genome.

UCSC has an explanation here: http://genome.ucsc.edu/FAQ/FAQdownloads#download10

If you are only interested in confidently assigned sequences from the genome build then you can exclude them from your analysis.

**chenjy** · 11-08-2012, 06:57 AM

Originally posted by GenoMax View Post

Calling these sequences "Abnormal" may be a bit extreme. These sequences have just not been condifently assigned a location though they are part of the human chromosome/genome.

UCSC has an explanation here: http://genome.ucsc.edu/FAQ/FAQdownloads#download10

If you are only interested in confidently assigned sequences from the genome build then you can exclude them from your analysis.

yes, they are actually not abnormal sequences. thanks!

**idonaldson** · 11-09-2012, 07:26 AM

If I were creating a new index for a hg19 I would include all sequences, apart from the alternative haplotypes (*hap* files). I would exclude them because the sequences will be highly redundant, which will make mapping reads to those areas problematic particularly if uniquely mapping reads are required.

In general my understanding is that all available sequence should be included in the reference (apart from the caveat above). This is principaly because if your DNA sample produces reads that are present in the unassembled contigs they may erroniously map to the wrong place if the correct sequence is not present in your reference.

If anyone disagrees with this summary i would be very interested in their opinion.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Question about indexing the human genome

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News