Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat --prefilter-multihits parameter

    I am trying to understand TopHat's --prefilter-multihits parameter. According to the documentation:
    When mapping reads on the transcriptome, some repetitive or low complexity reads that would be discarded in the context of the genome may appear to align to the transcript sequences and thus may end up reported as mapped to those genes only. This option directs TopHat to first align the reads to the whole genome in order to determine and exclude such multi-mapped reads (according to the value of the -g/--max-multihits option).
    I ran TopHat 1.4.1 (last version before 2) and 2.0.9 with just --GTF parameter on the same sequences. TopHat 2.0.9 mapped more reads, but both versions ended up with about 20% of bases as intronic or intergenic. When I add --prefilter-multihits, TopHat 1.4.1 produces very similar results (~1% less mapped reads), which seems very reasonable to me. However, with TopHat 2.0.9, I lose over half the reads. Seems like a lot, but maybe it's possible they are all multi-mapped. More importantly, less than 1% of aligned reads are now intergenic or intronic.

    Two questions:
    1) Why such a huge difference in behavior between the two versions? As far as I can tell, this option was not altered for version 2.
    2) Why does this parameter eliminate essentially all reads outside the transcriptome for TopHat 2.0.9?

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
23 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 09:45 AM
0 responses
198 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-03-2024, 08:54 AM
0 responses
209 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-02-2024, 03:00 PM
0 responses
191 views
0 likes
Last Post seqadmin  
Working...
X