Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat Paired Ends and trimming - A warning

    Only 0.5% of my reads were correctly paired with Tophat. And a huge amount were mapping to a different chromosome.

    I discovered the problem only occurred when I trimmed the reads first. (?)

    I then dug into Bowtie, and found out that it expects paired end reads to be in the same order of the input files. Thus if you remove a read from R1 but not R2, everything from that point in the file will be wrong.

    As far as I knew, fastq records were matched up by read ID NOT by position in the file. This requirement was NOT specified on the bowtie page[1]. There is NO warning that this is happening.

    I haven't seen this discussed before - so.... a warning: Don't do a quality trim w/discard on paired end data (unless you process the files in sync and throw a bad read in either out in both files so they remain in sync)

    I understand why they do this from an algorithm point of view, but it would be good to make the requirement explicit.

    Or, it would be pretty easy to test for this. In PairedDualPatternSource::nextReadPair() check that the read IDs are the same before return true.

    [1] Actually, I was looking on Bowtie1 page. Bowtie 2 says:

    "Pairs are often stored in a pair of files, one file containing the mate 1s and the other containing the mates 2s. The first mate in the file for mate 1 forms a pair with the first mate in the file for mate 2, the second with the second, and so on."

    So I guess it does say that... however it could be more explicit ("first fastq record in the file"), or do the read ID comparison test.
    Last edited by dlawrence; 08-15-2013, 01:46 AM.

  • #2
    If you have Illumina data, you can use Trimmomatic to trim your reads, it will remove unpaired reads and put them in separate files, one file for unpaired R1 reads, and another one for unpaired R2 reads. See

    Comment


    • #3
      I also experienced this problem a while back. It is incredibly sloppy for them to have neglected this from my point of view. I use trimmomatic and have found it better on all fronts!

      N

      Comment


      • #4
        Well, the read IDs aren't always identical. They sometimes have /1 or _1 or other such things that differentiate them. As a rule, all aligners assume that paired-end reads in fastq files are in sync. One could write an aligner that could deal with out of sync reads, but its first step would be to sync them up, which is a silly thing for an aligner to have to do since the original files were in sync to begin with.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 07:03 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-10-2024, 06:35 AM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-09-2024, 02:46 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Working...
        X