Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues using tophat2 to align paired sequences

    I work on software supporting targeted-amplicon sequencing where the insert size is typically shorter than the read size. One of my tasks was to evaluate tophat2 as an aligner for some of our more difficult cases, but I'm getting a really lousy alignment rate (granted, this is NextSeq data that I'm looking at, but even that can't explain the low percentages.)

    Is there a way to debug this sort of thing, a log I can look at that will explain why a given read did not align? I get alignments with bwa, but tophat2 frequently will miss aligning R2. I suspect that my settings are not quite right, but I've varied the edit distance parameters as well as the inner distance metrics and nothing seems to help.

  • #2
    If your insert size is shorter than read length, you should adapter-trim the reads prior to mapping. Are your reads paired, and do you know the adapter sequence?

    Comment


    • #3
      The reads are paired, and the adapters have already been trimmed off.

      Comment


      • #4
        Possibly, the error rate in R2 is too high. Since this is NextSeq, the quality scores won't give very useful information about the error rates; you'll have to determine that empirically by mapping. You could try BBMap, which is more error-tolerant than Tophat2 or bwa, and allows you to plot the error-rate histogram across both read1 and read2 (with the "mhist" flag), which may be a useful analytic tool in this case. Posting the base frequency histogram across the reads may also be useful.

        Also, depending on the read lengths, you could consider error-correcting the data if you want to increase the mapping rates.

        Comment


        • #5
          I'm not sure why you would bother with tophat2 for amplicon sequencing. It has fixed parameters at various steps that you can't actually change from the command line (this is likely causing some of your difficulties). You'd probably be better off with a standard DNA aligner (BBmap, bwa, bowtie2, etc.).

          Also, I'm curious why you bothered using paired-end reads if the read lengths are normally longer than the insert. That would seem to just be a waste of reads.

          Comment


          • #6
            Can BBMap handle fusion/split alignments?

            Comment


            • #7
              It can handle very long deletions, but not arbitrary rearrangements like inter-chromosomal fusions. Or, to be more precise, it only produces one sam line per read, so a read with part on one chromosome and part on another will map to the chromosome from which it got the majority of bases (if the "local" flag is enabled). But if it is a fusion created by, say, skipping 100kbp within a chromosome, that will be reported entirely in a single alignment (as Tophat2 would). To find these you should set the "maxindel" flag, which defaults to 16000. Depending on read length, the sensitivity drops eventually; I don't recommend setting it much above 200000 for 100bp reads.

              Comment


              • #8
                Originally posted by dpryan View Post
                I'm not sure why you would bother with tophat2 for amplicon sequencing. It has fixed parameters at various steps that you can't actually change from the command line (this is likely causing some of your difficulties). You'd probably be better off with a standard DNA aligner (BBmap, bwa, bowtie2, etc.).

                Also, I'm curious why you bothered using paired-end reads if the read lengths are normally longer than the insert. That would seem to just be a waste of reads.
                We're performing targeted sequencing of an RNA sample that will cross exon-exon boundaries. Furthermore, we're looking for fusions in this data, which is why we were interested in tophat.

                As far as using paired-end reads, the read lengths are not always longer than the amplicon sequence. Our chemistry is unique in that the amplicons are not always the same length, but they are anchored at one end (Archer AMP).

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Working...
                X