Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Studentlost
    Member
    • Oct 2014
    • 28

    Getting much higher coverage with bowtie2 than tophat2

    Hello,


    I ran an analysis on paired end reads through tophat2 using:
    tophat2 -p12 -o <tophat_dir> --no-coverage-search <reference genome> R1.fq R2.fq
    and the results gave 1.2% coverage.

    I ran the same data through bowtie2
    bowtie2 -x <index> -1 <R1.fq> -2 <R2.fq> -S <output.sam> and got a 42.75% overall alignment rate.

    Why such a big discrepancy? I tried --coverage-search as well and got the same results.

    I checked the tophat run.log and it's putting this into bowtie:
    bowtie2 -k 20 -D 15 -R 2 -N 0 -L 20 -i S,1,1.25 --gbar 4 --mp 6,2 --np 1 --rdg 5,3 --rfg 5,3 --score-min C,-14,0 -p 12 --sam-no-hd -x

    Any idea what's going on?

    P.S. I've been getting a silent error in my tophat.log
    bam2fastx: /usr/lib64/libz.so.1: no version information available
    and
    fix_map_ordering: /usr/lib64/libz.so.1: no version information available
    I do have libz.so.1.2.3

    The process still runs fine and the bam file output can be used for differential analysis... I just have terrible coverage. Any ideas?
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    libz.so.1 is a warning so your observation is unrelated (see #7 in http://seqanswers.com/forums/showthread.php?t=39873).

    Comment

    • Studentlost
      Member
      • Oct 2014
      • 28

      #3
      I'm sorry but to which #7 are you referring to? I didn't see any relevance in that thread?

      I'm just curious why tophat is missing 40% of the alignment that bowtie2 is finding.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Since you had noted the silent errors in your post I was only pointing out that they are warnings and are not related to the difference you are seeing.

        Have you tried to run bowtie2 with the same parameters as TopHat? Since bowtie2 in tophat is being run with different parameters it is not surprising that the result there is different.

        You can try BBMap as an alternative to tophat.

        Comment

        • Studentlost
          Member
          • Oct 2014
          • 28

          #5
          The interesting thing is that I took a look at bowtie 2's defaults and tophat was pretty much on point with running them.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            But the command options you included for bowtie2 example that you ran directly are not the same as what TopHat used.

            The parameters used in TopHat are the defaults for bowtie2 is what you were saying. My apologies.
            Last edited by GenoMax; 10-24-2014, 05:58 PM.

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Have you done QC with these reads? Have they been trimmed in parallel (R1 and R2)?

              Comment

              • Studentlost
                Member
                • Oct 2014
                • 28

                #8
                Yeah I did QC before and after checking for adapter contamination and trimming using scythe and sickle, respectively.

                This is really puzzling me. There's no reason for tophat to give me different results than bowtie2 would...

                Comment

                • colindaven
                  Senior Member
                  • Oct 2008
                  • 417

                  #9
                  Tophat can be quite sensitive to a few parameters especially insert size in my experience. That wouldn't account for the discrepancy here however.

                  I would try another aligner - and can highly recommend STAR for speed and accuracy.

                  Comment

                  • Studentlost
                    Member
                    • Oct 2014
                    • 28

                    #10
                    There's no reason Tophat should be failing like this though. Any idea what parameters I can try to change in Tophat to fix the issue?

                    Comment

                    • sdriscoll
                      I like code
                      • Sep 2009
                      • 436

                      #11
                      To cut down on the possible complexity of what is going wrong, try aligning only one of the two mate files with Tophat (i.e. as a single-end alignment) and see if Tophat manages to align more data.

                      Also..how long are your reads and what are you aligning to?
                      /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                      Salk Institute for Biological Studies, La Jolla, CA, USA */

                      Comment

                      • Studentlost
                        Member
                        • Oct 2014
                        • 28

                        #12
                        My reads average around 1,000,000 base pairs. I aligned them to the Mmul_1 Rhesus build from Ensembl. I also tried the resMac3 build to compare. I tried with and without a transcriptome index and with and without a reference GTF file. Nothing made a difference. This is blowing my mind.

                        I can align the paired ends with bowtie2 and I get ~40-60% per sample, but with tophat I get between 0.5% - 4 % per sample.

                        I was told by another lab working on this that they were able to get the alignment I got with bowtie2 using gsnap. It doesn't make any sense to me why Tophat is the only tool doing this. That means it's unreliable for other alignments in my mind and that bothers me a lot.

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          You are not using original reads? You have reads/contigs that average a megabase each?

                          TopHat is designed for reads that are a kb or shorter.

                          Comment

                          • Studentlost
                            Member
                            • Oct 2014
                            • 28

                            #14
                            Wait I'm sorry, I meant my total reads for each strand are at 1 MB. Each read per file is 251.

                            Comment

                            • sdriscoll
                              I like code
                              • Sep 2009
                              • 436

                              #15
                              Yeah, did I read that right? 1,000,000 base paired-end reads? No wonder bowtie2 returns a different result. Are you able to run the alignments with the original RNA-seq reads whatever they were (i.e. PE 100 or whatever)? Then you'll see Tophat actually function.
                              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                              Salk Institute for Biological Studies, La Jolla, CA, USA */

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                07-01-2026, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:08 AM
                              0 responses
                              6 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              53 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...