Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • adarshjose
    Junior Member
    • Jul 2010
    • 6

    TopHat -paired end vs single end reads

    Hi,

    I was trying to map paired end Illumina GA IIE 85 bp reads to a reference genome using TopHat. When I tried to map both the pairs together only a small fraction (< 10 % ) of the reads mapped to the genome, but > 80 % of the reads mapped to the reference when I mapped the pairs separately.

    mapping using each paired end data separately:
    tophat -r 200 -o ./tophatr200 Ref/Zm.seq.uniq seqs__filtered_6_1.fastq
    tophat -r 200 -o ./tophatr200 Ref/Zm.seq.uniq seqs__filtered_6_2.fastq

    (> 80 % of reads mapped here.)

    mapping paired data together:
    tophat -r 200 -o ./tophatr200 Ref/Zm.seq.uniq seqs__filtered_6_1.fastq seqs__filtered_6_2.fastq

    (< 10 % of reads mapped here.)

    Has anyone seen this before ? Could this have something to do with the -r value ? Any suggestion will be greatly appreciated.

    Thanks

    Adarsh Jose
    Iowa State University
  • sphil
    Senior Member
    • Apr 2010
    • 192

    #2
    hey,

    probably the distance between your paired-ends is to high such that TopHat isn't able to map it accurate to the source sequence. This could result of a high standard deviation in the sample prep. of the reads you use (i.e. too large clone libraries).
    If you map the read on their own they could be mapped because the information of mate pairs doesn't really matter in such a case. Try to enlarge the possible gaps while using TopHat and review the results.

    Don't know if it really helps but i guess that this could be a reason.


    cheers

    phil

    Comment

    • arrchi
      Member
      • Mar 2011
      • 46

      #3
      Hi adarshjose,

      Did you solve your problem? I would be very interested in how you solved the discrepancy.

      -a

      Comment

      • arrchi
        Member
        • Mar 2011
        • 46

        #4
        Hi adarshjose,

        Did you solve your problem? I would be very interested in how you solved the discrepancy.

        -a

        Comment

        • jameslz
          Member
          • Nov 2009
          • 20

          #5
          The reads may be trimmed....

          Comment

          • anurag.gautam
            Member
            • Oct 2010
            • 15

            #6
            Hi ,
            I tried to map illumina ~2 million reads to Oryza sativa indica reference genome with its reference gtf file using different versions of Tophat 1.1.4, 1.3.0, 1.3.1, 1.3.2, 1.3.3 and the current one 1.4.1 .
            I used the defalut options just to check if the mapping statistics really gets affected. As a result, I got the following stats:
            Reads Used Reads Mapped
            Tophat1.1.4 2,000,000 2,27,554
            Tophat1.3.0 2,000,000 2,30,817
            Tophat1.3.1 2,000,000 2,31,935
            Tophat1.3.2 2,000,000 4,517
            Tophat1.3.3 2,000,000 2,31,935
            Tophat1.4.1 2,000,000 1,37,724

            I wanted to know why the number of reads mapped is varying in each version even though using the same data. Secondly, why there is a drastic change in the mapping stats with version 1.3.2 and 1.4.1 as compared with other versions? Can please anybody throw some light on this matter

            Comment

            • pbluescript
              Senior Member
              • Nov 2009
              • 224

              #7
              Originally posted by anurag.gautam View Post
              Hi ,
              I tried to map illumina ~2 million reads to Oryza sativa indica reference genome with its reference gtf file using different versions of Tophat 1.1.4, 1.3.0, 1.3.1, 1.3.2, 1.3.3 and the current one 1.4.1 .
              I used the defalut options just to check if the mapping statistics really gets affected. As a result, I got the following stats:
              Reads Used Reads Mapped
              Tophat1.1.4 2,000,000 2,27,554
              Tophat1.3.0 2,000,000 2,30,817
              Tophat1.3.1 2,000,000 2,31,935
              Tophat1.3.2 2,000,000 4,517
              Tophat1.3.3 2,000,000 2,31,935
              Tophat1.4.1 2,000,000 1,37,724

              I wanted to know why the number of reads mapped is varying in each version even though using the same data. Secondly, why there is a drastic change in the mapping stats with version 1.3.2 and 1.4.1 as compared with other versions? Can please anybody throw some light on this matter
              Could you fix your comma placement? I don't know how many alignments Tophat gave you. Does 2,27,554 mean 227,554?

              Comment

              • anurag.gautam
                Member
                • Oct 2010
                • 15

                #8
                Yes both are same
                Tophat1.1.4 2,000,000 227,554
                Tophat1.3.0 2,000,000 230,817
                Tophat1.3.1 2,000,000 231,935
                Tophat1.3.2 2,000,000 4,517
                Tophat1.3.3 2,000,000 231,935
                Tophat1.4.1 2,000,000 137,724

                Comment

                • pbluescript
                  Senior Member
                  • Nov 2009
                  • 224

                  #9
                  Originally posted by anurag.gautam View Post
                  Yes both are same
                  Tophat1.1.4 2,000,000 227,554
                  Tophat1.3.0 2,000,000 230,817
                  Tophat1.3.1 2,000,000 231,935
                  Tophat1.3.2 2,000,000 4,517
                  Tophat1.3.3 2,000,000 231,935
                  Tophat1.4.1 2,000,000 137,724

                  That's not a lot of mapped reads. Either something went wrong with the library prep, sequencing, or mapping method. How good is the reference genome for Oryza sativa indica?

                  Comment

                  • anurag.gautam
                    Member
                    • Oct 2010
                    • 15

                    #10
                    Reference genome of ORyza sativa indica is of good quality and has good coverage. The reads are also of higher quality. , But still the question remains the same , why different mapping stats?

                    Comment

                    • zun
                      Member
                      • Oct 2010
                      • 26

                      #11
                      hello anurag.gautam,

                      I also have used tophat series with same O.sativa reads since 2010,
                      but I haven't encountered the same situation as yours.
                      In fact the number of mapped reads varied a little, but not drastically like your case.....hmm I don't know the reason why, sorry...

                      > adarshjose
                      I had a same problem before, and realized that was because tophat abandoned the mate pairs which mapped on different chromosomes when uniting the left/right reads mapped by bowtie.
                      but tophat2 has a option called "--report-discordant-pair-alignment" which allows mate pairs to map to different chromosomes.
                      so you will get higher mapping rate with tophat2...
                      hope this will help you....

                      zun

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        New Genomics Tools and Methods Shared at AGBT 2025
                        by seqadmin


                        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                        The Headliner
                        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                        03-03-2025, 01:39 PM
                      • seqadmin
                        Investigating the Gut Microbiome Through Diet and Spatial Biology
                        by seqadmin




                        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                        02-24-2025, 06:31 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-20-2025, 05:03 AM
                      0 responses
                      18 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-19-2025, 07:27 AM
                      0 responses
                      21 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-18-2025, 12:50 PM
                      0 responses
                      19 views
                      0 reactions
                      Last Post seqadmin  
                      Started by seqadmin, 03-03-2025, 01:15 PM
                      0 responses
                      187 views
                      0 reactions
                      Last Post seqadmin  
                      Working...