Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat reads kept/discarded during initial conversion

    I am using Tophat to analyze illumina HiSeq2000 paired end read data. I have noticed that during the initial execution, Tophat1(and 2) "converts the reads" and then sorts the left reads into kept and discarded groups (e.g. 8,000,012 kept, 10,121 discarded) and does the same for the right reads (e.g. 7,804,000 kept, 206133 discarded). Since there are a different number of discarded reads, I'm assuming that "lone" mates are treated as single reads.

    My question is, how does tophat decide which reads to keep and discard and why? Are there some underlying QC filters?

  • #2
    I am also VERY interested in this question/answer as I do quite a bit of quality trimming prior to mapping my reads and I've noticed the discarded reads being about 1-2% of my total read library.

    Comment


    • #3
      Hey anyone of you got the answer as the same occurred with me also.

      Tophat version is v2.0.6. Previously using the old software and that was working fine.

      Comment


      • #4
        I'm also using Tophat v2.0.6 and I also had this same question. I'm assuming it is removing reads that don't meet some quality threshold, but can't seem to find any documentation with the manual.

        Comment


        • #5
          I still haven't figured out why these reads are discarded. Since this step happens before alignment to the genome or GTF annotations, it has to be related to discarding low quality reads. I emailed [email protected] with this thread's link, so hopefully they respond.

          Comment


          • #6
            TopHat filter out some reads if they are of low complexity or include too many Ns.

            Comment


            • #7
              About how many might "too many" be?

              Comment


              • #8
                Not the answer to your question but...

                I can tell you that the 'discarded' reads end up in unmapped.bam.

                Hopefully future versions of tophat will allow for more user control/better documentation of the quality filtering.

                Comment


                • #9
                  I checked unmapped.bam from TopHat 2.0.9

                  samtools view -f 0x200 unmapped.bam | head

                  I got:
                  Code:
                  HWI-7001436:48:C2ET1ACXX:5:1108:2968:28222	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDD@DDDDBBBDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1203:5292:62817	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCTCGTTACA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBB5&)0((+()+((++	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1312:13946:40878	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1203:5920:62936	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1312:14680:40864	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:2312:9415:35514	581	*	0	255	*	*	0	0	ATTAAAAAAAAAAAACTCCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHIII<FHCHIIIIIIHDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1312:14593:40904	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCTCTCTTATAAAC	CCCFFFDFHGHHHIJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDBDD<9>&&+((((4(+(((((	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1108:4206:28028	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBBDDD<BDDDDDDDDDDDDDDDDDDDDDDD9	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1203:7475:62973	581	*	0	255	*	*	0	0	AGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDDDDBDDDDDDDBB@DDDDDDB@DBDB95&	ZT:A:L
                  HWI-7001436:48:C2ET1ACXX:5:1108:4708:28068	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
                  I think it makes sense removing these reads before alignment.

                  Right??

                  Another question:
                  what is the meaning of "ZT:A:L"?
                  Last edited by harryzs; 10-05-2013, 12:44 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Recent Advances in Sequencing Analysis Tools
                    by seqadmin


                    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                    05-06-2024, 07:48 AM
                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 05-14-2024, 07:03 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-10-2024, 06:35 AM
                  0 responses
                  46 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-09-2024, 02:46 PM
                  0 responses
                  59 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 05-07-2024, 06:57 AM
                  0 responses
                  47 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X