Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • harryzs
    replied
    I checked unmapped.bam from TopHat 2.0.9

    samtools view -f 0x200 unmapped.bam | head

    I got:
    Code:
    HWI-7001436:48:C2ET1ACXX:5:1108:2968:28222	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDD@DDDDBBBDDDD	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1203:5292:62817	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCTCGTTACA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBB5&)0((+()+((++	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1312:13946:40878	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1203:5920:62936	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDD	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1312:14680:40864	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:2312:9415:35514	581	*	0	255	*	*	0	0	ATTAAAAAAAAAAAACTCCTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHIII<FHCHIIIIIIHDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1312:14593:40904	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCTCTCTTATAAAC	CCCFFFDFHGHHHIJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDBDD<9>&&+((((4(+(((((	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1108:4206:28028	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBBDDD<BDDDDDDDDDDDDDDDDDDDDDDD9	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1203:7475:62973	581	*	0	255	*	*	0	0	AGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBBDDDDDDDDBDDDDDDDBB@DDDDDDB@DBDB95&	ZT:A:L
    HWI-7001436:48:C2ET1ACXX:5:1108:4708:28068	581	*	0	255	*	*	0	0	AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA	CCCFFFFFHHHHHJJJHFDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDBDDDDDDDDDDDDDDDDDDDDDDDDD	ZT:A:L
    I think it makes sense removing these reads before alignment.

    Right??

    Another question:
    what is the meaning of "ZT:A:L"?
    Last edited by harryzs; 10-05-2013, 12:44 AM.

    Leave a comment:


  • NKAkers
    replied
    Not the answer to your question but...

    I can tell you that the 'discarded' reads end up in unmapped.bam.

    Hopefully future versions of tophat will allow for more user control/better documentation of the quality filtering.

    Leave a comment:


  • carmeyeii
    replied
    About how many might "too many" be?

    Leave a comment:


  • Daehwan Kim
    replied
    TopHat filter out some reads if they are of low complexity or include too many Ns.

    Leave a comment:


  • kreitinger
    replied
    I still haven't figured out why these reads are discarded. Since this step happens before alignment to the genome or GTF annotations, it has to be related to discarding low quality reads. I emailed [email protected] with this thread's link, so hopefully they respond.

    Leave a comment:


  • asperjelly
    replied
    I'm also using Tophat v2.0.6 and I also had this same question. I'm assuming it is removing reads that don't meet some quality threshold, but can't seem to find any documentation with the manual.

    Leave a comment:


  • AsoBioInfo
    replied
    Hey anyone of you got the answer as the same occurred with me also.

    Tophat version is v2.0.6. Previously using the old software and that was working fine.

    Leave a comment:


  • ROaj
    replied
    I am also VERY interested in this question/answer as I do quite a bit of quality trimming prior to mapping my reads and I've noticed the discarded reads being about 1-2% of my total read library.

    Leave a comment:


  • Tophat reads kept/discarded during initial conversion

    I am using Tophat to analyze illumina HiSeq2000 paired end read data. I have noticed that during the initial execution, Tophat1(and 2) "converts the reads" and then sorts the left reads into kept and discarded groups (e.g. 8,000,012 kept, 10,121 discarded) and does the same for the right reads (e.g. 7,804,000 kept, 206133 discarded). Since there are a different number of discarded reads, I'm assuming that "lone" mates are treated as single reads.

    My question is, how does tophat decide which reads to keep and discard and why? Are there some underlying QC filters?

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-13-2024, 08:58 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-12-2024, 02:20 PM
0 responses
20 views
0 likes
Last Post seqadmin  
Working...
X