Hi all,
I've been using bam2fastq on my tophat output and it's been great, runs really quickly, except for the number of reads being discarded. For example this was for one of my output files from tophat
This looks like paired data from lane 239.
Output will be in x_1.fastq and x_2.fastq
60465861 sequences in the BAM file
60465861 sequences exported
WARNING: 6585459 reads could not be matched to a mate and were not exported
That's 10% of the reads being discarded, and in other files it's even more (I ran it on another file just now and 17% of the reads were discarded). What I don't understand is that the PE files which were put into tophat were quality filtered with software to directly handle PEs and so both files have the same number of reads and all the reads have mates and both PE files are the same order (tophat freaks out otherwise) so why is bam2fastq discarding these reads? If any reads didn't have a mate then tophat would have returned an error.
I've been using bam2fastq on my tophat output and it's been great, runs really quickly, except for the number of reads being discarded. For example this was for one of my output files from tophat
This looks like paired data from lane 239.
Output will be in x_1.fastq and x_2.fastq
60465861 sequences in the BAM file
60465861 sequences exported
WARNING: 6585459 reads could not be matched to a mate and were not exported
That's 10% of the reads being discarded, and in other files it's even more (I ran it on another file just now and 17% of the reads were discarded). What I don't understand is that the PE files which were put into tophat were quality filtered with software to directly handle PEs and so both files have the same number of reads and all the reads have mates and both PE files are the same order (tophat freaks out otherwise) so why is bam2fastq discarding these reads? If any reads didn't have a mate then tophat would have returned an error.
Comment