Hello
We analyse Solid 5500 RNA seq reads. I used tophat2 for the alignment. We first performed the analysis ONLY on 2 chr in order to have a quick reply (that's why our number of reads aligned - see below - is very low)
We noticed some reads were "missing" at the end of the pipeline and we're wondering why.
These are .info files
::::::::::::::
left_kept_reads.info
::::::::::::::
min_read_len=50
max_read_len=50
reads_in =25357934
reads_out=25319876
::::::::::::::
right_kept_reads.info
::::::::::::::
min_read_len=35
max_read_len=35
reads_in =25357934
reads_out=25283245
and samtools flagstat done on accepted_hits.bam
1552361 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1552361 + 0 mapped (100.00%:-nan%)
1552361 + 0 paired in sequencing
769043 + 0 read1
783318 + 0 read2
549840 + 0 properly paired (35.42%:-nan%)
661184 + 0 with itself and mate mapped
891177 + 0 singletons (57.41%:-nan%)
5522 + 0 with mate mapped to a different chr
5522 + 0 with mate mapped to a different chr (mapQ>=5)
and 15435015 reads are in the unmapped.bam files
--> so that we've got 15.435.015 unmapped + 1.552.361 mapped ~ 17.000.000 reads have been analysed.
--> we had 25.357.934 + 25.357.934 reads_in to analyse ~ 50.000.000 reads were available for the analysis.
We're wondering where are the other reads. We expected summing the number of reads in accepted + unmapped bam files would lead to the number of reads_in but it's not the cas. If you have any explanation may I ask you to help us please?
Thanks a lot for your time
We analyse Solid 5500 RNA seq reads. I used tophat2 for the alignment. We first performed the analysis ONLY on 2 chr in order to have a quick reply (that's why our number of reads aligned - see below - is very low)
We noticed some reads were "missing" at the end of the pipeline and we're wondering why.
These are .info files
::::::::::::::
left_kept_reads.info
::::::::::::::
min_read_len=50
max_read_len=50
reads_in =25357934
reads_out=25319876
::::::::::::::
right_kept_reads.info
::::::::::::::
min_read_len=35
max_read_len=35
reads_in =25357934
reads_out=25283245
and samtools flagstat done on accepted_hits.bam
1552361 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1552361 + 0 mapped (100.00%:-nan%)
1552361 + 0 paired in sequencing
769043 + 0 read1
783318 + 0 read2
549840 + 0 properly paired (35.42%:-nan%)
661184 + 0 with itself and mate mapped
891177 + 0 singletons (57.41%:-nan%)
5522 + 0 with mate mapped to a different chr
5522 + 0 with mate mapped to a different chr (mapQ>=5)
and 15435015 reads are in the unmapped.bam files
--> so that we've got 15.435.015 unmapped + 1.552.361 mapped ~ 17.000.000 reads have been analysed.
--> we had 25.357.934 + 25.357.934 reads_in to analyse ~ 50.000.000 reads were available for the analysis.
We're wondering where are the other reads. We expected summing the number of reads in accepted + unmapped bam files would lead to the number of reads_in but it's not the cas. If you have any explanation may I ask you to help us please?
Thanks a lot for your time
Comment