I have processed a few public available RNA-seq data I downloaded from NCBI SRA. Most are ok. Except these two below:
GSM849855: Total RNA extracted from mock infected NIH-3T3 cells; Mus musculus; RNA-Seq
GSM849856: Total RNA extracted from MCMV infected NIH-3T3 cells; Mus musculus; RNA-Seq
I use tophat to map rna-seq data. Normally 4G raw data will produce an ~800M BAM file. But the above two data give 20-30M. I always use the same parameters! I have run the flagstat to analyse the BAM files and the reports are below:
1373321 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1373321 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
271331 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
271331 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
What could be wrong with these two data? Please help me out!
(The title doesn't mean these data were generated in my lab but mean I got these data from NCBI and I processed them.)
sorry for the inaccuracy
GSM849855: Total RNA extracted from mock infected NIH-3T3 cells; Mus musculus; RNA-Seq
GSM849856: Total RNA extracted from MCMV infected NIH-3T3 cells; Mus musculus; RNA-Seq
I use tophat to map rna-seq data. Normally 4G raw data will produce an ~800M BAM file. But the above two data give 20-30M. I always use the same parameters! I have run the flagstat to analyse the BAM files and the reports are below:
1373321 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
1373321 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
271331 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
271331 + 0 mapped (100.00%:-nan%)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (-nan%:-nan%)
0 + 0 with itself and mate mapped
0 + 0 singletons (-nan%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
What could be wrong with these two data? Please help me out!
(The title doesn't mean these data were generated in my lab but mean I got these data from NCBI and I processed them.)
sorry for the inaccuracy
Comment