We've completed several bacterial RNA-seq experiments but have many unmapped reads leftover and I am not sure why.
Experimental details:
- RNA was isolated using a TRIzol protocol and 1-2 rounds of MICROBExpress was used to deplete the rRNA
- All samples were checked using an Agilent Bioanalyzer for quality and rRNA depletion
Sequencing details:
- All library prep and sequencing was performed by a core facility. Samples were individually barcoded and sequenced (50bp single-end) on a single SOLiD 4 run
Mapping outcome
- We're currently using bowtie to map reads in colorspace against a reference. For a pure culture for which we have a complete genome to map to we get approx. 20-50% reads mapped.
What I have tried:
- Trimming reads (tried removing up to 15 bases) - only 1-3% more mappings
- SAET to correct colorspace - worse mapping
- Other mappers (cufflinks and SHRiMP that I can remember) - No difference in mapping or lower mapping
- Transcript assembly (Velvet + Oases) - N50 was ridiculously low and most of the "large" contigs matched anything in the NCBI nrdb
- converting to basespace for mapping (I think I tried a script that came with MAQ) - miserable failure
A side issue:
- The bowtie SAM output will print out the basespace of reads that have no hits to the reference. How does bowtie get this sequence if there is no matching reference? I've examined these unmapped reads and they don't match anything in the NCBI database.
The only thing I can think of now is that the library preps were bad?
Thanks for any thoughts and input!
Experimental details:
- RNA was isolated using a TRIzol protocol and 1-2 rounds of MICROBExpress was used to deplete the rRNA
- All samples were checked using an Agilent Bioanalyzer for quality and rRNA depletion
Sequencing details:
- All library prep and sequencing was performed by a core facility. Samples were individually barcoded and sequenced (50bp single-end) on a single SOLiD 4 run
Mapping outcome
- We're currently using bowtie to map reads in colorspace against a reference. For a pure culture for which we have a complete genome to map to we get approx. 20-50% reads mapped.
What I have tried:
- Trimming reads (tried removing up to 15 bases) - only 1-3% more mappings
- SAET to correct colorspace - worse mapping
- Other mappers (cufflinks and SHRiMP that I can remember) - No difference in mapping or lower mapping
- Transcript assembly (Velvet + Oases) - N50 was ridiculously low and most of the "large" contigs matched anything in the NCBI nrdb
- converting to basespace for mapping (I think I tried a script that came with MAQ) - miserable failure
A side issue:
- The bowtie SAM output will print out the basespace of reads that have no hits to the reference. How does bowtie get this sequence if there is no matching reference? I've examined these unmapped reads and they don't match anything in the NCBI database.
The only thing I can think of now is that the library preps were bad?
Thanks for any thoughts and input!
Comment