Hello,
I have been looking at the alignment of RNAseq reads (Illumina) from a library which preserved both PolyA + and PolyA - transcripts. As expected, a majority (~80%) of the reads appear to be from rRNA (18S, 28S) fragments. In mapping these reads to these rRNA sequences (18S is around 1800 bp, 28S is 5500bp), I obtain extremely uneven distribution of the reads. This uneven distribution takes the form of some relatively large regions where there are very few reads compared to other regions where there are many. Additionally, in terms of the exact mapping - even in regions where there are large numbers of reads, the reads are not evenly distributed (or at least a semi Poisson distribution), but rather many reads pile up at a specific bp site, which might have 10X the number of aligns as a neighboring bp.
The overall unevenness I can perhaps understand (degradation?), but the more local drastic peaks and valleys I find more difficult to explain. Some possibilities appear to be sequencing bias (GC bias), or differential PCR amplification. Any ideas from users with more experience than myself would be greatly appreciated.
Also, if anyone is aware of any human sequencing data (publicly available) where the PolyA Minus fraction has been maintained - which I can look at for comparison - this would be very helpful.
Thanks for any ideas.
I have been looking at the alignment of RNAseq reads (Illumina) from a library which preserved both PolyA + and PolyA - transcripts. As expected, a majority (~80%) of the reads appear to be from rRNA (18S, 28S) fragments. In mapping these reads to these rRNA sequences (18S is around 1800 bp, 28S is 5500bp), I obtain extremely uneven distribution of the reads. This uneven distribution takes the form of some relatively large regions where there are very few reads compared to other regions where there are many. Additionally, in terms of the exact mapping - even in regions where there are large numbers of reads, the reads are not evenly distributed (or at least a semi Poisson distribution), but rather many reads pile up at a specific bp site, which might have 10X the number of aligns as a neighboring bp.
The overall unevenness I can perhaps understand (degradation?), but the more local drastic peaks and valleys I find more difficult to explain. Some possibilities appear to be sequencing bias (GC bias), or differential PCR amplification. Any ideas from users with more experience than myself would be greatly appreciated.
Also, if anyone is aware of any human sequencing data (publicly available) where the PolyA Minus fraction has been maintained - which I can look at for comparison - this would be very helpful.
Thanks for any ideas.
Comment