We are using a slightly modified version of the following protocol to generate strand-specific RNAseq libraries for Arabidopsis thaliana:
It is a dUTP method with a UDG digestion of the 2nd strand before PCR amplification.
We ran a test lane of 24 multiplexed libraries, and everything looked OK (~90% read pairs mapped with tophat2 using fr-firststrand). But RSeQC's infer_experiment.py function finds only ~ 60% of reads are oriented the right way. Here's an example output.
This is PairEnd Data
Fraction of reads failed to determine: 0.0073
Fraction of reads explained by "1++,1--,2+-,2-+": 0.3922
Fraction of reads explained by "1+-,1-+,2++,2--": 0.6004
The statistics are similar for the other libraries.
Is there a likely place where our protocol may have gone wrong and lost the strand specificity?
Our protocol has a couple modifications:
-we are using Superscript III instead of Superscript II
-we are using the inline barcodes from this protocol: Kumar et al 2012
Thanks.
It is a dUTP method with a UDG digestion of the 2nd strand before PCR amplification.
We ran a test lane of 24 multiplexed libraries, and everything looked OK (~90% read pairs mapped with tophat2 using fr-firststrand). But RSeQC's infer_experiment.py function finds only ~ 60% of reads are oriented the right way. Here's an example output.
This is PairEnd Data
Fraction of reads failed to determine: 0.0073
Fraction of reads explained by "1++,1--,2+-,2-+": 0.3922
Fraction of reads explained by "1+-,1-+,2++,2--": 0.6004
The statistics are similar for the other libraries.
Is there a likely place where our protocol may have gone wrong and lost the strand specificity?
Our protocol has a couple modifications:
-we are using Superscript III instead of Superscript II
-we are using the inline barcodes from this protocol: Kumar et al 2012
Thanks.
Comment