I have (very complex) artifact levels up to 95% in my total RNAseq data.
Some facts:
- NEB Ultra 2 Library prep
- 96 samples per plate, multiple plates
- plates show a clear left to right column wise increase in artifact levels with each plate getting worse (prepped in sequence)
- the libraries with the highest rates of artifacts also produced the highest concentration of library and some had to be diluted to be able to quantify and pool them
- the library complexity and size of fragments is fine - no over representation of specific sequences
- representation within the pool is fine, each sample produces expected read numbers
- no alignment to other species / lab bacteria / vectors…
- no alignment of smaller pieces (36bp) of the reads
- very elevated GC content
here are my thoughts (bioinfo background, no hands on lab experience):
The strong pattern on the 96-well plate suggests a machine as the source of the problem to me. The machine may have gotten worse during the course of the prep (plate 1 is actually fine, last plate is largely unusable). The content of the library is largely not of transcriptions origin, which can only be introduced during cDNA generation, right?
-> could this be due to a faulty heat distribution of the PCR block which leads to very wild constructs generated by random priming? It seems super unlikely to me that this would generate the observed level of complexity.
I am very grateful for any pointers!
Some facts:
- NEB Ultra 2 Library prep
- 96 samples per plate, multiple plates
- plates show a clear left to right column wise increase in artifact levels with each plate getting worse (prepped in sequence)
- the libraries with the highest rates of artifacts also produced the highest concentration of library and some had to be diluted to be able to quantify and pool them
- the library complexity and size of fragments is fine - no over representation of specific sequences
- representation within the pool is fine, each sample produces expected read numbers
- no alignment to other species / lab bacteria / vectors…
- no alignment of smaller pieces (36bp) of the reads
- very elevated GC content
here are my thoughts (bioinfo background, no hands on lab experience):
The strong pattern on the 96-well plate suggests a machine as the source of the problem to me. The machine may have gotten worse during the course of the prep (plate 1 is actually fine, last plate is largely unusable). The content of the library is largely not of transcriptions origin, which can only be introduced during cDNA generation, right?
-> could this be due to a faulty heat distribution of the PCR block which leads to very wild constructs generated by random priming? It seems super unlikely to me that this would generate the observed level of complexity.
I am very grateful for any pointers!