Hey all,
Recently, our lab is generating RNA-Seq and ChIP-Seq libraries that contain very high amounts of reads containing a poly(AT) sequence. Up to 99% of some libraries (ranging from 10 to 100 million reads) can be TA repeats. It became so much of a problem in the past weeks that we completely shut down our sequence facility and first have to find out where these reads come from.
There is no apparent pattern in the samples that are contaminated. They come from different species, use different protocols, have different starting amounts, and the samples generated in parallel are randomly contaminated.
An example of poly(TA) reads after the adapter sequence is trimmed off:
@NS500173:204:HHKCTBGXY:1:11101:21054:1060 1:N:0:CAAGAC
TATATNTATATATATATATATATATAAATATAAATATATATAA
+
AAAAA#AEEEEEEEEEAE/A///A<E///E/E///E6E/A/A/
@NS500173:204:HHKCTBGXY:1:11101:10020:1060 1:N:0:CAAGAC
ATATANATATATATATATATATATATATAGATATATATATAGA
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEE/6
@NS500173:204:HHKCTBGXY:1:11101:12697:1060 1:N:0:CAAGAC
ATATANATATATATATATATATATATATATATATATATATATA
+
For sample preparation we typically start with 2 ng of cDNA or ChIPed DNA, perform A tailing, adapter ligation with 28 nM Illumina adapters from BioScientific (adapters 1 to 48) and perform a clean up with 0.8X Ampure beads. The ligated DNA is then amplified with the KAPA HiFi Hotstart ReadyMix in 10 PCR cycles annealing at 60 degrees Celsius. PCR purification is done with a QIAquick MinElute column and the correct library size (typically 300 bp) is selected with an E-Gel iBase power system. As quality control we check the correct library size with an Bioanalyzer and expected expression / enrichment is checked with qPCR. If these checks are OK, we sequence 50 bp paired end on a NextSeq sequencer.
Did anyone ever see this problem before or does someone have an idea how this could've happened?
Thanks in advance!
Rik
Recently, our lab is generating RNA-Seq and ChIP-Seq libraries that contain very high amounts of reads containing a poly(AT) sequence. Up to 99% of some libraries (ranging from 10 to 100 million reads) can be TA repeats. It became so much of a problem in the past weeks that we completely shut down our sequence facility and first have to find out where these reads come from.
There is no apparent pattern in the samples that are contaminated. They come from different species, use different protocols, have different starting amounts, and the samples generated in parallel are randomly contaminated.
An example of poly(TA) reads after the adapter sequence is trimmed off:
@NS500173:204:HHKCTBGXY:1:11101:21054:1060 1:N:0:CAAGAC
TATATNTATATATATATATATATATAAATATAAATATATATAA
+
AAAAA#AEEEEEEEEEAE/A///A<E///E/E///E6E/A/A/
@NS500173:204:HHKCTBGXY:1:11101:10020:1060 1:N:0:CAAGAC
ATATANATATATATATATATATATATATAGATATATATATAGA
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEE/6
@NS500173:204:HHKCTBGXY:1:11101:12697:1060 1:N:0:CAAGAC
ATATANATATATATATATATATATATATATATATATATATATA
+
For sample preparation we typically start with 2 ng of cDNA or ChIPed DNA, perform A tailing, adapter ligation with 28 nM Illumina adapters from BioScientific (adapters 1 to 48) and perform a clean up with 0.8X Ampure beads. The ligated DNA is then amplified with the KAPA HiFi Hotstart ReadyMix in 10 PCR cycles annealing at 60 degrees Celsius. PCR purification is done with a QIAquick MinElute column and the correct library size (typically 300 bp) is selected with an E-Gel iBase power system. As quality control we check the correct library size with an Bioanalyzer and expected expression / enrichment is checked with qPCR. If these checks are OK, we sequence 50 bp paired end on a NextSeq sequencer.
Did anyone ever see this problem before or does someone have an idea how this could've happened?
Thanks in advance!
Rik