I'm working on a bacterial data set that I was having difficulty assembling.
Illumina. 300 bp reads. Pair End Data. Nextera library prep.
The FastQC per-base-sequence-content chart (attached) shows high sequence content bias in the first 15-20 positions. Initially, I thought it was adapter contamination and tried to use a variety of trimming tools (trimmomatic, others) to remove what I thought were adapters. I found a blog here: (https://www.instapaper.com/read/496731324), that suggests this is a library problem due to Nextera kits.
After running the data through trimmomatic, I used the paired data (ignored the data from the unpaired data sets for the time being) and then artificially trimmed off the first 20 positions from the subset of data that was showing the sequence bias. I was finally able to get a reasonable assembly.
Questions:
1) Does the sequence bias in the first 20 bases point to a problem with the library prep? Or is this typical with the Nextera/nothing to worry about?
2) For DeNovo assembly, is it necessary to trim off the first ~20 bases? Is there a recommended tool/process? (rather than just arbitrarily clipping the first 20 bases)?
3) I noticed Trimmomatic separates the reads into reads that are and are not paired. For DeNovo Assembly, is there any reason NOT to include the unpaired data?
Thanks in advance
Illumina. 300 bp reads. Pair End Data. Nextera library prep.
The FastQC per-base-sequence-content chart (attached) shows high sequence content bias in the first 15-20 positions. Initially, I thought it was adapter contamination and tried to use a variety of trimming tools (trimmomatic, others) to remove what I thought were adapters. I found a blog here: (https://www.instapaper.com/read/496731324), that suggests this is a library problem due to Nextera kits.
After running the data through trimmomatic, I used the paired data (ignored the data from the unpaired data sets for the time being) and then artificially trimmed off the first 20 positions from the subset of data that was showing the sequence bias. I was finally able to get a reasonable assembly.
Questions:
1) Does the sequence bias in the first 20 bases point to a problem with the library prep? Or is this typical with the Nextera/nothing to worry about?
2) For DeNovo assembly, is it necessary to trim off the first ~20 bases? Is there a recommended tool/process? (rather than just arbitrarily clipping the first 20 bases)?
3) I noticed Trimmomatic separates the reads into reads that are and are not paired. For DeNovo Assembly, is there any reason NOT to include the unpaired data?
Thanks in advance
Comment