Seqanswers Leaderboard Ad

**Brian Bushnell** · 07-16-2014, 12:21 PM

Nextera has highly nonuniform first ~20bp, but it's neither adapter sequence nor errors; just a fragmentation site bias. You don't need to trim it. If you did trim it, though, the only way would be to trim the first X bases.

For assembly, if you use a pair-aware assembler and have sufficient data, it's best to assemble from paired reads. Some assemblers allow you to specify both paired and unpaired reads in the same assembly, in which case you could use both. But if the assembler only allows you to give it paired OR unpaired reads, it's probably best to give it the paired reads only, rather than mixing all the reads together, which would require you running the data as unpaired. There is no strict answer that will be correct for all assemblers, as they make use of pairing data differently, or possibly not at all.

**dave1** · 07-17-2014, 07:55 AM

Thanks for your help Brian.

Your feedback that it isn't necessary to trim the first 15-20 bases due to fragmentation site bias led me to revisit my QC results.

Another Question: Would you be willing to comment on the quality of the reverse read? Would you consider this a good run? ok run? Do you typically see the large quality range in the first few bases of the reverse read? The lab is tuning its protocols. Does this point to anything that might need to get changed?

Adding this in case it helps others in the future.

Working with Illumina Nextera prepped, pair-end 300 bp reads.

I have typically been taking a quick glance at the FastQC results. If the results looked good, I didn't bother with trimming/filtering the data before de-novo assembly. (Was relying on the assembler to leverage quality score information)

However, when I tried to go assemble the data, the assembly (using a variety of assemblers) were all terrible (thousands of small contigs). Mapping results looked fine.

I was able to get a good assembly after running the data through trimmomatic first. As Brian suggested, it is not necessary to trim off the first 15-20 bases due to fragmentation site bias...

Attached Files

**Brian Bushnell** · 07-17-2014, 09:35 AM

I have never worked with 2x300bp data; so far, we only go up to 2x250. So I'm not sure how typical the quality is of the last bases on read 2, but it certainly looks like it should be trimmed. And overall the quality variability for read 2 seems higher than it should be, but I don't work on the wet-lab side, so I'm not sure what it might indicate.

If you have plenty of data, you might experiment with throwing away reads with average quality below some threshold (or specifically, pairs in which either read is below the threshold), and see if that improves your assembly.

**GenoMax** · 07-17-2014, 03:02 PM

Since FastQC plots larger intervals it is difficult to see what may be going on with R2. You could turn-off the interval plotting on the command line and see if the tail end of R2 truly requires major trimming/throwing away the reads.

If this is a bacterial genome I would suggest trying SPADes, if you have not already done so.

**avo** · 07-21-2014, 12:20 AM

In my experience the fastqc quality plots look similar to what we see with TruSeq libraries.
However i always do the trimming for adapters and quality.
Especially with Nextera, the bead size selection and 2x300bp reads you might end up with some adapter sequences in your read data.

Do you do the trimming on the MiSeq directly or separately afterwards? To get a feel about the adapter contamination i would recommend to turn off the adapter trimming function on the MiSeq.

Concerning the first 20 bp I agree with Brian and it looks the same for the Nextera libraries we sequenced so far.

Topics	Statistics	Last Post
Bacterial Timeline Study Suggests Oxygen Use Preceded Photosynthesis by seqadmin Started by seqadmin, Today, 12:59 PM	0 responses 6 views 0 reactions	Last Post by seqadmin Today, 12:59 PM
New Software Simplifies 3D Gene Expression Mapping by seqadmin Started by seqadmin, Yesterday, 10:17 AM	0 responses 8 views 0 reactions	Last Post by seqadmin Yesterday, 10:17 AM
AI Tool Creates High-Resolution 3D Maps of the Mouse Brain by seqadmin Started by seqadmin, 03-20-2025, 05:03 AM	0 responses 49 views 0 reactions	Last Post by seqadmin 03-20-2025, 05:03 AM
Studying Microbial Gene Transfer with RNA Barcoding by seqadmin Started by seqadmin, 03-19-2025, 07:27 AM	0 responses 60 views 0 reactions	Last Post by seqadmin 03-19-2025, 07:27 AM

Seqanswers Leaderboard Ad

Illumina Nextera Pair-End Sequence Content Bias-Require trimming for DeNovo Assembly?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News