Unconfigured Ad

**Brian Bushnell** · 02-18-2015, 10:10 PM

A list of all Nextera adapter sequences, complete with bar codes, is packaged with BBTools in fasta format, in the resources directory (nextera.fa.gz).

I recommend using BBDuk rather than Scythe, but of course, I'm biased. Anyway, the adapter sequences will work with any adapter-trimming program.

**cheezemeister** · 02-18-2015, 10:40 PM

Very helpful. Thanks Brian!

Another question. I notice from various reading I have done on this forum and elsewhere that many people do not use the entire adapter/barcode/whatever-contaminant sequence when trimming. Often I see them put just the first 8, 10, 12 bases into their contaminant file. Is there an advantage/disadvantage to using the truncated sequence over the full sequence(s)? Do most trimmers simply look for the specified sequence and trim that and everything after?

EDIT: middle of the night grammar

**Brian Bushnell** · 02-18-2015, 11:50 PM

I am not really sure what most trimmers do, but when looking for a full-sequence match, in the presence of error, you will trim more adapters with an 8bp sequence than a 12bp sequence. Of course, you will also incur more false-positives.

Some people trim as little as 1bp, allowing up to 1 mismatch. That will, of course, shorten all reads by a minimum of 1... and I think that's a bad approach unless a single adapter base is devastating - in which case, I think it's better to reorganize your experiment so that a single adapter base will not be devastating.

BBDuk matches full-length kmers in the middle of the read, and at the very end of the read, when there are fewer than K bases left, it will match kmers from the ends of adapters down to the "mink" setting. So, providing longer adapter sequences is generally advantageous. You can set "mink" to 8bp if you want, which will allow similar sensitivity but better specificity to using an 8-bp adapter sequence.

Generally, though, I recommend 11bp as a minimum for mink - meaning, a match for the last 11bp of a read (with a hamming distance of 0 or 1), and a kmer length of 23 for nonterminal kmers. If you use 8bp adapter sequences and trim wherever you see a match for them, you have a 4^8 = 2^16 = 1/65356 chance of a spurious match, even if you require an exact match. That means that for assembly, you will on average not get any contigs longer than 64kbp! Which is terrible.

It depends on what your goal is, though. When looking for super-rare 1/100 rate mutations, trimming adapters as much as possible may be wise, even if you lose data in a biased way.

P.S. I forgot to mention, BBDuk's "tbo" flag will allow you to trim reads with even 1bp of adapter sequence with very little risk of false-positives, by finding looking for where the reads overlap. It requires paired reads, but it will work even with unknown adapter sequences.

Topics	Statistics	Last Post
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 10 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM

Unconfigured Ad

Trimming Nextera XT Sequence Data

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News