Seqanswers Leaderboard Ad

**GenoMax** · 05-19-2015, 02:18 PM

Have you scanned this data (with a trimming program) to see how much adapter dimers or read-through it has? Did FastQC indicate this as a possibility?

**cheezemeister** · 05-19-2015, 02:50 PM

Originally posted by GenoMax View Post

Have you scanned this data (with a trimming program) to see how much adapter dimers or read-through it has? Did FastQC indicate this as a possibility?

Haven't done that, however adapters are trimmed at source by the MiSeq. I haven't quality-trimmed the data yet since everything I've read says that merging first is the preferred method.

Not sure why I would have read-through on a 460 bp amplicon using a 300 bp read.

I can run FastQC and see.

**GenoMax** · 05-19-2015, 03:04 PM

Originally posted by cheezemeister View Post

Haven't done that, however adapters are trimmed at source by the MiSeq. I haven't quality-trimmed the data yet since everything I've read says that merging first is the preferred method.

Not sure why I would have read-through on a 460 bp amplicon using a 300 bp read.

I can run FastQC and see.

Wasn't asking about quality trimming. You certainly want to first merge and then trim (if needed, for quality). Since we don't use onboard MiSeq analysis I tend to forget that adapters may have already been trimmed (though in that instance you probably no longer have uniform 300 bp reads, trimmed reads could be short and will overlap more than you expect them to, FastQC will tell you about the size spread).

Give BBMerge a try as well (from BBMap).

**cheezemeister** · 05-19-2015, 03:25 PM

Just selecting a representative file, FastQC reports my sequence length as 35-300 bp, though 70% are 300 bp and pretty much 100% are >280 bp.

Since max-overlap at 159 eliminates the error, and increasing beyond that does not increase % merged, that seems to jive with 100% of bases being 280 bp or greater.

I'll also try BBmerge. Do you happen to know if BBmerge can do batch processing (I've got several thousand samples of data) and output the %merge in a table?

**Brian Bushnell** · 05-19-2015, 04:33 PM

Originally posted by cheezemeister View Post

Just selecting a representative file, FastQC reports my sequence length as 35-300 bp, though 70% are 300 bp and pretty much 100% are >280 bp.

To clarify, was the only trimming done adapter-trimming by the machine? There should not really be anything in the 280-299bp range if trimming was done correctly and the library was made correctly. Adapter-trimming is not necessary prior to merging; the position of adapters (if any) is obvious based on the overlap, and a good read-merger will trim them if present. I suggest you turn it off in this case unless you first generate an insert-size histogram and specifically note adapter sequence. If ~30% are getting trimmed to between 280 and 299bp (when it should be 0%), perhaps the algorithm being used is a greedy one that matches even 1 bp. The end result will be inferior merging as the overlap region is unnecessarily reduced.

I'll also try BBmerge. Do you happen to know if BBmerge can do batch processing (I've got several thousand samples of data) and output the %merge in a table?

BBMerge does not have a batch mode; you'd have to script that. It does print the percent merged for each dataset, though, which can be parsed from stderr.

**ronaldrcutler** · 07-15-2016, 11:10 AM

For future flash use this should be noted:

--read-len (-r) has no effect when --max-overlap (-M) is also specified!

--fragment-len-stddev (-s) has no effect when --max-overlap

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Merging 16S reads with FLASH - parameters?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News