Seqanswers Leaderboard Ad

**westerman** · 08-27-2013, 08:31 AM

As Phillip SanMiguel said to me in private email and which may clarify my post:

So the reads may all have a few (1-5 bases) of adapter at the their 3' ends. A better way to trim them would be to compare R1 and R2 -- the first base of each should point out the last base of the the other. If PANDA had a setting to remove single stranded sequence from pair merges, that would be good.

**GenoMax** · 08-27-2013, 08:38 AM

A pair-wise aligner (that can export a consensus, followed by an appropriate trim) should work right?

**mcnelson.phd** · 08-27-2013, 08:42 AM

Have you tried SeqPrep?

I know I've tried it on Nextera data and by giving it the Nextera adapter sequence it was able to spit out reads with 100% overlap but whose length was < 250bp, which would fit what you're talking about. What I can't say is how it would handle the "adapter" sequences that might hang off the ends if you don't provide it with any sequence to look for.

**westerman** · 08-27-2013, 09:06 AM

@GenoMax: Your idea should work but doing it for an entire miSeq run sounds like a long processing time. I was hoping for a quicker and one-stop solution.

@McNelson.phd: No, I haven't tried SeqPrep but from my reading of it -- and your description -- it sounds like it would act the same as Panda and Flash: not good for when there is no prior knowledge of the adapter. I will install it though and give it a spin.

Real data coming off the sequencer later today!

**SNPsaurus** · 08-27-2013, 09:08 AM

I use SeqPrep for exactly that purpose, although I do an extra careful adapter stripping before and after merging to clean up the errors. It did an ok job without the extra step, but I wanted the reads as error-free as possible. I can reliably find alleles at the 0.03% range by doing that.

I look at the length of the merged reads and trim back if they are a size range where partial adapters would have been present. But your approach would work too, I think.

**GenoMax** · 08-27-2013, 09:10 AM

Rick,

It sounds like you do not want to trim (adapters) before the merge, is that a requirement?

**SNPsaurus** · 08-27-2013, 09:16 AM

This group published along these lines:

Ultra-deep mutant spectrum profiling: improving sequencing accuracy using overlapping read pairs - BMC Genomics

http://www.biomedcentral.com/1471-2164/14/96

Backgound High throughput sequencing is beginning to make a transformative impact in the area of viral evolution. Deep sequencing has the potential to reveal the mutant spectrum within a viral sample at high resolution, thus enabling the close examination of viral mutational dynamics both within- and between-hosts. The challenge however, is to accurately model the errors in the sequencing data and differentiate real viral mutations, particularly those that exist at low frequencies, from sequencing errors. Results We demonstrate that overlapping read pairs (ORP) -- generated by combining short fragment sequencing libraries and longer sequencing reads -- significantly reduce sequencing error rates and improve rare variant detection accuracy. Using this sequencing protocol and an error model optimized for variant detection, we are able to capture a large number of genetic mutations present within a viral population at ultra-low frequency levels (<0.05%). Conclusions Our rare variant detection strategies have important implications beyond viral evolution and can be applied to any basic and clinical research area that requires the identification of rare mutations.

They align the raw reads and analyze that rather than merging. There is a second paper that came out more recently as well, but I can't dredge it up. My lab should have our version out soon, too. Gary Schroth at Illumina said he was pushing long ago to have this the standard output of the Illumina machines as a way to get separation on error rate with other platforms, so it is funny that years later there is a sudden wave of labs all independently coming up with the idea.

**GenoMax** · 08-27-2013, 09:19 AM

Longer read lengths have finally made the idea practical.

**mcnelson.phd** · 08-27-2013, 09:38 AM

It's probably too late right now if your run is already doing, but the new version of Reporter incorporates a read "Stitching" feature that might do exactly what you want. You'll have to manually add the flag to your sample sheet and reprocess your data if you want to try it. Check out the full guide on Reporter for what the actual flag is and what options are associated with it.

**westerman** · 08-27-2013, 09:38 AM

Originally posted by GenoMax View Post

Rick,

It sounds like you do not want to trim (adapters) before the merge, is that a requirement?

Not a requirement per se. It is what I will probably end up doing especially since we know the adapters. However Phillip and I were wondering if there an adapter-knowledge-free method.

Indeed, the longer lengths are making for interesting possibilities.

**westerman** · 08-27-2013, 09:45 AM

Originally posted by mcnelson.phd View Post

... but the new version of Reporter incorporates a read "Stitching" feature that might do exactly what you want.

Ah yes, that is an interesting option. Hard to say from scanning the docs if it would be better than Panda/Flash/SeqPrep but since the Reporter can be run off-machine I might give it a try. Thanks for the tip.

**westerman** · 09-03-2013, 08:34 AM

As a followup, it turns out that the samples in question did not (for the most part) look like the 2nd example I gave -- i.e., with the desired fragment fully contained in R1 and R2 with R1 starting inside R2 and vice-versa. Instead most of the reads looked like the 1st example thus we could use normal Panda/Flash methodology on them.

It might still be interesting to develop an 'adapater-knowledge-free' stitching/merging program. But that is a task for another day.

**kmcarr** · 09-03-2013, 10:41 AM

Originally posted by westerman View Post

As a followup, it turns out that the samples in question did not (for the most part) look like the 2nd example I gave -- i.e., with the desired fragment fully contained in R1 and R2 with R1 starting inside R2 and vice-versa. Instead most of the reads looked like the 1st example thus we could use normal Panda/Flash methodology on them.

It might still be interesting to develop an 'adapater-knowledge-free' stitching/merging program. But that is a task for another day.

I'm curious about the 'adapter-knowledge-free' constraint to you problem. If the premise of instance #2 in your original post is that these are sequencing reads in which (read length) > (fragment length) (i.e. contain adapter sequence at the 3' end) how would you not know what the adapter sequence is? The adapters/sequencing primers for all major kits are pretty much known are they not?

If you have a priori knowledge of the adapter sequences then Trimmomatic, using it Palindrome trimming mode, handles cases like #2, but not in exactly the way you asked about. I makes not attempt to "merge" the two reads. It simply clips the adapter from read 1 and discards read 2 entirely as it contains no additional data beyond that which is contained in read 1.

**westerman** · 09-03-2013, 11:20 AM

@kmcarr: I will concede that the constraint is mostly, if not entirely, theoretical since the adapter sequencer should be known -- certainly it will be by us service providers and this information should be passed onto our customers. A 'adapter-knowledge-free' program would only be useful in extremely rare cases or as part of a thought experiment.

I had not considered Trimmomatic's Palindrome mode since I never use that part of Trimmomatic. Thanks for the tip.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 23 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Merger/overlapper for fully contained fragment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News