Seqanswers Leaderboard Ad

**snetmcom** · 07-28-2012, 11:32 AM

I would still argue that accurate 16s sequencing still requires long reads that just aren't available on other platforms. There are so many people struggling to make this wok with shorter read platforms. There are a few papers that claim to have accomplished this with PE short reads, but it's not an easy path on the analysis side.

**pmiguel** · 07-29-2012, 08:58 AM

Originally posted by snetmcom View Post

I would still argue that accurate 16s sequencing still requires long reads that just aren't available on other platforms. There are so many people struggling to make this wok with shorter read platforms. There are a few papers that claim to have accomplished this with PE short reads, but it's not an easy path on the analysis side.

"Other"? Than 454, you mean?

We have discontinued use of our 454, so I guess I should argue that it can be done on a MiSeq.

With the v2 MiSeq chemistry upgrade to be released soon, 2x250 base reads should make quick work of any amplicon pool with inserts less than 400 bp. Won't using PANDA, or something similar, give you sufficient length sequences to get the job done? I think we have had a good result using v1 MiSeq (2x150 base reads), see below.

But there are fairly complex technical considerations lab-side as well:

(1) Illumina has done a pretty clunky job of telling us how to make the equivalent version of "fusion primers" for the MiSeq. Possibly because various technical issues specific to Illumina sequencers making this methodology less than ideal for their platform.

(2) Number of bar codes available. This would be an issue for 300 samples. The 454 offers well over 100 official bar codes (MIDs). So you could possibly get by with a 3 regions on a 454, reusing those 100 bar codes. Where a "region" might be a region on an 8 gasket PTP. Or you could use one of the commonly available sets of 454 bar code sequences published by non-Roche sources. Of course buying the oligos would cost quite a bit.
Illumina has a couple of "dual" index adapter sequences that could be developed. That gets you out to 96 (8x12). What you really want for this project is 384, (16x24) though. Are the TruSeqHT and NexteraXT indexes compatible? If so you would have 16x24 indexes right there.

(3) Our only attempt, thus far, to do a 16S (v3 loop) run on the MiSeq did appear to work. But this success doesn't really address snetcom's objection, because analysis is ongoing. But I am pretty sure it will be good.

(4) Balance and diversity. Illumina's throbbing red Achilles heel comes to the foreground in amplicon work. The above mentioned 16S v3 loop amplicons would have failed to produce a usable cluster, in all likelihood had we not spiked in 50% genomic libraries as "ballast" into the same run. But with ballast, it seems to work fine. This is a little tricky, I think, because to get good demultiplexing we chose to run 2 genomic libraries with "balanced" indexes -- 6 and 12. That meant the investigator was stuck using only 22 indexes because we were camped out on two of the TruSeq 24.

(5) If you don't mind paying more, you could just buy a NexteraXT kit and create "tagmentation" fragmented amplicons. These do appear to avoid the balance and diversity issues. But downstream analysis may be an issue.

(6) Save money on oligo cost using "step-out" PCR. Whatever your "locus specific" primers are, just append the post-index part of an Illumina adapter to it. Then add the index-containing part of the adapter in a "step out" PCR.

For this you synthesize another set of oligos that overlap your locus specific primers just in the TruSeq adapter part. Then you have "factored" the fusion primer into to two segments that you combine multiplicatively.

That is, say, you are interested in a single locus. You amplify with your internal locus specific primers, then reamplify your products with TruSeq adapter oligoes. 24 available from the standard TruSeq set (48 if you want to use the small-RNA set). Instead of needing to synthesize 25 80-mers that will only be usable for this one purpose, you can thus synthesize 26 60-mers. The 24 of which are the TruSeq external adapter part can be reused for any other experiment.

Where it really gets powerful, is if you use dual index adapters. There, if you want 96 different indexes, you only need to synthesize 20 (8 for one side, 12 for the other). Just use the amplicon dual index sequences in the Illumina Oligo letter. Then, the obvious extension is to go to 40 adapters, 16 one side, 24 the other. Then you have up to 384 index pairs available. I don't know why Illumina has not already jumped on this obvious application.

But beware, primer-dimers are your enemies here as they are for the 454 amplicons. Particularly pernicious because they can anneal to the full-length products making them impossible to completely remove, even with a gel cut.

Anyway, by the end of the year I am confident 454 amplicons will seem like a bad dream having phased into complete obsolescence. But as things stand now it is difficult to say 454 amplicons are yet out-moded.

--
Phillip

**vs92** · 07-29-2012, 09:08 AM

Re: snetmcom

I am working towards sophisticated analysis, so I'd like to go with the shortest possible reads and make the analysis challenging - that is not an issue; especially if I can minimize cost and time. Can you please post the papers you have mentioned - that have used shorter reads?

**vs92** · 07-29-2012, 09:33 AM

RE: pmiguel

Hi Phillip,

Thank you so much for the detailed response. I am also leaning towards procuring the MiSeq. I had some follow-up questions to the points you've mentioned, and hope you'd take the time to reply:

1. Do you know when the v2 MiSeq chemistry upgrade will be released? Also, with this, what would be the cost and time required to focus on a 300 or 400 nucleobase region (like V2 or V4 of 16 rDNA)? My goal is to get the sequencing time to 4-6 hours duration, which seems unlikely with 2x250 base reads (isn't the time for running this on MiSeq currently close to 27 hours?).

2. Could you please provide the link to PANDA (or the other similar tools) you have mentioned? I'd like to get the length of the sequences down to 30-40 nucleobases, so that the goal of 4-6 hour run is feasible on MiSeq... any advice on going about this would be very helpful.

3. How much does the v1 MiSeq chemistry reagents cost? Do you know how much more v2 MiSeq chemistry reagents are likely to cost?

4. Instead of using universal primers, is there merit in going with a set of different primers targeted to different loci? Not sure whether MiSeq permits multiplexing such primers, and if so what the practical maximum number of primer sets (max number of loci one can focus on) is.

5. Do you know how I can check for the compatibility of the TruSeqHT and NexteraXT indexes you have mentioned? I'm assuming that the bacteria in the stool samples I'll be analyzing are very similar to those published from the stool samples of healthy patients of the human microbiome project recently. If I an get 16x24 = 384 samples in 1 month, that would be fantastic! By the way, is the number 384 for 1 run or for the entire 1 month period? I thought a single run takes just ~ 1 day, so if one can do 384 different samples in a single run, shouldn't this be the throughput for one day itself? Perhaps I am missing something key here!

Also, I do have extra grant money to get the NexteraXT kit, so I should definitely be able to create "tagmentation" fragmented amplicons -- could you refer me to some literature on how to do this, and why this would add to the complexity of the downstream analysis? Thanks!

6. Very glad to note that your attempt to do the 16S (v3 loop) run on the MiSeq was positive and the analysis is ongoing - good luck on this front! What length of V3 loop did you guys go after? Was it as short as 30-40 nucleobases? Would love to know how you developed this approach.

7. "spiked in 50% genomic libraries as "ballast" into the same run" -- could you please refer me to some link or literature reference that describes why adding in Ballasts into a run would increase likelihood of producing a usable cluster? Also why would having 6 and 12 balances indices help with demultiplexing (on 2 TrueSeq24s)?

8. Thanks for your very helpful advice on Step Out PCR - will definitely do that! Could you give me your email address - I'd love to stay connected and potentially collaborate on my project.

Thanks!

**snetmcom** · 07-29-2012, 09:43 AM

I'm with you phillip. 454 is pricey. I just dont see the same level of confidence in any of the Illumina data yet. It seems like it's a stretch, and it's quite the analysis headache. I'm intrigued about 2x250, but I am making zero assumptions until i actually see it. Most 16s projects require a high level of accuracy, and even the 150 Miseq data isn't that great towards the end. If you make it work, i'll be right behind you.

**MrGuy** · 07-30-2012, 01:42 AM

We also ditched our 454 as the machine was finicky and really expensive to run.

Originally posted by pmiguel View Post

(4) Balance and diversity. Illumina's throbbing red Achilles heel comes to the foreground in amplicon work. The above mentioned 16S v3 loop amplicons would have failed to produce a usable cluster, in all likelihood had we not spiked in 50% genomic libraries as "ballast" into the same run. But with ballast, it seems to work fine. This is a little tricky, I think, because to get good demultiplexing we chose to run 2 genomic libraries with "balanced" indexes -- 6 and 12. That meant the investigator was stuck using only 22 indexes because we were camped out on two of the TruSeq 24.

... and this is the big reason why we went ion. The majority of our work is amplicon and targeted (ie, pcr) applications with the occasional whole genome with reference (<100mb) so we can design pcrs. The amplicon applications are more for population diversity that is not possible to resolve with sanger (ie <15% frequency).

Other reasons were:
-run cost vs 454 was dramatically lower
-ability to multiplex outside of what "the company" says
-error profile is predictable -- unlikely substitution errors as in sbs chemistries. Homopolymers are easier to work with in this regard and occur in specific locations.
-potential for long reads. >300bp is imminent, but I have yet to see the error profile at the end of those reads... vacation, you know.

**TonyBrooks** · 07-30-2012, 02:00 AM

I'm currently looking into dual indexing for metagenomic runs on MiSeq with regards to the promised 250bp paired end runs by year end. Q scores will be vastly better with this approach as the poorer 3' scores will be boosted due the fact they will overlap (and hence be sequenced twice) on a 400bp amplicon.
Diversity is still an issue, but we may try some tricks to increase this (maybe design PCRs to both strands, use a few amplicons, custom seq primers to avoid sequencing the primer, lower cluster densities). Our Illumina rep did mention that they were trying to reduce the diversity problem, but wouldn't tell me how they are planning to do this.

**pmiguel** · 07-30-2012, 03:38 AM