I am trying to plan for a large experiment of 300 human stool samples over the next 1 month to identify the population of bacterial species living in each stool sample (using either 1 454 sequencer machine or 1 miSeq machine). I could very much use your insights on doing this effectively. Based on time and cost, should I go with Illumina MiSeq or the 454 sequencer, especially if I have to do 300 samples again per month for the next few months? Thanks so much for your insights.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I would still argue that accurate 16s sequencing still requires long reads that just aren't available on other platforms. There are so many people struggling to make this wok with shorter read platforms. There are a few papers that claim to have accomplished this with PE short reads, but it's not an easy path on the analysis side.
-
Originally posted by snetmcom View PostI would still argue that accurate 16s sequencing still requires long reads that just aren't available on other platforms. There are so many people struggling to make this wok with shorter read platforms. There are a few papers that claim to have accomplished this with PE short reads, but it's not an easy path on the analysis side.
We have discontinued use of our 454, so I guess I should argue that it can be done on a MiSeq.
With the v2 MiSeq chemistry upgrade to be released soon, 2x250 base reads should make quick work of any amplicon pool with inserts less than 400 bp. Won't using PANDA, or something similar, give you sufficient length sequences to get the job done? I think we have had a good result using v1 MiSeq (2x150 base reads), see below.
But there are fairly complex technical considerations lab-side as well:
(1) Illumina has done a pretty clunky job of telling us how to make the equivalent version of "fusion primers" for the MiSeq. Possibly because various technical issues specific to Illumina sequencers making this methodology less than ideal for their platform.
(2) Number of bar codes available. This would be an issue for 300 samples. The 454 offers well over 100 official bar codes (MIDs). So you could possibly get by with a 3 regions on a 454, reusing those 100 bar codes. Where a "region" might be a region on an 8 gasket PTP. Or you could use one of the commonly available sets of 454 bar code sequences published by non-Roche sources. Of course buying the oligos would cost quite a bit.
Illumina has a couple of "dual" index adapter sequences that could be developed. That gets you out to 96 (8x12). What you really want for this project is 384, (16x24) though. Are the TruSeqHT and NexteraXT indexes compatible? If so you would have 16x24 indexes right there.
(3) Our only attempt, thus far, to do a 16S (v3 loop) run on the MiSeq did appear to work. But this success doesn't really address snetcom's objection, because analysis is ongoing. But I am pretty sure it will be good.
(4) Balance and diversity. Illumina's throbbing red Achilles heel comes to the foreground in amplicon work. The above mentioned 16S v3 loop amplicons would have failed to produce a usable cluster, in all likelihood had we not spiked in 50% genomic libraries as "ballast" into the same run. But with ballast, it seems to work fine. This is a little tricky, I think, because to get good demultiplexing we chose to run 2 genomic libraries with "balanced" indexes -- 6 and 12. That meant the investigator was stuck using only 22 indexes because we were camped out on two of the TruSeq 24.
(5) If you don't mind paying more, you could just buy a NexteraXT kit and create "tagmentation" fragmented amplicons. These do appear to avoid the balance and diversity issues. But downstream analysis may be an issue.
(6) Save money on oligo cost using "step-out" PCR. Whatever your "locus specific" primers are, just append the post-index part of an Illumina adapter to it. Then add the index-containing part of the adapter in a "step out" PCR.
For this you synthesize another set of oligos that overlap your locus specific primers just in the TruSeq adapter part. Then you have "factored" the fusion primer into to two segments that you combine multiplicatively.
That is, say, you are interested in a single locus. You amplify with your internal locus specific primers, then reamplify your products with TruSeq adapter oligoes. 24 available from the standard TruSeq set (48 if you want to use the small-RNA set). Instead of needing to synthesize 25 80-mers that will only be usable for this one purpose, you can thus synthesize 26 60-mers. The 24 of which are the TruSeq external adapter part can be reused for any other experiment.
Where it really gets powerful, is if you use dual index adapters. There, if you want 96 different indexes, you only need to synthesize 20 (8 for one side, 12 for the other). Just use the amplicon dual index sequences in the Illumina Oligo letter. Then, the obvious extension is to go to 40 adapters, 16 one side, 24 the other. Then you have up to 384 index pairs available. I don't know why Illumina has not already jumped on this obvious application.
But beware, primer-dimers are your enemies here as they are for the 454 amplicons. Particularly pernicious because they can anneal to the full-length products making them impossible to completely remove, even with a gel cut.
Anyway, by the end of the year I am confident 454 amplicons will seem like a bad dream having phased into complete obsolescence. But as things stand now it is difficult to say 454 amplicons are yet out-moded.
--
Phillip
Comment
-
Re: snetmcom
I am working towards sophisticated analysis, so I'd like to go with the shortest possible reads and make the analysis challenging - that is not an issue; especially if I can minimize cost and time. Can you please post the papers you have mentioned - that have used shorter reads?
Comment
-
RE: pmiguel
Hi Phillip,
Thank you so much for the detailed response. I am also leaning towards procuring the MiSeq. I had some follow-up questions to the points you've mentioned, and hope you'd take the time to reply:
1. Do you know when the v2 MiSeq chemistry upgrade will be released? Also, with this, what would be the cost and time required to focus on a 300 or 400 nucleobase region (like V2 or V4 of 16 rDNA)? My goal is to get the sequencing time to 4-6 hours duration, which seems unlikely with 2x250 base reads (isn't the time for running this on MiSeq currently close to 27 hours?).
2. Could you please provide the link to PANDA (or the other similar tools) you have mentioned? I'd like to get the length of the sequences down to 30-40 nucleobases, so that the goal of 4-6 hour run is feasible on MiSeq... any advice on going about this would be very helpful.
3. How much does the v1 MiSeq chemistry reagents cost? Do you know how much more v2 MiSeq chemistry reagents are likely to cost?
4. Instead of using universal primers, is there merit in going with a set of different primers targeted to different loci? Not sure whether MiSeq permits multiplexing such primers, and if so what the practical maximum number of primer sets (max number of loci one can focus on) is.
5. Do you know how I can check for the compatibility of the TruSeqHT and NexteraXT indexes you have mentioned? I'm assuming that the bacteria in the stool samples I'll be analyzing are very similar to those published from the stool samples of healthy patients of the human microbiome project recently. If I an get 16x24 = 384 samples in 1 month, that would be fantastic! By the way, is the number 384 for 1 run or for the entire 1 month period? I thought a single run takes just ~ 1 day, so if one can do 384 different samples in a single run, shouldn't this be the throughput for one day itself? Perhaps I am missing something key here!
Also, I do have extra grant money to get the NexteraXT kit, so I should definitely be able to create "tagmentation" fragmented amplicons -- could you refer me to some literature on how to do this, and why this would add to the complexity of the downstream analysis? Thanks!
6. Very glad to note that your attempt to do the 16S (v3 loop) run on the MiSeq was positive and the analysis is ongoing - good luck on this front! What length of V3 loop did you guys go after? Was it as short as 30-40 nucleobases? Would love to know how you developed this approach.
7. "spiked in 50% genomic libraries as "ballast" into the same run" -- could you please refer me to some link or literature reference that describes why adding in Ballasts into a run would increase likelihood of producing a usable cluster? Also why would having 6 and 12 balances indices help with demultiplexing (on 2 TrueSeq24s)?
8. Thanks for your very helpful advice on Step Out PCR - will definitely do that! Could you give me your email address - I'd love to stay connected and potentially collaborate on my project.
Thanks!
Comment
-
I'm with you phillip. 454 is pricey. I just dont see the same level of confidence in any of the Illumina data yet. It seems like it's a stretch, and it's quite the analysis headache. I'm intrigued about 2x250, but I am making zero assumptions until i actually see it. Most 16s projects require a high level of accuracy, and even the 150 Miseq data isn't that great towards the end. If you make it work, i'll be right behind you.
Comment
-
We also ditched our 454 as the machine was finicky and really expensive to run.
Originally posted by pmiguel View Post(4) Balance and diversity. Illumina's throbbing red Achilles heel comes to the foreground in amplicon work. The above mentioned 16S v3 loop amplicons would have failed to produce a usable cluster, in all likelihood had we not spiked in 50% genomic libraries as "ballast" into the same run. But with ballast, it seems to work fine. This is a little tricky, I think, because to get good demultiplexing we chose to run 2 genomic libraries with "balanced" indexes -- 6 and 12. That meant the investigator was stuck using only 22 indexes because we were camped out on two of the TruSeq 24.
Other reasons were:
-run cost vs 454 was dramatically lower
-ability to multiplex outside of what "the company" says
-error profile is predictable -- unlikely substitution errors as in sbs chemistries. Homopolymers are easier to work with in this regard and occur in specific locations.
-potential for long reads. >300bp is imminent, but I have yet to see the error profile at the end of those reads... vacation, you know.
Comment
-
I'm currently looking into dual indexing for metagenomic runs on MiSeq with regards to the promised 250bp paired end runs by year end. Q scores will be vastly better with this approach as the poorer 3' scores will be boosted due the fact they will overlap (and hence be sequenced twice) on a 400bp amplicon.
Diversity is still an issue, but we may try some tricks to increase this (maybe design PCRs to both strands, use a few amplicons, custom seq primers to avoid sequencing the primer, lower cluster densities). Our Illumina rep did mention that they were trying to reduce the diversity problem, but wouldn't tell me how they are planning to do this.
Comment
-
Originally posted by vs92 View Post
Hi Phillip,
Thank you so much for the detailed response. I am also leaning towards procuring the MiSeq. I had some follow-up questions to the points you've mentioned, and hope you'd take the time to reply:
1. Do you know when the v2 MiSeq chemistry upgrade will be released? Also, with this, what would be the cost and time required to focus on a 300 or 400 nucleobase region (like V2 or V4 of 16 rDNA)? My goal is to get the sequencing time to 4-6 hours duration, which seems unlikely with 2x250 base reads (isn't the time for running this on MiSeq currently close to 27 hours?).
Originally posted by vs92 View Post2. Could you please provide the link to PANDA (or the other similar tools) you have mentioned? I'd like to get the length of the sequences down to 30-40 nucleobases, so that the goal of 4-6 hour run is feasible on MiSeq... any advice on going about this would be very helpful.
Here is the link.
Originally posted by vs92 View Post
3. How much does the v1 MiSeq chemistry reagents cost? Do you know how much more v2 MiSeq chemistry reagents are likely to cost?
Originally posted by vs92 View Post4. Instead of using universal primers, is there merit in going with a set of different primers targeted to different loci? Not sure whether MiSeq permits multiplexing such primers, and if so what the practical maximum number of primer sets (max number of loci one can focus on) is.
Obviously, if you want to save time by avoiding the sequence of your PCR locus-specific primers, then using custom primers would be desirable. But then you are stepping off the Illumina QC/QA path.
Currently you can do custom primers for the read 1 , first index read and the read2 primers. 3 extra ports in the reagent cassettes for you to add 600 ul of your custom primer at 0.5 uM. The MiSeq runs hotter than the HiSeq so you want your Tm's pretty high -- like 65 oC.
Originally posted by vs92 View Post5. Do you know how I can check for the compatibility of the TruSeqHT and NexteraXT indexes you have mentioned? I'm assuming that the bacteria in the stool samples I'll be analyzing are very similar to those published from the stool samples of healthy patients of the human microbiome project recently. If I an get 16x24 = 384 samples in 1 month, that would be fantastic! By the way, is the number 384 for 1 run or for the entire 1 month period? I thought a single run takes just ~ 1 day, so if one can do 384 different samples in a single run, shouldn't this be the throughput for one day itself? Perhaps I am missing something key here!
Originally posted by vs92 View PostAlso, I do have extra grant money to get the NexteraXT kit, so I should definitely be able to create "tagmentation" fragmented amplicons -- could you refer me to some literature on how to do this, and why this would add to the complexity of the downstream analysis? Thanks!
By more complicated, I mean that instead of all amplicons starting and ending at known positions you will have fragment libraries so your software pipeline needs to be able to deal with this.
Originally posted by vs92 View Post
6. Very glad to note that your attempt to do the 16S (v3 loop) run on the MiSeq was positive and the analysis is ongoing - good luck on this front! What length of V3 loop did you guys go after? Was it as short as 30-40 nucleobases? Would love to know how you developed this approach.
7. "spiked in 50% genomic libraries as "ballast" into the same run" -- could you please refer me to some link or literature reference that describes why adding in Ballasts into a run would increase likelihood of producing a usable cluster?
Basically Illumina instruments are designed to sequence randomly fragmented genomic libraries. Anything varying from that ideal of randomness causes issues for their software. However over the years they have gotten somewhat better at tolerating low diversity/higher bias. But it is always there.
Originally posted by vs92 View PostAlso why would having 6 and 12 balances indices help with demultiplexing (on 2 TrueSeq24s)?
8. Thanks for your very helpful advice on Step Out PCR - will definitely do that! Could you give me your email address - I'd love to stay connected and potentially collaborate on my project.
Originally posted by vs92 View PostThanks!
--
Phillip
Comment
-
Hi guys
We have posted a blog post which discusses some of the issues relating to low-diversity amplicons on the MiSeq, and a useful workaround for improving performance:
Hope it is useful
Comment
-
16S sequencing on MiSeq
Hi All,
We are doing 16S sequencing on a MiSeq using the protocol in Caporaso et al, 2012 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3400413/). It works well and you simply add your own sequencing and indexing primers to the reagent cartridge to get your data to work. The sequencing and index primers are contained in the supplementary materials. This protocol targets the V4 region with 515F/806R primers which capture both bacterial and archaeal taxa. We are pooling 192 samples per run on our instrument and getting back on average 5000 reads per sample. The cost per sample is about $12. The same data output was costing about 10 times this when using 454 previously. Using the 2x150 kit to cover amplicons that are about 252bp. We hope to make use of the 2x250 reads to look at fungal data in the near future as well.
Hope this isn't too late to help your project.
Andy
Comment
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
157 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
||
Started by seqadmin, 12-02-2024, 09:06 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:06 AM
|
||
Started by seqadmin, 12-02-2024, 08:03 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 08:03 AM
|
||
Started by seqadmin, 11-22-2024, 07:36 AM
|
0 responses
76 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:36 AM
|
Comment