Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • miSeq and Exome sequencing

    Hi all,
    we are looking to buy our first NGS machine for our lab to use from the beginning of 2012 and I am in charge of recommending a suitable machine for our needs. We work with human genome and cytogenetics in my lab.
    We have a limited budget and our idea is to get Illumina' s MiSeq (we are also considering Roche GS Junior System but we are leaning towards miSeq).

    As I come from a computer scientist background and its the first time I venture into the NGS machines world (before i was only in the analysis part) I have a few questions and would be grateful if anyone could answer or direct me to somewhere to read:

    1. One of the aplications we will want to do is exome sequencing. I have read that miseq is not able to do whole genome sequencing unless we are talking about a very small genome. What about the exome? Can I use the TruSeq Exome Enrichment kit with the DNA sample preparation kit and prepare out libraries and sequence on MiSeq?

    2. I read in this forum that for 1 GB MiSeq can produce 2x 150 bp reads and that for a 50Mb exome capture that would be 20x coverage. Can someone explain to me how these numbers are calculated? How do I know, given a capture library of X Mb and say I want to sequence paired end reads of 150 bp each, how much coverage I will get? Also, If i want to have say at least an average of 50x coverage in each position of the exome, how do i calculate how much data i need and the length of reads i should put miSeq to produce for best outcome? Could you direct me to a publication or something I can read to clear these notions and numbers in my head because i need to understand them to be able to assess NGS machines?

    3.Can I create libraries of differents parts of the exome to be resequenced by miSeq so I can have a better coverage at each position?

    4. Does anyone know the cost of miSeq machine and how much a single run will cost?

    Thank you in advance

  • #2
    The MiSeq is actually a lot more expensive than a HiSeq in terms of per-base cost. I don't know what the cost of the instrument is relative to the HiSeq. If it's similar or you plan to do a lot of sequencing, it might be better to get the HiSeq. The real advantage of the MiSeq is that turn around time (1 day vs. ~2 weeks).

    In terms of coverage:

    50 Mb exome x 50x coverage = 2.5 GB of sequence. So if you did 2x150 PE reads, you would need (2.5x(10^9))/300 = 8.33 million reads. HOWEVER, with exome sequencing you will not get every base pair on target and you will not get completely even coverage of the whole exome. Three reasons for this:

    1. The average human exon I think is 125 bp or so. If you have a 2x150 paired end read, that is 300 bp of sequence for an exon that is only 125 bp. Right there you are losing over half of your data.

    2. The way whole exome capture works means some of your reads will align to random places on the genome. You can probably assume 65% of your reads will align to the exome, maybe more.

    3. For other technical reasons, you won't get even coverage of each exon. So you might need to sequence to an average depth of 70x or so to really get almost all of the bases at least 50x.

    Bottom line: the MiSeq is not going to give you enough reads to analyze an exome without running the same sample multiple times, and it's a lot more expensive per base than the HiSeq, so I would definitely look into this more.

    Comment


    • #3
      Originally posted by Heisman View Post
      The MiSeq is actually a lot more expensive than a HiSeq in terms of per-base cost. I don't know what the cost of the instrument is relative to the HiSeq. If it's similar or you plan to do a lot of sequencing, it might be better to get the HiSeq. The real advantage of the MiSeq is that turn around time (1 day vs. ~2 weeks).
      The MiSeq is also a much cheaper piece of equipment. It depends whether you have a large enough capital budget to purchase a HiSeq. Also, the MiSeq is very easy to run, requires no additional equipment (i.e. cBot or analysis server) and has analysis software on board. The MiSeq has been designed for targeted capture and resequencing, 16s metagenomics and small genomes. For anything else, I would recommend forging connections with centres that have HiSeqs.
      GSJuniors are a bit cheaper to buy, but relatively very expensive to run (per base). I would also consider the IonTorrent as they are rapidly improving the technology, scaling up read number and increasing read-length all the time.
      It's cheaper than MiSeq to buy, cheaper to run and will hopefully hit 10m 400bp (modal) reads within a year.


      Originally posted by Heisman View Post
      50 Mb exome x 50x coverage = 2.5 GB of sequence. So if you did 2x150 PE reads, you would need (2.5x(10^9))/300 = 8.33 million reads. HOWEVER, with exome sequencing you will not get every base pair on target and you will not get completely even coverage of the whole exome. Three reasons for this:

      1. The average human exon I think is 125 bp or so. If you have a 2x150 paired end read, that is 300 bp of sequence for an exon that is only 125 bp. Right there you are losing over half of your data.

      2. The way whole exome capture works means some of your reads will align to random places on the genome. You can probably assume 65% of your reads will align to the exome, maybe more.

      3. For other technical reasons, you won't get even coverage of each exon. So you might need to sequence to an average depth of 70x or so to really get almost all of the bases at least 50x.

      Bottom line: the MiSeq is not going to give you enough reads to analyze an exome without running the same sample multiple times, and it's a lot more expensive per base than the HiSeq, so I would definitely look into this more.
      Agreed here. Purely on a cost basis, it's a very inefficient way to sequence an exome. If it was absolutely necessary, there's no reason why it can't be done. - it's just time consuming and expensive to do. Remember, you generally need >40X coverage to correctly call SNPs, the Illumina Exome kits enrich 62Mb plus enrichment only runs at 65-70% - all this means you'd probably need to do at least three runs per exome.

      UK (list) Price of the MiSeq is just under £85k. I'm not 100% on prices, but I think our rep stated it'd be about £500 per run (I assume that's the 50bp kit).

      Comment


      • #4
        Illumina is heavily discounting the MiSeq reagents for new customers. Your price per run is over twice the discount price.

        Comment


        • #5
          Thought I'd chime in here with some real-world results from our MiSeqs where it may aid this discussion.

          Right out of the box, we're getting 8-9 million 2x150 reads (at cluster densities of ~1100K/mm^2), translating to ~2.5GB. List price on the reagents/flowcell is pretty close to $950 I believe (for the 300 cycle kit), but there are discounts a fair amount lower.

          The GSJr and PGM are not even close to competitive in this space (cost, throughput, or ease of use), unless you have armies of people to keep them fed, AND you absolutely need longer than 150bp reads (GSJr).

          Comment


          • #6
            Thank you all for your answers and helpful explanations. All very useful information. I will look into Ion Torrent as well but I like the ease of use that miSeq will provide and according to my group we will rarely require whole exome sequencing but we will target smaller genome areas for resequencing that are associated with syndromes and diseases. Thus, I will inquire about the precise lengths of areas of interests, but I believe a single run of miSeq will suffice.
            Also fot miseq users, do you happen to know if the Agilent or Nimblegen exome enrichment kits are compatible with miSeq or I shoud stick with Illumina's Truseq?
            If you happen to have in mind a helpful publication about library preparation and designing an NGS run experiment, please share. Thanks

            Comment


            • #7
              I know this is an old thread, but for those that come across it looking for pricing estimates... be aware that Illumina's pricing for instruments and reagents vary by a large amount across the world. Often, USA and UK prices are not applicable in other countries. The pricing is not necessarily linked to exchange rate either... Our list price for the 300b kit is more than 25% higher than your quoted US prices, even though our dollar is at parity (or better).

              Cheers,

              Scott.

              Comment


              • #8
                Now MiSeq can do 2x150bp at 4.5-5.1Gb. Does that mean now it can do exome in one run?

                Comment


                • #9
                  no, see the description above regarding exon size. Our exome libraries are usually 150-200 bp insert, run PE-76. Take the yield you get from MiSeq at the shorter PE, not PE-150 or PE-250. These are good for small genomes or targeted studies but not yet for whole exome.

                  I had been told to expect MiSeq V3 to be able to do an exome but now wouldn't be surprised if I believe the rumor mill of a 3rd instrument coming out that fits the niche between MiSeq and HiSeq to compete with Proton. If you could have a MidSeq that just ran the two-lane rapid flow cell and not the HT mode, like a slimmed down HiSeq 1500 Rapid only, that would be the logical Dx instrument over MiSeq. Whole exome for germline and 200 gene Foundation Med sized panels for Cancer studies are most common.

                  Comment


                  • #10
                    Hello everyone, sorry to necro an old thread..

                    One thing I don't understand that is why Miseq cant do exome sequencing? when you are saying exome sequencing, are you referring to all exomes in the human genome or a single or 2 exomes in a certain gene.

                    Are the limitation we are talking about in terms of costs only? or in terms of technical issues? library preparation issues? what exactly??

                    Lets assume I want to sequence a certain exome in gene (X), cant I design primer flank that exome and "resequence" it via Miseq?? of course I can?

                    Comment


                    • #11
                      Originally posted by a.obeidat View Post
                      Hello everyone, sorry to necro an old thread..

                      One thing I don't understand that is why Miseq cant do exome sequencing? when you are saying exome sequencing, are you referring to all exomes in the human genome or a single or 2 exomes in a certain gene.

                      Are the limitation we are talking about in terms of costs only? or in terms of technical issues? library preparation issues? what exactly??

                      Lets assume I want to sequence a certain exome in gene (X), cant I design primer flank that exome and "resequence" it via Miseq?? of course I can?
                      You seem to be confusing two related terms - exons and exomes. Exons are the coding regions of genes while exomes are all of the exons present in a genome. Put another way, the exome is the protein-coding portion of the genome.

                      MiSeq doesn't have any problem sequencing exons or exomes. The issue is that a single run doesn't have quite enough coverage for a full human exome. This is because (as stated above), exons are relatively small, and the increased output from MiSeqs has primarily come in the form of longer reads. To get good coverage on a MiSeq, you might have to run two chips instead of one. That's why the HiSeq and now the NextSeq are probably better choices from Illumina. If you prefer Ion Torrent, the Proton P1 would be the way to go.

                      If you're interested, we have summaries of the various sequencing platforms and list out which applications each is best suited for on our NGS Knowledge Bank.
                      AllSeq - The Sequencing Marketplace
                      [email protected]
                      www.AllSeq.com

                      Comment


                      • #12
                        AllSeq,

                        Thanks for the correction, I was half asleep when I wrote that post

                        Regarding the coverage, cant you increase the depth of the sequencing and still be within the limit of the 15GB output?

                        According to post #2 even if you did 150x you will get 7.5GB output of data, that is 25 million reads if using 2x150 (not sure what is the maximum reads for Miseq flow cell)

                        if the above is plausible wont be 150x coverage enough to align your exomes and call your SNPs with confidence.

                        Sorry in advance if I am talking rubbish but I am sort of new to this
                        Last edited by a.obeidat; 07-15-2014, 01:22 PM.

                        Comment


                        • #13
                          It's because the 2X150 reads don't double your coverage in this case. If the insert size were 300b or more, it would be fine. However, human exons are only about 150b long, so the 2X150 read would just read the exact same molecule twice (once from either end). You could use that info to bump up the read quality a bit (by checking each read against the other), but you can't use it to increase the read depth. Most exomes are sequenced to ~100X coverage to look for rare variants (i.e., variants in a small subpopulation of the cells used to prepare the library). Reading the same exact molecule twice doesn't help you look for rare events. I hope that explanation helps a bit.
                          AllSeq - The Sequencing Marketplace
                          [email protected]
                          www.AllSeq.com

                          Comment


                          • #14
                            But longer reads give a better distribution of coverage so you might get to the same % of bases covered 30x with less average coverage. Of course this does not help you find mosaicisms (my definition of a rare variant would be a variant found once in a 1000 or whatever people) as AllSeq already explained.

                            Comment


                            • #15
                              Thanks for this great thread. I am new to this NGS so execuse me for my ignorance.

                              I have some questions. As indicated above one of the main problems for exome sequencing is the relatively small exon size (on avarege 125bp). But if we used Miseq v3 kit 150 cycle, that is 2x75 paired end (if I understood that correctly) then we will not lose alot of data because we are not sequncing more than the insert size. Problem one checked, right?

                              Since we are using v3 kits then I dont think so the 15GB will be a problem.

                              Second issue from above discussions is due to not getting enough distrubution or coverage for all exons. And the solution for that on Miseq to run that sample multiple times to get enough coverage for all exon to call variants with confidence. Now my question is what do the NextSeq and HiSeq instrument have extra to give me a better distribution across the exome (assuming no output and reads limitations on the MiSeq system)? I also read somewhere that on HiSeq you can run the sample twice on the same flow cell? is this the reason its better, or something else? Not really sure about the Nextseq flow cell configuration, your input here will be helpful.

                              A possible counter for the above issue (if I understood it correctly) in a technote by illumina "Optimizing Coverage for Targeted Resequencing" they were explaining about coverage and enrichment and gave an example (page4, you might have to see it to understand my logic below):

                              lets assume I want 100x mean coverage, that is 20x desired coverage/ 0.2 mean normalized coverage; 20/0.2

                              Now for the total amount of sequnencing required, that is 62 MB total targeted bases X 100x the mean coverage/ 0.65 the enrichent efficiency which equal around 9.5 GB (less than 15GB miseq maximum output); (62)x(100)/0.65

                              Does that make sense or I am just talking rubbish ??

                              If the above is correct, does not this save you from running the exome more than once?

                              Thanks in advance

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X