Announcement

Collapse
No announcement yet.

NovaSeq from Illumina

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    I don't know if you can truly compare efficiencies of the ExAmp chemistry with the other instruments.

    On the HiSeq and NextSeq instruments you are randomly clustering across the flowcell with a good correlation between how much DNA you load and how many clusters are produced.

    On the ExAmp instruments there are only a fixed number of wells in which clusters can be formed. Additionally, you have to deal with the duplicates coming out of those wells and those duplicates that are formed in solution prior to the library going onto the flowcell.

    I think what Illumina is trying to do in ExAmp is saturate the array as practically as possible.

    No argument, though, about the loss of flexibility with the NovaSeq. In its' current iteration it's not something useful for an all-comers core lab.

    Comment


    • #77
      Originally posted by misterc View Post
      Is 150ul of a 1nM library what Illumina recommends for a single S2 flow cell?!?
      Apparently for all of them. And that is the lower end (attached see pg. 16).
      Attached Files

      Comment


      • #78
        Originally posted by pmiguel View Post
        Okay I take your point, but an S2 should produce 3 billion clusters per flowcell, whereas a HiSeq 2500 produces about 1.6 billion with v3 chemistry. So the NovaSeq is about 4x less efficient than the HiSeq 2500 in this regard.

        A NextSeq produces about 0.4 billion clusters per flowcell. So, the relative efficiencies would be:

        (I'm using PF clusters per flowcell / ~number of input amplicon molecules)
        HiSeq2500v3 = 1.6/7 = 23%
        NextSeq = 0.4/1.4 = 29%
        NovaSeqS2 = 3/90 = 3.3%

        So, it absolutely looks like a much lower efficiency of clustering on the NovaSeq. (Anyone know if this is also the case for the HiSeq3000/4000?)
        Re: 3000/4000
        From what I could glean, based on the published specs (which are really vague, perhaps on purpose), the amount of library loaded ranges between 3-9 billion.

        The yield is 0.75 billion to ??? billion (I think those that use these should chime in, it is not clear that the total yields stated are per flow cell or for both flow cells).

        Mind you the % efficiencies (as you've defined) are way better than the MiSeq (0.3-0.4%) and the MiniSeq (1-5%)

        That said, how much difference will this make for most runs? If you use the standard HiSeq2500 method, you start with 10ul of a 2nM library pool for denaturation. Since it gets diluted down to 20 pM (at least) you end up with 1 ml for each denaturation you do. One denaturation could be used to cluster all 8 lanes of the flowcell. But how often does that happen?

        For us, I can't think of a single case where we have clustered more than 2-3 of lanes per denatured sample pool. Usually it is 8 sample pools for 8 lanes.

        There are cases where the amount of library produced is limiting. And the NovaSeq would not be a good choice where this is your critical parameter.

        So in most cases I would say it is being forced from 8 lanes to 1 lane along with losing the flexibility to run a much smaller flowcell (with rapid chemistry 2 lane flow cells) that are the major limitation of the NovaSeq.

        Illumina expects you to just buy a NextSeq to deal with the 2nd issue above. That would okay (for some definitions of "okay") if they hadn't just decided all the NextSeqs should now have the ability to scan their microarrays. But the option is there.

        Then there are the data issues considered in this thread. But I'm pretty sure that is something Illumina can fix (as they had for a period of time with the NextSeq, just after they introduced the v2 version of its chemistry/software) if they focus their attention on it.
        I'm not sure that they can improve the % efficiency...it seems like ~30% is about the best you can recover in reads. This would explain why you need more library to get more reads in the NovaSeq.

        Mind you 30% is not bad...it is an interesting threshold when you think about occupancy in space.

        Cheers, A.

        Comment


        • #79
          Forgive this really basic question, but what is the cause of the duplicates on patterned flow cells as opposed to the older HiSeq2500 approach? Is this due to the density of the clusters and the likelihood of a library molecule detaching and then re-attaching a short distance away? Also, how is this different than a PCR duplicate? Is there anyway to tell other than spatial relatedness? (prediction based on XY locale)?

          Comment


          • #80
            @cement_head: See if this blog post helps.

            Comment


            • #81
              Originally posted by GenoMax View Post
              @cement_head: See if this blog post helps.
              Okay. Thanks - that was really helpful. We're tilting towards ALWAYS doing PE RNA-Seq and using UMIs. Doesn't solve every problem, but I think it reduces a lot of issues.

              Comment


              • #82
                Originally posted by GenoMax View Post
                @cement_head: See if this blog post helps.
                As usual, GenoMax has the perfectly appropriate link...

                In my latest test, NovaSeq only had a 4-5% duplication rate. That's using our own NovaSeq data rather than external data. Overall not a huge problem though it's certainly worth removing. I'm not sure why the number is lower than my previous tests on external data, indicating >12%; possibly the chemistry got better. (Edit - I should note that this run used lots of libraries from different organisms multiplexed together, which reduces the apparent duplication rate, but makes it more accurate. That should not be relevant to such a huge discrepency, though.)

                This run was extremely high quality (average 99.6% identity to the reference, or ~Q24) so duplicates were easy to detect. I'm really quite impressed with NovaSeq quality. It's unfortunate that there are only 4 quality scores, but CalcTrueQuality seems to do good job of recalibrating them to the full range of 0-41, yielding a 0.04 average deviation from the correct quality, down from 1.1 on the raw data. 1.1 is still really good (better than the HiSeq 2500 I compared it to), but having only 4 quality scores makes many operations like trimming and merging less accurate. It's actually very impressive that NovaSeq managed, with 4 quality scores, to get better quality score accuracy than HiSeq 2500. I've drawn a couple of conclusions from this: 1) The HiSeq quality score algorithm is terrible. And 2) NovaSeq is calibrated for successful runs only and cannot produce correct quality scores if there are any anomalies (e.g., if there is a lighting failure producing no signal, it will still output really high quality scores even though all the data is wrong). With our previous unsuccessful run (there was a lighting failure), the average deviation from the correct quality was ~20 (2 orders of magnitude).
                Last edited by Brian Bushnell; 07-14-2017, 05:25 AM.

                Comment


                • #83
                  Slightly off-topic, but related: INDEX swapping on patterned flow cells...

                  https://sequencing.qcfail.com/articl...uddle-samples/

                  Comment


                  • #84
                    I calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.

                    Comment


                    • #85
                      In my latest test, NovaSeq only had a 4-5% duplication rate.
                      The important point is JGI probably made VERY GOOD quality libraries. With patterned FC's having clean libraries (with just the right sized inserts, zero primers and dimers) are critical to minimizing these issues. Since we are talking about "B"illions of reads losing some during dedupe should not cause a major loss. 2D barcoding seems essential (perhaps should be made mandatory).

                      Comment


                      • #86
                        Originally posted by Brian Bushnell View Post
                        I calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.
                        What went into that 8000 PPM (0.8%) calculation Brian? I mean, did you just count the number of swaps in a dual unique indexed run?

                        Anyone checked that figure for a HiSeq 2500 run? I know no one is complaining about index hopping on that instrument or a MiSeq, but it would happen at some rate.

                        --
                        Phillip

                        Comment


                        • #87
                          Originally posted by GenoMax View Post
                          The important point is JGI probably made VERY GOOD quality libraries. With patterned FC's having clean libraries (with just the right sized inserts, zero primers and dimers) are critical to minimizing these issues. Since we are talking about "B"illions of reads losing some during dedupe should not cause a major loss. 2D barcoding seems essential (perhaps should be made mandatory).
                          From what I'm hearing, the NovaSeq doesn't have the major issues with amplicon lengths that the HiSeq4000 and X do. The NovaSeq is spec'ed to run 550bp no PCR DNA libraries, unlike the HiSeq patterned flowcell instruments.

                          --
                          Phillip

                          Comment


                          • #88
                            Originally posted by Brian Bushnell View Post
                            I calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.
                            What is PPM?

                            Comment


                            • #89
                              Originally posted by cement_head View Post
                              What is PPM?
                              Parts Per Million.

                              --
                              Phillip

                              Comment


                              • #90
                                Originally posted by pmiguel View Post
                                What went into that 8000 PPM (0.8%) calculation Brian? I mean, did you just count the number of swaps in a dual unique indexed run?

                                Anyone checked that figure for a HiSeq 2500 run? I know no one is complaining about index hopping on that instrument or a MiSeq, but it would happen at some rate.

                                --
                                Phillip
                                The 8000 PPM was single-indexed. This was not an ideal test, but there were a few E.coli isolate libraries multiplexed with various other things (a lot of Chlamy, and various bacterial single-cells). Also, some were dual indexed and some were single-indexed, in the same run, and for whatever reason demultiplexing was done with only 6bp of the barcode for the single-indexed libraries rather than all 8 (allowing zero mismatches). So I'm not really sure what the rates would be in an ideal test environment. That said, for the reads that came out as this particular E.coli library, I concatenated all references for everything being sequenced together and ran:

                                Code:
                                seal.sh in=reads.fq stats=stats.txt ambig=toss clearzone=10
                                Everything hitting E.coli was considered correct, and everything hitting anything else was considered contamination. For the dual-indexed test I used a P.heparinus single-cell library with similar methodology.

                                I also tested a HiSeq run of the same E.coli library and calculated a 7 PPM contamination rate, but that's not really credible since I don't know what else was present on the plate in that run so I don't necessarily have the correct references (though there was definitely some Chlamy present). In the past I've seen various rates of cross contamination in HiSeq 2500 (<1PPM to >1000PPM) and it's actually quite hard to consistently reproduce the same numbers on different runs. The cross contamination comes from various sources, including physical contamination, though I think we've eliminated physical in our cross contamination current processes. NextSeq has generally yielded lower rates of cross contamination compared to HiSeq 2500 so we use that for our multiplexed single cells even though the quality is lower than HiSeq.
                                Last edited by Brian Bushnell; 07-14-2017, 09:34 AM.

                                Comment

                                Working...
                                X