Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GenoMax
    replied
    In my latest test, NovaSeq only had a 4-5% duplication rate.
    The important point is JGI probably made VERY GOOD quality libraries. With patterned FC's having clean libraries (with just the right sized inserts, zero primers and dimers) are critical to minimizing these issues. Since we are talking about "B"illions of reads losing some during dedupe should not cause a major loss. 2D barcoding seems essential (perhaps should be made mandatory).

    Leave a comment:


  • Brian Bushnell
    replied
    I calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.

    Leave a comment:


  • cement_head
    replied
    Slightly off-topic, but related: INDEX swapping on patterned flow cells...

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by GenoMax View Post
    @cement_head: See if this blog post helps.
    As usual, GenoMax has the perfectly appropriate link...

    In my latest test, NovaSeq only had a 4-5% duplication rate. That's using our own NovaSeq data rather than external data. Overall not a huge problem though it's certainly worth removing. I'm not sure why the number is lower than my previous tests on external data, indicating >12%; possibly the chemistry got better. (Edit - I should note that this run used lots of libraries from different organisms multiplexed together, which reduces the apparent duplication rate, but makes it more accurate. That should not be relevant to such a huge discrepency, though.)

    This run was extremely high quality (average 99.6% identity to the reference, or ~Q24) so duplicates were easy to detect. I'm really quite impressed with NovaSeq quality. It's unfortunate that there are only 4 quality scores, but CalcTrueQuality seems to do good job of recalibrating them to the full range of 0-41, yielding a 0.04 average deviation from the correct quality, down from 1.1 on the raw data. 1.1 is still really good (better than the HiSeq 2500 I compared it to), but having only 4 quality scores makes many operations like trimming and merging less accurate. It's actually very impressive that NovaSeq managed, with 4 quality scores, to get better quality score accuracy than HiSeq 2500. I've drawn a couple of conclusions from this: 1) The HiSeq quality score algorithm is terrible. And 2) NovaSeq is calibrated for successful runs only and cannot produce correct quality scores if there are any anomalies (e.g., if there is a lighting failure producing no signal, it will still output really high quality scores even though all the data is wrong). With our previous unsuccessful run (there was a lighting failure), the average deviation from the correct quality was ~20 (2 orders of magnitude).
    Last edited by Brian Bushnell; 07-14-2017, 05:25 AM.

    Leave a comment:


  • cement_head
    replied
    Originally posted by GenoMax View Post
    @cement_head: See if this blog post helps.
    Okay. Thanks - that was really helpful. We're tilting towards ALWAYS doing PE RNA-Seq and using UMIs. Doesn't solve every problem, but I think it reduces a lot of issues.

    Leave a comment:


  • GenoMax
    replied
    @cement_head: See if this blog post helps.

    Leave a comment:


  • cement_head
    replied
    Forgive this really basic question, but what is the cause of the duplicates on patterned flow cells as opposed to the older HiSeq2500 approach? Is this due to the density of the clusters and the likelihood of a library molecule detaching and then re-attaching a short distance away? Also, how is this different than a PCR duplicate? Is there anyway to tell other than spatial relatedness? (prediction based on XY locale)?

    Leave a comment:


  • austinso
    replied
    Originally posted by pmiguel View Post
    Okay I take your point, but an S2 should produce 3 billion clusters per flowcell, whereas a HiSeq 2500 produces about 1.6 billion with v3 chemistry. So the NovaSeq is about 4x less efficient than the HiSeq 2500 in this regard.

    A NextSeq produces about 0.4 billion clusters per flowcell. So, the relative efficiencies would be:

    (I'm using PF clusters per flowcell / ~number of input amplicon molecules)
    HiSeq2500v3 = 1.6/7 = 23%
    NextSeq = 0.4/1.4 = 29%
    NovaSeqS2 = 3/90 = 3.3%

    So, it absolutely looks like a much lower efficiency of clustering on the NovaSeq. (Anyone know if this is also the case for the HiSeq3000/4000?)
    Re: 3000/4000
    From what I could glean, based on the published specs (which are really vague, perhaps on purpose), the amount of library loaded ranges between 3-9 billion.

    The yield is 0.75 billion to ??? billion (I think those that use these should chime in, it is not clear that the total yields stated are per flow cell or for both flow cells).

    Mind you the % efficiencies (as you've defined) are way better than the MiSeq (0.3-0.4%) and the MiniSeq (1-5%)

    That said, how much difference will this make for most runs? If you use the standard HiSeq2500 method, you start with 10ul of a 2nM library pool for denaturation. Since it gets diluted down to 20 pM (at least) you end up with 1 ml for each denaturation you do. One denaturation could be used to cluster all 8 lanes of the flowcell. But how often does that happen?

    For us, I can't think of a single case where we have clustered more than 2-3 of lanes per denatured sample pool. Usually it is 8 sample pools for 8 lanes.

    There are cases where the amount of library produced is limiting. And the NovaSeq would not be a good choice where this is your critical parameter.

    So in most cases I would say it is being forced from 8 lanes to 1 lane along with losing the flexibility to run a much smaller flowcell (with rapid chemistry 2 lane flow cells) that are the major limitation of the NovaSeq.

    Illumina expects you to just buy a NextSeq to deal with the 2nd issue above. That would okay (for some definitions of "okay") if they hadn't just decided all the NextSeqs should now have the ability to scan their microarrays. But the option is there.

    Then there are the data issues considered in this thread. But I'm pretty sure that is something Illumina can fix (as they had for a period of time with the NextSeq, just after they introduced the v2 version of its chemistry/software) if they focus their attention on it.
    I'm not sure that they can improve the % efficiency...it seems like ~30% is about the best you can recover in reads. This would explain why you need more library to get more reads in the NovaSeq.

    Mind you 30% is not bad...it is an interesting threshold when you think about occupancy in space.

    Cheers, A.

    Leave a comment:


  • austinso
    replied
    Originally posted by misterc View Post
    Is 150ul of a 1nM library what Illumina recommends for a single S2 flow cell?!?
    Apparently for all of them. And that is the lower end (attached see pg. 16).
    Attached Files

    Leave a comment:


  • GW_OK
    replied
    I don't know if you can truly compare efficiencies of the ExAmp chemistry with the other instruments.

    On the HiSeq and NextSeq instruments you are randomly clustering across the flowcell with a good correlation between how much DNA you load and how many clusters are produced.

    On the ExAmp instruments there are only a fixed number of wells in which clusters can be formed. Additionally, you have to deal with the duplicates coming out of those wells and those duplicates that are formed in solution prior to the library going onto the flowcell.

    I think what Illumina is trying to do in ExAmp is saturate the array as practically as possible.

    No argument, though, about the loss of flexibility with the NovaSeq. In its' current iteration it's not something useful for an all-comers core lab.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by austinso View Post
    On another note:

    150 uL of a 1 nM library (~90 billion molecules) minimum for loading is a lot of library when you consider you can get by with 1.4 billion for the NextSeq and 7 billion for the HiSeq.

    FWIW...
    Okay I take your point, but an S2 should produce 3 billion clusters per flowcell, whereas a HiSeq 2500 produces about 1.6 billion with v3 chemistry. So the NovaSeq is about 4x less efficient than the HiSeq 2500 in this regard.

    A NextSeq produces about 0.4 billion clusters per flowcell. So, the relative efficiencies would be:

    (I'm using PF clusters per flowcell / ~number of input amplicon molecules)
    HiSeq2500v3 = 1.6/7 = 23%
    NextSeq = 0.4/1.4 = 29%
    NovaSeqS2 = 3/90 = 3.3%

    So, it absolutely looks like a much lower efficiency of clustering on the NovaSeq. (Anyone know if this is also the case for the HiSeq3000/4000?)

    That said, how much difference will this make for most runs? If you use the standard HiSeq2500 method, you start with 10ul of a 2nM library pool for denaturation. Since it gets diluted down to 20 pM (at least) you end up with 1 ml for each denaturation you do. One denaturation could be used to cluster all 8 lanes of the flowcell. But how often does that happen?

    For us, I can't think of a single case where we have clustered more than 2-3 of lanes per denatured sample pool. Usually it is 8 sample pools for 8 lanes.

    There are cases where the amount of library produced is limiting. And the NovaSeq would not be a good choice where this is your critical parameter.

    So in most cases I would say it is being forced from 8 lanes to 1 lane along with losing the flexibility to run a much smaller flowcell (with rapid chemistry 2 lane flow cells) that are the major limitation of the NovaSeq.

    Illumina expects you to just buy a NextSeq to deal with the 2nd issue above. That would okay (for some definitions of "okay") if they hadn't just decided all the NextSeqs should now have the ability to scan their microarrays. But the option is there.

    Then there are the data issues considered in this thread. But I'm pretty sure that is something Illumina can fix (as they had for a period of time with the NextSeq, just after they introduced the v2 version of its chemistry/software) if they focus their attention on it.

    --
    Phillip

    Leave a comment:


  • misterc
    replied
    Is 150ul of a 1nM library what Illumina recommends for a single S2 flow cell?!?

    Leave a comment:


  • austinso
    replied
    On another note:

    150 uL of a 1 nM library (~90 billion molecules) minimum for loading is a lot of library when you consider you can get by with 1.4 billion for the NextSeq and 7 billion for the HiSeq.

    FWIW...

    Leave a comment:


  • GenoMax
    replied
    Originally posted by misterc View Post
    Does anyone have even a lane's worth of these new .cbcl files from a NovaSeq? I'd like to test our bioinformatics pipeline with the new bcl2fastq converter v.2.19 that supports NovaSeq.
    Illumina does not appear to have made the input files for NovaSeq data available on BaseSpace. Just the outputs.

    Leave a comment:


  • misterc
    replied
    Does anyone have even a lane's worth of these new .cbcl files from a NovaSeq? I'd like to test our bioinformatics pipeline with the new bcl2fastq converter v.2.19 that supports NovaSeq.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    Yesterday, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
57 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
47 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
55 views
0 likes
Last Post seqadmin  
Working...
X