Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Damn, thanks Brian. I woke up this morning thinking that maybe I should try a NextSeq run instead of HiSeq 2000 for this chapter of my dissertation. It seemed like I might be able to get a slightly better assembly for the money, given the longer PE reads available. I don't so much think so, now.
-
The graphs I posted in this thread are from one NextSeq, but I have generated similar graphs from multiple libraries run on 3 independent NextSeq machines at 3 different facilities (one being Illumina), and they all look about the same.
Leave a comment:
-
@Brian: Are these results from one NextSeq or do you have an n of > 1?
Leave a comment:
-
Yow, that is not good news! I can't help but want to blame it on the two-color chemistry, even though I have no basis to do so.
Except -- I mean it still could be an indel issue -- if indels were more common with the NextSeq. What I would fear about this instrument would be bubbles in the flowcell. Seems like it would hard to distinguish no signal (bubble) from no signal (G?). Although I have been assured that the two do look different.
Also, bubbles in the flowcell may be a HiSeq-only thing, I don't know that NextSeqs would have any.
--
Phillip
Leave a comment:
-
The alignments were done by an indel-capable aligner. That's not the problem. In fact, the actual quality scores are calculated separately for bases impacted by mismatches only and for bases impacted by indels or SNPs. Furthermore, the exact same analysis was done for HiSeq, MiSeq, and NextSeq, and NextSeq is the only one with the major quality issues.
Here, let me show you. These graphs were all generated by mapping after adapter-trimming the input reads. This is from a HiSeq2500, which shows low error rates and accurate (generally conservative) quality scores:
And this is from a NextSeq, which shows extremely high error rates and vastly inflated quality scores:
You can plainly see that something is very wrong without any mapping whatsoever, just by looking at the base frequency histogram:
Possibly, the high error rate is driven by the A/T ratio divergence, and thus due to a fundamental base-calling or dye-system issue, but I don't know. At any rate, the base frequency divergence, the inflated Q-scores, and the high error rates have now been seen on 3 different independent NextSeq platforms at 3 different facilities (ours, Illumina's, and one of our collaborators') with unrelated organisms and libraries. I have yet to see a NextSeq run from anywhere that did not exhibit these characteristics, but now that I have 3 independent confirmations, I don't really expect that I will see one.
The way I produced these graphs (starting with interleaved reads, and using BBTools):
bbduk.sh in=reads.fastq.gz out=trimmed.fq.gz ktrim=r k=23 hdist=1 mink=11 tpe tbo minlen=90 ref=truseq.fa.gz,nextera.fa.gz
bbmap.sh maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt
I encourage anyone who is unable to share their raw data to do the same, and share the histograms. Ideally, for the same library sequenced on both a NextSeq and HiSeq/MiSeq, to eliminate any possible variables.
Leave a comment:
-
Sorry, no NextSeq data to discuss. But on the issue of quality values vs. empirical error rate -- always seemed to me this would highly depend on the alignment engine and the parameters used. Specifically how gaps (indels) were handled.
A single indel in a read results in nearly all the bases downstream of that indel being scored as "mismatch" unless a gap is introduced into the alignment.
Seems like how gaps are handled could easily explain what Illumina (and I) would call a Q37 base showing up as only Q30 in your analysis. Depending on how you did your alignments...
--
Phillip
Leave a comment:
-
Originally posted by GenoMax View Post@nucacidhunder: All those appear to be "standard" (gold?) samples.
Brian: If PhiX standard does not look good then that is worrisome.
Our machine self-reports 87% of bases as having quality above 30, and therefore Illumina claims it is in-spec, but the true quality as measured by mapping for the highest-rated bases (claimed Q37) is only Q28, and the majority are much lower. In other words, bases the machine assigns Q37 are wrong 0.16% of the time rather than the claimed 0.02%, so their quality values are inflated by a factor of 8. In reality, 0% of the output is at least Q30, either from our machine or from Illumina's official PhiX data, which I used because they calibrate their machines on PhiX so it should represent the best case scenario.
Does anyone have a different experience?Last edited by Brian Bushnell; 12-04-2014, 09:30 PM.
Leave a comment:
-
Originally posted by GenoMax View Post@nucacidhunder: All those appear to be "standard" (gold?) samples.
Leave a comment:
-
I already looked at this one: "NextSeq 500: TruSeq Nano 2x151 (PhiX)"
...and it's just as bad as ours. But thanks for the suggestion; I'll take a look at the others, as they may have used a different machine. Still, I'm kind of hoping for data from e.g. bryanbriney, as his machine seems to be producing data on-par with HiSeq machines.
Leave a comment:
-
Following data are publically available in BaseSpace:
NextSeq 500: TruSeq PCR Free WGS_RTA2.1.3.0 (NA12878)
NextSeq 500: TruSeq Nano 2x151 (PhiX)
NextSeq 500: RNA-Seq (8plex)
NextSeq 500: TruSight One (CEPH Trio replicates)
Leave a comment:
-
Do any of you have NextSeq data for something common (phiX, e.coli, mouse, etc) that you would be willing to share? Our NextSeq has consistently produced data of far lower quality than our HiSeq/MiSeq machines, and I'm trying to determine whether this is specific to the individual machine or not.
Leave a comment:
-
I can confirm that the most you can get out of a 75 cycle kit is currently 92 cycles (76|8|8). This means you can't use the dark cycles for sequencing like you can on the MiSeq. You can register the run in BaseSpace but you get an error message when you insert the cartridge and you can't bypass it.
Leave a comment:
-
Originally posted by TonyBrooks View PostDoes this mean you can't do more than 75 cycles in a 75 cycle kit. I confirmed with tech support that there are 25 additional cycles for dual indexing and we were told we could use those 100 cycles as we wished. We were hoping to do 47|6|47 from the 75 cycle kit
Leave a comment:
-
Originally posted by bryanbriney View PostAs of right now, the NextSeq doesn't support dual indexes or custom indexes. Dual indexing is reportedly in the pipeline, not sure about custom indexes. Also, you can't start a run that exceeds the stated capacity of the reagent kit. MiSeq throws an error if you try, but it can be bypassed; NextSeq won't let you continue.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 08:47 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
Leave a comment: