A first look at Illumina’s new NextSeq 500

Elsie replied

03-03-2015, 05:39 PM
Thank you Brian, that is really helpful, and incredibly prompt. Thank you so much.
Leave a comment:
Brian Bushnell replied

03-03-2015, 05:38 PM
The only way around is to try to assemble it first. If it's a bacteria, and you have sufficient coverage, you can get a decent assembly in a few minutes with Velvet. BBMap will not work without an assembly, but it doesn't have to be a good assembly - a quick one with short contigs is fine for this purpose, as long as those contigs are several times larger than read length.
Leave a comment:
Elsie replied

03-03-2015, 05:36 PM
Thanks Brian, unfortunately I do not have a reference for this sequence!, so I'm assuming no way around this?
Leave a comment:
Brian Bushnell replied

03-03-2015, 05:35 PM
Hi Elsie,

You have to index the reference first. For example:

bbmap.sh ref=genome.fasta

Wait for that to finish, then map.

-Brian
Leave a comment:
Elsie replied

03-03-2015, 05:28 PM
Hi Brian,

Thanks so much for this. I am trying to repeat your above commands, using interleaved files, and I get this error, can you help?
Thanks.

bbmap.sh maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt
java -Djava.library.path=/bbmap/jni/ -ea -Xmx43110m -cp /bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, maxindel=200, in=trimmed.fq.gz, mhist=mhist.txt, bhist=bhist.txt, qhist=qhist.txt, qahist=qahist.txt]

BBMap version 34.56
Set match histogram output to mhist.txt
Set base content histogram output to bhist.txt
Set quality histogram output to qhist.txt
Set quality accuracy histogram output to qahist.txt
Retaining first best site only for ambiguous mappings.
No output file.
Exception in thread "main" java.lang.RuntimeException: Can't find file ref/genome/1/summary.txt
at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:815)
at fileIO.ReadWrite.getInputStream(ReadWrite.java:780)
at fileIO.TextFile.open(TextFile.java:277)
at fileIO.TextFile.<init>(TextFile.java:94)
at dna.Data.setGenome2(Data.java:839)
at dna.Data.setGenome(Data.java:785)
at align2.BBMap.loadIndex(BBMap.java:302)
at align2.BBMap.main(BBMap.java:32)
Leave a comment:
kentawan replied

01-27-2015, 07:13 PM
Originally posted by ymc View Post

http://www.illumina.com/systems/next...ncer/kits.html

I find a NextSeq v2 kit here. Is it something new?

I just gave my local distributor a call. He said that this kit will be ready for shipment on February 2015. Pricing will be the same as the v1 kits!

Finally some hope for NextSeq 500 users!
Leave a comment:
ymc replied

01-21-2015, 11:39 PM
Products for the NextSeq 500 and NextSeq 550 Systems | Compatible kits and reagents

http://www.illumina.com/systems/nextseq-sequencer/kits.html

Kits and reagents compatible with the NextSeq 500 and NextSeq 550 benchtop sequencers.

I find a NextSeq v2 kit here. Is it something new?
Leave a comment:
Brian Bushnell replied

01-13-2015, 10:31 AM
Originally posted by pmiguel View Post

One possible trivial reason could be whether mismatches between an index read and the index sequence are allowed. HiSeq and MiSeq allow 1 mismatch by default. But we demultiplex off-instrument and allow zero mismatches.

--
Phillip

We are also allowing 0 mismatches in both cases (and typically end up with >20% of reads in the unknown bin, as a result). Right now our 2 leading candidate hypotheses are:

1) NextSeq has much lower cluster density;
2) NextSeq has a different order of {read1, read2, index1, index2, resynthesis} compared to HiSeq/MiSeq.
Leave a comment:
pmiguel replied

01-13-2015, 10:24 AM
Originally posted by Brian Bushnell View Post

I would certainly not want to use it for low-coverage variant calling!

Incidentally, though, it seems the NextSeq platform may have a silver lining. Though all of the standard data quality metrics are much worse than HiSeq in my testing, it appears to have a drastically lower cross-contamination rate (reads from one library assigned to a different library) for dual-index pooled libraries, to the point that we are considering using NextSeq over HiSeq for projects in which index cross-contamination is more important than error rate. We are still investigating why the rate is lower.

One possible trivial reason could be whether mismatches between an index read and the index sequence are allowed. HiSeq and MiSeq allow 1 mismatch by default. But we demultiplex off-instrument and allow zero mismatches.

--
Phillip
Leave a comment:
Brian Bushnell replied

01-13-2015, 10:13 AM
Originally posted by aeonsim View Post

The data quality from the NextSeq is substantially worse than that from a Hiseq with substantially more errors to the point where I'm not certain the data is usable for low to medium coverage whole genome sequence variant calling (1-20x).

I would certainly not want to use it for low-coverage variant calling!

Incidentally, though, it seems the NextSeq platform may have a silver lining. Though all of the standard data quality metrics are much worse than HiSeq in my testing, it appears to have a drastically lower cross-contamination rate (reads from one library assigned to a different library) for dual-index pooled libraries, to the point that we are considering using NextSeq over HiSeq for projects in which index cross-contamination is more important than error rate. We are still investigating why the rate is lower.
Leave a comment:
aeonsim replied

01-13-2015, 04:07 AM
So I've been getting some initial test data back from a Nextseq 500 and I'm really not happy with it compared to the Hiseq.

The data quality from the NextSeq is substantially worse than that from a Hiseq with substantially more errors to the point where I'm not certain the data is usable for low to medium coverage whole genome sequence variant calling (1-20x).

Attached are a number of different PDF's showing the data compared to HiSeq data from the same facility (same experienced technician's doing all the sequencing, library prep and everything). We resequenced our HiSeq libraries (PCR-Free 550bp insert) to compare like to like and you can clearly see the difference.

Two of the files show GATK's BQSR before after and plots for one of our typical Hiseq libraries (recalQC-randomHiseq.pdf) and a Hiseq library sequenced on the NextSeq (BQSR-NextSeq-Before-After.pdf). The difference is substantial and while these are not the same library the Hiseq is representative of what we usually get.

The second two files show the same library with 4 lanes of NextSeq sequence vs the Same library when sequenced on the Hiseq, you'll clearly be able to determine which comes from which machine (Nxt, Nxt, Nxt, Nxt, Hiseq).

Finally here are some alignment stats from Picard tools for the same library sequenced twice on the NextSeq (two different runs) vs the Stats for the same library from the HiSeq showing a 1-2% reduction in reads aligned and ~80% increase in mismatch rate.

Seq PCT_PF_READS_ALIGNED PF_MISMATCH_RATE PF_HQ_ERROR_RATE
NextSeq_R2 0.964464 0.022694 0.021512
NextSeq_R1 0.955108 0.025834 0.024588
HiSeq 0.973545 0.013678 0.013063

Now the data isn't entirely unusable for WGS if you have enough coverage you can still get variant calls out of it. However they're likely to have a higher FP and if you were looking for rare variants I would be very hesitant to use the data (especially for de novo mutations). For other uses this may be fine, but I've only experience with WGS and RNA-seq so I'll leave that for others to decide.
Attached Files

BQSR-NextSeq-Before-After.pdf (343.9 KB, 145 views)

recalQC-randomHIseq.pdf (245.8 KB, 118 views)

nextSeq-Hiseq-Comp-web.pdf (392.8 KB, 152 views)

next-seq-hiseq-comp2.pdf (614.4 KB, 117 views)
Leave a comment:
Innovelty replied

12-16-2014, 10:43 AM
Originally posted by GenoMax View Post

You can (get the long reads )

Provided you have access to the right HiSeq 2500. One can now do 2 x 250 PE runs.

One can... provided one has access to the machine, and more than a tiny pilot grant to work with. Sadly, when one works on non-model insects for non-agricultural/biomedical purposes, and one is only a wee third-year, one might only have ~$2500 to spend on the run itself. (Not that one is complaining. One is really super pleased about that.)

Enough of my de-railing, though -- really looking forward to updates from TonyBrooks, because my application is a de novo transcriptome project, primarily interested in the gene expression. Thanks one and all.
Leave a comment:
Brian Bushnell replied

12-16-2014, 10:31 AM
Originally posted by TonyBrooks View Post

The question is whether the error is systematic or random. Random error can be somewhat compensated for by a decent sequence depth.

Yep, I plan to plot the error rate across a genome and see if I can see some kind of pattern, but I have not had time to do that yet.

I'll attempt to post the QC from the Illumina PhiX we sequenced during training.

That would be great!
Leave a comment:
TonyBrooks replied

12-16-2014, 07:43 AM
Data quality is definitely inferior to both the MiSeq and HiSeq.
It's quick though, and perhaps more suited for counting applications, such as RNA-Seq and ChIPSeq than variant calling.
The question is whether the error is systematic or random. Random error can be somewhat compensated for by a decent sequence depth.

I'll attempt to post the QC from the Illumina PhiX we sequenced during training.
Leave a comment:
GenoMax replied

12-16-2014, 06:33 AM
You can (get the long reads )

Provided you have access to the right HiSeq 2500. One can now do 2 x 250 PE runs.
Leave a comment:

Previous 1 3 4 5 6 7 8 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News