Tech Summary: Illumina's Solexa Sequencing Technology

Joe Petrosino replied

12-21-2011, 09:18 AM
Hi all,

I have Chip-Seq data by Solexa in three formats:
1) Sequence.bam
2) Sequence.txt
3) Export.txt

What are the differences between these formats?
Leave a comment:
vaibhavvsk replied

11-17-2011, 03:54 AM
Nice Info.
Leave a comment:
ngsseq replied

10-31-2011, 11:05 PM
Thanks, ECO, its useful for me, a beginner
Leave a comment:
059 replied

10-08-2011, 11:54 PM
Thanks very much！
Leave a comment:
Qiuting replied

08-21-2011, 07:52 PM
Originally posted by Aaron Cooper View Post

Nobody ever answered the original question, so here goes:

The fluorescent bases are 3'-O-azido dNTPs with fluorophores linked to the bases. The azide group on the 3' O blocks addition of another nucleotide, and you can use a phosphine (TCEP) to chemically cleave the azide from the 3' O and allow the next nucleotide to be added. I believe TCEP also cleaves the fluorophore from the base.

It's very clever, but unfortunately 3'-O-azido dNTPs are not available commercially. They would be fun to play with.

Hi! A little curious on the function of TCEP. Is it possible to cleave 6-FAM from nucleotide by using TCEP and at the same time allow the next nucleotide to be added?
Leave a comment:
Monk replied

07-14-2011, 08:44 AM
Hi,
This is my first post here.Thank you very much, this is very useful.

Originally posted by ECO View Post

Illumina's $600 million acquisition of Solexa in November 2006 gave the company a head start in the next generation sequencing market.

Here I present a brief overview of Solexa's sequencing-by-synthesis chemistry. The sample prep methods used differ slightly from that used in ABI's SOLiD system, but the basic goals are the same: generate large numbers of unique "polonies" (polymerase generated colonies) that can be simultaneously sequenced. These parallel reactions occur on the surface of a "flow cell" (basically a water-tight microscope slide) which provides a large surface area for many thousands of parallel chemical reactions.

Step 1: Sample Preparation

The DNA sample of interest is sheared to appropriate size (average ~800bp) using a compressed air device known as a nebulizer. The ends of the DNA are polished, and two unique adapters are ligated to the fragments. Ligated fragments of the size range of 150-200bp are isolated via gel extraction and amplified using limited cycles of PCR.

Complete detailed protocols for DNA and small RNA library preparation can be found in the documents provided in the attachments to this post. ("dna_libe_prep.pdf" and "rna_libe_small_prep.pdf", respectively). This process is a fairly straightforward multi-step molecular biology process, however there are many pitfalls that can result in skewed results downstream.

Steps 2-6: Cluster Generation by Bridge Amplification

In contrast to the 454 and ABI methods which use a bead-based emulsion PCR to generate "polonies", Illumina utilizes a unique "bridged" amplification reaction that occurs on the surface of the flow cell.

The flow cell surface is coated with single stranded oligonucleotides that correspond to the sequences of the adapters ligated during the sample preparation stage. Single-stranded, adapter-ligated fragments are bound to the surface of the flow cell exposed to reagents for polyermase-based extension. Priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface.

Repeated denaturation and extension results in localized amplification of single molecules in millions of unique locations across the flow cell surface. This process occurs in what is referred to as Illumina's "cluster station", an automated flow cell processor.

Steps 7-12: Sequencing by Synthesis

A flow cell containing millions of unique clusters is now loaded into the 1G sequencer for automated cycles of extension and imaging.

The first cycle of sequencing consists first of the incorporation of a single fluorescent nucleotide, followed by high resolution imaging of the entire flow cell. These images represent the data collected for the first base. Any signal above background identifies the physical location of a cluster (or polony), and the fluorescent emission identifies which of the four bases was incorporated at that position.

This cycle is repeated, one base at a time, generating a series of images each representing a single base extension at a specific cluster. Base calls are derived with an algorithm that identifies the emission color over time. At this time reports of useful Illumina reads range from 26-50 bases.

The use of physical location to identify unique reads is a critical concept for all next generation sequencing systems. The density of the reads and the ability to image them without interfering noise is vital to the throughput of a given instrument. Each platform has its own unique issues that determine this number, 454 is limited by the number of wells in their PicoTiterPlate, Illumina is limited by fragment length that can effectively "bridge", and all providers are limited by flow cell real estate.

Hopefully that serves as a brief introduction to the technology! If I have made any errors or omissions, please feel free to correct me by posting here!
Leave a comment:
jinxinhao1988 replied

07-07-2011, 06:18 PM
It is very useful for us,thank you.This is my first post here.Hope that we could share our sequencing experiexce here.
Leave a comment:
dongshenglulv replied

05-14-2011, 03:04 AM
Originally posted by Jonathan View Post

When beginning the sequencing, first a cleaving step is carried out,
removing the sequences bound with adapter2 to the cell-surface.

For paired-end sequencing, after having reached the desired cycle-count the synthesized strand is removed, another step of bridge amplification is carried out, followed by a cleaving of sequences bound with adapter1 to the cell surface. Thus leaving the `other'/reverse strand bound to the flow cell for sequencing...

Anything else?

Did you mean that the difference between PE and SE is that PE sequencing can do the 'desired cycle-count' twice? Meanwhile, did you mean that the cell flow is not disposable, we can reuse it for another sample for PE sequencing? I'm a new for sequencing, thanks.

P.S. The first base ( and the next n bases) for imaging is from the adapter, is it necessary to remove such fragments in the fastq file generated by GA?
Leave a comment:
krobison replied

05-12-2011, 05:00 AM
Originally posted by vtosha View Post

What about these articles? Whole genome, transcriptome, exome?
What genome size? How many groups work on poplar sequencing?
Best results for de novo assembly from 454, of course. But for resequencing why not use pair-end read?

Search "novo AND transcriptome AND (illumina OR solexa)" -- that's currently 18 papers to get you started. Probably many more which just don't quite fit the search terms.
Leave a comment:
krobison replied

05-12-2011, 04:58 AM
Originally posted by Bioinfo View Post

hi all,
Does anyone knows about illumina data downloadble from any published papers?
many thanks

Look in the Sequence Read Archive (SRA) at NCBI (while it still exists) or it's European Nucleotide Archive )(ENA) -- there are huge amounts of data there. There is an R interface to let you do SQL queries on the SRA which beats the NCBI interface for queries; I don't know of a similar one for ENA (definitely need one once SRA shuts down!).

(if we ever start a FAQ, these would be obvious items to put there)
Leave a comment:
krobison replied

05-12-2011, 02:46 AM
This is great!

A couple of questions.

Are the HiSeq numbers per flowcell?

Looking at the two HiSeq columns (1K & 2K), the data per run is 750M reads vs. 1000M (but if per flowcell, why different?)

Looking at reagent cost, they are both at $12K/run

BTw, shouldn't PacBio be more like 0.02M reads & .040 Gb for yield, not 2.94M reads & 2.94Gb yield (as I've stated publically, it's hard to really nail those numbers down for PacBio, but these are more likely in the right ballpark). Run time should probably be more like 0.08 as with the PGM.

For PGM, should you have a column per chip? With the 314, the reads are somewhere in the 100K-200K per run. Also, perhaps it should be separated from the SOLiDs -- the projection you give for Q2 is obviously for the SOLiD family & a separate projection for the PGM (5X the number of reads & resultant increase in yield) could be appropriate.
Leave a comment:
dongshenglulv replied

05-11-2011, 09:51 PM
That's what I'm looking for. Thank you so much
Leave a comment:
avilella replied

05-06-2011, 12:55 AM
spreadsheet with updated specs

Hi all,

Illumina has officially announced the updated specs for their Hiseq2000 and Hiseq1000 machines, with throughput up to 600GB. I've updated a google spreadsheet I keep with all the specs for all the companies that have commercial systems available here:

Next-Generation-Sequencing.v1.10.35 @albertvilella

https://spreadsheets.google.com/ccc?key=0AvaxS3m5rl-9dHdtUGRtaGlsZWNFNWJleDRXaUhQTHc

Please feel free to add more info to the spreadsheet if you have any more details.

Cheers,

Albert.
Leave a comment:
dadaliliuk replied

05-04-2011, 07:11 PM
I am new to the whole NGS topic and I am going tot use Illumina sequencing. Does any one know how much library preparation is important? and does it worth to invest buying one of the preparation workstations?
Leave a comment:
grandma replied

04-19-2011, 11:32 AM
Can anyone explain how the RTA1.8 software identifies and locates clusters - is a particular nucleotide, e.g. an A and a C, required to be present in the first 4 or 5 base pairs of sequence? You can tell I'm a real newbie!
Leave a comment:

Previous 1 2 3 4 5 template Next

Recent Developments in Metagenomics

by seqadmin

Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable¹. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
- Channel: Articles
09-23-2024, 06:35 AM
Understanding Genetic Influence on Infectious Disease

by seqadmin

During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
- Channel: Articles
09-09-2024, 10:59 AM

Topics	Statistics	Last Post
Mechanical Forces in DNA Transcription Uncovered by Clemson Researchers by seqadmin Started by seqadmin, 10-02-2024, 04:51 AM	0 responses 13 views 0 likes	Last Post by seqadmin 10-02-2024, 04:51 AM
New Epigenetic Clock Links Cheek Cells to Mortality Risk by seqadmin Started by seqadmin, 10-01-2024, 07:10 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-01-2024, 07:10 AM
AI-Powered Blood Test Shows Promise for Early Ovarian Cancer Detection by seqadmin Started by seqadmin, 09-30-2024, 08:33 AM	0 responses 25 views 0 likes	Last Post by seqadmin 09-30-2024, 08:33 AM
Stem Cell Research Suggests Human Cells May Enter Developmental Pause by seqadmin Started by seqadmin, 09-26-2024, 12:57 PM	0 responses 18 views 0 likes	Last Post by seqadmin 09-26-2024, 12:57 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News