Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Joe Petrosino
    replied
    Hi all,

    I have Chip-Seq data by Solexa in three formats:
    1) Sequence.bam
    2) Sequence.txt
    3) Export.txt

    What are the differences between these formats?

    Leave a comment:


  • vaibhavvsk
    replied
    Nice Info.

    Leave a comment:


  • ngsseq
    replied
    Thanks, ECO, its useful for me, a beginner

    Leave a comment:


  • 059
    replied
    Thanks very much!

    Leave a comment:


  • Qiuting
    replied
    Originally posted by Aaron Cooper View Post
    Nobody ever answered the original question, so here goes:

    The fluorescent bases are 3'-O-azido dNTPs with fluorophores linked to the bases. The azide group on the 3' O blocks addition of another nucleotide, and you can use a phosphine (TCEP) to chemically cleave the azide from the 3' O and allow the next nucleotide to be added. I believe TCEP also cleaves the fluorophore from the base.

    It's very clever, but unfortunately 3'-O-azido dNTPs are not available commercially. They would be fun to play with.
    Hi! A little curious on the function of TCEP. Is it possible to cleave 6-FAM from nucleotide by using TCEP and at the same time allow the next nucleotide to be added?

    Leave a comment:


  • Monk
    replied
    Hi,
    This is my first post here.Thank you very much, this is very useful.





    Originally posted by ECO View Post


    Illumina's $600 million acquisition of Solexa in November 2006 gave the company a head start in the next generation sequencing market.

    Here I present a brief overview of Solexa's sequencing-by-synthesis chemistry. The sample prep methods used differ slightly from that used in ABI's SOLiD system, but the basic goals are the same: generate large numbers of unique "polonies" (polymerase generated colonies) that can be simultaneously sequenced. These parallel reactions occur on the surface of a "flow cell" (basically a water-tight microscope slide) which provides a large surface area for many thousands of parallel chemical reactions.

    Step 1: Sample Preparation


    The DNA sample of interest is sheared to appropriate size (average ~800bp) using a compressed air device known as a nebulizer. The ends of the DNA are polished, and two unique adapters are ligated to the fragments. Ligated fragments of the size range of 150-200bp are isolated via gel extraction and amplified using limited cycles of PCR.

    Complete detailed protocols for DNA and small RNA library preparation can be found in the documents provided in the attachments to this post. ("dna_libe_prep.pdf" and "rna_libe_small_prep.pdf", respectively). This process is a fairly straightforward multi-step molecular biology process, however there are many pitfalls that can result in skewed results downstream.

    Steps 2-6: Cluster Generation by Bridge Amplification

    In contrast to the 454 and ABI methods which use a bead-based emulsion PCR to generate "polonies", Illumina utilizes a unique "bridged" amplification reaction that occurs on the surface of the flow cell.

    The flow cell surface is coated with single stranded oligonucleotides that correspond to the sequences of the adapters ligated during the sample preparation stage. Single-stranded, adapter-ligated fragments are bound to the surface of the flow cell exposed to reagents for polyermase-based extension. Priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface.

    Repeated denaturation and extension results in localized amplification of single molecules in millions of unique locations across the flow cell surface. This process occurs in what is referred to as Illumina's "cluster station", an automated flow cell processor.



    Steps 7-12: Sequencing by Synthesis

    A flow cell containing millions of unique clusters is now loaded into the 1G sequencer for automated cycles of extension and imaging.

    The first cycle of sequencing consists first of the incorporation of a single fluorescent nucleotide, followed by high resolution imaging of the entire flow cell. These images represent the data collected for the first base. Any signal above background identifies the physical location of a cluster (or polony), and the fluorescent emission identifies which of the four bases was incorporated at that position.

    This cycle is repeated, one base at a time, generating a series of images each representing a single base extension at a specific cluster. Base calls are derived with an algorithm that identifies the emission color over time. At this time reports of useful Illumina reads range from 26-50 bases.




    The use of physical location to identify unique reads is a critical concept for all next generation sequencing systems. The density of the reads and the ability to image them without interfering noise is vital to the throughput of a given instrument. Each platform has its own unique issues that determine this number, 454 is limited by the number of wells in their PicoTiterPlate, Illumina is limited by fragment length that can effectively "bridge", and all providers are limited by flow cell real estate.

    Hopefully that serves as a brief introduction to the technology! If I have made any errors or omissions, please feel free to correct me by posting here!

    Leave a comment:


  • jinxinhao1988
    replied
    It is very useful for us,thank you.This is my first post here.Hope that we could share our sequencing experiexce here.

    Leave a comment:


  • dongshenglulv
    replied
    Originally posted by Jonathan View Post
    When beginning the sequencing, first a cleaving step is carried out,
    removing the sequences bound with adapter2 to the cell-surface.

    For paired-end sequencing, after having reached the desired cycle-count the synthesized strand is removed, another step of bridge amplification is carried out, followed by a cleaving of sequences bound with adapter1 to the cell surface. Thus leaving the `other'/reverse strand bound to the flow cell for sequencing...

    Anything else?
    Did you mean that the difference between PE and SE is that PE sequencing can do the 'desired cycle-count' twice? Meanwhile, did you mean that the cell flow is not disposable, we can reuse it for another sample for PE sequencing? I'm a new for sequencing, thanks.

    P.S. The first base ( and the next n bases) for imaging is from the adapter, is it necessary to remove such fragments in the fastq file generated by GA?

    Leave a comment:


  • krobison
    replied
    Originally posted by vtosha View Post
    What about these articles? Whole genome, transcriptome, exome?
    What genome size? How many groups work on poplar sequencing?
    Best results for de novo assembly from 454, of course. But for resequencing why not use pair-end read?
    Search "novo AND transcriptome AND (illumina OR solexa)" -- that's currently 18 papers to get you started. Probably many more which just don't quite fit the search terms.

    Leave a comment:


  • krobison
    replied
    Originally posted by Bioinfo View Post
    hi all,
    Does anyone knows about illumina data downloadble from any published papers?
    many thanks
    Look in the Sequence Read Archive (SRA) at NCBI (while it still exists) or it's European Nucleotide Archive )(ENA) -- there are huge amounts of data there. There is an R interface to let you do SQL queries on the SRA which beats the NCBI interface for queries; I don't know of a similar one for ENA (definitely need one once SRA shuts down!).

    (if we ever start a FAQ, these would be obvious items to put there)

    Leave a comment:


  • krobison
    replied
    This is great!

    A couple of questions.

    Are the HiSeq numbers per flowcell?

    Looking at the two HiSeq columns (1K & 2K), the data per run is 750M reads vs. 1000M (but if per flowcell, why different?)

    Looking at reagent cost, they are both at $12K/run

    BTw, shouldn't PacBio be more like 0.02M reads & .040 Gb for yield, not 2.94M reads & 2.94Gb yield (as I've stated publically, it's hard to really nail those numbers down for PacBio, but these are more likely in the right ballpark). Run time should probably be more like 0.08 as with the PGM.

    For PGM, should you have a column per chip? With the 314, the reads are somewhere in the 100K-200K per run. Also, perhaps it should be separated from the SOLiDs -- the projection you give for Q2 is obviously for the SOLiD family & a separate projection for the PGM (5X the number of reads & resultant increase in yield) could be appropriate.

    Leave a comment:


  • dongshenglulv
    replied
    That's what I'm looking for. Thank you so much

    Leave a comment:


  • avilella
    replied
    spreadsheet with updated specs

    Hi all,

    Illumina has officially announced the updated specs for their Hiseq2000 and Hiseq1000 machines, with throughput up to 600GB. I've updated a google spreadsheet I keep with all the specs for all the companies that have commercial systems available here:



    Please feel free to add more info to the spreadsheet if you have any more details.

    Cheers,

    Albert.

    Leave a comment:


  • dadaliliuk
    replied
    I am new to the whole NGS topic and I am going tot use Illumina sequencing. Does any one know how much library preparation is important? and does it worth to invest buying one of the preparation workstations?

    Leave a comment:


  • grandma
    replied
    Can anyone explain how the RTA1.8 software identifies and locates clusters - is a particular nucleotide, e.g. an A and a C, required to be present in the first 4 or 5 base pairs of sequence? You can tell I'm a real newbie!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 08:47 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
54 views
0 likes
Last Post seqadmin  
Working...
X