Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any data on SOLiD error characteristics?

    I would like to find some information on the distribution of errors in SOLiD data. I'm planning to use it to simulate a pooling sequencing strategy like the "DNA Sudoku" approach, and assess how badly SOLiD's error rate hurts the capacity to uniquely resolve variants in this scheme.

    If I just assume errors are uniformly distributed along reads with a frequency of 0.03%, I am pretty sure the answer will be "Not much, go for it!" But I suspect that error model is too optimistic, and there are errors which correlate with sequence position and context. Ideally, I'd like to find a paper like "Substantial biases in ultra-short read data sets from high-throughput DNA sequencing", but for SOLiD rather than Illumina. Is there such a paper?

    Another possibility would be a large corpus of public SOLiD data from loci which have been sequenced by other methods, so I could compare and look for and characterize errors myself.

  • #2
    Originally posted by throwaway View Post
    I would like to find some information on the distribution of errors in SOLiD data. I'm planning to use it to simulate a pooling sequencing strategy like the "DNA Sudoku" approach, and assess how badly SOLiD's error rate hurts the capacity to uniquely resolve variants in this scheme.

    If I just assume errors are uniformly distributed along reads with a frequency of 0.03%, I am pretty sure the answer will be "Not much, go for it!" But I suspect that error model is too optimistic, and there are errors which correlate with sequence position and context. Ideally, I'd like to find a paper like "Substantial biases in ultra-short read data sets from high-throughput DNA sequencing", but for SOLiD rather than Illumina. Is there such a paper?

    Another possibility would be a large corpus of public SOLiD data from loci which have been sequenced by other methods, so I could compare and look for and characterize errors myself.
    You can take a look at the BAMs found here: http://genome.ucla.edu/U87
    They store both the decoded bases as well as the origin color sequences (in the CS tag).

    When you say that the error will be uniform, does that mean the error in the two-base encoded sequence will be uniform? The per-color sequencing error can range from 1-15% from the 5' to the 3' end of the read. After alignment, the base error rate is 0.5-1%.

    Note that sequencing error and base differences are not the same like in Illumina. Sequencing errors may be identified during alignment, and when the two-base encoded sequence is decoded into bases, those identified errors may be corrected (usually this is done simultaneously). Alternatively, sequencing error may occur such that a false SNP is decoded, which would lead to a base difference. Such is the beauty and power of color space.

    Comment


    • #3
      Here you have a 50bp Fragment run and the mismatch rate per cycle. Before and after color space corrections:

      -drd

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X