Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Longer reads => more errors?

    Hi,

    I am working on transcript quantification, where multi-reads, i.e. reads which can be mapped to multiple locations within a transcript/genome, is an important issue as such multi-reads add ambiguity to transcript counts.

    Although longer reads can have a better chance to be mapped uniquely to a genomic location, I am concerned about read error for longer reads. Specifically, does the number of errors within a read increase linearly with read length, or not?

    For example, if a 80bp read will contain 1 error on average, then is it fair to assume a 160bp read will contain 2 errors on average, or actually more?

    As far as I know, read quality deteriorates from the 5' end to the 3' end, hence errors occur more often at the 3' end. Suppose the low quality 3'end begins in the middle of a 80bp read (i.e. the 41bp from the 5'end), can I assume the low-quality end for a 160bp read also will start in the middle (i.e. the 81bp from the 5'end), or will it still start at the 41bp of the read?

    Please suggest. Thanks in advance.

    Billy

  • #2
    You need to specify which platform you are working with.

    Because 454, Illumina & SOLiD all get their signal from an ensemble of molecules, dephasing (the lagging of some molecules behind others due to a failure to extend) is a problem & error rates increase with the length of the read. So a read twice as long is indeed expected to have more than twice as many errors, because errors are not evenly distributed across the length.

    The precise relationship depends on the platform. I've seen plots, though I can never find one when I really need one. Ideally, you could estimate this from your dataset.

    Comment


    • #3
      Thanks Keith. I have been working on a publicly available Illumina data set. I might have other data set to analyze later, but I don't know yet about the platform. It's good to know the simple linear relationship will not hold in general, and the relationship varies across platforms.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      21 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X