Header Leaderboard Ad

Collapse

Longer reads => more errors?

Collapse

Announcement

Collapse

SEQanswers June Challenge Has Begun!

The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!

For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Longer reads => more errors?

    Hi,

    I am working on transcript quantification, where multi-reads, i.e. reads which can be mapped to multiple locations within a transcript/genome, is an important issue as such multi-reads add ambiguity to transcript counts.

    Although longer reads can have a better chance to be mapped uniquely to a genomic location, I am concerned about read error for longer reads. Specifically, does the number of errors within a read increase linearly with read length, or not?

    For example, if a 80bp read will contain 1 error on average, then is it fair to assume a 160bp read will contain 2 errors on average, or actually more?

    As far as I know, read quality deteriorates from the 5' end to the 3' end, hence errors occur more often at the 3' end. Suppose the low quality 3'end begins in the middle of a 80bp read (i.e. the 41bp from the 5'end), can I assume the low-quality end for a 160bp read also will start in the middle (i.e. the 81bp from the 5'end), or will it still start at the 41bp of the read?

    Please suggest. Thanks in advance.

    Billy

  • #2
    You need to specify which platform you are working with.

    Because 454, Illumina & SOLiD all get their signal from an ensemble of molecules, dephasing (the lagging of some molecules behind others due to a failure to extend) is a problem & error rates increase with the length of the read. So a read twice as long is indeed expected to have more than twice as many errors, because errors are not evenly distributed across the length.

    The precise relationship depends on the platform. I've seen plots, though I can never find one when I really need one. Ideally, you could estimate this from your dataset.

    Comment


    • #3
      Thanks Keith. I have been working on a publicly available Illumina data set. I might have other data set to analyze later, but I don't know yet about the platform. It's good to know the simple linear relationship will not hold in general, and the relationship varies across platforms.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 06-07-2023, 07:14 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-06-2023, 01:08 PM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-01-2023, 08:56 PM
      0 responses
      164 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-01-2023, 07:33 AM
      0 responses
      299 views
      0 likes
      Last Post seqadmin  
      Working...
      X