Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembly and sequencing errrors

    Dear all,

    I got a small question regarding the assembly of transcriptomes, or genomes.

    I understand that the assembly definitely favours the absence of errors in the reads.
    But let's assume some Illumina sequenced data. What would be the outcome of the assembly if ...

    ... I had more perfect reads, but overall more substitutions remaining in the other reads?
    ... I had fewer perfect reads as in the above scenario, but also less substitutions reamaining in the other reads?

    How do assembler react in both scenarios?

    Assuming a k-mer graph assembly, from what I understand the first scenario favours the general graph structure, possibly speeding up the assembly and creating less contigs (or in general longer ones?)?
    The second scenario could be better correctable by the assembler, leading to the same results?

    This question really puzzles me, and I'd be happy about your comments/experience. I couldn't find any paper that answers this question directly, but maybe you know one where my answer is hidden?

    Thanks

  • #2
    This greatly depends on the assembler, and the specific depth and error rate, and what you are assembling (single-cell, metagenome, isolate, transcriptome, etc), repeat content, and more.

    Metagenomes, transcriptomes, single-cells, and highly-repetitive isolates tend to be the most difficult (possibly in that order). The more highly variable your coverage is - whether due to community composition, amplification, gene expression, or repeats - the harder it is to tell the difference between low-coverage genomic sequence and error kmers. Some assemblers are better at this than others.

    Informatically, the signal-to-noise ratio is more important than raw coverage. However, coverage is discrete so if you have 2X coverage with some errors, you will probably get a better assembly than with 1X coverage and no errors, since that has no overlaps and cannot possibly assemble, even though it has a better SNR.

    In other words, there are no strict rules about whether it is good to increase coverage at the expense of accepting reads with higher error rates; you can find scenarios with directly contradictory best practices. Only once you decide on a specific sequencing platform, experiment type, organism, assembler, and sequencing volume, is it possible to objectively answer the question.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    29 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    32 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    52 views
    0 likes
    Last Post seqadmin  
    Working...
    X