No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assembly and sequencing errrors

    Dear all,

    I got a small question regarding the assembly of transcriptomes, or genomes.

    I understand that the assembly definitely favours the absence of errors in the reads.
    But let's assume some Illumina sequenced data. What would be the outcome of the assembly if ...

    ... I had more perfect reads, but overall more substitutions remaining in the other reads?
    ... I had fewer perfect reads as in the above scenario, but also less substitutions reamaining in the other reads?

    How do assembler react in both scenarios?

    Assuming a k-mer graph assembly, from what I understand the first scenario favours the general graph structure, possibly speeding up the assembly and creating less contigs (or in general longer ones?)?
    The second scenario could be better correctable by the assembler, leading to the same results?

    This question really puzzles me, and I'd be happy about your comments/experience. I couldn't find any paper that answers this question directly, but maybe you know one where my answer is hidden?


  • #2
    This greatly depends on the assembler, and the specific depth and error rate, and what you are assembling (single-cell, metagenome, isolate, transcriptome, etc), repeat content, and more.

    Metagenomes, transcriptomes, single-cells, and highly-repetitive isolates tend to be the most difficult (possibly in that order). The more highly variable your coverage is - whether due to community composition, amplification, gene expression, or repeats - the harder it is to tell the difference between low-coverage genomic sequence and error kmers. Some assemblers are better at this than others.

    Informatically, the signal-to-noise ratio is more important than raw coverage. However, coverage is discrete so if you have 2X coverage with some errors, you will probably get a better assembly than with 1X coverage and no errors, since that has no overlaps and cannot possibly assemble, even though it has a better SNR.

    In other words, there are no strict rules about whether it is good to increase coverage at the expense of accepting reads with higher error rates; you can find scenarios with directly contradictory best practices. Only once you decide on a specific sequencing platform, experiment type, organism, assembler, and sequencing volume, is it possible to objectively answer the question.


    Latest Articles


    • seqadmin
      Advanced Methods for the Detection of Infectious Disease
      by seqadmin

      The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
      Yesterday, 01:15 PM
    • seqadmin
      Strategies for Investigating the Microbiome
      by seqadmin

      Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
      11-09-2023, 07:02 AM





    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 08:12 AM
    0 responses
    Last Post seqadmin  
    Started by seqadmin, 11-22-2023, 09:29 AM
    1 response
    Last Post VilliamPast  
    Started by seqadmin, 11-22-2023, 08:53 AM
    0 responses
    Last Post seqadmin  
    Started by seqadmin, 11-21-2023, 08:24 AM
    0 responses
    Last Post seqadmin