Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Definition of Scaftig

    There appears to be a lack of a clear definition of the term "scaftig".
    How do you use this term? Is there a good definition somewhere?

    I think it should have a very definite distinction from the term contig, and I would define it as
    "All portions of a final assembly consisting of contiguous sequence, with sequences split at every occurrence of gaps of unknown bases (Ns)."

    For example, if my final assembly is
    >C1020304
    ACTGTGATCG
    >scaffold1234
    CGTCGATCnnnnnnCGATCGATnCATGCA

    My scaftigs are
    >scaftig1
    ACTGTGATCG
    >scaftig2
    CGTCGATC
    >scaftig3
    CGATCGAT
    >scaftig4
    CATGCA

    Additionally, you could use the term "scaffold scaftigs" if you wanted to make clear that the left-over contigs are not to be included in the set of scaftigs.

    References
    I have found only a few somewhat inconsistent definitions available, in order of publication:
    1. "A scaftig refers to a continuous sequence formed by multiple initial contigs lined up in a scaffold with putative sequence overlaps."
      State of the art de novo assembly of human genomes from massively parallel sequencing data
      Hum Genomics. 2010; 4(4): 271–277. (April 1, 2010)

      Can sequence not in a scaffold be considered scaftigs? This definition would imply no.
    2. "New word of the day: #scaftigs RT @assemblathon 'scaftigs' intra-scaffold gaps between contigs. #gaw"
      Twitter (March 15, 2011)

      This strangely seems to define the gaps rather than the sequence. I include this because there are few definitions to be found.
    3. "scaftigs can be constructed by extracting the contiguous sequences that lack unknown bases (Ns)."
      Mende DR, Waller AS, Sunagawa S, Järvelin AI, Chan MM, et al. (2012) Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data. PLoS ONE 7(2): e31386. doi:10.1371/journal.pone.0031386 (February 23, 2012)
      Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available.

      This is closest to my definition
    4. "The resulting high-quality reads were assembled into scaftigs using SOAPdenovo 1.05 and genes predicted on scaftigs longer than 500 nt using MetaGeneMark v1.0"
      Country-specific antibiotic use practices impact the human gut resistome
      Genome Res. 2013. 23: 1163-1169 (April 8, 2013)
      An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

      Supplementary Materials and Methods
      I would call these contigs, they are the primary product of an assembly.
    5. ABySS version 1.5.0 introduced a command (May 1, 2014)
      "New command, `scaftigs`. Breaks scaffold sequences at 'N's and produce a scaftigs.fa file."
      http://www.bcgsc.ca/platform/bioinfo...releases/1.5.0


    Other references that I found do not define the term.

  • #2
    If I understand well, scaftigs are the contigs (contiguous sequences) from scaffolds, right? (Can be interpreted as the same set of the original contigs without the ones that are not included in any scaffold).

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Non-Coding RNA Research and Technologies
      by seqadmin


      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

      [Article Coming Soon!]...
      Today, 08:07 AM
    • seqadmin
      Recent Developments in Metagenomics
      by seqadmin





      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
      09-23-2024, 06:35 AM
    • seqadmin
      Understanding Genetic Influence on Infectious Disease
      by seqadmin




      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
      09-09-2024, 10:59 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 10-02-2024, 04:51 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 10-01-2024, 07:10 AM
    0 responses
    23 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-30-2024, 08:33 AM
    1 response
    29 views
    0 likes
    Last Post EmiTom
    by EmiTom
     
    Started by seqadmin, 09-26-2024, 12:57 PM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Working...
    X