Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Wallysb01
    Senior Member
    • Feb 2011
    • 286

    create scaffold file from contig file and agp file?

    Does anyone know how to go from a contig file with the scaffolding information in an AGP file to a full scaffolded multifasta file?

    Thanks
  • aajosselin
    Junior Member
    • May 2016
    • 2

    #2
    Hi,
    did you solve the problem? right now, I have to solve the same problem.

    Comment

    • Markiyan
      Senior Member
      • Sep 2010
      • 124

      #3
      If you have contigs with scaffolding info in the agp file -

      Than you can pad gaps between the contigs with N's (according to agp file) and write a multifasta file with an entry for each scaffold. Can be done with a bit of perl programming.

      (In my case I use newbler output, which includes the scaffolds sequences in the multifasta file).

      If you have source reads data, and would like to try a different scaffolder(s) -

      have a look at the following review for the scaffolding tools available:

      Background Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        New Genomics Tools and Methods Shared at AGBT 2025
        by seqadmin


        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

        The Headliner
        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
        03-03-2025, 01:39 PM
      • seqadmin
        Investigating the Gut Microbiome Through Diet and Spatial Biology
        by seqadmin




        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
        02-24-2025, 06:31 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 05:03 AM
      0 responses
      14 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 07:27 AM
      0 responses
      12 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-18-2025, 12:50 PM
      0 responses
      14 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-03-2025, 01:15 PM
      0 responses
      185 views
      0 reactions
      Last Post seqadmin  
      Working...