Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finishing a de novo assembly of bacteria genome from single-end reads

    I have assembled a ~5Mb bacterial genome using both Edena and CLC Bio. I have about several hundred contigs.

    The lab is willing to sequence to close up some of the gaps - about 50...but not 500. This is a 50 bps reads single-end illumina sequencing (about 200X) DNA-Seq library.

    What is the best approach to try and close some of the gaps?

    Are there specific tools to automate this?

    Can the singleton sequences be helpful?

    There is a genome of a cousin strain that has been sequenced (90-95% similar). Using Mummer, I am able to map the contigs to them, but they have found experimentally that there that there are differences between the strains and I'm not sure where to stick in the contigs that are different.
    Also, how do I handle repetitive sequences. There are some contigs that have much higher coverage than the others > 2000X.

    Again....are there tools or pipelines that can help reduce the number of contigs by merging some together? Any tips on how to proceed?


    Thanks so much,
    Tirza
    --
    Tirza Doniger, Ph.D.
    Bioinformatics Unit
    The Mina and Everard Faculty of Life Sciences
    Bar Ilan University

  • #2
    There are lots of threads on this topic on seqanswers. Also try the de novo forum here, there will be a lot of useful information there.

    Generally, you'll struggle to scaffold contigs from single end data. I would advise also trying Velvet for assembly though.

    Comment


    • #3
      Designing PCR primers and capillary sequencing to close gaps is quite expensive (time and money). Assuming you still have DNA for the exact same sample, getting more high throughout sequencing might actually be the most cost effective plan - probably you should get paired end data next time.

      In terms of tools, I've not tried it yet but CONTIGuator seems to do what you need:
      A bacterial genomes finishing tool for structural insights on draft genomes

      Comment


      • #4
        Originally posted by maubp View Post

        In terms of tools, I've not tried it yet but CONTIGuator seems to do what you need:
        [url]http://contiguator.sourceforge.net/[/url

        I have a assembly with 18 scaffolds and a reference genome which have average nucleotide identity of 85% with my genome. I used contiguator and it determined the order of my scaffolds. 11 out of 18 scaffolds alligned to reference genome.

        However, I am not sure how exactly results are calculated or ordering is done. How to interpret the reliability of this order.

        Since my genome is only 85% identical to reference, it might be possible that unmapped scaffolds are part of my genome but not present in reference. How to determine the placement of those scaffolds. Any experience? Any ideas?

        Thanks

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Working...
        X