Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mwatson
    Member
    • Aug 2010
    • 13

    Combine de novo and reference assembly

    Hi

    I'd be interested in anyone who can tell me about software that will/can combine de novo assembled contigs with a reference assembly.

    What I have are bacterial genomes and between 36 and 72bp reads.

    When I align to the reference, large parts of the genome align perfectly, but then I find gaps; If I do a de novo assembly, I can see that some of the contigs span the gaps, but I am doing this by eye using MUMmer, IGV and a few other bits and bobs.

    It seems to me obvious that someone would have written this, but I can't find anything....

    Mick
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    What assembler are you using for your de novo and reference guided assemblies?

    Have you tried MIRA3?

    Comment

    • mwatson
      Member
      • Aug 2010
      • 13

      #3
      OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

      The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.

      Comment

      • Geneious
        Registered Vendor
        • Jul 2010
        • 22

        #4
        You can also try the 14-day free trial of Geneious Pro.

        Comment

        • maubp
          Peter (Biopython etc)
          • Jul 2009
          • 1544

          #5
          Originally posted by mwatson View Post
          OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

          The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.
          I don't quite understand what you are describing. Are you saying you have "gaps" meaning areas where nothing maps to the reference, yet there is a de novo contig that could be used to span this "gap"? If so, to me it sounds like a region of divergence between the reference genome and your new genome (not a missing section in the new genome). Perhaps you can try adjusting the setting of your guided assembler? Or try including selected de novo contigs in the guided assembly as well?

          Comment

          • tplsmith
            Junior Member
            • Aug 2008
            • 7

            #6
            MIRA-based pipeline to view scaffolds

            Not sure what you really want is to "combine" a reference and your de novo stuff. Researchers that I have been assisting in the same kinds of studies have had good success using the MIRA assembler mentioned earlier, then using bambus to generate a .bnk file. Viewing the results using hawkeye provides a lot of information like the kind you appear to be looking for, especially useful when you have a lot of paired end information and want to see how the scaffolds fit together and where problems may lie. Definitely this is useful to see if a piece in the reference that is "missing" in your data is due to assembly quality issues because you can get an idea of read depth at the ends of the contigs. You can then compare the pertinent scaffolds of the assembly to a reference in a variety of ways, a good one that you can easily edit is in the Geneious package advertised in one of the replies especially because in that single package you can call a variety of aligners (Geneious, ClustalW2, MUSCLE) to see how it affects the results. Instructions for the MIRA/bambus/hawkeye pipeline can be obtained at the MIRA website, some were written here by my colleague who's contact information is on the site also. This should help you decide if your strain has sequence not found in the reference or vice versa. There is also a MIRA discussion group that you can address specific questions to if you have problems.

            Comment

            • Adjuvant
              Member
              • Sep 2010
              • 13

              #7
              Apparently mwatson and I are interested in the same things.

              I'm also doing bacterial sequencing. I used novoalign to align my reads to several reference sequences, extracted the unaligned reads and performed velvet assembly on those. Blasting the resulting contigs shows quite a few that have sequence correseponding to my reference sequences at the ends of the contigs, but novel sequence in the middle. So in an effort to combine my alignments and my de novo assemblies, I did a pileup of my novoalign alignments, dumped the consensus to fastq, then separated the quality data to yield several consensus fasta sequences (corresponding to each of the reference genomes).

              Here's where I get stuck: the pileup fills gaps in the alignment with N's. When I look at my alignment in Tablet, however, I can see that not all gaps are equal. Many are clearly spanned by a lot of paired end reads, whereas others have no spanning pairs and so might be the sites where some of my de novo assembled contigs might fit. They would also be sites where I'd first like to start designing outward directed primers for Sanger sequencing.

              My question is: Is there a way to separate my alignment consensus sequences into contigs separated by these unspanned gaps? The way I'm doing it now is scanning through my alignment in Tablet looking for such gaps, then looking for those gaps in my consensus sequence and manually deleting the N's. It seems like there should be a better way.

              Thanks.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM
              • seqadmin
                Investigating the Gut Microbiome Through Diet and Spatial Biology
                by seqadmin




                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                02-24-2025, 06:31 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 05:03 AM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              16 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              185 views
              0 reactions
              Last Post seqadmin  
              Working...