Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Verifying de novo bacterial genome

    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

    So now my question is - how do I verify that this is all good?
    Do I do synteny maps with a nearby bacteria?
    Do I finish connecting the genome as best I can (and how?)
    Do I pray?
    Thanks....

  • #2
    Originally posted by Noa View Post
    I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).
    Nice! (even though 3.7 Mbp sounds too good to be true, using velvet - but I don't know what your read lengths/coverage were...)

    So now my question is - how do I verify that this is all good?
    You can't really 'verify' without an independently-obtained assembly of the same organism; realistically you can only increase your level of confidence in this assembly. Happily, that's all anyone is likely to want from you.

    Do I do synteny maps with a nearby bacteria?
    That's sensible. Mauve is good for this. Bacterial genomes are prone to rearrangement though, and it's not true that a breakdown of synteny implies misassembly; you'd want to look for other indicators of rearrangement, and also to inspect the assembly for indicators of poor quality assembly.

    Do I finish connecting the genome as best I can (and how?)
    You could do that, for example, by designing primers to either side of a 'gap' in the assembly, amplifying up from chromosomal DNA, and sequencing the amplification product. You could do the same for questionable assembly regions, too (use Tablet/some other viewer to inspect the assembly for dips/spikes in coverage and other indicators of misassembly). Depending on what you want from your assembly, this could be unnecessary, or too much effort to be worthwhile.

    Other approaches you might consider could include: BLASTing your sequence with the annotated genes of a fully-sequenced, related bacterium, to estimate your recovery of a comparable gene complement; having a quick look at a GC skew plot (window size ≈4kbp) of your Mauve output to see if you have a 'sensible' assembly, in the sense that GC skew usually has a characteristic pattern, either side of the origin of replication (positive on one strand, negative on the other); checking evenness of coverage of your assembled/(re-)mapped reads ('spikes' might indicate collapsed repeats), etc...

    Do I pray?
    This is one of the least likely routes to an improved assembly

    L.

    Comment


    • #3
      Depends on how much work you want to do

      If you have the time and resources for more experiments, you might evaluate your assembly by one or several of

      - paired-end sequencing and looking for indels. There are a number of techniques that look for aberant average distances between the pairs of reads given the expected library size as an indicator of indels. Such an analysis might identify assembly errors (either mis-joined contigs or missing pieces)

      - array CGH to see whether it indicates copy number changes relative to your assembly, which would indicate missing or duplicated segments in your assembly

      If you're limited to computational techniques, then synteny is a good idea. I'd also look for coding regions that show substantial differences (especially truncation) to the nearest species for which an annotated genome exists. Such changes may be real, but would be good candidates for resequencing (potentially Sanger) to confirm. Similarly, changes in copy number of genes would be good to confirm.

      Comment


      • #4
        Another option is to get an optical or restriction map of the physical genome. I used OpGen's service for making optical maps and had good results. I found a number of misassemblies which I corrected and closed a lot of gaps.

        OpGen, along with its subsidiaries, Curetis & Ares Genetics, develops & commercializes molecular microbiology solutions.

        Comment


        • #5
          Bioinformatically, there are another approaches you could try: AmosValidate and hawkeye, including the FRC (feature Response Curve), see these papers: http://bib.oxfordjournals.org/cgi/co...tract/bbr074v1 and http://dx.plos.org/10.1371/journal.pone.0031002. This should allow you to flag potential problematic regions.

          Comment


          • #6
            Originally posted by Noa View Post
            I used velvet to assemble a ~4.7M bacterial genome. I got some seemingly nice results - 4.8M of genome covered, 100 contigs >1k, largest contig 3.7M (!!!).

            So now my question is - how do I verify that this is all good?
            Do I do synteny maps with a nearby bacteria?
            Do I finish connecting the genome as best I can (and how?)
            Do I pray?
            Thanks....
            As Bacteria Genomes pointed out (thank you, B.G.!), Whole Genome Mapping (formerly known as "Optical Mapping") by OpGen could certainly help in improving your assembly and reducing those 100 contigs to a potentially significantly lower number. Full disclaimer: I work for OpGen. Feel free to contact me, and I'd be happy to put you in touch with the right people if you wish to discuss further.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM
            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 11-08-2024, 11:09 AM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-08-2024, 06:13 AM
            0 responses
            38 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Working...
            X