No announcement yet.

de novo assembly vs. reference assembly

  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembly vs. reference assembly


    I would like to know if someone has experience in comparing a local de novo assembly to a reference assembly and measure which one is the best.

    I have mapped genomic Illumina reads to a reference genome. Then, since I'm interested in a 1Mb region of one of the chromosomes, I used a de novo assembler to assemble the reads that mapped to that 1Mb region. So now I have about 6000 contigs ranging in size from 500bp to 30kb and I would like to:
    1- visualize their position in relation to the original 1Mb region
    2- Be able to say that the de novo local assembly is better (or worse) than just to map my reads to the reference assembly.

    Many thanks

  • #2
    1 E.g. Mauve Contig Mover
    2 What is your definition of 'better' and 'worse'?


    • #3
      This is pretty much what Complete Genomics does. They align to the reference and identify positions where they detect a variant, then do local de novo assembly over the variant. It does seem to increase specificity in particular (by excluding potential false positives that disappear after de novo assembly).

      That said, having compared myself, it does not appear to be worth the effort for the relatively long reads you'll get off an Illumina given the computational expense of assembly because it doesn't really seem to increase sensitivity that much.
      Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
      Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
      Projects: U87MG whole genome sequence [Website] [Paper]


      • #4
        Thanks for the replies. I would look into the mauve tool.

        By being a 'better' local de novo assembly vs. reference assembly, I consider a region on the genome that has many SNPs, indels, etc. when mapped to a reference assembly. And so, it might be due to a hyper polymorphic region where the reference genome is very different from the sample DNA you are analyzing. In these circumstances I would choose a de novo local assembly.

        Now, the most important question is where do you define a threshold so as you consider a region with "many" variants? That is almost a rhetoric question I guess...