Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best assembler for tetraploid plant genome?

    So far we collected 32GB of 2x100 reads on a no-PCR library with average inserts size of about 500bp. Also part of a lane from a 5kb Mate End library, but the library was very bottomed-out. Probably only about 3GB of sequence from read pairs with unique endpoints.

    These were libraries constructed from a DNA prep from a single plant. The species is genetically tetraploid with a 1C genome size around 0.7-1GBp.

    Our best results with our normal assembler, ABySS-PE gave an N50 of 2.3kb scaffold length for a 875MB summed non-N scaffold length.

    A cursory evaluation of the reads mapped back to scaffolds annotated by CEGMA suggests the allelic diversity is high -- maybe at the 1-3% level inside exons.

    What assembler would you use?

    We are planning to go deeper on the no-PCR library as that seems not to be even close to bottoming out. Also we have a 600 cycle 12GB MiSeq run on a pool of 6 or so plants of the same species that we can add to the assembly.

  • #2
    Bringing this topic back up because Phillip and myself are still stumped as to how to approach the problem of assembling a tetraploid (or higher ploid) plant genome. We did more sequencing of the paired-end library and now have about 52 Gbases or roughly 50x coverage plus the mate-pairs. For a normal diploid organism that would be a great coverage level that would provide us with long scaffolds using ABySS. Even 25x coverage would tend to provide a good assembly. However we are getting -- even after putting in the miSeq reads Phillip mentioned - results with maximum lengths around 90K. N50s of around 2500 bases with ~110,000 of ~580,000 [total] scaffolds this length or greater. A total genome size of about 1 Gbase.. Yes, I know N50 is not the best statistic; it is the max length that is more troubling -- usually we can achieve longer.

    Anyway in my reading it appears that polyploid assemblies usually fall back to sequencing a diploid progenitor or relative; i.e., reduce the problem to the normal and achievable diploid assembly. An "orthologous group assembly" was done for wheat however our plant doesn't have close relatives thus I am not sure if we can do that approach. Any suggestions?

    1) Different assembler than ABySS? I've tried Ray, Spades, and Mira with various degrees of non-success. Hapsemblr is taking a long time -- 30+ days -- but might work.

    2) Get different reads? Say PacBio.

    3) Be satisfied with our current results?

    4) Try an orthologous assembly? We have talked about this, at least starting from known genes.

    Comment


    • #3
      I'm guessing you are dealing with an outcrossing auto-tetraploid. This would mean at any give locus you could have anywhere from 2 to 4 different alleles. With the allele diversity you are seeing from CEGMA I think read length is your best (only?) way forward. Techology like PacBio or Moleculo could give you nice long reads with the haplotype phase already determined. This would help tremendously in assembling each of the different homoeologous regions.

      I don't think additional sequencing of your libraries is going to get you very far as it is the diversity confounding the assembler. ABYSS-PE is collapsing the homoeologous regions into a haploid 875 MB genome. I have no suggestions on which assembler could help you here.

      This is a tough problem. I look forward to hearing about it when you've figured it out.
      Last edited by tbanks; 05-08-2014, 12:15 PM.

      Comment


      • #4
        @pmiguel and @westerman - Do you have any suggestion for tetraploid plant genome assembly?. I am having same situation here.
        1.Does error correction before assembly helps?
        2.Any good assembler?

        Comment


        • #5
          I guess you should give Platanus a try:

          It is very easy to use but in our own test unfortunately did not work very well either.

          “Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads”
          An international, peer-reviewed genome sciences journal featuring outstanding original research that offers novel insights into the biology of all organisms

          Comment


          • #6
            I second both Platanus for a trial but was also very disappointed with the results.

            Polyploid assembly is a very, very difficult task. Even if you got 50X PacBio coverage the PacBio diploid assembler (Falcon I believe) seems to still be in its infancy.

            Good luck!

            Comment


            • #7
              We have not been that successful in assembling our tetraploid.

              Comment


              • #8
                I think for a successful assembly of a tetraploid genome, biology would help to some degree depending on whether it is autogamous or allogamous. For instance, if there is a protocol for doubled haploid plant regeneration for the species the genome complexity would be reduced to a diploid level in doubled haploid lines. If it is an early maturing plant the allelic variation can be further reduced by inbreeding those lines.

                Comment


                • #9
                  Platanus was promising much, but I had also no success assembling a highly heterozygeous diploid genome with it...

                  Try to use Masurca http://www.genome.umd.edu/masurca.html. The hybrid approach might help your N50 contig size.

                  Comment


                  • #10
                    Rick,
                    Did you ever manage to improve your assembly? Currently faced with sequencing a tetraploid and no known diploid progenitor.
                    Did you try any long read seq? Did you find a better assembler?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Genetic Variation in Immunogenetics and Antibody Diversity
                      by seqadmin



                      The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                      11-06-2024, 07:24 PM
                    • seqadmin
                      Choosing Between NGS and qPCR
                      by seqadmin



                      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                      10-18-2024, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 11-08-2024, 11:09 AM
                    0 responses
                    57 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 11-08-2024, 06:13 AM
                    0 responses
                    38 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 11-01-2024, 06:09 AM
                    0 responses
                    35 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-30-2024, 05:31 AM
                    0 responses
                    23 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X