Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • De novo assembly - plant genome - read length + amount

    Hi,
    I am looking to do my first de novo assembly:
    Plant, diploid, genome size of just under 1gig, no reference, 2 samples, happy to start with only a draft Illumina assembly, genome NOT transcriptome.
    I would guess PE, but 50bp or 100bp or 300bp?, 30x coverage? Sufficient data off a MiSeq?
    Any feedback appreciated, thank you.

  • #2
    The longer the reads, the better. The amount of coverage you need likely depends on the ploidy and heterozygosity; a tetraploid organism may need over 4x the coverage of a haploid. But even for a haploid I would suggest aiming for at least 50x, and for an organism that's highly heterozygous, 50x per ploidy. Bear in mind that Illumina coverage is not very even, so there will be many places where the actual coverage is substantially below the average.

    We make most of our fungal assemblies with around 100x fragment library coverage, with 2x150bp HiSeq reads. I don't work much with plants directly, but I understand that our plant group tries to get 2x250bp MiSeq data because plants are typically bigger and more polyploid than fungi. For an optimal assembly you should use both fragment and long-mate-pair libraries, but that's more difficult (lab-wise) and much more expensive, and may not be needed for a decent assembly; it depends on the genome.

    So you MIGHT be able to get all the coverage you need from a single HighSeq lane, but you'll definitely need multiple MiSeq lanes; however, MiSeq will give longer reads and thus a better assembly at a higher cost per base pair.

    Comment


    • #3
      Thank you Brian, that is extremely useful, I really appreciate this.

      Comment


      • #4
        Hi

        it might also be advisable to check (consider) the homozygosity. (If you have some coverage you see this in kmer plots). But you can also estimazte this if this is an inbred line and whether or not it is self compatible.
        If in doubt go for longer reads. There is of course a trade off between quality and length but 50bp is definitely too short.
        In any case unfortunately no two plant genomes are exactly the same. As one major bugbear are repetetive/transposable elements of differrent sizes.

        We usually do one/two Miseq runs 2x300 to get a feeling for the genome.

        Best Wishes
        Björn

        Comment


        • #5
          Thanks Bjorn, helpful comments. I doubt we'll do any pre-sequencing, we will just commit to one type and then go for it, but your comments on 300bp match what I have read in other publications.

          Comment


          • #6
            Hi Elsie

            if you have your own Miseq, you can also get slightly longer runs. (We picked his up from here) but this is totally unsupoorted though.

            björn

            Comment


            • #7
              Hi Bjorn, thank for you for that. I spoke with the core where we are going to do our sequencing and have decided to go for a HiSeq run 150bp paired ends (1 lane) as the quality on the current MiSeq 300bp PE drops off significantly, so I would lose a lot after trimming.
              Thank you.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X