Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice on genome sequencing project for Anolis distichus

    Hi folks,

    We are aiming to kick off a genome sequencing project on an anole (lizard) species, Anolis distichus (genome size somewhere between 1.8 and 2.5Gbp). We've been researching "best practices", and the general consensus I am coming across is a combined coverage (of shorter and long inserts) of about ~100X, having long insert mate-pairs (~10-20kb+), and using SOAP or ALLPATHS as an assembler seems to lead to a decent assembly. However, recently, we've come across DISCOVAR de novo and Platanus as potential assembler options as well.

    Obviously, choice of assembler is going to affect our library prep, so I wanted to canvass the community to see if anyone had any updated thoughts on current best genome sequencing practices (most of the posts/genomes I found were initiated in 2014 or before).

    Resources:
    -- A 'relatively' inbred individual for sequencing, and high quality DNA extracted from it
    -- One of its congeners, A. carolinensis has been sequenced
    -- We'll have a transcriptome for the individual we sequence
    -- $5-7k for whole genome sequencing costs

    Our current feeling is aiming for 2*250bp reads of ~450bp insert sizes (Illumina) will allow us to use DISCOVAR denovo, and if along with the transcriptome, that doesn't give us a "pretty enough" assembly (we are looking for a very high quality draft as we are interested in specific chromosomal regions involved in divergence across the genus), we could then add in mate-pair (and potentially pacbio?) if we needed to, and try ALLPATHS.

    Do people have strong thoughts on how they would attack the project with the same resources? We would love to hear from you if so. Full disclosure: also asking this question on Researchgate, so if I get any answers there that I think people here would like to hear I'll make sure to share.

    Cheers!

  • #2
    I like no amp PE libraries. The Illumina PCR-free kit works well.
    Are you planning doing a 2x250bp RAPID run? We've used 2x250 reads from a MiSeq run on an insect genome and I didn't see much improvement to the assembly from them over the 2x100base reads we had already.

    --
    Phillip

    Comment


    • #3
      @Phillip: DiscovarDeNovo requires 250 bp or longer illumina reads.

      @Zippy: DiscovarDeNovo required between 800G and a TB of RAM for an assembly I recently tried with ~250M 250bp reads. So keep that in mind.

      Comment


      • #4
        Thanks Phillip and Genomax. So it seems if we went the ALLPATHS route, 2*100bp would be enough, and if we go the Discovar route we need to be aware of the computational demands. Appreciate the input!

        Comment


        • #5
          ABySS-PE seems really good at animal and fungal genome assemblies. It can handle 250 base reads fine. (ALLPATHS probably could also.)
          Mate-pair libraries really help. Even the cheap-to-make no-gel TruSeq Nextera Mate Pair libraries.

          Comment


          • #6
            Thanks Phillip - we were planning on trying multiple assemblers anyway, so I'll stick ABySS-PE in the queue. Just in case anyone is interested, the recommendations I got at researchgate were:
            -- 40~60-fold of 2*250bp reads of ~450 bp insert size should be suitable at the first stage
            -- generate mate-pairs or pacbio data if the cost can be covered. Then, if the resulted assembly at the first stage isn't very good, those data can be used

            or, alternately:
            -- Start with only Illumina Synthetic Long Reads, and from there sequence either very long insert mate pairs and/or get some PacBio reads.

            Available from: https://www.researchgate.net/post/Ca...olis_distichus [accessed May 6, 2015].

            Comment


            • #7
              Originally posted by Zippy View Post

              or, alternately:
              -- Start with only Illumina Synthetic Long Reads, and from there sequence either very long insert mate pairs and/or get some PacBio reads.
              Great if you have unlimited funds. A synthetic long read "run" will net you about 1 billion bases in 2-10kb "reads" if all goes well. If you could buy enough to get to 10x coverage with these, you might get some great results. You really need a lane of sequence from a HiSeq to generate that 1 billion bases of sequence. So you are looking at the cost of 10 HiSeq lanes.

              --
              Phillip

              Comment


              • #8
                Hi Phillip,

                Yes - I agree that is definitely out of our budget. I thought I would share just in case anyone reading this thread has a lot more money to throw at their genome than we do!

                Cheers!

                Comment


                • #9
                  SGA will run on a capable linux workstation...I used it for a ~1.1GB fish genome.

                  Comment


                  • #10
                    Thanks lac302

                    Comment


                    • #11
                      I have found that with reptile genomes (particularly Anolis), high coverage (~80X) 2*100bp illiumina with a diversity of insert sizes (3 lanes ~200bp, 3 lanes 300bp, and 2 lanes 1kb) can get you decent contigs using ABySS, SOAP, and platanus, but you will run up against the highly repetitive (and possibly heterozygous) regions eventually, and if your sample was basically a hybrid from separate populations it will be very disappointing (even using platanus, in my experience). Multiple mate-pairs insert size libraries will extend the scaffolds to a decent length that will allow you to annotate most genes, but not to chromosome-level.

                      We have been in conversation with several groups about how they are going to tackle mega-base size scaffolds for reptiles, and there appears to not yet be a consensus outside mammals, for which apparently DISCOVAR is working very well in the Broad's effort to sequence 150 species. One idea has been to merge the DISCOVAR contigs with a single mate-pair library. But again, this has not been optimized for reptiles.

                      The alligator people fortunately had BACs that allowed them to merge many of their Illumina-only scaffolds, followed by RNA-scaffolding of assembled transcripts from the same species. Those extra steps really helped.(I have tried RNA scaffolding of carolinensis transcripts on my de novo anole genomes - with no real improvement).

                      Since BACs are a thing of the past (although one species I am working on - not an anole - has a few on GenBank, can't wait to use them), I think PacBio would do really well to merge Illumina scaffolds. But these are nascent times for reptiles genomics, for sure, and not much is known yet.

                      Comment


                      • #12
                        Thanks Marct - it is great to get the reptile-specific feedback. At this stage I think we are going to try and target 2*250bp on a ~450bp insert size library, do an assembly with DISCOVAR, add in the transcriptome/colinearity with carolinensis, and then see where we are at before pursuing optical-mapping/long-range reads/mate pair libraries.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-27-2024, 06:37 PM
                        0 responses
                        12 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-27-2024, 06:07 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        52 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        68 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X