Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wood Frog Genome ver 0.1

    Hello,

    Trying an experiment in genomics - trying to crowd-fund the sequencing of the wood frog genome. The wood frog is a vertebrate animal that can survive freezing - but only winter acclimated animals can do this - summer acclimated cannot. Thus, this ability is under genetic control.

    We're 55% of the way there and need more pledges to achieve the goal. The project will use Illumina's Moleculo technology. If funded, I'm going to involved undergraduates in the annotation project as a special class. So, this is a great way to get some pretty cool science done, as well as get undergrads involved.

    The project link is here: https://experiment.com/projects/unlo...rvive-freezing

  • #2
    Looks like Rana sylvatica is approximately a 6gb genome? The experiment.com page says you're doing HiSeq2000 2x100bp sequencing--is that to increase coverage over the Moleculo? A single HiSeq 2x100 lane would probably get you 6x or 7x coverage, yes?

    If you have a method for assembling 6gb frog genomes for $3k in molecular costs I'd love to talk to you about it--we've got several that we'd probably do if that was the case.

    Thanks!

    Comment


    • #3
      Originally posted by atcghelix View Post
      Looks like Rana sylvatica is approximately a 6gb genome? The experiment.com page says you're doing HiSeq2000 2x100bp sequencing--is that to increase coverage over the Moleculo? A single HiSeq 2x100 lane would probably get you 6x or 7x coverage, yes?

      If you have a method for assembling 6gb frog genomes for $3k in molecular costs I'd love to talk to you about it--we've got several that we'd probably do if that was the case.

      Thanks!
      Hi,

      I think we're actually going to wait for the HiSeq3000 to come online - we'll get WAY more data for the same cost. The Moleculo kit uses an application on Illumina's Basespace that does the assembly - so we'll see how that goes. I'm hoping that it will get us to the 10Kb stage of contigs and then we'll have to do the scaffolding.

      Yes, we expect to get about 8x coverage. I'll keep you posted on our progress, if you PM me.

      Regards,
      Andor

      Comment


      • #4
        This sounds like an interesting de novo project but I am not sure how far the molecule approach will take you given your limited budget. I would consider to do an unamplified 2x250 library in combination with true long reads (PacBio or possibly MinION) for scaffolding. Have a look at the assemblies produced by Discovar de novo (http://www.broadinstitute.org/softwa...og/?page_id=14) and apply for this http://pacb.com/smrtgrant/ or send a pm if you would be interested in a collaboration.

        Comment


        • #5
          Hi Andor,

          I think there is some misunderstanding as to how "Moleculo" aka "Illumina Synthetic Long Read" libraries work.

          The total number of synthetic long read sequences you would get from a library pool of this type would be about 100,000 long reads. If everything went great the average long read length might be 10,000 base pairs. That works out to about 1.1 billion bases of sequence. That works out to 0.2x coverage of a 6 billion base genome. Way too low for most purposes.

          I think you would be better off doing a lane of a normal paired end library. That would yield about 40 billion bases of sequence. About 6x coverage. Still too low to get much of an assembly out of, but at 6x you would get some assembly.

          Better yet, you could do a transcriptome project instead.

          --
          Phillip

          Comment


          • #6
            Originally posted by pmiguel View Post
            Hi Andor,

            I think there is some misunderstanding as to how "Moleculo" aka "Illumina Synthetic Long Read" libraries work.

            The total number of synthetic long read sequences you would get from a library pool of this type would be about 100,000 long reads. If everything went great the average long read length might be 10,000 base pairs. That works out to about 1.1 billion bases of sequence. That works out to 0.2x coverage of a 6 billion base genome. Way too low for most purposes.

            I think you would be better off doing a lane of a normal paired end library. That would yield about 40 billion bases of sequence. About 6x coverage. Still too low to get much of an assembly out of, but at 6x you would get some assembly.

            Better yet, you could do a transcriptome project instead.

            --
            Phillip
            We have a transcriptome done - I think we are now leaning to using Moleculo and a standard library in combination to generate the overall genome. With the advent of the HiSeq3000, I think we'll be able to obtain enough read coverage for the same money as our initial budget called for.

            Regards,
            Andor

            Comment


            • #7
              Originally posted by Chipper View Post
              This sounds like an interesting de novo project but I am not sure how far the molecule approach will take you given your limited budget. I would consider to do an unamplified 2x250 library in combination with true long reads (PacBio or possibly MinION) for scaffolding. Have a look at the assemblies produced by Discovar de novo (http://www.broadinstitute.org/softwa...og/?page_id=14) and apply for this http://pacb.com/smrtgrant/ or send a pm if you would be interested in a collaboration.
              Thanks for the grants tip - I've submitted something.

              Comment


              • #8
                Originally posted by cement_head View Post
                We have a transcriptome done - I think we are now leaning to using Moleculo and a standard library in combination to generate the overall genome. With the advent of the HiSeq3000, I think we'll be able to obtain enough read coverage for the same money as our initial budget called for.

                Regards,
                Andor
                I really like the idea of the Illumina Synthetic Long Reads. But because you need to go at least 30X or so to construct them, they end up costing way more than mate pair library data.

                Plus, one goofy factor that Illumina doesn't really focus on: they use weird 8 base indexes (instead of the normal 6+1 base TruSeq indexes) so that they can get 384 good ones. This means that you are unlikely to find a facility willing to run them in a highoutput flowcell unless you pay for all 8 lanes.

                Also, I don't know if you read my post in the other thread, but you may be drastically overestimating the cost-savings of the HiSeq3000/4000. It will probably be a less than 1.5x lower cost per base. I mean, that is still a substantial drop in price, but not 10x...

                --
                Phillip

                Comment


                • #9
                  Originally posted by pmiguel View Post
                  I really like the idea of the Illumina Synthetic Long Reads. But because you need to go at least 30X or so to construct them, they end up costing way more than mate pair library data.

                  Plus, one goofy factor that Illumina doesn't really focus on: they use weird 8 base indexes (instead of the normal 6+1 base TruSeq indexes) so that they can get 384 good ones. This means that you are unlikely to find a facility willing to run them in a highoutput flowcell unless you pay for all 8 lanes.

                  Also, I don't know if you read my post in the other thread, but you may be drastically overestimating the cost-savings of the HiSeq3000/4000. It will probably be a less than 1.5x lower cost per base. I mean, that is still a substantial drop in price, but not 10x...

                  --
                  Phillip
                  I thought that the # of reads on a HiSeq3000 was between 10x to 15x that of the HiSeq2500, but pretty much the same cost. Has there been a update on the chemistry pricing? That said, the V4 chemistry is pretty impressive on its own. I thought one could run the HiSeq3000 a lane at a time? Well, if not, I'll have to wait until a facility gets enough orders. I think the chemistry will be popular enough that it shouldn't be too long a wait.

                  Comment


                  • #10
                    3000/4000 only does up to 2x150 bp. With the "old" 2500 system you can run 2x250 bp in rapid mode (single lane) which will be better for assembly. This should give ~150 Gb but is probably over your budget unless you find someone willing to share a run. Don't forget that according to some, $25000 is affordable for a large de novo genome... (https://www.genomeweb.com/sequencing...-horse-genomes)

                    Comment


                    • #11
                      Originally posted by cement_head View Post
                      I thought that the # of reads on a HiSeq3000 was between 10x to 15x that of the HiSeq2500, but pretty much the same cost. Has there been a update on the chemistry pricing? That said, the V4 chemistry is pretty impressive on its own. I thought one could run the HiSeq3000 a lane at a time? Well, if not, I'll have to wait until a facility gets enough orders. I think the chemistry will be popular enough that it shouldn't be too long a wait.
                      Yes, I know that is what you thought because of your posts in that thread. But you were comparing the number of reads in a lane (HiSeq2500) to number of reads in a flowcell==8lanes(HiSeq3000).

                      You absolutely will not be able to run a HiSeq3000/4000 one lane at a time. You run the whole flow cell at once on a HiSeq.

                      Yeah, you might find a core that does a lot of Synthetic Long Reads. Then you could get your single lane of data back in a reasonable amount of time. I'm not sure if there is such a core, though...

                      --
                      Phillip

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 11:49 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 08:47 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X