Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gprakhar
    Member
    • Aug 2010
    • 78

    Preinstalled Genomic analysis Tools for Cloud Computing

    One challenge in harnessing Cloud computing is IT related i.e., installation and testing of bioinformatics tools.
    The situation is compounded by the fact that there is no common platform/language/library/API, bioinformatics software developers stick to.
    In hindsight it would time-saving to have all these tools available pre-installed and tested for Cloud computing.

    Objectives:
    • Creation of bootable cloud-volumes (Amazon:EBS & Google: Disk) with Bioinfomatics tools installed
    • Periodic upgrade of tools and Operating System updates in case of new release
    • Easy scalability options for tools that support up-scaling
    • Documentation for Usage, Security, Scaling up and a mailing list



    Remarks:

    The tools are to be tested after installation. The cloud-volumes have tools broadly grouped according to analysis tasks e.g., Prokaryotic assembly, Prokaryotic annotation tools, Assembly improvement tools, RNA seq analysis, Metagenomic pipelines. Some degree of redundany of tools in different volumes is expected. The user raw data can be stored in Amazon S3 bucket or a cloud-volume.

    Current Status:
    I have created bootable Cloud Volume with tools for Prokaryotic assembly (Soap Denovo, a5-2014 pipeline, MaSuRCA, SPAdes). In process of creating volume for Prokaryotic annotation tools and assembly improvement tools. Next would be RNA seq analysis and Metagenomic pipelines.

    Technical Information:
    OS : Ubuntu 12.04 LTS 64 bit


    Questions:
    Is this effort something that users(community) would find useful ?
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    I strongly recommend people to consider the cost of cloud computing in comparison to a cheap (but high-performance) desktop/server system. Local systems have the benefit of lower latency and much greater storage capacities, as well as being able to know exactly where your data is.

    Comment

    • Bukowski
      Senior Member
      • Jan 2010
      • 388

      #3
      Reinventing the wheel?

      BDSLOT88 menjadi salah satu pusat pelatihan resmi bagi para Aquarian Teacher dengan layanan tercepat 24 jam nonstop sehingga dapat membantu jika mengalami kendala.

      Comment

      • gprakhar
        Member
        • Aug 2010
        • 78

        #4
        Originally posted by Bukowski View Post
        Reinventing the wheel?

        http://cloudbiolinux.org/
        I am currently reading through the documentation of CloudBioLinux, it should do the job if I can figure out how to regulate the packages getting installed.
        The latest Ubutnu 13.04 based ami is 35 Gb instance size and has a huge number of tools.

        Thank you

        --
        prakhar

        Comment

        • gprakhar
          Member
          • Aug 2010
          • 78

          #5
          Originally posted by gringer View Post
          I strongly recommend people to consider the cost of cloud computing in comparison to a cheap (but high-performance) desktop/server system. Local systems have the benefit of lower latency and much greater storage capacities, as well as being able to know exactly where your data is.
          L, Stein 2010 Genome Biology provides convincing arguments for moving to the cloud.
          Secondly in my country wide scale adoption of Computational analysis has been lagging due to high costs involved.
          At ~$100 a month for a prokaryotic analysis on small scale on AWS, that works out perfect for us.


          cheers,
          --
          prakhar

          Comment

          • gringer
            David Eccles (gringer)
            • May 2011
            • 845

            #6
            L, Stein 2010 Genome Biology provides convincing arguments for moving to the cloud.
            Okay, let me cherry-pick from that article:

            Transferring a 100 gigabyte next-generation sequencing data file across such a link will take about a week in the best case. A 10 gigabit/second connection (1.25 gigabytes/second), which is typical for major universities and some of the larger research institutions, reduces the transfer time to under a day, but only at the cost of hogging much of the institution's bandwidth. Clearly cloud services will not be used for production sequencing any time soon. If cloud computing is to work for genomics, the service providers will have to offer some flexibility in how large datasets get into the system.
            Additionally, the paper was written in 2010. 4 Years have passed since then, during which time Intel has pushed out quite a few power-efficient processors with large capabilities for parallel processing. Moore's law has continued in computers, but sequencing volumes haven't changed so much in terms of total data sizes (admittedly driven by customers that are content with the produced volumes), allowing the computers to catch up. I don't think the paper is providing ultimate arguments for cloud computing, just that there are some cases where it can be more cost-effective.

            At ~$100 a month for a prokaryotic analysis on small scale on AWS, that works out perfect for us.
            Well, it's good that you've looked at the options. As I mentioned previously, a $1500 computer (15 months at $100/month, plus a bit more for power) will probably be capable of doing prokaryotic analysis (including genome assembly), and you get the additional benefit of large cheap storage (3TB for $200), as well as the knowledge of precisely where your data is.

            edit: changed drive cost to a more reasonable value
            Last edited by gringer; 03-21-2014, 01:50 PM.

            Comment

            • biznatch
              Senior Member
              • Nov 2010
              • 124

              #7
              Originally posted by gringer
              Yes, sorry. I was thinking $200, but wrote $400, because I was looking at prices for 4TB at the same time.
              That's ok, and I deleted my comment because I realized I wasn't sure if you were talking about US dollars or something else, and I also thought maybe you're factoring in the extra cost to back up local files on a second drive. So I figured it was just getting too confusing and decided not to post

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 11:08 AM
              0 responses
              7 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              12 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              54 views
              0 reactions
              Last Post SEQadmin2  
              Working...