Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Seeking advice for Amazon Web Services usage

    I have been searching for some specific sequences in the 1000 Genomes Project data, using samtools view and BreakSeq, until the IT services in my University contacted me, because I was taking too much bandwidth. Then, the 1000G people suggested me to use AWS. It looks like a good solution, but I have some doubts, and I would appreciate if other users of AWS can ease my concerns.

    1. I don't understand the language used in the AWS website ("instances", "API", bla, bla, bla). May I assume that if I start an EC2 instance, I will connect to it through ssh as with any remote machine, and be able to install samtools and what not?

    2. They claim most of the 1000 Genomes Project data is available in a "bucket", and they mention several ways of accessing it that I don't know about. Will I be able to samtools-view the bam files or read fastq files?

    3. Assuming so, how fast the data would be transferred from that bucket to my EC2 instance? Most of the time consumed by the pipeline before was to download. I need to know the speed of data transfer to estimate the cost.

    4. Almost the first thing AWS asks you for is your credit card number. I don't want to give mine, and there's none available for the lab. Do you know of alternative ways to pay? We have a budget, but it's managed by the University, which requires invoices and so on.

    Thank you.

  • #2
    Hello Lluc,
    Perhaps you've resolved your questions by now, but I'll just post my answer anyway, and hope someone adds or corrects me.
    I have had the same problem, and I haven't found a really "for dummies" page. Up to now, what I have found out is:
    AWS is a service where you rent servers offsite. The way they do it is by renting virtual servers, that they call "instances" on "EC2". You have complete control of your instance, so it's like having your own server. You have ssh command line access, as well as a web-based control panel. You can rent several instances at a time, and there is a cluster option to rent several instances that work as a cluster. There are several types of instances which include different RAM, number of processors, number of cores per processor and instance disk storage. You will need external storage, which they call "S3". When you "initiate an instance" you have to load an "image" of a server (RAM and disk) so that you don't have to install everything from zero. These images are called "AMI". Amazon provides several pre made images with different pre installed OS (Debian, RedHat, Windows, etc.) Once you install something new on your instance, you will have to save that image on the S3 storage in order to have it ready when you connect to your instance again. The space used for your instance is grouped in objects called "buckets", and can be accessed at the time of instance creation (or re-creation) or even through the web using keys that you can give to third parties.
    There are several applications, both native and third party, that you can access directly from your instance without installing the whole thing. These are the "APIs". A common API is the storefront, which makes your instance use all of Amazon's web store functions on your own domain and products. There are some APIs for science and sequencing.
    So for your question, the transfer would be between the 1000G's bucket and your instance, without going through your local network. The speed can be anything from 1.5 to 10 Mbps, from what I've read, depending on your luck. Once you configure your instance you can use it as your own server.
    There is no way of avoiding the Credit Card step, I've asked. In theory, you can use a "Free Tier" level for one year, and not have any charges made to your card, but they will not tell you if you went over the limit and they will start charging.
    I don't know what sequences you're querying at 1000G, but perhaps it would be best to download them first and do the queries locally. It would be a one time huge download that could be done overnight with your IT's approval.

    Hope this helps, and I hope someone else that is more knowledgeable jumps in.

    Comment


    • #3
      It is such a really old topic. But while I was searching for AWS EC2 AMI related threads on here, I came up to this post.
      Just in case there are some other newbies like me, you can try to learn more by exploring this page:
      Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file form...

      It is a really good resource that introduces all about AWS, which I found more easily to understand than the tutorials on AWS itself. And you can also explore their lectures on AWS and RNA seq analysis using AWS.
      Thanks!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Latest Developments in Precision Medicine
        by seqadmin



        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

        Somatic Genomics
        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
        Today, 01:16 PM
      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 07:15 AM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 10:28 AM
      0 responses
      15 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 07:35 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-22-2024, 02:06 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Working...
      X