Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nhntran
    replied
    It is such a really old topic. But while I was searching for AWS EC2 AMI related threads on here, I came up to this post.
    Just in case there are some other newbies like me, you can try to learn more by exploring this page:
    Informatics for RNA-seq: A web resource for analysis on the cloud. Educational tutorials and working pipelines for RNA-seq analysis including an introduction to: cloud computing, critical file form...

    It is a really good resource that introduces all about AWS, which I found more easily to understand than the tutorials on AWS itself. And you can also explore their lectures on AWS and RNA seq analysis using AWS.
    Thanks!

    Leave a comment:


  • keo
    replied
    Hello Lluc,
    Perhaps you've resolved your questions by now, but I'll just post my answer anyway, and hope someone adds or corrects me.
    I have had the same problem, and I haven't found a really "for dummies" page. Up to now, what I have found out is:
    AWS is a service where you rent servers offsite. The way they do it is by renting virtual servers, that they call "instances" on "EC2". You have complete control of your instance, so it's like having your own server. You have ssh command line access, as well as a web-based control panel. You can rent several instances at a time, and there is a cluster option to rent several instances that work as a cluster. There are several types of instances which include different RAM, number of processors, number of cores per processor and instance disk storage. You will need external storage, which they call "S3". When you "initiate an instance" you have to load an "image" of a server (RAM and disk) so that you don't have to install everything from zero. These images are called "AMI". Amazon provides several pre made images with different pre installed OS (Debian, RedHat, Windows, etc.) Once you install something new on your instance, you will have to save that image on the S3 storage in order to have it ready when you connect to your instance again. The space used for your instance is grouped in objects called "buckets", and can be accessed at the time of instance creation (or re-creation) or even through the web using keys that you can give to third parties.
    There are several applications, both native and third party, that you can access directly from your instance without installing the whole thing. These are the "APIs". A common API is the storefront, which makes your instance use all of Amazon's web store functions on your own domain and products. There are some APIs for science and sequencing.
    So for your question, the transfer would be between the 1000G's bucket and your instance, without going through your local network. The speed can be anything from 1.5 to 10 Mbps, from what I've read, depending on your luck. Once you configure your instance you can use it as your own server.
    There is no way of avoiding the Credit Card step, I've asked. In theory, you can use a "Free Tier" level for one year, and not have any charges made to your card, but they will not tell you if you went over the limit and they will start charging.
    I don't know what sequences you're querying at 1000G, but perhaps it would be best to download them first and do the queries locally. It would be a one time huge download that could be done overnight with your IT's approval.

    Hope this helps, and I hope someone else that is more knowledgeable jumps in.

    Leave a comment:


  • Lluc
    started a topic Seeking advice for Amazon Web Services usage

    Seeking advice for Amazon Web Services usage

    I have been searching for some specific sequences in the 1000 Genomes Project data, using samtools view and BreakSeq, until the IT services in my University contacted me, because I was taking too much bandwidth. Then, the 1000G people suggested me to use AWS. It looks like a good solution, but I have some doubts, and I would appreciate if other users of AWS can ease my concerns.

    1. I don't understand the language used in the AWS website ("instances", "API", bla, bla, bla). May I assume that if I start an EC2 instance, I will connect to it through ssh as with any remote machine, and be able to install samtools and what not?

    2. They claim most of the 1000 Genomes Project data is available in a "bucket", and they mention several ways of accessing it that I don't know about. Will I be able to samtools-view the bam files or read fastq files?

    3. Assuming so, how fast the data would be transferred from that bucket to my EC2 instance? Most of the time consumed by the pipeline before was to download. I need to know the speed of data transfer to estimate the cost.

    4. Almost the first thing AWS asks you for is your credit card number. I don't want to give mine, and there's none available for the lab. Do you know of alternative ways to pay? We have a budget, but it's managed by the University, which requires invoices and so on.

    Thank you.

Latest Articles

Collapse

  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    Today, 06:43 AM
  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-21-2024, 07:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-20-2024, 07:23 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-17-2024, 06:54 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-14-2024, 07:24 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Working...
X