Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes in the Amazon cloud?

    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

  • #2
    Originally posted by throwaway View Post
    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

    Have you checked http://1000genomes.org/page.php?

    Comment


    • #3
      Yes. None of the information on that page or the data access page seems pertinent to Amazon storage. Searching for "amazon" or "aws" only turns up a reference to the Ensembl dataset, and doesn't make it cleaer how to access the BAM files.

      Comment


      • #4
        FYI - having the data in the AWS Public Data Catalog/S3 would be neat for people analyzing data on AWS EC2 (their cloud computing infrastructure) because transfering data within a region is free and very fast.

        Amazon S3 is cloud object storage with industry-leading scalability, data availability, security, and performance. S3 is ideal for data lakes, mobile applications, backup and restore, archival, IoT devices, ML, AI, and analytics.

        There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon S3 within the same Region or for data transferred between the Amazon EC2 Northern Virginia Region and the Amazon S3 US Standard Region.
        Last edited by spenthil; 04-27-2010, 01:38 PM.
        --
        Senthil Palanisami

        Comment


        • #5
          Location of 1000 genomes data on s3

          s3://1000genomes

          Comment


          • #6
            How does one decrypt this s3 link to actually view/download the data?
            --
            bioinfosm

            Comment


            • #7
              I would recommend installing S3fox or similar S3 browser. Since the bucket is public, just type /1000genomes into the location windows (every bucket ID in S3 is unique)

              Screenshot: http://img.skitch.com/20100622-geb3s...ngw3rrecc1.jpg

              and

              Get your point across with fewer words using annotation, shapes and sketches, so that your ideas become reality faster.


              Each individual BAM file is addressable, e.g.



              (added later)

              Also if you use curl or a browser and point to http://1000genomes.s3.amazonaws.com/ you'll get the XML response
              Last edited by mndoci; 06-22-2010, 10:34 PM. Reason: added XML response

              Comment


              • #8
                S3fox is great. I also like Bucket Explorer (commerical, but there's a 30-day trial). If the analysis tools you are using are expecting a filesystem, you could create an AMI and try using s3fs or subcloud. Due to the large size of the current dataset, EBS is just not an option as it is for other public datasets which are less than 1 TB.

                On thing to be aware of is that because S3 does not natively understand directories, it is up to the clients to infer the directory structure. Unfortunately some clients differ in this, and so when you mount the bucket using something like s3fs, the directory structure may not appear correctly.

                Comment


                • #9
                  Also, the AWS console now allows access to S3: http://aws.amazon.com/console/#s3

                  Haven't used it myself, but it looks nice enough.
                  --
                  Senthil Palanisami

                  Comment


                  • #10
                    any update on this issue?

                    thanks!

                    Comment


                    • #11
                      not directly on-topic, but you can get past the 1TB limit for ebs volumes if you use raid 0 and stripe data across multiple volumes - there's a nice article by eric hammond - http://alestic.com/2009/06/ec2-ebs-raid

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Best Practices for Single-Cell Sequencing Analysis
                        by seqadmin



                        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                        06-06-2024, 07:15 AM
                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 07:49 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 07:23 AM
                      0 responses
                      14 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-17-2024, 06:54 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 06-14-2024, 07:24 AM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X