Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes in the Amazon cloud?

    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

  • #2
    Originally posted by throwaway View Post
    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

    Have you checked http://1000genomes.org/page.php?

    Comment


    • #3
      Yes. None of the information on that page or the data access page seems pertinent to Amazon storage. Searching for "amazon" or "aws" only turns up a reference to the Ensembl dataset, and doesn't make it cleaer how to access the BAM files.

      Comment


      • #4
        FYI - having the data in the AWS Public Data Catalog/S3 would be neat for people analyzing data on AWS EC2 (their cloud computing infrastructure) because transfering data within a region is free and very fast.

        Amazon S3 is cloud object storage with industry-leading scalability, data availability, security, and performance. S3 is ideal for data lakes, mobile applications, backup and restore, archival, IoT devices, ML, AI, and analytics.

        There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon S3 within the same Region or for data transferred between the Amazon EC2 Northern Virginia Region and the Amazon S3 US Standard Region.
        Last edited by spenthil; 04-27-2010, 01:38 PM.
        --
        Senthil Palanisami

        Comment


        • #5
          Location of 1000 genomes data on s3

          s3://1000genomes

          Comment


          • #6
            How does one decrypt this s3 link to actually view/download the data?
            --
            bioinfosm

            Comment


            • #7
              I would recommend installing S3fox or similar S3 browser. Since the bucket is public, just type /1000genomes into the location windows (every bucket ID in S3 is unique)

              Screenshot: http://img.skitch.com/20100622-geb3s...ngw3rrecc1.jpg

              and

              Get your point across with fewer words using annotation, shapes and sketches, so that your ideas become reality faster.


              Each individual BAM file is addressable, e.g.



              (added later)

              Also if you use curl or a browser and point to http://1000genomes.s3.amazonaws.com/ you'll get the XML response
              Last edited by mndoci; 06-22-2010, 10:34 PM. Reason: added XML response

              Comment


              • #8
                S3fox is great. I also like Bucket Explorer (commerical, but there's a 30-day trial). If the analysis tools you are using are expecting a filesystem, you could create an AMI and try using s3fs or subcloud. Due to the large size of the current dataset, EBS is just not an option as it is for other public datasets which are less than 1 TB.

                On thing to be aware of is that because S3 does not natively understand directories, it is up to the clients to infer the directory structure. Unfortunately some clients differ in this, and so when you mount the bucket using something like s3fs, the directory structure may not appear correctly.

                Comment


                • #9
                  Also, the AWS console now allows access to S3: http://aws.amazon.com/console/#s3

                  Haven't used it myself, but it looks nice enough.
                  --
                  Senthil Palanisami

                  Comment


                  • #10
                    any update on this issue?

                    thanks!

                    Comment


                    • #11
                      not directly on-topic, but you can get past the 1TB limit for ebs volumes if you use raid 0 and stripe data across multiple volumes - there's a nice article by eric hammond - http://alestic.com/2009/06/ec2-ebs-raid

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Exploring the Dynamics of the Tumor Microenvironment
                        by seqadmin




                        The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                        07-08-2024, 03:19 PM
                      • seqadmin
                        Exploring Human Diversity Through Large-Scale Omics
                        by seqadmin


                        In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                        06-25-2024, 06:43 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 07-10-2024, 07:30 AM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-03-2024, 09:45 AM
                      0 responses
                      201 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-03-2024, 08:54 AM
                      0 responses
                      212 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 07-02-2024, 03:00 PM
                      0 responses
                      193 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X