Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1000 Genomes in the Amazon cloud?

    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

  • #2
    Originally posted by throwaway View Post
    These slides (p. 12) claim that the 1000 genomes BAM files were to be made available in the Amazon cloud in Febuary. That would be hugely useful to me, if so, but I can't seem to find them mentioned in the AWS Public Data Catalog. The claimed sizes for the Ensembl data sets listed there only come to about 250G combined.

    Anyone know whether the BAM files actually were made publically available, and if so, how I can access them?

    Have you checked http://1000genomes.org/page.php?

    Comment


    • #3
      Yes. None of the information on that page or the data access page seems pertinent to Amazon storage. Searching for "amazon" or "aws" only turns up a reference to the Ensembl dataset, and doesn't make it cleaer how to access the BAM files.

      Comment


      • #4
        FYI - having the data in the AWS Public Data Catalog/S3 would be neat for people analyzing data on AWS EC2 (their cloud computing infrastructure) because transfering data within a region is free and very fast.

        Amazon S3 is cloud object storage with industry-leading scalability, data availability, security, and performance. S3 is ideal for data lakes, mobile applications, backup and restore, archival, IoT devices, ML, AI, and analytics.

        There is no Data Transfer charge for data transferred between Amazon EC2 and Amazon S3 within the same Region or for data transferred between the Amazon EC2 Northern Virginia Region and the Amazon S3 US Standard Region.
        Last edited by spenthil; 04-27-2010, 01:38 PM.
        --
        Senthil Palanisami

        Comment


        • #5
          Location of 1000 genomes data on s3

          s3://1000genomes

          Comment


          • #6
            How does one decrypt this s3 link to actually view/download the data?
            --
            bioinfosm

            Comment


            • #7
              I would recommend installing S3fox or similar S3 browser. Since the bucket is public, just type /1000genomes into the location windows (every bucket ID in S3 is unique)

              Screenshot: http://img.skitch.com/20100622-geb3s...ngw3rrecc1.jpg

              and

              Get your point across with fewer words using annotation, shapes and sketches, so that your ideas become reality faster.


              Each individual BAM file is addressable, e.g.



              (added later)

              Also if you use curl or a browser and point to http://1000genomes.s3.amazonaws.com/ you'll get the XML response
              Last edited by mndoci; 06-22-2010, 10:34 PM. Reason: added XML response

              Comment


              • #8
                S3fox is great. I also like Bucket Explorer (commerical, but there's a 30-day trial). If the analysis tools you are using are expecting a filesystem, you could create an AMI and try using s3fs or subcloud. Due to the large size of the current dataset, EBS is just not an option as it is for other public datasets which are less than 1 TB.

                On thing to be aware of is that because S3 does not natively understand directories, it is up to the clients to infer the directory structure. Unfortunately some clients differ in this, and so when you mount the bucket using something like s3fs, the directory structure may not appear correctly.

                Comment


                • #9
                  Also, the AWS console now allows access to S3: http://aws.amazon.com/console/#s3

                  Haven't used it myself, but it looks nice enough.
                  --
                  Senthil Palanisami

                  Comment


                  • #10
                    any update on this issue?

                    thanks!

                    Comment


                    • #11
                      not directly on-topic, but you can get past the 1TB limit for ebs volumes if you use raid 0 and stripe data across multiple volumes - there's a nice article by eric hammond - http://alestic.com/2009/06/ec2-ebs-raid

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Advanced Tools Transforming the Field of Cytogenomics
                        by seqadmin


                        At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                        09-26-2023, 06:26 AM
                      • seqadmin
                        How RNA-Seq is Transforming Cancer Studies
                        by seqadmin



                        Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                        09-07-2023, 11:15 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 09:38 AM
                      0 responses
                      9 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-27-2023, 06:57 AM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 09-26-2023, 07:53 AM
                      1 response
                      23 views
                      0 likes
                      Last Post seed_phrase_metal_storage  
                      Started by seqadmin, 09-25-2023, 07:42 AM
                      0 responses
                      17 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X