Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What do you do with your database?

    Dear All,
    In a different thread which I posted earlier (Core cluster setup...), westerman suggested I post a new thread with the question what do you do with your DB. I am indeed curious what do you do with your database? I would believe that trying to store the NGS data in something like a SQL database is a lost enterprise. So my questions are

    1) What do you do with your Database?

    2) How do you store your NGS data?

    3) Do you have any troubles with accessing your data on a repeated basis?

    4) What are the biggest bottlenecks you commonly encounter with regards to the data management?

    5) Do you have a commercial solution or a home-grown one?

    Thank you for your time and I shall look forward to your replies.
    Regards
    Quantrix

  • #2
    At the moment we don't use a database. As you say the files are huge. It would be important to store variants etc for comparison if you are always working on one large project, but here we have a lot of smaller/medium projects which aren't relevant to compare to each other.
    Also keep in mind your users might not be trained in database-based analysis, so a good front end will be important.

    Comment


    • #4
      We had an oracle DB housing our SRF + fastq files when we were still generating those. It was huge, but worked well and had a fuse layer to transparently make it visible to the users. The DB was in two halves - a large set of partitions holding blobs (actually oracle "secure files" I think) and a far smaller meta-data component that tracked where things were. It would have worked OK using a filesystem instead of the binary blobs though - there are pros and cons to each method.

      We've since switched both format and DB mechanism for raw data: we store BAM files in an iRODs system.

      The analysis bams & co (ie mapped or assembled data, vcf files, etc) are less clearly divided - stored in various project/group directories over a variety of file system types; slow & fast NFS storage, lustre, etc.

      The only real bottlenecks are if someone tries to access a single DB layer (like the fuse layer) from 1000+ cores on our cpu farm. We require that people copy data to something more scalable first which we use Lustre for.

      Comment


      • #5
        Hi Mapper and jkbonfield,
        That is very helpful indeed!
        Thanks
        Quantrix

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Developments in Metagenomics
          by seqadmin





          Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
          09-23-2024, 06:35 AM
        • seqadmin
          Understanding Genetic Influence on Infectious Disease
          by seqadmin




          During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

          Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
          09-09-2024, 10:59 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 10-02-2024, 04:51 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-01-2024, 07:10 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-30-2024, 08:33 AM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-26-2024, 12:57 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Working...
        X