Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • quantrix
    Member
    • Jan 2011
    • 21

    What do you do with your database?

    Dear All,
    In a different thread which I posted earlier (Core cluster setup...), westerman suggested I post a new thread with the question what do you do with your DB. I am indeed curious what do you do with your database? I would believe that trying to store the NGS data in something like a SQL database is a lost enterprise. So my questions are

    1) What do you do with your Database?

    2) How do you store your NGS data?

    3) Do you have any troubles with accessing your data on a repeated basis?

    4) What are the biggest bottlenecks you commonly encounter with regards to the data management?

    5) Do you have a commercial solution or a home-grown one?

    Thank you for your time and I shall look forward to your replies.
    Regards
    Quantrix
  • colindaven
    Senior Member
    • Oct 2008
    • 417

    #2
    At the moment we don't use a database. As you say the files are huge. It would be important to store variants etc for comparison if you are always working on one large project, but here we have a lot of smaller/medium projects which aren't relevant to compare to each other.
    Also keep in mind your users might not be trained in database-based analysis, so a good front end will be important.

    Comment

    • jkbonfield
      Senior Member
      • Jul 2008
      • 146

      #4
      We had an oracle DB housing our SRF + fastq files when we were still generating those. It was huge, but worked well and had a fuse layer to transparently make it visible to the users. The DB was in two halves - a large set of partitions holding blobs (actually oracle "secure files" I think) and a far smaller meta-data component that tracked where things were. It would have worked OK using a filesystem instead of the binary blobs though - there are pros and cons to each method.

      We've since switched both format and DB mechanism for raw data: we store BAM files in an iRODs system.

      The analysis bams & co (ie mapped or assembled data, vcf files, etc) are less clearly divided - stored in various project/group directories over a variety of file system types; slow & fast NFS storage, lustre, etc.

      The only real bottlenecks are if someone tries to access a single DB layer (like the fuse layer) from 1000+ cores on our cpu farm. We require that people copy data to something more scalable first which we use Lustre for.

      Comment

      • quantrix
        Member
        • Jan 2011
        • 21

        #5
        Hi Mapper and jkbonfield,
        That is very helpful indeed!
        Thanks
        Quantrix

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          Yesterday, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM
        • seqadmin
          Investigating the Gut Microbiome Through Diet and Spatial Biology
          by seqadmin




          The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
          02-24-2025, 06:31 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        37 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        43 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        35 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        191 views
        0 reactions
        Last Post seqadmin  
        Working...