Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NGS data directory structures

    After a recent influx of new data, I am beginning to doubt the way that I previously stored my data (both raw and processed), so I figured I might as well ask on here what people's directory structures look like. Maybe there is some kind of ideal structure that people tend to use? I've taught myself most of what I know of the bioinformatics, and while I now feel quite fluent in both Python, R and some basic bash coding, I think I'm lacking quite a bit in those basic things that are taught at computer science courses/programmes.

    This is what my first few levels of the directory structure looks like at the moment:
    Code:
    ~/data/rna/fastq/
               bam
               counts
               fpkm
               de_analyses
    ... and each directory there has subdirectories as follows:

    Code:
    .../cell_type_1/<appropriate data files>
        cell_type_2
        cell_type_3
    So, it's kind of cell type-oriented at the moment. I have some scripts (for differential expression, for example) that automatically finds the files in the structure specified in the command line argument, such as "de_analysis.R --samples cell_type_1,cell_type_2" (which run DESeq2 on cell_type_1 vs. cell_type_2).

    I started doubting this structure now that I got more data, some of which is new sequencing runs of the same cell types, but under different cultivating conditions. I could just keep my current structure and name the new data something like "cell_type_1_b", but I don't want to do anything quite yet without hearing what other people tend to do.

    I thought about making it more experiment-centric (i.e. use dates up in the hierarchy and appropriate subfolders below), but then I'd have to re-write a lot of my scripts. If that would be generally better (especially for the future, with even more data) then I'd do it, though, but It'd be nice to hear other people's thoughts first =P

  • #2
    For whatever it's worth, I generally group everything by project, since I've rarely needed to analyze samples across projects. So it's something like Project/[fastq|bam|counts|DE_analysis]. I've never done subdirectories for groups, I just have a sample table that has all of the group information within a project that I can parse in R (or python) when needed. My scripts for mapping are separate from this, since that involves moving files to/from a cluster. For DE and similar analyses, the scripts often get tweaked for each project anyway, so each project just has its own (normally an Rmd file, in the case of R-based analyses).

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-25-2024, 11:49 AM
    0 responses
    19 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-24-2024, 08:47 AM
    0 responses
    17 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    62 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X