Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lkomo
    Junior Member
    • Apr 2015
    • 4

    nested for loop to concatenate fastq files

    My pooled PE RNA-Seq data was demultiplexed by the sequencing facility, so the data I receive is a directory with sub directories for each sample that contain the R1 and R2 fastq files for each lane (i.e., the main directory "FISH_RNA_SEQ" has 96 folders, each labeled by sample-like "SpA.Treatment1.Rep1", etc.). If I am in a sub-directory for a particular sample, I can concatenate across lanes and write to a file in a new directory so I have two files for each sample (R1/R2) using this for loop:

    for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/samplename_cat_$SUFFIX
    done

    However, this requires me to manually run this for each of the 96 samples, going into the sub-directory and typing in the desired output name. Since I will have to repeat this in the future, does anyone have suggestions about how to use a nested for loop (or other way) to do this automatically/iteratively do this from the main directory for all subdirectories, naming the output files with by the subdirectory (i.e. sample name)?

    Working from the main directory, I was testing something like:
    for dir in *; do
    (for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/test_cat_$SUFFIX
    done)

    But this doesn't seem to work, and it doesn't solve the problem of naming the output files according to the sample names. Any suggestions appreciated!
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Do you just want to use the directory name (e.g., SpA.Treatment1.Rep1) as the prefix, or some variant of whatever the file names are?

    Comment

    • lkomo
      Junior Member
      • Apr 2015
      • 4

      #3
      Yes, ideally the directory name would be the file name prefix, so for example a 'Sample1.treatment1.Rep1' directory would produce two output files like: 'Sample1.treatment1.Rep1_cat_R1.001.fastq' and 'Sample1.treatment1.Rep1_cat_R2.001.fastq'

      And then the same for all the other directories/samples...
      Thank you!

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Code:
        for dir in `find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n"`
        do
            cd $dir
            cat *_L???_R1_*.fastq > $dir_R1.fastq
            cat *_L???_R2_*.fastq > $dir_R2.fastq
            cd ..
        done
        or something like that.

        Comment

        • lkomo
          Junior Member
          • Apr 2015
          • 4

          #5
          thank you! I think I see what each part does except the "%f\n"? my apologies if it's obvious-I'm still fairly new.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            That's just the formatting that the results should be returned in.

            Comment

            • lkomo
              Junior Member
              • Apr 2015
              • 4

              #7
              great, thank you so much for the help!

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 11:58 AM
              0 responses
              10 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              25 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              35 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              58 views
              0 reactions
              Last Post SEQadmin2  
              Working...