Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lkomo
    Junior Member
    • Apr 2015
    • 4

    nested for loop to concatenate fastq files

    My pooled PE RNA-Seq data was demultiplexed by the sequencing facility, so the data I receive is a directory with sub directories for each sample that contain the R1 and R2 fastq files for each lane (i.e., the main directory "FISH_RNA_SEQ" has 96 folders, each labeled by sample-like "SpA.Treatment1.Rep1", etc.). If I am in a sub-directory for a particular sample, I can concatenate across lanes and write to a file in a new directory so I have two files for each sample (R1/R2) using this for loop:

    for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/samplename_cat_$SUFFIX
    done

    However, this requires me to manually run this for each of the 96 samples, going into the sub-directory and typing in the desired output name. Since I will have to repeat this in the future, does anyone have suggestions about how to use a nested for loop (or other way) to do this automatically/iteratively do this from the main directory for all subdirectories, naming the output files with by the subdirectory (i.e. sample name)?

    Working from the main directory, I was testing something like:
    for dir in *; do
    (for SUFFIX in R1_001.fastq R2_001.fastq
    do
    cat *L001_$SUFFIX *L002_$SUFFIX *L003_$SUFFIX > ../test.cat.DDIG/test_cat_$SUFFIX
    done)

    But this doesn't seem to work, and it doesn't solve the problem of naming the output files according to the sample names. Any suggestions appreciated!
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Do you just want to use the directory name (e.g., SpA.Treatment1.Rep1) as the prefix, or some variant of whatever the file names are?

    Comment

    • lkomo
      Junior Member
      • Apr 2015
      • 4

      #3
      Yes, ideally the directory name would be the file name prefix, so for example a 'Sample1.treatment1.Rep1' directory would produce two output files like: 'Sample1.treatment1.Rep1_cat_R1.001.fastq' and 'Sample1.treatment1.Rep1_cat_R2.001.fastq'

      And then the same for all the other directories/samples...
      Thank you!

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Code:
        for dir in `find . -maxdepth 1 -mindepth 1 -type d -printf "%f\n"`
        do
            cd $dir
            cat *_L???_R1_*.fastq > $dir_R1.fastq
            cat *_L???_R2_*.fastq > $dir_R2.fastq
            cd ..
        done
        or something like that.

        Comment

        • lkomo
          Junior Member
          • Apr 2015
          • 4

          #5
          thank you! I think I see what each part does except the "%f\n"? my apologies if it's obvious-I'm still fairly new.

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            That's just the formatting that the results should be returned in.

            Comment

            • lkomo
              Junior Member
              • Apr 2015
              • 4

              #7
              great, thank you so much for the help!

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                Yesterday, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 11:08 AM
              0 responses
              6 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              53 views
              0 reactions
              Last Post SEQadmin2  
              Working...