Hi all,
I have many filles which I need to quality trim using cutadapt. The sequences are paired end and the filenames are on the following format:
sample#_Lane#_R#_.fastq.gz , where # is number.
e.g.
29_L001_R1.fastq, 29_L001_R2.fastq, 29_L002_R1.fastq, 29_L002_R2.fastq,
30_L003_R1.fastq, 30_L003_R2.fastq, 30_L004_R1.fastq, 30_L004_R2.fastq, etc.
I will use cutadapt to trim these sequences using this command:
cutadapt -a adaptors_to_trim -A adaptors_to_trim -q 20 --minimum-length 5 -o outputfile_R1 -p outputfile_R2 inputfile_R1 inputfile_R2
Further I would like to use GNU parallel to pipe this to use as much of my 16 cores as possible and in order to kind of loop all files into cutadapt.
This means that for each cutadapt run I need to input R1 and R2 of all my samples but keeping sample#_Lane# seperate.
So I was thinking of listing all input files followed by piping that into gnu parallel and there defining the R1 and R2 for each sample# in combination with lane# followed by input into cutadapt. Something like this:
find *_L00*_R*.fastq.gz | parallel DEFINING_TWO INPUT_FILES_FROM_SAME_SAMPLE_AND_LANE_ j +0 cutadapt -a adaptors_to_trim -A adaptors_to_trim -q 20 --minimum-length 5 -o outputfile_R1 -p outputfile_R2 inputfile_R1 inputfile_R2
Is this possible at all?
Hope it makes sense. I will of course explain more if needed.
Thank you very much in advance.
Best,
Toke
I have many filles which I need to quality trim using cutadapt. The sequences are paired end and the filenames are on the following format:
sample#_Lane#_R#_.fastq.gz , where # is number.
e.g.
29_L001_R1.fastq, 29_L001_R2.fastq, 29_L002_R1.fastq, 29_L002_R2.fastq,
30_L003_R1.fastq, 30_L003_R2.fastq, 30_L004_R1.fastq, 30_L004_R2.fastq, etc.
I will use cutadapt to trim these sequences using this command:
cutadapt -a adaptors_to_trim -A adaptors_to_trim -q 20 --minimum-length 5 -o outputfile_R1 -p outputfile_R2 inputfile_R1 inputfile_R2
Further I would like to use GNU parallel to pipe this to use as much of my 16 cores as possible and in order to kind of loop all files into cutadapt.
This means that for each cutadapt run I need to input R1 and R2 of all my samples but keeping sample#_Lane# seperate.
So I was thinking of listing all input files followed by piping that into gnu parallel and there defining the R1 and R2 for each sample# in combination with lane# followed by input into cutadapt. Something like this:
find *_L00*_R*.fastq.gz | parallel DEFINING_TWO INPUT_FILES_FROM_SAME_SAMPLE_AND_LANE_ j +0 cutadapt -a adaptors_to_trim -A adaptors_to_trim -q 20 --minimum-length 5 -o outputfile_R1 -p outputfile_R2 inputfile_R1 inputfile_R2
Is this possible at all?
Hope it makes sense. I will of course explain more if needed.
Thank you very much in advance.
Best,
Toke
Comment