Hi there!
I've tried to find answers to questions that apply to my scenario, but I'm still unsure about the kosher way to approach this. I should say, I'm very new to this side of bioinformatics.
I have data from four samples, with fastq files per read (R1 and R2) and per lane (lane 1 and lane 2)
For example, the fastq files from my first sample:
I was told generally that I should concatenate the fastq files prior to trimming and alignment because it would be easier to interpret the trimming and alignment statistics. My question then, is how can these be combined?
I.e. Should I combine all fastq files from a sample (e.g. all of the files above into a single file), combine all files within a sample by R1 and R2, or combine all files within a sample by L1 and L2.
Apologies for the simple questions. Again, I have done some research prior to posting this question, but still feel unsure about how to proceed.
Thanks in advance for any insight!
I've tried to find answers to questions that apply to my scenario, but I'm still unsure about the kosher way to approach this. I should say, I'm very new to this side of bioinformatics.
I have data from four samples, with fastq files per read (R1 and R2) and per lane (lane 1 and lane 2)
For example, the fastq files from my first sample:
SF10711_9-1-22_CAGATC_L001_R1_001.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_001.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_002.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_002.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_003.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_003.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_004.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_004.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_005.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_005.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_006.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_006.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_007.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_007.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_008.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_008.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_009.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_009.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_010.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_010.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_011.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_011.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_001.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_001.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_002.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_002.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_003.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_003.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_004.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_004.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_005.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_005.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_006.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_006.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_007.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_007.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_008.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_008.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_009.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_009.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_010.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_010.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_011.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_011.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_002.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_002.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_003.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_003.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_004.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_004.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_005.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_005.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_006.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_006.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_007.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_007.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_008.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_008.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_009.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_009.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_010.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_010.fastq.gz
SF10711_9-1-22_CAGATC_L001_R1_011.fastq.gz SF10711_9-1-22_CAGATC_L002_R1_011.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_001.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_001.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_002.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_002.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_003.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_003.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_004.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_004.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_005.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_005.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_006.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_006.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_007.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_007.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_008.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_008.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_009.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_009.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_010.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_010.fastq.gz
SF10711_9-1-22_CAGATC_L001_R2_011.fastq.gz SF10711_9-1-22_CAGATC_L002_R2_011.fastq.gz
I.e. Should I combine all fastq files from a sample (e.g. all of the files above into a single file), combine all files within a sample by R1 and R2, or combine all files within a sample by L1 and L2.
Apologies for the simple questions. Again, I have done some research prior to posting this question, but still feel unsure about how to proceed.
Thanks in advance for any insight!
Comment