Hi,
I hope this is the right forum to post to, if not I apologize.
I am new to RNAseq analysis, and currently undertaking a project doing differential gene expression analysis on human cell lines (disease vs. control). I am using Trimmomatic to perform trimming on the sample data, and I had some questions about the best parameters to use for the specific project I am doing.
The sequencing was done with an Illumina NovaSeq machine using paired end sequencing and 10m reads per sample. After performing FastQC I can see that the "Illumina Universal Adapter" is over represented in most samples, and that the read length is 151bp.
For reference here is the example posted on the Trimmomatic website:
java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36
My main questions are:
1) what are the best numeric values to use for the end of the ILLUMINACLIP argument (e.g., ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True)? And does this differ depending on what the project is about (e.g., why would this be different for a RNAseq project versus a DNA assembly project)?
2) what would the best / most common values for the LEADING (remove low quality bases from the beginning), TRAILING (remove low quality bases from the end), and MINLEN (remove reads below a minimum length) arguments?
I apologize if these are very basic questions. I am not sure what the best practices are for performing this type of QC, and curious how it differs between research projects and what the standard practices are.
Thanks you for you help.
Nathan
I hope this is the right forum to post to, if not I apologize.
I am new to RNAseq analysis, and currently undertaking a project doing differential gene expression analysis on human cell lines (disease vs. control). I am using Trimmomatic to perform trimming on the sample data, and I had some questions about the best parameters to use for the specific project I am doing.
The sequencing was done with an Illumina NovaSeq machine using paired end sequencing and 10m reads per sample. After performing FastQC I can see that the "Illumina Universal Adapter" is over represented in most samples, and that the read length is 151bp.
For reference here is the example posted on the Trimmomatic website:
java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36
My main questions are:
1) what are the best numeric values to use for the end of the ILLUMINACLIP argument (e.g., ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True)? And does this differ depending on what the project is about (e.g., why would this be different for a RNAseq project versus a DNA assembly project)?
2) what would the best / most common values for the LEADING (remove low quality bases from the beginning), TRAILING (remove low quality bases from the end), and MINLEN (remove reads below a minimum length) arguments?
I apologize if these are very basic questions. I am not sure what the best practices are for performing this type of QC, and curious how it differs between research projects and what the standard practices are.
Thanks you for you help.
Nathan