I am working with 2 biological replicates and 6 sample types.
Sample type/Biological Replicate/Drug Treatment
A / 1 / ―
B / 1 / ―
C / 1 / ―
A / 1 / +
B / 1 / +
C / 1 / +
A / 2 / ―
B / 2 / ―
C / 2 / ―
A / 2 / +
B / 2 / +
C / 2 / +
The data was prepared using single-end Illumina deep sequencing. Data analysis is being conducted utilizing Galaxy and its tools.
All of the original RNAseq files were preprocessed by clipping and trimming using standard parameters.
To compare +/- drug treatment, the following procedure was undertaken:
1. NGS: Mapping → Bowtie2
2. NGS: RNA Analysis → Cufflinks
a. SAM or BAM file of aligned RNA-Seq reads: Bowtie2: Aligned Reads
b. A reference annotation and reference sequence data was utilized
c. Bias Correction and multi-read correct were performed
d. Reference sequence data
3. NGS: RNA Analysis → Cuffmerge
a. GTF file produced by Cufflinks: Cufflinks output file for sample of interest (Replicate #1_assembled transcripts)
b. GTF file produced by Cufflinks: 2nd Cufflinks output file for sample of interest (Replicate #2_assembled transcripts)
c. A reference annotation and reference sequence data was utilized
4. NGS: SAM Tools → Merge BAM Files
5. NGS: RNA Analysis → Cuffcompare compare
a. GTF file produced by Cufflinks: 1 Cuffmerged_Combined transcripts.gtf
b. GTF file produced by Cufflinks: 2 Cuffmerged_Combined transcripts.gtf
c. A reference annotation and reference sequence data was utilized
6. NGS: RNA Analysis → Cuffdiff find significant changes in transcript expression, splicing, and promoter use
a. Transcripts: Cuffcompare_Combined transcripts.gtf
b. Perform replicate analysis: No
c. SAM or BAM file of aligned RNA-Seq reads: merged bam (drug treatment (-))
d. SAM or BAM file of aligned RNA-Seq reads: merged bam (drug treatment (+))
e. A reference annotation and reference sequence data was utilized
Steps #3-4 were done using each pair of sample replicates (e.g., A1- & A2-; B1- & B2-; etc.). In step #5, the Cuffmerge files for + and – drug treatment were compared (e.g., (A1(-)+A2(-)) vs. (A1(+)+A2(+)), which produced a combined transcripts.gtf file that was input into step #6, Cuffdiff.
Does this appear to be the correct work-flow to be able to compare the two groups (+/- drug treatment)? The output for Cuffdiff gene differential expression testing shows there are no significant differences between the groups; however, examining the fold change, along with the visual inspection of the alignment data using IGV, there are significant differences. I am concerned the above workflow may be “averaging-out” the significant differences.
I considered utilized the replicate analysis option in Cuffdiff; however, I was confused as to what the correct transcript input file would be to allow for a non-biased analysis.
Any suggestions or feedback would be greatly appreciated! Thank you advice for your time!
Sample type/Biological Replicate/Drug Treatment
A / 1 / ―
B / 1 / ―
C / 1 / ―
A / 1 / +
B / 1 / +
C / 1 / +
A / 2 / ―
B / 2 / ―
C / 2 / ―
A / 2 / +
B / 2 / +
C / 2 / +
The data was prepared using single-end Illumina deep sequencing. Data analysis is being conducted utilizing Galaxy and its tools.
All of the original RNAseq files were preprocessed by clipping and trimming using standard parameters.
To compare +/- drug treatment, the following procedure was undertaken:
1. NGS: Mapping → Bowtie2
2. NGS: RNA Analysis → Cufflinks
a. SAM or BAM file of aligned RNA-Seq reads: Bowtie2: Aligned Reads
b. A reference annotation and reference sequence data was utilized
c. Bias Correction and multi-read correct were performed
d. Reference sequence data
3. NGS: RNA Analysis → Cuffmerge
a. GTF file produced by Cufflinks: Cufflinks output file for sample of interest (Replicate #1_assembled transcripts)
b. GTF file produced by Cufflinks: 2nd Cufflinks output file for sample of interest (Replicate #2_assembled transcripts)
c. A reference annotation and reference sequence data was utilized
4. NGS: SAM Tools → Merge BAM Files
5. NGS: RNA Analysis → Cuffcompare compare
a. GTF file produced by Cufflinks: 1 Cuffmerged_Combined transcripts.gtf
b. GTF file produced by Cufflinks: 2 Cuffmerged_Combined transcripts.gtf
c. A reference annotation and reference sequence data was utilized
6. NGS: RNA Analysis → Cuffdiff find significant changes in transcript expression, splicing, and promoter use
a. Transcripts: Cuffcompare_Combined transcripts.gtf
b. Perform replicate analysis: No
c. SAM or BAM file of aligned RNA-Seq reads: merged bam (drug treatment (-))
d. SAM or BAM file of aligned RNA-Seq reads: merged bam (drug treatment (+))
e. A reference annotation and reference sequence data was utilized
Steps #3-4 were done using each pair of sample replicates (e.g., A1- & A2-; B1- & B2-; etc.). In step #5, the Cuffmerge files for + and – drug treatment were compared (e.g., (A1(-)+A2(-)) vs. (A1(+)+A2(+)), which produced a combined transcripts.gtf file that was input into step #6, Cuffdiff.
Does this appear to be the correct work-flow to be able to compare the two groups (+/- drug treatment)? The output for Cuffdiff gene differential expression testing shows there are no significant differences between the groups; however, examining the fold change, along with the visual inspection of the alignment data using IGV, there are significant differences. I am concerned the above workflow may be “averaging-out” the significant differences.
I considered utilized the replicate analysis option in Cuffdiff; however, I was confused as to what the correct transcript input file would be to allow for a non-biased analysis.
Any suggestions or feedback would be greatly appreciated! Thank you advice for your time!