Hi everyone,
I want to share my pipeline to analyze RNAseq data without replicates because I'm not finding the expected results, and I ask for help in order to improve it.
My experimental design consist of a time-series experiment without replicates and 2 time-points: T0 and 6h hours of treatment using r5020 hormone (R6h).
So for this analysis I have 2 samples that I want to compare.
Furthermore I followed TopHat/cufflinks nature protocol paper and found less than expected differentially expressed genes at R6h compared to T0.
In our lab we know by previous microarray experiments that after 6h of treatment we could end up with almost 4000 regulated genes, but using RNAseq analysis I found more or less 600 regulated genes.
A good point of this is that genes found by RNAseq data seems to have a good correlation with microarray data, but I was expecting to find more genes.
After this results I search a bit on internet to see which kind of answers people publish related to this, but didn't find any clear answer.
I was wondering if the absence of replicates could affect dramatically the differential gene analysis, so I want to ask your opinion about this.
And also I paste here the bash code I use to run tophat/cufflinks to know if i'm doing it right (I hope so
).
Finally, Hope this question could serve anybody else doing RNAseq analysis without replicates
Thanks everybody wish to contribute.
I want to share my pipeline to analyze RNAseq data without replicates because I'm not finding the expected results, and I ask for help in order to improve it.
My experimental design consist of a time-series experiment without replicates and 2 time-points: T0 and 6h hours of treatment using r5020 hormone (R6h).
So for this analysis I have 2 samples that I want to compare.
Furthermore I followed TopHat/cufflinks nature protocol paper and found less than expected differentially expressed genes at R6h compared to T0.
In our lab we know by previous microarray experiments that after 6h of treatment we could end up with almost 4000 regulated genes, but using RNAseq analysis I found more or less 600 regulated genes.
A good point of this is that genes found by RNAseq data seems to have a good correlation with microarray data, but I was expecting to find more genes.
After this results I search a bit on internet to see which kind of answers people publish related to this, but didn't find any clear answer.
I was wondering if the absence of replicates could affect dramatically the differential gene analysis, so I want to ask your opinion about this.
And also I paste here the bash code I use to run tophat/cufflinks to know if i'm doing it right (I hope so
). Finally, Hope this question could serve anybody else doing RNAseq analysis without replicates
Code:
#!/bin/bash cuffdiff=/soft/bin/cuffdiff cufflinks=/soft/bin/cufflinks cuffmerge=/soft/bin/cuffmerge genes=~/daniel/tracks/human_genome_19/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf genome=~/daniel/tracks/human_genome_19/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome genomeFA=~/daniel/tracks/human_genome_19/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa mask=~/projects/rna-seq/large/analysis_4/mask.gtf tophat=/soft/bin/tophat transcriptome=~/daniel/RNAseq/New_analysis/Guille_samples/mapping-assembly/transcriptome_data/known ## T0 sample analysis $tophat -p 8 -G $genes --transcriptome-index=$transcriptome -o T0 --no-coverage-search -g 10 -r 98 $genome T0.read1 T0.read2 $cufflinks -p 8 -o T0 -M $mask --no-update-check T0/T0.accepted_hits.bam ## R6h sample analysis $tophat -p 8 -G $genes --transcriptome-index=$transcriptome -o R6h --no-coverage-search -g 10 -r 98 $genome R6h.read1 R6h.read2 $cufflinks -p 8 -o R6h -M $mask --no-update-check R6h/R6h.accepted_hits.bam ## Merge assemblies $cuffmerge -g $genes -s $genomeFA -p 8 assemblies.txt ## Start differential expression analysis $cuffdiff -o diff_out -b $genomeFA -p 8 -L T0,R6h -u merged_asm/merged.gtf T0/T0.accepted_hits.bam R6h/R6h.accepted_hits.bam -T -M $mask ##-------- CummeRbund R analysis ------- ##
Comment