If I have 6 samples, call them 1,2,3,4,5,6 (they could be 6 lanes or a 6 way multiplexed experiement). Then I want to compare 1,2,3 vs 4,5,6.
So if I map the reads using tophat, I assume I can run all the 1,2,3,4,5,6 in parallel and then cat and sort the relavant sam files together before I run cufflinks.
So
cat tophat1 tophat2 tophat3 | sort > 123.sam
cat tophat4 tophat5 tophat6 | sort > 456.sam
Then if I run cufflinks on 123.sam and 456.sam I get 123.transcripts.gtf and 456.transcripts.gtf.
So I was wondering what is the next step? Can I do
cuffcompare 123.transcripts.gtf -r geneannotations.gtf
cuffcompare 456.transcripts.gtf -r geneannotations.gtf
(geneannotations.gtf is an Ensembl annotations file)
so I get
123.transcripts.tmap and 456.transcripts.tmap
then I just compare the two data sets RPKM values using Ensembl ID as the in the same way you would use probe_id in a traditional array experiement?
Do I need to do any further normalization?
Am I missing something!?
Thanks in advance!
So if I map the reads using tophat, I assume I can run all the 1,2,3,4,5,6 in parallel and then cat and sort the relavant sam files together before I run cufflinks.
So
cat tophat1 tophat2 tophat3 | sort > 123.sam
cat tophat4 tophat5 tophat6 | sort > 456.sam
Then if I run cufflinks on 123.sam and 456.sam I get 123.transcripts.gtf and 456.transcripts.gtf.
So I was wondering what is the next step? Can I do
cuffcompare 123.transcripts.gtf -r geneannotations.gtf
cuffcompare 456.transcripts.gtf -r geneannotations.gtf
(geneannotations.gtf is an Ensembl annotations file)
so I get
123.transcripts.tmap and 456.transcripts.tmap
then I just compare the two data sets RPKM values using Ensembl ID as the in the same way you would use probe_id in a traditional array experiement?
Do I need to do any further normalization?
Am I missing something!?
Thanks in advance!
Comment