Seqanswers Leaderboard Ad

**Cole Trapnell** · 12-08-2009, 09:08 AM

Then if I run cufflinks on 123.sam and 456.sam I get 123.transcripts.gtf and 456.transcripts.gtf.

So I was wondering what is the next step? Can I do

cuffcompare 123.transcripts.gtf -r geneannotations.gtf
cuffcompare 456.transcripts.gtf -r geneannotations.gtf

(geneannotations.gtf is an Ensembl annotations file)

You probably want:
cuffcompare -r geneannotations.gtf 123.transcripts.gtf

Looking then at the out.tracking file will then tell you how the Ensembl transcripts (along with whatever new transcripts you find) compare to each other in the two samples.

Do I need to do any further normalization?

That's a tougher question. RPKM is a measure of relative gene expression designed to allow comparisons between experiments, just as you are asking to do. There are a fair number of people working on ways of correcting and normalizing for various biases in RNA-Seq, but there are no widely accepted means of doing so that I am aware of.

**staylor** · 12-08-2009, 10:08 AM

Originally posted by Cole Trapnell View Post

You probably want:
cuffcompare -r geneannotations.gtf 123.transcripts.gtf

Looking then at the out.tracking file will then tell you how the Ensembl transcripts (along with whatever new transcripts you find) compare to each other in the two samples.

Do you mean I need to do

cuffcompare -r geneannotations.gtf 123.transcripts.gtf 456.transcripts.gtf

otherwise how will the tracking file know about the two samples?

That's a tougher question. RPKM is a measure of relative gene expression designed to allow comparisons between experiments, just as you are asking to do. There are a fair number of people working on ways of correcting and normalizing for various biases in RNA-Seq, but there are no widely accepted means of doing so that I am aware of.

Ok. Thanks for your help on this.

**Cole Trapnell** · 12-08-2009, 10:13 AM

Do you mean I need to do

cuffcompare -r geneannotations.gtf 123.transcripts.gtf 456.transcripts.gtf

otherwise how will the tracking file know about the two samples?

Yes, sorry. That's what I get for responding before coffee.

**staylor** · 12-08-2009, 10:17 AM

:-) No problem.

Thanks for your help.

**staylor** · 12-08-2009, 12:22 PM

I did

cuffcompare -r geneannotations.gtf 123.transcripts.gtf 456.transcripts.gtf

and got:

- - q1:CUFF.49|CUFF.49.0|100|320.741503|0.000000|0.000000|uniq -
- - q1:CUFF.922|CUFF.922.0|100|2080.321131|0.211081|0.210872|uniq -
- - q1:CUFF.2093|CUFF.2093.0|100|320.213468|0.000000|0.000000|uniq -
- - q1:CUFF.923|CUFF.923.1|100|2498.467902|0.159820|0.159979|uniq -
- - q1:CUFF.2964|CUFF.2964.0|100|136.004645|0.000000|0.000000|uniq -
- - q1:CUFF.3939|CUFF.3939.0|100|2146.036538|0.000000|0.000000|uniq -
- - q1:CUFF.4125|CUFF.4125.0|100|612.020902|0.000000|0.000000|uniq -
- - q1:CUFF.4410|CUFF.4410.0|100|4534.343539|0.000000|0.000000|uniq -
- - q1:CUFF.5384|CUFF.5384.0|100|1515.019712|0.000000|0.000000|uniq -
- - q1:CUFF.5667|CUFF.5667.0|100|1515.969648|0.000000|0.000000|uniq -
- - q1:CUFF.5952|CUFF.5952.0|100|5568.492065|0.000000|0.000000|uniq -
- - q1:CUFF.6333|CUFF.6333.0|100|2299.714905|0.000000|0.000000|uniq -
- - - q2:CUFF.611|CUFF.611.0|100|2209.360572|0.000000|0.000000|uniq
- - q1:CUFF.7536|CUFF.7536.0|100|702.690665|0.000000|0.000000|uniq -
- - q1:CUFF.7629|CUFF.7629.0|100|3756.201011|0.000000|0.000000|uniq -
- - q1:CUFF.9212|CUFF.9212.0|100|417.816071|0.000000|0.000000|uniq -
- - q1:CUFF.9554|CUFF.9554.0|100|9625.347396|0.000000|0.000000|uniq -
- - q1:CUFF.9797|CUFF.9797.0|100|2347.966478|0.000000|0.000000|uniq -
- - q1:CUFF.9965|CUFF.9965.0|100|702.690665|0.000000|0.000000|uniq -
- - q1:CUFF.8952|CUFF.8952.0|100|719.829462|0.000000|0.000000|uniq -
- - q1:CUFF.10691|CUFF.10691.0|100|1806.918854|0.000000|0.000000|uniq -
- - q1:CUFF.10799|CUFF.10799.1|100|3619.236510|0.284840|0.284870|uniq -
- - q1:CUFF.10839|CUFF.10839.0|100|2153.582326|0.000000|0.000000|uniq -
- - q1:CUFF.11538|CUFF.11538.0|100|7683.301808|0.000000|0.000000|uniq -
- - q1:CUFF.10782|CUFF.10782.0|100|89.705191|0.000000|0.000000|uniq -
- - q1:CUFF.10800|CUFF.10800.0|100|3902.596250|0.174811|0.174792|uniq -
- - q1:CUFF.10825|CUFF.10825.0|100|6510.429805|0.000000|0.000000|uniq -
- - q1:CUFF.12645|CUFF.12645.0|100|58.557555|0.000000|0.000000|uniq -
- - q1:CUFF.13320|CUFF.13320.0|100|1702.673535|0.000000|0.000000|uniq -
- - q1:CUFF.12088|CUFF.12088.0|100|6322.264673|0.000000|0.000000|uniq -
- - q1:CUFF.13147|CUFF.13147.0|100|8946.451886|0.000000|0.000000|uniq -
- - q1:CUFF.14256|CUFF.14256.0|100|782.998170|0.000000|0.000000|uniq -
- - q1:CUFF.14910|CUFF.14910.0|100|1953.822826|0.000000|0.000000|uniq -
- - - q2:CUFF.1367|CUFF.1367.0|100|2051.549103|0.000000|0.000000|uniq
- - q1:CUFF.15282|CUFF.15282.0|100|1686.457597|0.000000|0.000000|uniq -
- - q1:CUFF.15535|CUFF.15535.0|100|2326.148410|0.000000|0.000000|uniq -
- - q1:CUFF.16741|CUFF.16741.0|100|1297.275075|0.000000|0.000000|uniq -
- - q1:CUFF.17421|CUFF.17421.0|100|3055.953562|0.000000|0.000000|uniq -
- - q1:CUFF.17710|CUFF.17710.0|100|2845.146363|0.000000|0.000000|uniq -
- - q1:CUFF.19743|CUFF.19743.0|100|133.631524|0.000000|0.000000|uniq -
- - q1:CUFF.19375|CUFF.19375.0|100|2369.916793|0.000000|0.000000|uniq -
- - q1:CUFF.19582|CUFF.19582.0|100|4355.505260|0.000000|0.000000|uniq -
- - q1:CUFF.19789|CUFF.19789.0|100|5320.372181|0.000000|0.000000|uniq -
- - q1:CUFF.22524|CUFF.22524.0|100|647.153215|0.000000|0.000000|uniq -
- - q1:CUFF.22483|CUFF.22483.0|100|21957.950620|0.000000|0.000000|uniq -
- - q1:CUFF.22641|CUFF.22641.0|100|13234.040114|0.000000|0.000000|uniq -

Does this mean there are no overlaps for annotated genes? This sample was mutiplexed on 1 lane so there aren't as many reads as you would normally expect. Would this have something to do with it?

**townway** · 12-08-2009, 02:26 PM

In cufflinks document, cufflinks only accepted input SAM format by “sort –k 3,3 –k 4,4n *.sam”. but when I use the *sam format generated by BWA, and type sort command, the format change from

NCI-GA2:1:1:4:358#0 147 chr1 53372939 60 35M = 53372816 -158 ATGGGCTGGAT
GATCCCTGTTCAGGCCTAATCCGC A>BAA>=BBBBBB@>?BBBA@8>BB>BBABB>BCB XT:A:U NM:i:0 SM:i:37 AM:i:23 X0:i:1 X1:
i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:35
NCI-GA2:1:1:4:1683#0 99 chrUn_gl000220 159630 0 35M = 159744 149 CTAGGGCGCGGGCCCGGGT
GGAGCCGCCGCAGGTG BB@A=?B<B?:>BBBAA<9BABABBBBB<BA=78A XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:2 X1:i:0 XM:
i:0 XO:i:0 XG:i:0 MD:Z:35

to

NCI-GA2:1:100:1000:1645#0 133 * 0 0 * * 0 0 TCCTCCTTTTTCACTTGAT
CCCACCGATGTCTGCC BCBCCCBBCCCBCB@BACABBCCCA>BAAABABBB
NCI-GA2:1:100:1000:1645#0 69 * 0 0 * * 0 0 CAAGTCTGCATGGCTGTTG
ACATAGGCAGACATCG BCCB=ABBABB@@AA@;>;A75/86=376;:;/9:
NCI-GA2:1:100:1000:391#0 133 * 0 0 * * 0 0 TACCGCGGCTGCTGGCACC
AGACTTGCCCAGATCG BAAAAABBAB>5>7998;<1=9:A@@>A@=>;=7=

Some part of the line are missing, so cufflinks would not recognize it. How can I change the sort parameter to get correct format for cufflinks?

**staylor** · 12-08-2009, 03:08 PM

Try filtering for hits only (simple way is grep for chr if your genome uses this in the fasta header line) and then sort.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Tophat/cufflinks workflow question

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News