Seqanswers Leaderboard Ad

**Xi Wang** · 02-08-2010, 11:15 PM

Thanks, Cole.

I am not sure I understood quite well how to give GTF annotation to Cuffdiff according to the manual.
First, is it required the matching of tss_id and p_id? If not, how does the program know which TSS is corresponding to a transcript?
Second, if the TSS of a transcript or primary transcript is unkown, the program will skip this transcript and won't look for the difference in promoter use, right?
Moreover, is it possible that to infer the TSS for RNA-seq data?

Many thanks.

**chapmandu2** · 02-09-2010, 03:10 AM

ABI Solid?

Hi Cole,

Thanks for the new release it looks really comprehensive and I look forward to trying it for my Illumina datasets. Do you have any plans to include ABI Solid support for TopHat and Cufflinks, especially now that bowtie supports colourspace?

Many thanks.

**Cole Trapnell** · 02-09-2010, 09:42 AM

Originally posted by Xi Wang View Post

Thanks, Cole.

I am not sure I understood quite well how to give GTF annotation to Cuffdiff according to the manual.
First, is it required the matching of tss_id and p_id? If not, how does the program know which TSS is corresponding to a transcript?
Second, if the TSS of a transcript or primary transcript is unkown, the program will skip this transcript and won't look for the difference in promoter use, right?
Moreover, is it possible that to infer the TSS for RNA-seq data?

Many thanks.

Without tss_id and p_id attributes, Cufflinks will simply test for differential expression of transcripts and genes. You can attach these attributes to your own GTF file, but for convenience, cuffcompare now outputs a single file containing the "union" of all transfrags assembled you give it. So the basic workflow we recommend is:

1) Assemble each sample with cufflinks
2) Run cuffcompare on the sample transfrags all at the same time, providing a reference annotation if you want to classify your transfrags according to known, novel, etc.
3) Give the stdout.combined.gtf to cuffdiff, along with your original SAM alignments from the samples. Cuffdiff will re-estimate the abundances of the transfrags in the GTF using the alignments in each sample, and do the differential expression testing at the same time.

Optionally, you may wish to clean up the stdout.combined.gtf before running cuffdiff, to remove partial transfrags that resulted from low depth of sequencing coverage in one of the samples. We like to perform differential testing only on transcripts that are either already known to annotation or that we've assembled in two different samples independently.

As far as how cuffcompare assigns p_id and tss_id attributes:

* p_id is assigned just using the CDS records in the reference GTF. If there are no CDS records, there will be no p_ids. Similarly, if you run cuffcompare without a reference annotation along with your sample assemblies, there will be no p_id attributes in stdout.combined.gtf
* tss_id is assigned based on transfrags where the 5' ends are: two transcripts on the same strand and which share bases have the same TSS iff their 5' ends start within 100bp of each other. This threshhold is chosen based on our observation that depth of sequencing doesn't always reach to the end of the true transcript on either end. You can change it with the -d option (which I just realized is not listed in the manual - I will update it).

All this is to say that if you're hoping to just use a reference GTF with cuffdiff, you'll need to add those p_id and tss_id attributes yourself. You can do this with cuffcompare too, using a little hack:

cuffcompare -r reference.gtf reference.gtf reference.gtf

This will spit out a version of reference.gtf in stdout.combined.gtf that has the p_id and tss_id attributes attached.

**Cole Trapnell** · 02-09-2010, 10:04 PM

Originally posted by chapmandu2 View Post

Hi Cole,

Thanks for the new release it looks really comprehensive and I look forward to trying it for my Illumina datasets. Do you have any plans to include ABI Solid support for TopHat and Cufflinks, especially now that bowtie supports colourspace?

Many thanks.

Cufflinks should *in theory* already support Colorspace, since it takes SAM input, and doesn't call expressed SNPs by itself (yet). TopHat will hopefully support Colorspace sometime this spring. I've got a number of other features in TopHat and Cufflinks I need to get to, and I have to finish my thesis and graduate - so I can't give a timeline. However, it's an often requested feature, so I'd like to add support.

**Kasycas** · 02-22-2010, 03:48 AM

Hi Cole,

Thanks for the new release! I've been trying to use cuffdiff as described above. It runs for a while and then terminates as follows;

Importance sampling posterior distribution
isoform TCONS_00000803 has no p_id, no CDS grouping analysis available here
Quantitating samples in locus [ chr1:152014391-152019257 ]
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
isoform TCONS_00002699 has no p_id, no CDS grouping analysis available here
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >'
what(): Error in function boost::math::cdf(const normal_distribution<d>&, d): Random variate x is nan, but must be finite!
Aborted

I don't expect this is because of the lack of p_id as this happens earlier in the running of the program but it doesn't terminate. However... I've tried using cuffdiff on cuffcompare stdout.combined.gtf files that were derived with UCSC annotation AND Ensembl annotation and they both terminate after a similar incidence (isoform TCONS_00002699 has no p_id, no CDS grouping analysis available here).

Would you know why this is happening?

Regards,

Karen

**Kasycas** · 02-22-2010, 09:33 AM

One more thing on a slightly separate issue. The output from cuffcompare stdout.tracking, according to the manual should contain;

Each of the columns after the fifth have the following format:
qJ:<gene_id>|<transcript_id>|<FMI>|<FPKM>|<conf_lo>|<conf_hi>

However, I have 4 numerical columns after the <FMI>, not three. What does the forth one relate to?

Example:
q1:ENSG00000188076|ENST00000342878|100|12.188023|11.834710|12.541337|11.044084

Thanks,

Karen

**Cole Trapnell** · 02-22-2010, 10:09 AM

Originally posted by Kasycas View Post

Hi Cole,

Thanks for the new release! I've been trying to use cuffdiff as described above. It runs for a while and then terminates as follows;

Importance sampling posterior distribution
isoform TCONS_00000803 has no p_id, no CDS grouping analysis available here
Quantitating samples in locus [ chr1:152014391-152019257 ]
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
Calculating intial MLE
Tossing likely garbage isoforms
Revising MLE
Importance sampling posterior distribution
isoform TCONS_00002699 has no p_id, no CDS grouping analysis available here
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >'
what(): Error in function boost::math::cdf(const normal_distribution<d>&, d): Random variate x is nan, but must be finite!
Aborted

I don't expect this is because of the lack of p_id as this happens earlier in the running of the program but it doesn't terminate. However... I've tried using cuffdiff on cuffcompare stdout.combined.gtf files that were derived with UCSC annotation AND Ensembl annotation and they both terminate after a similar incidence (isoform TCONS_00002699 has no p_id, no CDS grouping analysis available here).

Would you know why this is happening?

Regards,

Karen

Another user reported this to me a few days ago, and I fixed it yesterday. It's a divide by zero error in the Jensen-Shannon variance calculation. I'll be releasing a fix in a few days. Please sign up for the mailing list if you haven't already - you'll get an email once I make the release.

**Cole Trapnell** · 02-22-2010, 10:11 AM

Originally posted by Kasycas View Post

One more thing on a slightly separate issue. The output from cuffcompare stdout.tracking, according to the manual should contain;

Each of the columns after the fifth have the following format:
qJ:<gene_id>|<transcript_id>|<FMI>|<FPKM>|<conf_lo>|<conf_hi>

However, I have 4 numerical columns after the <FMI>, not three. What does the forth one relate to?

Example:
q1:ENSG00000188076|ENST00000342878|100|12.188023|11.834710|12.541337|11.044084

Thanks,

Karen

The last column is the estimated depth of read coverage for that transfrag. Apologies - I will update the manual.

**Lesley** · 02-22-2010, 03:47 PM

Originally posted by Cole Trapnell View Post

Without tss_id and p_id attributes, Cufflinks will simply test for differential expression of transcripts and genes. You can attach these attributes to your own GTF file, but for convenience, cuffcompare now outputs a single file containing the "union" of all transfrags assembled you give it. So the basic workflow we recommend is:

1) Assemble each sample with cufflinks
2) Run cuffcompare on the sample transfrags all at the same time, providing a reference annotation if you want to classify your transfrags according to known, novel, etc.
3) Give the stdout.combined.gtf to cuffdiff, along with your original SAM alignments from the samples. Cuffdiff will re-estimate the abundances of the transfrags in the GTF using the alignments in each sample, and do the differential expression testing at the same time.

Optionally, you may wish to clean up the stdout.combined.gtf before running cuffdiff, to remove partial transfrags that resulted from low depth of sequencing coverage in one of the samples. We like to perform differential testing only on transcripts that are either already known to annotation or that we've assembled in two different samples independently.

As far as how cuffcompare assigns p_id and tss_id attributes:

* p_id is assigned just using the CDS records in the reference GTF. If there are no CDS records, there will be no p_ids. Similarly, if you run cuffcompare without a reference annotation along with your sample assemblies, there will be no p_id attributes in stdout.combined.gtf
* tss_id is assigned based on transfrags where the 5' ends are: two transcripts on the same strand and which share bases have the same TSS iff their 5' ends start within 100bp of each other. This threshhold is chosen based on our observation that depth of sequencing doesn't always reach to the end of the true transcript on either end. You can change it with the -d option (which I just realized is not listed in the manual - I will update it).

All this is to say that if you're hoping to just use a reference GTF with cuffdiff, you'll need to add those p_id and tss_id attributes yourself. You can do this with cuffcompare too, using a little hack:

cuffcompare -r reference.gtf reference.gtf reference.gtf

This will spit out a version of reference.gtf in stdout.combined.gtf that has the p_id and tss_id attributes attached.

Thanks for the info on the reference gtf. I downloaded both fasta and gtf from ensembl and ran into the chr problem. However, now when I run the cuffcompare on the reference genome I get tss_ids but no p_ids and the original gtf has CDS information.

I also had the following error when running cuffcompare on cufflinks output and the fixed gtf file that I guess has something to do with the cufflinks gtf files since there are two of them.

Warning: found 26695 transcripts with undetermined strand.
Warning: found 44851 transcripts with undetermined strand.

Cuffcompare then exits.

Any help on moving forward with cufflinks will be greatly appreciated.

Cheers,
Lesley

**seqfast** · 03-03-2010, 06:58 AM

Error messages

already reported ...

**jebe** · 03-03-2010, 11:32 AM

cuffdiff considers only X, Y, and MT loci

Hi,

I ran tophat using the h_sapiens_37_asm index and converted the accepted_hits.sam file's chromosomes accessions to their corresponding number/letter (1,2,X,Y,MT). I wanted the chromosome notation to match the chromosome notation in the ensembl gtf file (Homo_sapiens.GRCh37.56.gtf). Next I ran cufflinks on each sample using the converted sam file outputted by tophat. Then I ran cuffcompare using the transcripts.gtf files from each samples (outputted by cufflinks) along with my reference gtf above. Finally, I fed the converted sam files and combined.gtf file into cuffdiff. Cuffdiff runs without error however it only considers loci on the X, Y and MT chromosomes. Has anyone else experienced this error?

Thank you in advance for any advice.

**Xi Wang** · 03-03-2010, 05:25 PM

Originally posted by jebe View Post

Hi,

I ran tophat using the h_sapiens_37_asm index and converted the accepted_hits.sam file's chromosomes accessions to their corresponding number/letter (1,2,X,Y,MT). I wanted the chromosome notation to match the chromosome notation in the ensembl gtf file (Homo_sapiens.GRCh37.56.gtf). Next I ran cufflinks on each sample using the converted sam file outputted by tophat. Then I ran cuffcompare using the transcripts.gtf files from each samples (outputted by cufflinks) along with my reference gtf above. Finally, I fed the converted sam files and combined.gtf file into cuffdiff. Cuffdiff runs without error however it only considers loci on the X, Y and MT chromosomes. Has anyone else experienced this error?

Thank you in advance for any advice.

Did you try convert the chromosome notation in the ensembl gtf to chr1,chr2,...chrX,chrY, and chrM? I think conversion in this way is much better.

**blackgore** · 03-04-2010, 07:28 AM

This may be a naive question, as I'm only about to get into using Cufflinks (Bowtie and Tophat seem great though), but I have not been able to find any documentation about differential expression analysis when groups of samples are involved? My question is can you - and therefore how can you - specify that certain samples are replicates, and so be treated as a group when running differential expression analysis?

**kmcarr** · 03-13-2010, 03:11 PM

Originally posted by Cole Trapnell View Post

Another user reported this to me a few days ago, and I fixed it yesterday. It's a divide by zero error in the Jensen-Shannon variance calculation. I'll be releasing a fix in a few days. Please sign up for the mailing list if you haven't already - you'll get an email once I make the release.

Cole,

First, thanks for an excellent software stack.

Was the release you are referring to > 0.8.1? I am using 0.8.1 (the latest available on the web site) and am experiencing this problem. It seems that since 0.8.1 was released on 2/13/2010 and you wrote the above on 2/22/2010 the the fix would be in a version later than 0.8.1. I hate to be a pest; I have no doubt you are very busy and dealing with (L)users is the last thing you need, but I'm a little stymied by this bug.

Thanks again.

P.S. Yes, I just subscribed to the mailing list.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Differential expression, splicing, and promoter use with Cufflinks

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News