Seqanswers Leaderboard Ad

**JohnK** · 11-18-2010, 07:22 AM

Originally posted by Cole Trapnell View Post

I just wanted to announce that v0.8.2 of Cufflinks addresses the divide by zero, along with a number of other issues.

Cole, I created a GTF file off of the latest refgene. Is this valid/viable for Cufflinks?

**vschulz** · 11-18-2010, 08:22 AM

Cufflinks questions

Hi,

I have some questions about cufflinks...

-If I run cufflinks with the --quartile-normalization and --reference-seq options, can/should I also run cuffdiff with these options? I expect that it wouldn't somehow normalize and correct twice, so it should be a good idea to do this.

-Is --mask-file superfluous or a good idea if using --quartile-normalization option? Does anyone have a hg19 version of a mask.gtf file, or even a header so I could see what it should look like? It would be great if someone had reference gtf files posted somewhere, since it isn't obvious for beginners how to get these, and there seem to be problems with eg UCSC refgene or refflat gtf files compared to ensembl files.

Thanks,

Vince

**urchgene** · 11-18-2010, 11:17 AM

i have been encountering problems in this run and never got it completed for once.

This is my command................it is a single fragment solid run

tophat -C -p 8 -r 100 -I 1200 --library-type fr-secondstrand --microexon-search spruce_est solid0179_20100226_Kuusi_3_ACC_pooli_F3.csfasta

[Thu Nov 18 14:30:32 2010] Beginning TopHat run (v1.1.2)
-----------------------------------------------
[Thu Nov 18 14:30:32 2010] Preparing output location ./tophat_out/
[Thu Nov 18 14:30:32 2010] Checking for Bowtie index files
[Thu Nov 18 14:30:32 2010] Checking for reference FASTA file
Warning: Could not find FASTA file spruce_est.fa
[Thu Nov 18 14:30:32 2010] Reconstituting reference FASTA file from Bowtie index
[Thu Nov 18 14:30:40 2010] Checking for Bowtie
Bowtie version: 0.12.7.0
[Thu Nov 18 14:30:40 2010] Checking for Samtools
Samtools version: 0.1.9.0
[Thu Nov 18 14:30:42 2010] Checking reads
min read length: 50bp, max read length: 50bp
format: fasta
[Thu Nov 18 14:37:57 2010] Mapping reads against spruce_est with Bowtie
[Thu Nov 18 15:02:17 2010] Joining segment hits
Traceback (most recent call last):
File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 2201, in ?
sys.exit(main())
File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 2160, in main
user_supplied_juncs)
File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1870, in spliced_alignment
segment_len)
File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1593, in split_reads
split_record(read_name, read_seq, read_quals, output_files, offsets, color)
File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1526, in split_record
read_seq_temp = convert_color_to_bp(read_seq)
File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1500, in convert_color_to_bp
base = decode_dic[base+ch]

KeyError: 'CN'

13780.227u 172.063s 1:03:32.11 365.9% 0+0k 0+0io 2pf+0w

what is this error about please?

**Thomas Doktor** · 11-19-2010, 04:56 AM

Have you tried with TopHat version 1.1.4? It is possible the bug you encountered has been fixed in the recent releases.

**DavidMatthewsBristol** · 11-30-2010, 07:45 AM

Cleaning up a combined gtf file

Originally posted by Cole Trapnell View Post

So the basic workflow we recommend is:

1) Assemble each sample with cufflinks
2) Run cuffcompare on the sample transfrags all at the same time, providing a reference annotation if you want to classify your transfrags according to known, novel, etc.
3) Give the stdout.combined.gtf to cuffdiff, along with your original SAM alignments from the samples. Cuffdiff will re-estimate the abundances of the transfrags in the GTF using the alignments in each sample, and do the differential expression testing at the same time.

Optionally, you may wish to clean up the stdout.combined.gtf before running cuffdiff, to remove partial transfrags that resulted from low depth of sequencing coverage in one of the samples. We like to perform differential testing only on transcripts that are either already known to annotation or that we've assembled in two different samples independently.

d.

Hi Cole,

In this post you mention cleaning up the combined gtf file - can you (or anyone else) be more specific on what flags we should filter the file on? When I use a combined gtf file with cuffdiff I end up with the same gene, same co-ordinates etc but different XLOC number reported many times in the gene expression file (currently running it through Galaxy). I assume this is in part because I've not cleaned the combined gtf file.

Cheers
David

**dnusol** · 02-11-2011, 12:16 AM

Hi, has anyone had any luck on filtering the combined gtf file to remove partial transfrags? how can these be detected?

D.

**Tani** · 03-16-2011, 06:24 PM

Originally posted by Lesley View Post

Thanks for the info on the reference gtf. I downloaded both fasta and gtf from ensembl and ran into the chr problem. However, now when I run the cuffcompare on the reference genome I get tss_ids but no p_ids and the original gtf has CDS information.

I also had the following error when running cuffcompare on cufflinks output and the fixed gtf file that I guess has something to do with the cufflinks gtf files since there are two of them.

Warning: found 26695 transcripts with undetermined strand.
Warning: found 44851 transcripts with undetermined strand.

Cuffcompare then exits.

Any help on moving forward with cufflinks will be greatly appreciated.

Cheers,
Lesley

Hi Lesley,
I am running the same issue, however I could get p_ids using -s option....still could not get tss_ids..
would appreciate any advise.
Thanks

**Tani** · 03-16-2011, 06:36 PM

Originally posted by Cole Trapnell View Post

Without tss_id and p_id attributes, Cufflinks will simply test for differential expression of transcripts and genes. You can attach these attributes to your own GTF file, but for convenience, cuffcompare now outputs a single file containing the "union" of all transfrags assembled you give it. So the basic workflow we recommend is:

1) Assemble each sample with cufflinks
2) Run cuffcompare on the sample transfrags all at the same time, providing a reference annotation if you want to classify your transfrags according to known, novel, etc.
3) Give the stdout.combined.gtf to cuffdiff, along with your original SAM alignments from the samples. Cuffdiff will re-estimate the abundances of the transfrags in the GTF using the alignments in each sample, and do the differential expression testing at the same time.

Optionally, you may wish to clean up the stdout.combined.gtf before running cuffdiff, to remove partial transfrags that resulted from low depth of sequencing coverage in one of the samples. We like to perform differential testing only on transcripts that are either already known to annotation or that we've assembled in two different samples independently.

As far as how cuffcompare assigns p_id and tss_id attributes:

* p_id is assigned just using the CDS records in the reference GTF. If there are no CDS records, there will be no p_ids. Similarly, if you run cuffcompare without a reference annotation along with your sample assemblies, there will be no p_id attributes in stdout.combined.gtf
* tss_id is assigned based on transfrags where the 5' ends are: two transcripts on the same strand and which share bases have the same TSS iff their 5' ends start within 100bp of each other. This threshhold is chosen based on our observation that depth of sequencing doesn't always reach to the end of the true transcript on either end. You can change it with the -d option (which I just realized is not listed in the manual - I will update it).

All this is to say that if you're hoping to just use a reference GTF with cuffdiff, you'll need to add those p_id and tss_id attributes yourself. You can do this with cuffcompare too, using a little hack:

cuffcompare -r reference.gtf reference.gtf reference.gtf

This will spit out a version of reference.gtf in stdout.combined.gtf that has the p_id and tss_id attributes attached.

Hi Cole,
I do get the p_ids by using your trick, still not able to get to the tss_id.
Please suggest the way out..
Cheers
Tani

**vinay052003** · 04-04-2012, 10:09 AM

Hi Cole,
I am aware of the Jensen-Shannon metric for the detection of differential splicing. It is nicely described in your paper for Cufflinks. But I am still not clear how do I calculate p-value for it. What I understood from the supplementary material of the paper is that "asymptotic" values are calculated for the JS metric but I am not sure exactly how to calculate them. It would be a great help if you could shed some more light on that since I am trying to implement and include that in my analysis scripts.

**Kittykat22** · 04-20-2012, 09:31 PM

Divide by 0 error?

Hi everyone,
I tried to run cuffdiff as shown in the most recent paper (Tranell,2012). I am running cufflinks 1.1.0. This was my command:

cuffdiff -o mouse_diff_out -b genome.fa -p 8 -L KO,WT -u merged_asm/merged.gtf \./KO1_thout/accepted_hits.bam,./KO2_thout/accepted_hits.bam,./KO3_thout/accepted_hits.bam \./WT1_thout/accepted_hits.bam,./WT2_thout/accepted_hits.bam,./WT3_thout/accepted_hits.bam

It has worked this way with another sample set before, but this time it came up with an error (which I belive is a divide by 0 error...).

15:05:06] Inspecting maps and determining fragment length distributions.
> Map Properties:
> Total Map Mass: 6136.40
> Number of Multi-Reads: 2847 (with 7697 total hits)
> Read Type: 0bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
> Map Properties:
> Total Map Mass: 7789.56
> Number of Multi-Reads: 4369 (with 14182 total hits)
> Read Type: 0bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
> Map Properties:
> Total Map Mass: 691124.82
> Number of Multi-Reads: 653163 (with 2156382 total hits)
> Read Type: 0bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
> Map Properties:
> Total Map Mass: 546.92
> Number of Multi-Reads: 213 (with 629 total hits)
> Read Type: 0bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[15:05:28] Modeling fragment count overdispersion.
> Map Properties:
> Total Map Mass: 6435.42
> Number of Multi-Reads: 4202 (with 12421 total hits)
> Read Type: 0bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
> Map Properties:
> Total Map Mass: 328384.74
> Number of Multi-Reads: 190518 (with 592361 total hits)
> Read Type: 0bp single-end
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[15:05:46] Modeling fragment count overdispersion.
[15:05:46] Calculating initial abundance estimates for bias and multi-read correction.
> Processed 13207 loci. [*************************] 100%
[15:08:30] Learning bias parameters.
[15:08:58] Testing for differential expression and regulation in locus.
> Processing Locus 1:25124320-25886552 [ ] 0%terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >'
what(): Error in function boost::math:df(const normal_distribution<d>&, d): Random variate x is nan, but must be finite!
Abort

Does anyone know what causes this?
K

**Dario1984** · 04-22-2012, 04:00 PM

It was fixed at a later time. Use version 1.3.0

**Kittykat22** · 04-27-2012, 06:15 PM

Thanks for your response! That solved the problem:-)

**g781** · 02-04-2013, 01:32 AM

Hi everyone,

I was running on cufflinks, but I got this error message as below:

Code:

$ cufflinks -p 8 -M ./ref/tb427.mask.gff -g ./ref/tb427.genes.gff -s ./ref/Bowtie2index/tb427.genome -u -o ./KO/PCF_427_WT.th.cl ./KO/PCF_427_WT.th/accepted_hits.bam
You are using Cufflinks v2.0.2, which is the most recent release.
[16:34:33] Loading reference annotation.
[16:34:34] Loading reference annotation.
[16:34:34] Inspecting reads and determining fragment length distribution.
> Processed 1930 loci.                         [*************************] 100%
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >'
  what():  Error in function boost::math::normal_distribution<d>::normal_distribution: Scale parameter is 0, but must be > 0 !

The RNAseq data is two replications for T.brucei species. The tb427.genome and tb427.genes.gff downloaded from TriTryDB. In order to speed up the assembly, the tb427.mask.gff is a file I would like to exclude to assemble expect for CDS and exon regions.

I did some serveries. It seems people had the same problem as I did but it occurred when ran on cuffdiff. As far as I know it has fixed by Cole.

I have no idea what happened to me. Does anyone know what's going on and guide me how to do?

Thanks.

**g781** · 02-04-2013, 12:42 PM

I am sorry for confusing everyone.
I solved this problem by myself due to a misprint of parameter.

Code:

cufflinks -p 8 -M ./ref/tb427.mask.gff -g ./ref/tb427.genes.gff [COLOR="Red"]-s ./ref/Bowtie2index/tb427.genome[/COLOR] -u -o ./KO/PCF_427_WT.th.cl ./KO/PCF_427_WT.th/accepted_hits.bam

changed to

cufflinks -p 8 -M ./ref/tb427.mask.gff -g ./ref/tb427.genes.gff [COLOR="Red"]-b ./ref/Bowtie2index/tb427.genome.fa[/COLOR] -u -o ./KO/PCF_427_WT.th.cl ./KO/PCF_427_WT.th/accepted_hits.bam

However, I've got a warning message as below:

Code:

Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
> Map Properties:
>       Normalized Map Mass: 45525182.00
>       Raw Map Mass: 45525182.00
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80

I ran TopHat on paired-end data. I was expecting that the Cufflinks can estimate the mean and s.d. from paired-end data. I think I don't have to set the these two parameters as mentioned by Cufflinks manual as "Note: Cufflinks now learns the fragment length mean for each SAM file, so using this option is no longer recommended with paired-end reads."

I've read some previous records. One of answers is it's (maybe) caused by wrong annotation. The annotation downloaded from TriTryDB, I didn't modify it (only remove the fasta format). So I don't expected it's due to wrong annotation.

I think I probably have something wrong to set parameters. My TopHat parameters as below:

Code:

tophat -p 8 -G ./ref/Bowtie2index/tb427.genes.gff -o ./KO/PCF_427_WT.th ./ref/Bowtie2index/tb427.genome ./KO/PCF_427_WT1 ./KO/PCF_427_WT2

Dose anyone one explain a little bit to me?
Thanks a lots.

**pengchy** · 04-23-2013, 10:07 PM

Hi all,

According to the description of the output file genes.fpkm_tracking of cuffdiff, the value of *_FPKM is larger than *_conf_lo and smaller than *_conf_hi. But in my results, 12257 of 91991 transcripts in one sample have FPKM larger than both conf_lo and conf_hi. Is it normal?

The cufflinks version 2.1.1 was used.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 2 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News