Seqanswers Leaderboard Ad

**Cole Trapnell** · 10-20-2009, 01:40 PM

I know you emailed me some files already, but I just want to point out that posting SAM to a forum like this is not all that helpful due to linebreaks, etc. It's much better to host them on the web somewhere, and post a link instead.

I'm pretty sure I know where the problem is in Cufflinks, and I believe it's a simple fix. However, I'm traveling this week and have limited access to email and time to fix bugs, so I may not get to this for a few days. Thanks for your patience.

**marvin.j** · 10-21-2009, 04:39 AM

Originally posted by Cole Trapnell View Post

[...]
As noted above, exons need to be attached to their parents transcripts, but through the transcript_id attribute, not the ID/Parent tree.

Thank you very much for the clarification! I'll try to make the SAM ( ca. 2GB) and GTF available to you. Meanwhile I'll experiment with sanitizing the RefSeq annotations before feeding them into cufflinks... Enjoy your travel.

-Marvin

**marvin.j** · 10-21-2009, 06:51 AM

RefSeq cleanup helped!

Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

If someone is interested I can clean the code up (python) and post it here.

**bioinfosm** · 10-22-2009, 10:50 AM

Aborted message from cufflinks

Originally posted by Cole Trapnell View Post

One thing you could try is to increase the value of the collapse-rounds option from it's default of one. Each additional bump should cut the memory use in bundles like this roughly in half (up to a certain point). It carries some risk that Cufflinks will misassemble things if you set it too high, but 2 or 3 should certainly be safe (at least it is in my experience).

Ok, I removed the entire set of reads that were mapping to this particular bundle, but it still crashed. Then I used the -c 2 option as you suggest, but the same Aborted message.

Any comments?

$ ../cufflinks-0.7.0.Linux_x86_64/cufflinks -p 2 -c 2 berg_42R7WAAXX_300164_41.lane1/accepted_hits.sam
...
Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57769933-57769983 ] with 1 non-redundant alignments
Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57770041-57770125 ] with 3 non-redundant alignments
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
Aborted

**Cole Trapnell** · 10-22-2009, 12:29 PM

Originally posted by bioinfosm View Post

Ok, I removed the entire set of reads that were mapping to this particular bundle, but it still crashed. Then I used the -c 2 option as you suggest, but the same Aborted message.

Any comments?

$ ../cufflinks-0.7.0.Linux_x86_64/cufflinks -p 2 -c 2 berg_42R7WAAXX_300164_41.lane1/accepted_hits.sam
...
Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57769933-57769983 ] with 1 non-redundant alignments
Processing bundle [ gi|89161220|ref|NC_000024.8|NC_000024:57770041-57770125 ] with 3 non-redundant alignments
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
Aborted

Well this may be good news in a way, because those bundles are tiny, so this could be a simple bug in the SAM parser or something of that ilk, rather than Cufflinks exhausting memory. Can you send me (by email at [email protected]) a snippet of your SAM file that reproduces this crash? I'll throw it in the tracker and get to it this weekend when I get back from my trip.

**RockChalkJayhawk** · 10-26-2009, 08:41 AM

Cufflinks

Originally posted by marvin.j View Post

Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

If someone is interested I can clean the code up (python) and post it here.

I'm interested. Heck, I'll even take the verbose code (It helps me to figure out what's going on)!

**seqfast** · 11-12-2009, 07:29 AM

ditto memory errors. running without GTF file, 16GB memory, ~10M pairs.

Thanks for this awesome package ...

**bioinfosm** · 11-16-2009, 09:38 AM

I managed to run cufflinks and obtain the genes.expr file. But how do I annotate it with gene IDs etc from this information? The coordinates do not match anything on UCSC

$ head genes.expr
gene_id bundle_id chr left right bundle_fraction density RPKM
CUFF.1 725391 gi|13626247|ref|NT_025975.2|HsY_26131 350 400 2.65651
CUFF.10 725573 gi|13626247|ref|NT_025975.2|HsY_26131 55703 55835 0.503126
CUFF.13 725579 gi|13626247|ref|NT_025975.2|HsY_26131 56414 56521 1.86204
CUFF.15 725581 gi|13626247|ref|NT_025975.2|HsY_26131 56698 56748 3.98476

**bioinfosm** · 11-16-2009, 11:45 AM

Seems I could use cuffcompare, but am confused about the reference I am using (from tophap website) and which gtf file to download for use in cuffcompare

**Cole Trapnell** · 11-16-2009, 12:19 PM

Originally posted by bioinfosm View Post

Seems I could use cuffcompare, but am confused about the reference I am using (from tophap website) and which gtf file to download for use in cuffcompare

We certainly intend for you to use cuffcompare - it was built to do exactly what you want. As for which reference to use, that's up to you. You could try Ensembl first to get a feel for how cuffcompare works and how to parse its output. If the manual is unclear on how to interpret cuffcompare output, please feel free to ask questions here.

One important thing about using cuffcompare is that the chromosome names in whatever reference GTF file you use must match the chromosome names in your Cufflinks output, which of course come from your SAM input.

**bioinfosm** · 11-20-2009, 03:12 PM

So I have these gene counts / exon counts results from Illumina's Genome Studio tool. They use the refseq annotation to obtain these read counts for the known genes.

I wish to compare these, with the data generated off cufflinks? I was hoping to use the Homo_sapiens.NCBI36.52.gtf in cuffcompare with cufflinks results for my fastq reads, and obtain the respective counts.

Could you help me with obtaining the number of reads mapping to genes/transcripts/exons using tophat-cufflinks-cuffcompare combo?

Thanks.

**sjm** · 11-24-2009, 05:17 PM

Originally posted by marvin.j View Post

Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

If someone is interested I can clean the code up (python) and post it here.

marvin.j - that sounds like a very helpful script for working with UCSC gtf files. If you have time to post the code I'm sure many of us would be grateful.

**bekkari** · 12-10-2009, 02:16 PM

"Seqmentation fault" error with cufflinks

Hi,
Did anyone figured out how to fix the error of segmentation fault that is arriving from running cufflinks. It occurs only when I use the annotation file in GTF format. The GTF file, I am downloading from UCSC, which contains some thing like below. Really appreciate any help I could get here.
Thanks

chr1 canFam2_refGene exon 16743049 16743195 0.000000 + . gene_id "NM_001002949"; transcript_id "NM_001002949";
chr1 canFam2_refGene start_codon 16743704 16743706 0.000000 + . gene_id "NM_001002949"; transcript_id "NM_001002949";
chr1 canFam2_refGene CDS 16743704 16743859 0.000000 + 0 gene_id "NM_001002949"; transcript_id "NM_001002949";
chr1 canFam2_refGene exon 16743422 16743859 0.000000 + . gene_id "NM_001002949"; transcript_id "NM_001002949";
chr1 canFam2_refGene CDS 16743943 16744269 0.000000 + 0 gene_id "NM_001002949"; transcript_id "NM_001002949";

**Boel** · 02-18-2010, 06:51 AM

Originally posted by marvin.j View Post

Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

If someone is interested I can clean the code up (python) and post it here.

marvin.j, I would also be very grateful if you would post this python script somewhere!

**Bingy** · 03-18-2010, 07:14 AM

script wanted

Originally posted by marvin.j View Post

Just to let you know: Curating the RefSeq output with a little script resolved the crash reported earlier.

The script adds suffixes to RefSeq transcript IDs which refer to more than one genomic locus. The output is then a GTF with only exons, linked together by their (now unique) transcript_ids and supplemented with a gene_id as well (RefSeq.name2 aka gene name).

If someone is interested I can clean the code up (python) and post it here.

Hi Marvin.j
I have the same problem at running cuffcompare with refGene data. Could you send the script to me ([email protected])? Thanks!

Topics	Statistics	Last Post
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 25 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News