cufflinks funny output, scripture comparison

adarob replied

12-25-2010, 10:55 AM
@rgejman

This issue has been resolved in v0.9.3.
Leave a comment:
honey replied

12-22-2010, 01:44 PM
IS anyone know how can I plot RNAseq differential expression results using Tophat Cufflink and Cuffdiff, for visualization
Leave a comment:
rgejman replied

11-15-2010, 07:47 AM
Has anyone resolved this issue? I am still seeing disconnected transcripts in many genes using tophat v1.1.2 and cufflinks 0.9.2. Tophat is run without the -G option, so de novo transcripts are found and I DO see reads connecting junctions that are then not reflected in the cufflinks transcripts.gtf output file.
Leave a comment:
cur replied

10-08-2010, 12:26 AM
how do you make refFlat_RefSeq.gff for mm9

Can somebody tell me where you get the refFlat_RefSeq.gff for mm9? I have found gff3 files for each chromosome (reference assembly, MGSCv37, of mouse build 37.1, in GFF3 format). Do these correspond to mm9? If so you have to combine these gff3 for each chromosome into one file, adding column for chromosome (chr1, chr2 etc) to each gff3 before merging the gff3 files?
Thanks
Leave a comment:
thinkRNA replied

06-30-2010, 05:32 PM
Originally posted by rcorbett View Post

If anyone is interested, Cole is working on a new version (0.8.3), that will improve these results.

do you know when it will be out? Is it possible for him to let out temporary fixes to critical known bugs reported.
Leave a comment:
rcorbett replied

06-30-2010, 10:47 AM
If anyone is interested, Cole is working on a new version (0.8.3), that will improve these results.
Leave a comment:
thinkRNA replied

06-25-2010, 01:42 PM
Originally posted by rcorbett View Post

I'm pretty jealous of your nice results! I have played with cufflinks quite a bit and haven't seen a decent transcript such as that in all of my data.

Is it possible I am not seeing such good results because I am using 50bp reads? I just don't know at this point. Certainly the tophat results show a consistent level of junction reads for cufflinks to be expected to put it together correctly (after all scripture does a fine job).

To show the bam on UCSC you need to index it with samtools, and then as you suggest, upload to a publicly viewable site. THen you just point UCSC browser at your bam and it works! If you are using picassa, you can probably (though I'm not sure) host your bam file on google somewhere and point UCSC to that.

Unfortunately I have been looking at many genes and they all show exactly the same behaviour.

Can you tell me exactly what version of cufflinks you are using, and on what OS? For extra points I could share a small part of my sam file with you and would love to see if you get the same results on my data.

Trust me, I have had my share of bad luck with these programs. I am now stuck in making sense of the output and tens of files spit out. I used linux 64 bit version and ofcourse the latest version of all programs given this forum is filled with the bugs reported in the older version. this is bizarre that tophat is reporting those junctions but cufflinks is not connecting them. Email Cole Trapnell and just hope that he will reply.
Leave a comment:
rcorbett replied

06-25-2010, 12:52 PM
I'm pretty jealous of your nice results! I have played with cufflinks quite a bit and haven't seen a decent transcript such as that in all of my data.

Is it possible I am not seeing such good results because I am using 50bp reads? I just don't know at this point. Certainly the tophat results show a consistent level of junction reads for cufflinks to be expected to put it together correctly (after all scripture does a fine job).

To show the bam on UCSC you need to index it with samtools, and then as you suggest, upload to a publicly viewable site. THen you just point UCSC browser at your bam and it works! If you are using picassa, you can probably (though I'm not sure) host your bam file on google somewhere and point UCSC to that.

Unfortunately I have been looking at many genes and they all show exactly the same behaviour.

Can you tell me exactly what version of cufflinks you are using, and on what OS? For extra points I could share a small part of my sam file with you and would love to see if you get the same results on my data.
Leave a comment:
thinkRNA replied

06-25-2010, 12:33 PM
SO, I looked at this same gene on UCSC along with junctions from tophat and fortunately I get the entire transcript connected.

I have 75bp reads sequenced to ~30 million depth.

I ran tophat with this parameter
tophat -a 10 --coverage-search -p 4 -g 10 -G refFlat_RefSeq.gff -o s2_tophat mm9 ../s2.fastq

cufflinks without -G option

Here is the UCSC image
http://picasaweb.google.com/priyamsi...08504614142194

I have not explored too many other genes systematically but around 5 of them I have seen so far are connected well.
I don't understand why there are two cuff ids (CUFF.204951 and CUFF.204952) with such different FPKM and coverages!! only difference in the two CUFF ids is that one is 3 base longer?

HTML Code:

~/tophat/S2$ grep "Insr" transcripts.tmap Insr ENSMUST00000091291 p CUFF.203963 CUFF.203963.1 100 1.082117 0.000000 2.612462 1.685393 89 CUFF.203963.1 Insr ENSMUST00000091291 p CUFF.203965 CUFF.203965.1 100 0.687917 0.000000 1.482256 1.071429 210 CUFF.203965.1 Insr ENSMUST00000139504 c CUFF.203967 CUFF.203967.1 100 2.363397 0.692223 4.034571 3.680982 163 CUFF.203967.1 Insr ENSMUST00000139504 j CUFF.204951 CUFF.204951.1 44 5.694980 1.526815 9.863144 8.869908 9073 CUFF.204952.2 Insr ENSMUST00000139504 j CUFF.204952 CUFF.204952.2 100 13.057124 8.871707 17.242540 20.336418 9076 CUFF.204952.2

How did you get your BAM file to view on UCSC? Did you just upload your BAM file to an https server? I don't have access to a server, so I doubt I can upload it.

May be you should see if tophat is picking those junctions for this gene? Given your image though, I can already see a lot of your reads are crossing junctions. You should also look systematically to see how many genes exhibit this behavior, you may just be unlucky with this one.
Leave a comment:
rcorbett replied

06-25-2010, 07:04 AM
Hi thinkRNA,

To run cufflinks, I used entirely default parameters. I used the pre-compiled 0.8.2 beta version for 64bit linux. I didn't provide a gtf of reference exons because I wanted to test the "de-novo" transcript assembly. I think that cufflinks should do this well according to the paper....

"High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks."

The intronic reads are more or less to be expected. The number of intronic reads varies with the genes, libraries, and disease type that we study. The disconnectivity of the identified transcripts is prevalent throughout my data set, in genes with high and low intronic levels.

To load the cufflinks output into UCSC you can just take your transcripts.gtf output file and load it directly as a custom track.

I would be interested to hear how your data performs with this software.

thanks!
Leave a comment:
thinkRNA replied

06-24-2010, 03:42 PM
Originally posted by rcorbett View Post

Hi all,

I have 50bp paired illumina reads which I have aligned with tophat (default parameters).
The alignments look reasonable in IGV, or UCSC browser.

I have run scripture on the tophat output, and I get a list of isoforms that look reasonable, if not verbose.
However, when I run cufflinks I get very spotty connectivity.

I am trying to attach a screenshot, which shows at the top, the split alignments of tophat, then the predicted transcripts of scripture, and below that, before the reference gene annotations there is the cufflinks output.

Has anyone else seen cufflinks output that is disconected like this? Any ideas on how to improve the results?
I have run scripture, and cufflinks on the same file.

(the screenshot attempt didn't work out)
The image I tried to attach has been posted here instead:
http://www.bcgsc.ca/downloads/rnaSeq...f3f_22ffe0.gif

What are the parameters you used for running cufflinks/cuffcompare? Could it be that you are filtering out a number of reads based on some paramenter, i.e did you provide a -G file. if yes, your gtf file could be missing exon junctions.

Also I noted that are a lot of reads landing in intronic regions, is that to be expected?
Finally, can you please tell me which file you used to get the cuff.23.1 track on UCSC? I would like to see if I get similar dis-connectivity in my data.
Leave a comment:
rcorbett started a topic cufflinks funny output, scripture comparison

06-23-2010, 08:06 AM
cufflinks funny output, scripture comparison
Hi all,

I have 50bp paired illumina reads which I have aligned with tophat (default parameters).
The alignments look reasonable in IGV, or UCSC browser.

I have run scripture on the tophat output, and I get a list of isoforms that look reasonable, if not verbose.
However, when I run cufflinks I get very spotty connectivity.

I am trying to attach a screenshot, which shows at the top, the split alignments of tophat, then the predicted transcripts of scripture, and below that, before the reference gene annotations there is the cufflinks output.

Has anyone else seen cufflinks output that is disconected like this? Any ideas on how to improve the results?
I have run scripture, and cufflinks on the same file.

(the screenshot attempt didn't work out)
The image I tried to attach has been posted here instead:

http://www.bcgsc.ca/downloads/rnaSeq_test/hgt_genome_test_4f3f_22ffe0.gif

Attached Files

hgt_genome_test_4f3f_22ffe0.jpg (9.0 KB, 202 views)
Last edited by rcorbett; 06-23-2010, 08:12 AM. Reason: screenshot too small to see
Tags: cufflinks, scripture, tophat

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: