Seqanswers Leaderboard Ad

**honey** · 01-24-2011, 04:42 PM

gene level

For gene level run TopHat with Ensembl/ refflat GTF file

**Rachelly** · 02-23-2011, 12:59 AM

Cole's answer

I consulted Cole on this matter and this was his reply:

Actually, you won't see those id's in the genes.fpkm_tracking (or, IIRC, the tss_group.fpkm_tracking) files, because as far as Cufflinks is concerned, genes and tss groups are *sets* of transcripts. Each transcript in a gene could have a different nearest reference transcript, so we don't put anything in that field.
However, the way we recommend doing what (I think) you want here is to use the gene_name attribute. If you compare to a reference file that has gene_name attributes, they will get propogated to the stdout.combined.gtf file from cuffcompare. Ensembl has the gene_name attributes already built in (and the values are typically the HUGO names in the case of human), but you could add them to your reference if they're not there already.

**greener** · 03-08-2011, 04:49 PM

Originally posted by Rachelly View Post

I consulted Cole on this matter and this was his reply:

Hi Rachelly, I seem to having the same problem. My Cuffdiff output does not contain gene names. Could you post an example of a reference file that worked and the commands you ran that worked? I tried rerunning cuffcompare with ensembl which contained gene_name attributes but that did not seem to work. The output of my ensembl annotation file:

11 pseudogene exon 86649 87586 . - . gene_id "ENSG00000224777"; transcript_id "ENST00000424047"; exon_number "1"; gene_name "OR4F2P"; transcript_name "OR4F2P-001";
11 protein_coding exon 129060 129388 . - . gene_id "ENSG00000230724"; transcript_id "ENST00000382784"; exon_number "1"; gene_name "AC069287.3"; transcript_name "AC069287.3-201";

**severin** · 03-09-2011, 05:50 AM

Cuffcompare

If you ran Cuffcompare with a reference file you can extract the significant Cuffdiff transcript piles and grep out those lines in your combined gtf file which should contain your gene ids. This will tell you which genes are significant.

Requires unix commands cut, awk, grep, | (pipe) and xargs -I

**jasonwood** · 03-14-2011, 10:12 PM

I found that I had to use the -s switch in cuffcompare in order for it to propagate my gene names (with gene_name attribute in last column of GTF) all the way through to the final cuffdiff files.

**kareldegendt** · 03-20-2012, 11:02 PM

is genes.gtf the correct annotation file?

Hi all,
I had the same problem, but figured that I had to run tophat with the Ensmble "genes.gtf" file, which is what I did.
All works fine, untill I want to run Cuffmerge:
There I'm getting the following error:

Error: duplicate GFF ID 'ENSMUST00000098282' encountered!
[FAILED]

In another set I was running, I get the same error with a different ENSMUST number.
Any clue on what's wrong here? Obviously there's multiple lies with that ID, but why did it go allright with Tophat then????

Thanks!
K.

**kareldegendt** · 03-31-2012, 10:44 PM

Ok, I found the issue. Turns out I was being too "efficient"

I am comparing 2 times 2 datasets, and I was already running the cuffmerge on the second set while the run on the first dataset was still ongoing (wanted to be fast...).
However, I forgot to change the directory name, so both runs saved to the same dir... and ran into problems.
It was all solved when I assigned them different directories...

Karel

**billstevens** · 04-12-2012, 12:18 PM

Sorry, I know this is a basic question comparatively, but can someone give me a quick take on the gene ID's. I ran cuffdiff to get the significantly differentially expressed genes. I want to view them in DAVID or Ensembl to check out the actual pathways. I saved all of my 300 or so genes in a txt file with many genes having more than 1 unique ID (e.g. B1AKN3,NP_001036147,Q9P2R6,uc001aph.1) and uploaded to DAVID. However, it could only "ambiguously" match 25 of these genes. What kind of gene IDs are these? There are appear to be more than one kind. How do you view your pathways???

**billstevens** · 04-13-2012, 11:54 AM

bump

Sorry, I'm just having trouble working with these gene names. Some are UniProt, some are RefSeq, some are UCSC. How do you guys do it? DAVID has no idea what I'm uploading? What do you guys use? And does it recognize all the gene names?

**billstevens** · 04-15-2012, 11:15 AM

Please help...

I'm sorry, I'm just so confused on this. Why are there more than one genes listed for promoters.diff, or tss_group.diff, or even gene_exp.diff??? I just don't get it. It says right there in the Cufflinks manual, and I'm quoting:

"Transcripts with the same gene_id are part of the same gene group, and similarly, those with the same tss_id and p_id are part of the same primary transcript group and CDS group. "

How can one transcription start site be associated with more than one gene?? Likewise with promoters and CDS?

Sincere thanks to anyone that can help me with this!

**billstevens** · 04-17-2012, 08:04 PM

Hey guys,

So I have this plan for analyzing my data using DAVID, and I was hoping maybe someone might say how they do their differential expression gene analysis. From the output of gene_expression.diff file, I take the significant genes and then I remove all of the subsets of genes (e.g. if uc0012w.1, i make it uc0012w) and then I load this into DAVID. I got rid of the subsets because oftentimes DAVID couldn't find the subset, but DAVID did recognize it without the subset, and I imagine they would both have the same gene. I found that DAVID recognizes all genes that have been reviewed. This seems like a nice and straightforward method for obtaining my network.

Am I totally off-base? Anyone?

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

CuffDiff output

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News