Unconfigured Ad

**billstevens** · 04-20-2012, 01:36 PM

I'm having a simliar issue. I obtained my .gtf file from UCSC at this website:

403 Forbidden

http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format

My problem is that this outputs REFSEQ, UNIPROT, and UCSC gene IDs. Can anyone make a good recommendation?

**sdriscoll** · 04-24-2012, 02:14 PM

how many lines are in the genes.fpkm_tracking file?

**xiangq** · 04-24-2012, 02:17 PM

Hi, sdriscoll,

There are ~52000 lines in genes.fpkm_tracking file.
~156600 lines in isoform.fpkm_tracking file.

Any idea? Thanks.

**sdriscoll** · 04-24-2012, 02:38 PM

nope. according to my ensemble gtf reference (downloaded from UCSC) there are 55,796 unique transcript ids with CDS features and 22,948 unique gene names with CDS features. so the number returned from cuffdiff seems random.

**xiangq** · 04-24-2012, 02:38 PM

Hi guys,

I check the data and find that cuffdiff seems only output the CDS features under following condition:

suppose p_id of cds is p01, its gene_id is ENSG00000123456,
if the total number of the features of the gene_id of ENSG00000123456 in annotation GTF file is 3 and they all have p_id annotation,such as p02,p03
then cds p01,p02 and p03 will be outlined in cds.FPKM.tracking file,
otherwise none of p01,p02,p03 will be outlined.

Why cuffdiff calculate cds fpkm like that? Is that right?

Any comments are welcome? Thanks

-xiangq

**xiangq** · 04-24-2012, 02:43 PM

Hi sdriscoll,

Thanks for your quick response.

Did you do the cuffdiff on your data? How many CDS features are outlined in the cds.FPKM.tracking file?

Thanks.

-xiangq

**sdriscoll** · 04-24-2012, 02:49 PM

did you do as they suggest on their website to modify the annotation you're using to include the necessary information for cuffdiff to work properly? i'm assuming you did since you say your annotation includes p_id values.

maybe...could we see some lines from your GTF file that include some CDS features? I know that cuffdiff only does CDS DE when there are CDS features in the GTF reference.

**xiangq** · 04-24-2012, 03:02 PM

Hi, I used the Ensemble GTF annotation file, not every feature has a p_id annotation. following are several features in the file

PHP Code:


chr10    protein_coding    exon    97990550    97990590    .    -    .    exon_number "4"; gene_id "ENSG00000095585"; gene_name "BLNK"; transcript_id "ENST00000467799"; transcript_name "BLNK-004"; tss_id "TSS39268";
chr10    protein_coding    exon    97990550    97990590    .    -    .    exon_number "6"; gene_id "ENSG00000095585"; gene_name "BLNK"; p_id "P56921"; transcript_id "ENST00000393894"; transcript_name "BLNK-202"; tss_id "TSS58878";
chr10    protein_coding    exon    97990550    97990590    .    -    .    exon_number "4"; gene_id "ENSG00000095585"; gene_name "BLNK"; p_id "P32418"; transcript_id "ENST00000371176"; transcript_name "BLNK-002"; tss_id "TSS85036";
chr10    protein_coding    exon    97998644    97998920    .    -    .    exon_number "4"; gene_id "ENSG00000095585"; gene_name "BLNK"; transcript_id "ENST00000495266"; transcript_name "BLNK-003"; tss_id "TSS11211";

I dont know if UCSC annotation file will be different with this one.

Thanks.

**sdriscoll** · 04-24-2012, 04:06 PM

did you try this?

Code:

cuffcompare -s /path/to/genome_seqs.fa -CG -r annotation.gtf annotation.gtf

that's what's recommended by the cufflinks people when using GTF files that were NOT generated by cuffcompare. it basically tags up the GTF annotation in a way that cuffdiff prefers. annotation.gtf would be your Ensembl GTF and you'd specify the genome FASTA file you built your bowtie index from.

**xiangq** · 04-24-2012, 06:34 PM

Hi sdriscoll,

thanks for your advice.

Actually, I have tried cuffcompare before, the output GTF is as following:

PHP Code:


chr1    Cufflinks       exon    850183  850351  .       +       .       gene_id "XLOC_000031"; transcript_id "TCONS_00000075"; exon_number "2"; gene_name "RP11-54O7.2"; oId "ENST00000398216"; nearest_ref "ENST00000398216"; class_code "="; tss_id "TSS53";

chr1    Cufflinks       exon    860260  860328  .       +       .       gene_id "XLOC_000032"; transcript_id "TCONS_00000076"; exon_number "1"; gene_name "SAMD11"; oId "ENST00000420190"; contained_in "TCONS_00000078"; nearest_ref "ENST00000420190"; class_code "="; tss_id "TSS54"; p_id "P5";

the gene_id, transcript_id, tss_id, p_id are replaced by cufflinks interior ids, therefore, in the cuffdiff output, all features Id are also replaced by the cufflinks interior id, which seems to make the following analysis more difficult.

-xiangq

**sdriscoll** · 04-24-2012, 10:24 PM

I guess the only important question is if the Cuffcompare version of the GTF results in more CDS output. You can always swap gene names later with Perl or something. Of course I'm a programmer so that's how I think. At least you'll know if it's the annotation causing the lack of CDS output. Maybe try a GTF from UCSC as well. See what happens.

**xiangq** · 04-25-2012, 06:36 AM

Hi, thanks for the response.

I will try and let you know the results.

**billstevens** · 04-25-2012, 12:21 PM

So I"m very confused on this GTF format.

From the UCSC wiki, it states "At this time, this genePredToGtf command can provide better GTF files than available from the table browser."

403 Forbidden

http://genomewiki.ucsc.edu/index.php/Genes_in_gtf_or_gff_format

However, I get outputs from cuffdiff in UNIPROT, RefSeq, UCSC and ENSEMBLE. So that's not a huge deal, I can use DAVID to get all of them. What I am really confused by is this:

One of my CDS is uc010nxq.1. When I search for this on Google, I get this and this

One of these is on chromosome 15 and the other on chromosome X. Please help, I could not be more confused.

**sdriscoll** · 04-25-2012, 12:39 PM

Be careful with the genome browser when you're searching from Google. What you've got there is one link to uc010nxq.1 in the hg19 build and one to the same ID in the hg18 build. Since the ucXXXxxx.X IDs are always UCSC I'd do this: go to the browser at http://genome.ucsc.edu/cgi-bin/hgGateway, select the correct genome & build, and paste the ID into the empty "gene" box, then click "submit". That will take you to its location on the genome. The transcript corresponding to the one you searched for will be highlighted in the left margin (if it was a UCSC id, that is). You can click on the transcript and you'll be taken to the information page for the gene.

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 30 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 39 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 62 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

cuffdiff does not output all the CDS in cds.FPKM.tracking file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News