Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I'm using Cufflinks 2.2.1 but still seeing duplicate genes in the tracking file. Has the issue ever fixed?
-
Collapse duplicate FPKMs for a gene
Originally posted by mgogol View PostI ended up writing a script to sum the FPKMS for a given gene id, which I think is right...
Here's my (unpolished) code (a perl script and a shell script).
This botches the confidence intervals, by the way.
The format of cufflinks outputs (genes.fpkm_tracking files) are now different from previous. I updated the code written by mgogol and published it on sourceforge.net https://sourceforge.net/projects/col...?source=navbar . I hope it will facilitate your work.
Leave a comment:
-
Hi yjlui,
Do you have already figure out the problem of the description of "test status" that shown "OK" , "LOWDATA", and "FAIL".
Should I delete those transcript for downstream analysis and consider them as poor assembly transcript?
Apart from that, do you have any idea about FPKM is 0?
Is it mean that those transcript is poor assembly transcript as well?
Thanks in advance.
Leave a comment:
-
You will look for cuffdiff out put files-gene.expr, isoform.expr which are diff files and combined GTF file. However, to get one FPKM per gene it is suggested sum FOKM corresponding to gene name and same location. However as Adam has also suggested if gene has more than on location (overlap) it may not be possible to sum those FPKM. It is on going area of research. I am not very convinced that summing of FPKM all row per gene is good idea. Though several publications including a recent one has reported the same. (http://genome.cshlp.org/content/earl...d-4783a31b68c6). My suggestion is if you are trying to learn RNA-seq start with isoform.expr not gene level.
Best.
Leave a comment:
-
Hi Honey,
Sorry if I am not being clear. This is what I have done so far and I am struggling to make some sense of the information I am getting:
1. I have 2 .bam files (1 control and 1 disease). I am trying to identify gene expression differences).
2. Using galaxy I ran the cufflinks-cuffcompare-cuffdiff workflow.
3. For running cufflinks, I took the .bam files and ran cufflinks with the defaults.
4. I ran cuffcompare (with assembled transcripts file from each of the sample, along with the reference).
5. I fed the output (transcript file) of cuffcompare along with the two original bam files into cuffdiff.
6. I was looking at the output of cuffdiff and am seeing a few things I don't quite understand:
There are more than one rows per gene for most of the genes in the output file (I would have thought that the differential expression would be reported at gene level). I read in some other threads on Seqanswers (including this one) that summing up the FPKM values of the transcript shall give me the gene level value (which is file). What I don't understand is which output file fom the workflow should I perform the operation on:
a) The cufflinks output has the FPKM, but no gene annotations
b) The cuffcompare output has the annotations, but not the FPKM values (unless I m missing them).
c) The cuffdiff output has both the FPKM and gene annotation values, but the "statistical" analysis is already done.
So should I take the cuffdiff output, edit it and then fed it back into the workflow (again, at what point?)
This is where my first confusion is coming from.
There is another (possibly related) issue that some of the transcripts in the cuffdiff output have FPKM = 0, so when diff analysis is run, the FC are ridiculous.
What is making this all the more frustrating is that I am trying to use published data (with paper that gives some list of genes that are diff expressed between conditions analyzed using galxaxy) in a bid to educate myself and am going in circles.
As you pointed out in one of my other threads that I have a lot of reading to do, but at the risk of sounding like a nag and unbelievably dense, i have been unsuccessful in finding some material that might help me understand these things.
Any help from anybody greatly appreciated
Leave a comment:
-
Read
Not clear what you want to say. However, I agree FPKM per gene is an ongoing research.
Leave a comment:
-
Sorry, in my previous thread I had asked whether the cuffcompare file needs to be edited. I just looked at a cuffcompare file, it seems to have only annotation information and no FPKM values. So, how (or where) is one supposed to combine the FPKM values from different transcripts for a gene and run cuffdiff?
Leave a comment:
-
Originally posted by adarob View PostThe multiple FPKM problem occurs when genes have transcripts that do not overlap with any other transcripts in the gene. For example, this occurs in the ENSG00000125388 gene from ENSEMBL/hg19. We are aware of this issue and will eventually change the behavior, but for now a simple solution is just to sum the FPKMs since the gene FPKMs are just the sum of the transcript FPKMs anyways. The issue should not occur in Cuffdiff.
I would not draw any conclusions about the FPKM of the FAILED genes.
Leave a comment:
-
If one has to sum the FPKM for a gene One has to use FPKM gene tracking file or gene expr file of cuffdiff. Mgogol's perl script uses fpkm lo, high and fpkm values which are only in tracking file. Is it ok to sum the fpkm values for a gene?
Thanks
Leave a comment:
-
Originally posted by adarob View PostThe multiple FPKM problem occurs when genes have transcripts that do not overlap with any other transcripts in the gene. For example, this occurs in the ENSG00000125388 gene from ENSEMBL/hg19. We are aware of this issue and will eventually change the behavior, but for now a simple solution is just to sum the FPKMs since the gene FPKMs are just the sum of the transcript FPKMs anyways. The issue should not occur in Cuffdiff.
I would not draw any conclusions about the FPKM of the FAILED genes.
I ran tophat (1.1.0) without a mouse gtf file. Run cufflinks (0.9.1) without a mouse gtf file. Then run cuffcompare with a mouse gtf file and two gtf files generated from cufflinks for my two samples. Finally, I ran cuffdiff with compare.combined.gtf and two accepted_hits.bam files.
However, I checked gene_exp.diff. I found there is still multiple FPKM problem for some genes (see below):
XLOC_000009 Cspp1 chr1:10053629-10189988 q1 q2 OK 44.5012 58.359 0.271096 -2.93789 0.00330457 yes
XLOC_000010 Arfgef1 chr1:10053629-10189988 q1 q2 OK 10.0582 7.68137 -0.269589 4.88261 1.04688e-06 yes
XLOC_000011 Arfgef1 chr1:10053629-10189988 q1 q2 OK 40.66 31.8566 -0.244 17.6406 0 yes
XLOC_000013 Arfgef1 chr1:10053629-10189988 q1 q2 OK 2.7768 40.8059 2.68753 -144.972 0 yes
XLOC_000015 Arfgef1 chr1:10053629-10189988 q1 q2 OK 54.0345 65.0081 0.18489 -12.9339 0 yes
XLOC_000016 Arfgef1 chr1:10053629-10189988 q1 q2 OK 23.4654 43.6672 0.62107 -29.4492 0 yes
XLOC_000031 Tram2 chr1:20986216-20997026 q1 q2 OK 5.8219 2.96147 -0.67594 3.70609 0.000210487 yes
XLOC_000032 Tram2 chr1:20986216-20997026 q1 q2 OK 3.33419 14.9065 1.49757 -29.7646 0 yes
XLOC_000057 Tmem131 chr1:36849038-36996484 q1 q2 OK 37.3723 30.8444 -0.191975 5.03247 4.84195e-07 yes
Did I do something wrong?
I have another question regarding gene_exp.diff file. As you can see, the first gene Cspp1 has the same coordiates (chr1:10053629-10189988) as the second gene Arfgef1. But in my mouse gtf file (from Ensembl), the coordinates for those two genes are:
Cspp1: Chromosome 1: 10,028,299-10,126,849
Arfgef1: Chromosome 1: 10,127,652-10,222,751
Those two genes are not overlapped. Why do they have the same coordinates in gene_exp.diff file?
Thank you very much!
Leave a comment:
-
The multiple FPKM problem occurs when genes have transcripts that do not overlap with any other transcripts in the gene. For example, this occurs in the ENSG00000125388 gene from ENSEMBL/hg19. We are aware of this issue and will eventually change the behavior, but for now a simple solution is just to sum the FPKMs since the gene FPKMs are just the sum of the transcript FPKMs anyways. The issue should not occur in Cuffdiff.
I would not draw any conclusions about the FPKM of the FAILED genes.
Leave a comment:
-
batch ORFs finder for cufflinks assembled transcripts(mrna)
Hi,
I have used the cufflinks assembled the transcripts(mrna) from RNA-SEQ experiment.
my purpose is to check the possible length of the UTRs of each transcripts, and i should firstly find the best ORF for each transcripts, is there any tool for batch find the best ORF?
Leave a comment:
-
Thanks for the prompt reply, Adam! Just emailed you a small dataset built from my SAM file.
Leave a comment:
-
Does someone have a small example dataset that I can run this on to find the problem?
Leave a comment:
-
Cufflinks
I was wondering if anyone knows what the status in genes.expr and transcripts.expr (output files of Cufflinks) means? I can't find the meaning in the manual. A possible meaning is "can be one of OK (test successful), NOTEST (not enough alignments for testing), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing", but this is actually the description of "test status" which is a column in the Cuffdiff output files.
What shall I do with genes (or transcripts) whose status is FAIL? Shall I assume that their FPKM is 0 or take the FPKM of these genes regardless of their status?
Cufflinks v0.9.1b was used in my experiments, but the problem of getting multiple FPKM for some genes still exists. Running Cufflinks without a GTF file seems to solve this problem, but then I don't know how to link the FPKM to the corresponding Ensembl ID. If I provide a GTF file when running Cufflinks, I'll get multiple FPKM and FAIL status for some genes.
What shall I do with genes that have multiple FPKM? Shall I add the FPKM together or choose only the FPKM that matches the start and end position of these genes?
Thank you very much for your time.Last edited by yjlui; 11-11-2010, 07:53 AM.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
104 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
112 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
1 response
115 views
0 likes
|
Last Post
by EmiTom
10-07-2024, 06:46 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: