After a lot of digging about wrong FPKMs and cufflink in the forum and documentation. I tried to check cds_exp.diff and was surprised that FPKMs there and gene list (after infinity filtering +-1.79E+308) are near expected values. Maybe we incorrectly interpret how cufflinks split reads between intersect regions which are a lot in GTF file (CDS, exons, stop-codons...) ?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
If you follow this thread, you will see that there is a problem with this approach because cufflinks/cuffmerge produces erroneous .gtf files which contains instances where multiple transcripts are merged into one (despite the lack of any evidence to support such mergings).
Comment
-
Originally posted by drdna View PostIf you follow this thread, you will see that there is a problem with this approach because cufflinks/cuffmerge produces erroneous .gtf files which contains instances where multiple transcripts are merged into one (despite the lack of any evidence to support such mergings).
I'm curious though, how are you guys running cufflinks? I'm assuming you are using the -g/--GTF-guide argument? Or does this problem persist even if you give it the -G/--GTF argument and tell it not to look for novel transcripts and stick to the supplied GTF file?
Comment
-
chadn737, Yes and yes. I have been running cuffmerge using a reference gtf and the --no-novel-juncs flag.
So if you are using a count-based method of DE analysis, do you align your reads with gene sequences, as opposed to a genome assembly? I'd be interested in hearing a little bit more about your approach.
Comment
-
Originally posted by Portah View PostAfter a lot of digging about wrong FPKMs and cufflink in the forum and documentation. I tried to check cds_exp.diff and was surprised that FPKMs there and gene list (after infinity filtering +-1.79E+308) are near expected values. Maybe we incorrectly interpret how cufflinks split reads between intersect regions which are a lot in GTF file (CDS, exons, stop-codons...) ?
Comment
-
Originally posted by drdna View Postchadn737, Yes and yes. I have been running cuffmerge using a reference gtf and the --no-novel-juncs flag.
So if you are using a count-based method of DE analysis, do you align your reads with gene sequences, as opposed to a genome assembly? I'd be interested in hearing a little bit more about your approach.
I've seen both approaches. I've also seen people align both to the CDS and genome and then integrate the two. The simplest approach really is to just realign back to the genome using Tophat or BWA.
After that, use something like HTSeq-count or Bedtools to count up the number of reads mapping to each gene (or exon) which then serves as input into any number of count-based DE tools (DESeq, EdgeR, Bayseq, etc). Or if you want to look for differential exon usage, DEXseq.
This approach is good if your primary interest is doing differential expression and if novel unannotated transcripts are not what you are after.
Comment
-
Oops my bad - I'm getting my analyses mixed up. I only tried cuffmerge with the -G option flag. I'll try the -GTF instead. However, I doubt that it will make any difference because, as I mentioned before, there are no reads in the adjoining regions that cuffmerge merges into the true transcripts. One thing I've noticed is that one of the gtfs I'm working with is discontinuous, in the sense that adjacent genes do not occur sequentially in the gtf file. I don't know why, that's just the way the downloaded file was constructed. I'm beginning to suspect that cuffmerge/cufflinks assumes that gtfs always contains genes in sequential order and has hiccups at the discontinuities. I plan to test this by reconfiguring the gtfs in sequential fashion. This might also explain why Portah has a problem with the Snord37 gene - because it lies inside another gene. I suspect that cufflinks/cuffmerge doesn't allow for this possibility and gets its locus coordinates confused.Last edited by drdna; 06-12-2012, 05:29 PM.
Comment
-
A few points I would like to make.
1. I have tried both cufflinks and scripture to assemble transcripts from RNAseq data in Tetraodon. Cufflinks outperforms scripture in terms of the assembly quality.
2. Scripture on an average produces more number of trancripts in each locus compared to Cufflinks. Cufflinks is better at building novel intergenic transcripts.
3. As written in a previous reply, it is good to use HTseq, BEDtools coverage bed and DEseq R package for the differential analysis as compared to Cuffdiff, which gives bloated FPKM values for many transcripts.
4. The Rsubread package is a fast accurate alternative to Cufflinks. (http://www.bioconductor.org/packages.../Rsubread.html)
Comment
-
Originally posted by Portah View PostI'm wrong numbers in cds_exp and gene_exp are the same, total list of genes are different, that genes which are wrap Snord's have disappeared.
Looks like there is no other way then write own FPKM counter to check myself and others
Comment
Latest Articles
Collapse
-
by seqadmin
While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...-
Channel: Articles
06-06-2024, 07:15 AM -
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 06-14-2024, 07:24 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
06-14-2024, 07:24 AM
|
||
Started by seqadmin, 06-13-2024, 08:58 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
06-13-2024, 08:58 AM
|
||
Started by seqadmin, 06-12-2024, 02:20 PM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
06-12-2024, 02:20 PM
|
||
Started by seqadmin, 06-07-2024, 06:58 AM
|
0 responses
186 views
0 likes
|
Last Post
by seqadmin
06-07-2024, 06:58 AM
|
Comment