Seqanswers Leaderboard Ad

**gringer** · 07-01-2011, 07:35 AM

You'd probably want to read here:

404 Not Found

http://cufflinks.cbcb.umd.edu/howitworks.html#hdif

And here:

404 Not Found

http://cufflinks.cbcb.umd.edu/faq.html#fpkm

My quick-glance summary from that second FAQ is the following:

Current count-based differential expression tools are poorly suited to differential expression analysis in genomes with alternatively spliced genes. The main reason for this is that when a gene has multiple isoforms, a change in the total number of reads or fragments from that gene doesn't always correspond to a change in expression for that gene. Conversely, a gene's expression may change, but the total number of fragments generated by its isoforms may be very similar. In order to detect changes accurately, it's necessary to estimate how many fragments came from each individual splice variant in each sample. Current count-based tools don't do this (to our knowledge - please send us email if you know of one!). Even if they did, fragments that come from parts of genes that are shared by more than one splice variant can't generally assigned to a single isoform, so the fragment counts for each isoform are only estimates, and there is some uncertainty in the counts. Isoforms that are very similar will have a great deal of uncertainty surrounding their fragment counts. This uncertainty needs to be accounted for when testing for differential expression. So while you could use Cufflinks to estimate isoform-level counts, you'd be throwing away Cufflinks' uncertainty, and thus have more confidence in the differences you see than you really should. This will probably lead to many false positives in your analysis. Furthermore, we do not normalize simply by the length to calculate FPKM but an effective length, as explained in our publications. Calculting counts from FPKM by multiplying by the length will give incorrect results. We strongly encourage you to consider using Cuffdiff to find differentially expressed genes and transcripts.

In other words, if you're using cufflinks, it is also recommended to use cuffdiff. Note that tophat seems to be under somewhat heavy development at the moment. If you're not using the latest versions (cufflinks 1.0.3, tophat 1.3.1), there may be bugs that have been fixed to solve the memory issues.

**mbblack** · 07-01-2011, 07:53 AM

Recently I was running cuffdiff with 6 SOLiD BioScope 1.3 mapped BAM files (3 control and 3 treatment, total of about 40.2Gb with the smallest file being about 5Gb and the largest about 12Gb) and was getting bad_alloc failures too.

I just took a look at our cluster's swap setup and then made a temporary swap big enough to let cuffdiff run. The machine I was using has 24Gb RAM, but had a small swap (not sure why, it shipped from Penguin that way), so I made an empty file of 24Gb and appended that to swap and cuffdiff ran just fine after that (used all the RAM of course, and about 13-14Gb of the swap, so I was overly generous but it worked).

So, you may be able to run cuffdiff by just creating a nice massive temporary swap file for the run.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Can I use FPKM to represent gene expression

Comment

Comment

Latest Articles

ad_right_rmr

News