Seqanswers Leaderboard Ad

**jwfoley** · 07-24-2014, 07:13 AM

Cuffdiff only does pairwise comparisons (two conditions at a time). For a more complex experimental desgn you may need to use more powerful software like DESeq2, which will let you fit an additive model (read count ~ tissue + treatment), or an interaction model, etc.

**lucasmiguel** · 07-24-2014, 07:32 AM

Right, but Cuffdiff normalizes different when he has 4 conditions and 2 conditions, right? I think this is the cause of different values of DE genes.

Thanks for ur answer.

**jwfoley** · 07-24-2014, 07:34 AM

I think you're talking about significance testing, not normalization, so of course there are more significant results when you provide more data.

**id0** · 07-24-2014, 10:32 AM

The cufflinks manual discusses the normalization and dispersion estimation methods (all the way at the bottom at http://cufflinks.cbcb.umd.edu/manual.html). There are actually multiple options to choose from.

Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010)

Dispersion Method Description
pooled Each replicated condition is used to build a model, then these models are averaged to provide a single global model for all conditions in the experiment. (Default)
per-condition Each replicated condition receives its own model. Only available when all conditions have replicates.
blind All samples are treated as replicates of a single global "condition" and used to build one model.
poisson The Poisson model is used, where the variance in fragment count is predicted to equal the mean across replicates. Not recommended.

Which method you choose largely depends on whether you expect variability in each group of samples to be similar. For example, if you are comparing two groups, A and B, where A has low cross-replicate variability and B has high variability, it may be best to choose per-condition. However, if the conditions have similar levels of variability, you might stick with the default, which sometimes provides a more robust model, especially in cases where each group has few replicates. Finally, if you only have a single replicate in each condition, you must use blind, which treats all samples in the experiment as replicates of a single condition. This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment.

**anikng** · 08-12-2014, 05:31 AM

Hello all..

Am trying to process RNASeq sample which i got. I exactly followed the method mentioned in the Nature Protocol ("Trapnell et al,2012") and now am in confusion at the cuffdiff step.
So anyone pls suggest the command for getting my desired output.

I need Cuffdiff to generate output for each sample (seperate FPKM values for each replicate also)

When i executed the cuffdiff as in the below line, i got the replicate merged output. I mean two replicates are merged and ultimately output for a single control, tretment 1 and tretment 2.

cuffdiff -o phos -b Syn.fa -p 8 -L c1,2t,4t -u merged_phos/merged.gtf ./ctrl_rep-1/accepted_hits.bam,./ctrl_rep-2/accepted_hits.bam \./treat_1_rep-1/accepted_hits.bam,./treat_1_rep-2/accepted_hits.bam \./treat_2_rep-1/accepted_hits.bam,./treat_2_rep-2/accepted_hits.bam

My samples are as follows,

ctrl_rep-1
ctrl_rep-2

treat_1_rep-1
treat_1_rep-2

treat_2_rep-1
treat_2_rep-2

Thanks
Han

**jwfoley** · 08-12-2014, 06:02 AM

Cuffdiff is very limited in the kinds of comparisons it can do. It doesn't let you see inter-replicate variation like you see inter-group variation. If you want to do a more powerful analysis like that, you need to switch software. I would use featureCounts + DESeq2 for this.

That will also give you better normalizations than FPKM (DESeq2's variance-stabilizing transformation and regularized log) if you want to do more than just significance testing. Here is the inventor of FPKM explaining why you shouldn't use FPKM: https://www.youtube.com/watch?v=5NiFibnbE8o&t=30m38s

**lucasmiguel** · 08-12-2014, 06:11 AM

Hi Han.

Cuffdiff don't make an output with FPKM per replciates. He has one output file where show exactly the FPKM per conditions. You only have to parse the file and divided them in samples that you want.

Or, for one fast analysis, you could run Cuffdiff using only:
ctrl_rep-1
ctrl_rep-2

treat_1_rep-1
treat_1_rep-2

for see the difference between both samples.

One day using Cuffdiff, I analyzed the differential gene expression using all samples that i had (Root_ctrl,Root_treat, Leaf_ctrl and Leaf_treat), and after i run Cuffdiff using only Leaf data (Ctrl and treat).

When i analyzed the differential genes expressed of Leaf between this two analysis cases, the number was different. Because the normalization and dispersion method are changed, when you remove or insert sampĺes.

Lucas

**anikng** · 08-16-2014, 05:56 AM

Thanks for making me aware of limitations of cuffdiff.
Based on instructions, i modified the strategy as follows...Kindly tell me am correct or not.

Input Sam/Bam file to featureCounts. Then the count table (generated as output of feature count) is given as input to DESeq2 for analyzing expression of each sample including replicates of conditions.

Han,
ROK

**jwfoley** · 08-16-2014, 06:31 AM

Yes, that's the idea. Of course you'll also need a GTF for featureCounts. You can use the transcripts.gtf from Cufflinks, though of course you'll get a lot of unannotated transcripts this way; or you can use a database annotation, which will be missing a lot of transcripts or parts of transcripts.

**anikng** · 08-16-2014, 07:00 AM

Hi jwfoley,

Thank you very much for the quick reply..

Han

**anikng** · 08-18-2014, 11:23 PM

Following the suggestion, I obtained count matrix from featureCounts. However i have 2 questions to ask

1. In the read count process, only 47% reads are successfully aligned to meta-feature "gene". Is that low value?

2. In the DESeq2 analysis, i face problem in setting the input criteria for ctrl and treatment because of my lack of knowledge in R. My sample are,

control-1 drought 2days-1 drought 4 days-1
control-2 drought 2days-2 drought 4 days-2

I tried to follow a method explained in the manual by Love et al.,. and i saw a sample code for inputting and setting count matrix as follows,

1.library("pasilla")
2.library("Biobase")
3.data("pasillaGenes")
4.countData <- counts(pasillaGenes)
5.colData <- pData(pasillaGenes)[,c("condition","type")]

6.dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)

7.dds$condition <- factor(dds$condition,levels=c("untreated","treated"))

Since am using two drought treated samples, i think i should modify the line 5 and line 7. Can anyone suggest how to set those parameters.

i modified header of count matrix as gene_id untreated1 untreated2 treated1 treated2 treated3 treated4

Thanks,

Han

**jwfoley** · 08-19-2014, 05:30 AM

Lines 1 through 5 are all for importing an example data set. If you want to use your data instead of the example, you don't need any of those.

You need to import your own data, create your own data frame of factors, and set your own model design, then use DESeqDataSetFromMatrix to create a DESeqDataSet object and proceed normally.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Cuffdiff normalization using 2 conditions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News