Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffdiff normalization using 2 conditions

    Hi,

    I have one doubt. In my project, i have reads from 2 conditions (Control/Infected) for leaf and root. I am using Cuffdiff to normalize the data and make differential gene expression, but i saw something.

    I testes my datas in two different forms:

    1. In Cuffdiff's parameters i put 4 conditions ( LeafControl, LeafInfected, RootControl and RootInfected), each condition has 3 replicates, and the Cuffdiff show 350 differentialy expressed genes for LeafControl and LeafInfected.

    2. In Cuffdiff's parameters i put 2 conditions ( LeafControl and LeafInfected), each condition has 3 replicates, and the Cuffdiff show 101 differentialy expressed genes.

    Why? I think that is the way that him normalizes. In first form the normalization includes all conditions, including Leaf and Root replicates, and the second use only one condition.

    In my opinion, if you want to see the DE between one condition (Leaf or. Root), you have to normalize the read mapping coming from Cufflinks separadaly.

    Somente could help me and say why i have different values for the same thing? What is the right value, 350 or 101? I don't if it helps, but the results have 98 genes in common.

  • #2
    Cuffdiff only does pairwise comparisons (two conditions at a time). For a more complex experimental desgn you may need to use more powerful software like DESeq2, which will let you fit an additive model (read count ~ tissue + treatment), or an interaction model, etc.

    Comment


    • #3
      Right, but Cuffdiff normalizes different when he has 4 conditions and 2 conditions, right? I think this is the cause of different values of DE genes.

      Thanks for ur answer.

      Comment


      • #4
        I think you're talking about significance testing, not normalization, so of course there are more significant results when you provide more data.

        Comment


        • #5
          The cufflinks manual discusses the normalization and dispersion estimation methods (all the way at the bottom at http://cufflinks.cbcb.umd.edu/manual.html). There are actually multiple options to choose from.
          Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010)

          Dispersion Method Description
          pooled Each replicated condition is used to build a model, then these models are averaged to provide a single global model for all conditions in the experiment. (Default)
          per-condition Each replicated condition receives its own model. Only available when all conditions have replicates.
          blind All samples are treated as replicates of a single global "condition" and used to build one model.
          poisson The Poisson model is used, where the variance in fragment count is predicted to equal the mean across replicates. Not recommended.

          Which method you choose largely depends on whether you expect variability in each group of samples to be similar. For example, if you are comparing two groups, A and B, where A has low cross-replicate variability and B has high variability, it may be best to choose per-condition. However, if the conditions have similar levels of variability, you might stick with the default, which sometimes provides a more robust model, especially in cases where each group has few replicates. Finally, if you only have a single replicate in each condition, you must use blind, which treats all samples in the experiment as replicates of a single condition. This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment.

          Comment


          • #6
            Hello all..

            Am trying to process RNASeq sample which i got. I exactly followed the method mentioned in the Nature Protocol ("Trapnell et al,2012") and now am in confusion at the cuffdiff step.
            So anyone pls suggest the command for getting my desired output.

            I need Cuffdiff to generate output for each sample (seperate FPKM values for each replicate also)


            When i executed the cuffdiff as in the below line, i got the replicate merged output. I mean two replicates are merged and ultimately output for a single control, tretment 1 and tretment 2.

            cuffdiff -o phos -b Syn.fa -p 8 -L c1,2t,4t -u merged_phos/merged.gtf ./ctrl_rep-1/accepted_hits.bam,./ctrl_rep-2/accepted_hits.bam \./treat_1_rep-1/accepted_hits.bam,./treat_1_rep-2/accepted_hits.bam \./treat_2_rep-1/accepted_hits.bam,./treat_2_rep-2/accepted_hits.bam



            My samples are as follows,

            ctrl_rep-1
            ctrl_rep-2

            treat_1_rep-1
            treat_1_rep-2

            treat_2_rep-1
            treat_2_rep-2



            Thanks
            Han

            Comment


            • #7
              Cuffdiff is very limited in the kinds of comparisons it can do. It doesn't let you see inter-replicate variation like you see inter-group variation. If you want to do a more powerful analysis like that, you need to switch software. I would use featureCounts + DESeq2 for this.

              That will also give you better normalizations than FPKM (DESeq2's variance-stabilizing transformation and regularized log) if you want to do more than just significance testing. Here is the inventor of FPKM explaining why you shouldn't use FPKM: https://www.youtube.com/watch?v=5NiFibnbE8o&t=30m38s
              Last edited by jwfoley; 08-12-2014, 06:05 AM.

              Comment


              • #8
                Hi Han.

                Cuffdiff don't make an output with FPKM per replciates. He has one output file where show exactly the FPKM per conditions. You only have to parse the file and divided them in samples that you want.

                Or, for one fast analysis, you could run Cuffdiff using only:
                ctrl_rep-1
                ctrl_rep-2

                treat_1_rep-1
                treat_1_rep-2

                for see the difference between both samples.

                One day using Cuffdiff, I analyzed the differential gene expression using all samples that i had (Root_ctrl,Root_treat, Leaf_ctrl and Leaf_treat), and after i run Cuffdiff using only Leaf data (Ctrl and treat).

                When i analyzed the differential genes expressed of Leaf between this two analysis cases, the number was different. Because the normalization and dispersion method are changed, when you remove or insert sampĺes.



                Lucas

                Comment


                • #9
                  Thanks for making me aware of limitations of cuffdiff.
                  Based on instructions, i modified the strategy as follows...Kindly tell me am correct or not.

                  Input Sam/Bam file to featureCounts. Then the count table (generated as output of feature count) is given as input to DESeq2 for analyzing expression of each sample including replicates of conditions.

                  Han,
                  ROK

                  Comment


                  • #10
                    Yes, that's the idea. Of course you'll also need a GTF for featureCounts. You can use the transcripts.gtf from Cufflinks, though of course you'll get a lot of unannotated transcripts this way; or you can use a database annotation, which will be missing a lot of transcripts or parts of transcripts.

                    Comment


                    • #11
                      Hi jwfoley,

                      Thank you very much for the quick reply..

                      Han

                      Comment


                      • #12
                        Following the suggestion, I obtained count matrix from featureCounts. However i have 2 questions to ask

                        1. In the read count process, only 47% reads are successfully aligned to meta-feature "gene". Is that low value?

                        2. In the DESeq2 analysis, i face problem in setting the input criteria for ctrl and treatment because of my lack of knowledge in R. My sample are,

                        control-1 drought 2days-1 drought 4 days-1
                        control-2 drought 2days-2 drought 4 days-2

                        I tried to follow a method explained in the manual by Love et al.,. and i saw a sample code for inputting and setting count matrix as follows,

                        1.library("pasilla")
                        2.library("Biobase")
                        3.data("pasillaGenes")
                        4.countData <- counts(pasillaGenes)
                        5.colData <- pData(pasillaGenes)[,c("condition","type")]

                        6.dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)

                        7.dds$condition <- factor(dds$condition,levels=c("untreated","treated"))

                        Since am using two drought treated samples, i think i should modify the line 5 and line 7. Can anyone suggest how to set those parameters.

                        i modified header of count matrix as gene_id untreated1 untreated2 treated1 treated2 treated3 treated4

                        Thanks,

                        Han
                        Last edited by anikng; 08-18-2014, 11:25 PM.

                        Comment


                        • #13
                          Lines 1 through 5 are all for importing an example data set. If you want to use your data instead of the example, you don't need any of those.

                          You need to import your own data, create your own data frame of factors, and set your own model design, then use DESeqDataSetFromMatrix to create a DESeqDataSet object and proceed normally.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Recent Advances in Sequencing Analysis Tools
                            by seqadmin


                            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                            05-06-2024, 07:48 AM
                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 05-14-2024, 07:03 AM
                          0 responses
                          19 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-10-2024, 06:35 AM
                          0 responses
                          44 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-09-2024, 02:46 PM
                          0 responses
                          54 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-07-2024, 06:57 AM
                          0 responses
                          42 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X