Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zorin
    Junior Member
    • Feb 2014
    • 2

    cuffquant count data as input for DESeq/DEXseq

    Hi community,

    the latest cufflinks release (2.2.0) comes with two novel tools, cuffquant and cuffnorm. The latter can be used to generate expression and count tables at the level of transcripts, primary transcripts and genes, that are normalized for library size.

    I was wondering whether these normalized counts can be used with one of the 'count-based' methods like DESeq/DEXSeq/edgeR, circumventing their normalization methods.

    In other words, can I use e.g. the DESeq nbinomTest() function with these cuffnorm-generated data?

    Thanks.
  • id0
    Senior Member
    • Sep 2012
    • 130

    #2
    According to the cuffnorm documentation:
    Cuffnorm will report both FPKM values and normalized, estimates for the number of fragments that originate from each gene, transcript, TSS group, and CDS group. Note that because these counts are already normalized to account for differences in library size, they should not be used with downstream differential expression tools that require raw counts as input.
    So they specifically warn against using cuffnorm counts for tools that require raw counts.

    I actually have a follow-up question. If these are just normalized counts, why can't they be used? When they get re-normalized again by another tool, wouldn't they just come out the same as if they weren't normalized? The initial normalization shouldn't lose any information.

    Comment

    • gringer
      David Eccles (gringer)
      • May 2011
      • 845

      #3
      No, this is not an appropriate thing to do for either DESeq or edgeR. They assume raw counts are used as input, and these have a particular distribution that is assumed by the programs. The programs use the assumed distribution to estimate biological variation and determine statistical significance. While your normalised count values may be similar (or the same), the probability calculations will likely be off.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        There seems to be a common misconception that tools like DESeq(2) actually store the normalized counts somewhere. They don't, in fact, which is why trying to input normalized counts will lead to no end of problems.

        Comment

        • zaki
          Member
          • Dec 2012
          • 15

          #5
          Following up from the original question..

          If we were to use cuffnorm with --library-norm-method parameter specifying classic-fpkm, can the count data be used for DESeq/DESeq(2)?

          classic-fpkm - Library size factor is set to 1 - no scaling applied to FPKM values or fragment counts. (default for Cufflinks)
          Does this mean the library size normalization was not applied? and therefore can the count data be considered as raw count??

          Comment

          • raphael123
            Member
            • Dec 2013
            • 37

            #6
            Do you think there is a way to get the raw count in a readable format for DESeq ?
            Or a way to read the binary file ? I can t find that !

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              What binary file? If you mean the BAM file, just use featureCounts or htseq-count.

              Comment

              • raphael123
                Member
                • Dec 2013
                • 37

                #8
                No the raw count table:

                Cuffquant produces writes a single output file, abundances.cxb, into the output directory. CXB files are binary files, and can be passed to Cuffnorm or Cuffdiff for further processing.
                I would like to analyse the raw count with DESeq2

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #9
                  I'm sure it's theoretically possible to read the CXB file, but since its format seems to have never been documented, you'd have to go through the source code and reverse-engineer its format. It'd be faster to just ignore it.

                  Comment

                  • raphael123
                    Member
                    • Dec 2013
                    • 37

                    #10
                    Thanks for your answer !|
                    So there is no way to get the read counts from cuff-tools ? Maybe I miss something here..

                    Comment

                    • dpryan
                      Devon Ryan
                      • Jul 2011
                      • 3478

                      #11
                      Hard to say, there are a lot of undocumented areas of those programs. It's quick enough to just use featureCounts.

                      Comment

                      • raphael123
                        Member
                        • Dec 2013
                        • 37

                        #12
                        Oh ! so featureCounts is a tool to construct a count table from a sam/bam file ?
                        Thank you !

                        Comment

                        • dpryan
                          Devon Ryan
                          • Jul 2011
                          • 3478

                          #13
                          Yes, it's similar to htseq-count, though significantly faster.

                          Comment

                          • gringer
                            David Eccles (gringer)
                            • May 2011
                            • 845

                            #14
                            To repeat myself, you shouldn't be using cufflinks output as input to DESeq2, because DESeq is expecting raw count data, and depends on that for its model.

                            If you want to do isoform-level analysis with a DESeq-like workflow, look at DEXSeq, which has its own method of counting by using raw counts for exon bins.

                            Comment

                            • shi
                              Wei Shi
                              • Feb 2010
                              • 236

                              #15
                              Another option is to use limma/voom, which accepts fractional counts.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 06:09 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              34 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              39 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              46 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...