Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Marianna85 View Post
    With bot the normalization methods I obtain size factors very different:
    -using DESeq 0,095 for one library and 10,85 for the other.
    -using edgeR 0,14 and 7,2 respectively.
    This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
    - for DESeq, just by the size factod
    - for edgeR, by the total read counts and by the normalization factors

    Comment


    • #17
      Hi Simon,
      happy to read your answer

      Originally posted by Simon Anders View Post
      This is because edgeR gives is norm factors relative to the total read count. To get expression values on a scale comparable across sample, you have to divide the counts
      - for DESeq, just by the size factod
      - for edgeR, by the total read counts and by the normalization factors
      Just an example to better understand.
      For DESeq
      gene A raw counts; 5 reads in library 1 - 70 reads in library 2
      library 1:36 million reads size factor: 0.09
      library 2:64 million reads size factor: 10
      gene A normalized counts lib 1=5/0.09 - lib2=70/10

      For edgeR
      gene A raw counts; 5 reads in library 1 - 70 reads in library 2
      library 1:36 million reads size factor: 0.14
      library 2:64 million reads size factor: 7.2
      gene A normalized counts lib 1=(5/36 million)/0.14 - lib2=(70/64million)/7.2

      in this case with a very huge difference in library size, it seems better to normalize with edgeR. Isn't it?

      Thanks a lot.
      I really appreciate your answer.

      Marianna

      Comment


      • #18
        I am using both edgeR and DESeq to analyze my own dataset. In general, the fold change reported by both are close (should be, right?).

        If your orignal librares have 36 vs 64 million of total reads, your normalization factors from both methods are very weird (at least quite unusual). Please check whether you analyze your dataset properly.

        The idea of normalization behind edgeR and DESeq is very similar to each other (but implementation is different). In practice, I don't see one method is superior than the other. However, it is definitely a mistake if we feed DESeq with the normalization factor from edgeR, and vice versa.

        Comment


        • #19
          Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.
          Last edited by Jeremy; 03-04-2013, 05:50 PM.

          Comment


          • #20
            Originally posted by Jeremy View Post
            Something is going wrong somewhere, 36M and 64M reads should give normalization factors with less than a 2-fold difference. The normalization factors you listed had a 50 fold difference and suggest a much greater difference in read count.
            Hi Jeremy,
            in fact I was surprised to obtain such a difference...
            I've not yet understood which is the mistake in the size factor calculation.

            Comment


            • #21
              Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.

              Comment


              • #22
                Originally posted by Simon Anders View Post
                Have you already looked at a scatter plot comparing the counts for the two samples? This should clarify what is going on.
                Simon, do you mean the estimateDispersions?

                Comment


                • #23
                  This is the script I used

                  CountTable=read.table("decEggs.txt", header=TRUE, row.names=1 )
                  head(CountTable)
                  decDesign = data.frame(row.names = colnames( CountTable ), condition = c( "stripped", "spawned" ), libType = c( "paired-end", "paired-end" ) )
                  decDesign
                  pairedSamples = decDesign$libType == "paired-end"
                  condition = decDesign$condition[ pairedSamples ]
                  library( "DESeq" )
                  cds = newCountDataSet( CountTable, condition )
                  cds = estimateSizeFactors( cds )
                  sizeFactors( cds )
                  head( counts( cds, normalized=TRUE ) )


                  and the size factors have a 100 fold difference...
                  what should I do???

                  Comment


                  • #24

                    Comment


                    • #25
                      Originally posted by Marianna85 View Post
                      Simon, do you mean the estimateDispersions?
                      No, I mean a scatter plot of the reads.

                      Try, e.g.,

                      Code:
                      plot( log10( 1 + counts(cds)[1,] ), log10( 1 + counts(cds)[2,] ), pch="." )
                      to plot the raw, unnormalized read counts of the second sample versus the first on a log scale.

                      Comment


                      • #26
                        Hi Simon,
                        the plot seems empty...
                        may I change the axis scale?

                        Comment


                        • #27
                          This will be hard to debug via the forum. You may need to get some local help.

                          To try one thing: If you simply type "counts(cds)", you get your table of raw counts (or, if you just want the first 100 lines, try "head( counts(cds), 100 )". Check whether they make sense.

                          Comment


                          • #28
                            Sorry, I made a type. It's

                            Code:
                            plot( log10( 1 + counts(cds)[,1] ), log10( 1 + counts(cds)[,2] ), pch="." )

                            Comment


                            • #29
                              of course! I defined the cds rows, not the columns.
                              So this is the plot...



                              something strange in your opinion??

                              Comment


                              • #30
                                ".emf"? That's Windows extended metafile, right? Haven't seen this graphics file format in ten years, and frankly, I have no idea how to open it. Could you use something more common, please, maybe png?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Recent Advances in Sequencing Analysis Tools
                                  by seqadmin


                                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                  05-06-2024, 07:48 AM
                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:57 AM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-06-2024, 07:17 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-02-2024, 08:06 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-30-2024, 12:17 PM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X