Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • different gene numbers in edgeR and DESeq

    Hi,all,

    I used edgeR and DESeq to do RNAseq analysis. EdgeR gave me 236 genes, while DESeq gave me 49 genes. And the 49 genes in DESeq are all belong to the 236 genes of EdgeR. I wonder is this normal phenomenon?

    And I don,t know whether it is because the different filtering process. In edgeR , I use the filtering in the following:
    > keep<-rowSums(cpm(y)>1)>=3
    > y<-y[keep,]

    while in DESeq, I used the following filtering:
    > rs=rowSums(counts(cds))
    > theta=0.3
    > use=(rs>quantile(rs,probs=theta))
    > table(use)
    > cdsFit=cds[use,]

    Could anyone give me some suggestions? Thank you!

    Best,

    Sadiexiaoyu

  • #2
    It's probably the filtering. Don't just blindly use the 0.3 value for DESeq or 1cpm for edgeR, you should tailor that to your particular dataset (you'll probably have more similar results then).

    Comment


    • #3
      Originally posted by dpryan View Post
      It's probably the filtering. Don't just blindly use the 0.3 value for DESeq or 1cpm for edgeR, you should tailor that to your particular dataset (you'll probably have more similar results then).
      Hi,dpryan,
      Thank you for your reply. I filtered my data in DESeq according to this paper (I chose FDR 0.01), http://www.bioconductor.org/packages..._filtering.pdf
      But I do not know how to set same parameters in both DESeq or edgeR for filtering. Is there any code in edgeR can do the same filtering process as in DESeq, or vice versa?

      Best,

      Sadiexiaoyu

      Comment


      • #4
        The easiest example would be the method you used in DESeq and apply it to the edgeR case:

        Code:
        y <- DGEList(counts=Data, group=condition)
        rs = rowSums(Data)
        theta=0.3
        use=(rs>quantile(rs, probs=theta))
        table(use)
        yFilt=y[use,]
        or something along those lines (I haven't tested it, but that's the gist).

        Regarding my earlier comment about blindly applying the threshold, I had assumed that you just used (for example) a theta of 0.3 since that's what the DESeq vignette used. It sounds like you followed the genefilter vignette, so just ignore what I wrote there

        Comment


        • #5
          Originally posted by sadiexiaoyu View Post
          I filtered my data in DESeq according to this paper (I chose FDR 0.01)
          Why 0.01? This is an unusual strict value. You really cannot tolerate more than one percent false positives among your hits? (I hope, BTW, you used the same cut-off for edgeR. Otherwise, a comparison would be rather pointless.)

          BTW, why don't you simply use the same filter for both DESeq and edgeR (or, to keep things simpler, no filter at all)?

          Simon

          Comment


          • #6
            Originally posted by dpryan View Post
            The easiest example would be the method you used in DESeq and apply it to the edgeR case:

            Code:
            y <- DGEList(counts=Data, group=condition)
            rs = rowSums(Data)
            theta=0.3
            use=(rs>quantile(rs, probs=theta))
            table(use)
            yFilt=y[use,]
            or something along those lines (I haven't tested it, but that's the gist).

            Regarding my earlier comment about blindly applying the threshold, I had assumed that you just used (for example) a theta of 0.3 since that's what the DESeq vignette used. It sounds like you followed the genefilter vignette, so just ignore what I wrote there
            Hi, dpryan,
            Thank you for your help! I tried the other method to make edgeR and DESeq filter around same reads:
            In edgeR, when you run the scripts in the following,
            > colnames(y)<-targets$Label
            > dim(y)
            [1] 26788 6
            > keep<-rowSums(cpm(y)>1)>=3
            > y<-y[keep,]
            > dim(y)
            [1] 17613 6
            you can see that you have filtered 26788-17613=9175 low count reads.
            In Deseq, when you run
            cds=newCountDataSet(x[,1:6],condition)
            > rs=rowSums(counts(cds))
            > theta=0.34
            > use=(rs>quantile(rs,probs=theta))
            > table(use)
            use
            FALSE TRUE
            9148 17640
            so you filtered 9148 low count reads, which is very similar with 9175 in edgeR (here I used theta=0.34).
            Then I run DESeq again. But still, I get very similar results as before (just several genes are added).
            And then I tried without filter method, I get less genes, and still, all the DESeq genes are belonging to edgeR genes (more than 200)result.

            So maybe it is not the filtering problem?

            Could it be the analysis differences between edgeR and DESeq?

            And interesting thing is that DESeq genes are belonging to edgeR genes...

            Best,

            Sadiexiaoyu
            Last edited by sadiexiaoyu; 06-05-2013, 11:36 AM.

            Comment


            • #7
              Originally posted by Simon Anders View Post
              Why 0.01? This is an unusual strict value. You really cannot tolerate more than one percent false positives among your hits? (I hope, BTW, you used the same cut-off for edgeR. Otherwise, a comparison would be rather pointless.)

              BTW, why don't you simply use the same filter for both DESeq and edgeR (or, to keep things simpler, no filter at all)?

              Simon
              Hi, Simon,

              I just replied as #6. I do not know whether my method is right to make the filter similar between edgeR and DESeq.
              Besides, for FDR 0.01, maybe it is too strict...but for edgeR, I also choose genes with FDR<0.01.
              I will also try FDR<0.05 later to see what is the difference between 0.05 and 0.01 in the final results.

              Best,

              Sadiexiaoyu

              Comment


              • #8
                Hi Sadie, if the unfiltered data produces that difference, then it must be algorithmic. I've not run into that big of a difference in my datasets, so I can't give you any ready insight regarding why that might happen. It'd be interesting to just visually look at the data (with IGV or similar) to see if the edgeR results seem correct or not.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  Hi Sadie, if the unfiltered data produces that difference, then it must be algorithmic. I've not run into that big of a difference in my datasets, so I can't give you any ready insight regarding why that might happen. It'd be interesting to just visually look at the data (with IGV or similar) to see if the edgeR results seem correct or not.
                  Hi, dpryan,

                  Thank you for your suggestion I think maybe the DESeq is more strict than edgeR, although I do not know exactly why. I will try your suggestion and see what happens. Thanks!

                  Best,

                  Sadiexiaoyu

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Best Practices for Single-Cell Sequencing Analysis
                    by seqadmin



                    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                    06-06-2024, 07:15 AM
                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 06-21-2024, 07:49 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-20-2024, 07:23 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-17-2024, 06:54 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-14-2024, 07:24 AM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X