Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • different gene numbers in edgeR and DESeq

    Hi,all,

    I used edgeR and DESeq to do RNAseq analysis. EdgeR gave me 236 genes, while DESeq gave me 49 genes. And the 49 genes in DESeq are all belong to the 236 genes of EdgeR. I wonder is this normal phenomenon?

    And I don,t know whether it is because the different filtering process. In edgeR , I use the filtering in the following:
    > keep<-rowSums(cpm(y)>1)>=3
    > y<-y[keep,]

    while in DESeq, I used the following filtering:
    > rs=rowSums(counts(cds))
    > theta=0.3
    > use=(rs>quantile(rs,probs=theta))
    > table(use)
    > cdsFit=cds[use,]

    Could anyone give me some suggestions? Thank you!

    Best,

    Sadiexiaoyu

  • #2
    It's probably the filtering. Don't just blindly use the 0.3 value for DESeq or 1cpm for edgeR, you should tailor that to your particular dataset (you'll probably have more similar results then).

    Comment


    • #3
      Originally posted by dpryan View Post
      It's probably the filtering. Don't just blindly use the 0.3 value for DESeq or 1cpm for edgeR, you should tailor that to your particular dataset (you'll probably have more similar results then).
      Hi,dpryan,
      Thank you for your reply. I filtered my data in DESeq according to this paper (I chose FDR 0.01), http://www.bioconductor.org/packages..._filtering.pdf
      But I do not know how to set same parameters in both DESeq or edgeR for filtering. Is there any code in edgeR can do the same filtering process as in DESeq, or vice versa?

      Best,

      Sadiexiaoyu

      Comment


      • #4
        The easiest example would be the method you used in DESeq and apply it to the edgeR case:

        Code:
        y <- DGEList(counts=Data, group=condition)
        rs = rowSums(Data)
        theta=0.3
        use=(rs>quantile(rs, probs=theta))
        table(use)
        yFilt=y[use,]
        or something along those lines (I haven't tested it, but that's the gist).

        Regarding my earlier comment about blindly applying the threshold, I had assumed that you just used (for example) a theta of 0.3 since that's what the DESeq vignette used. It sounds like you followed the genefilter vignette, so just ignore what I wrote there

        Comment


        • #5
          Originally posted by sadiexiaoyu View Post
          I filtered my data in DESeq according to this paper (I chose FDR 0.01)
          Why 0.01? This is an unusual strict value. You really cannot tolerate more than one percent false positives among your hits? (I hope, BTW, you used the same cut-off for edgeR. Otherwise, a comparison would be rather pointless.)

          BTW, why don't you simply use the same filter for both DESeq and edgeR (or, to keep things simpler, no filter at all)?

          Simon

          Comment


          • #6
            Originally posted by dpryan View Post
            The easiest example would be the method you used in DESeq and apply it to the edgeR case:

            Code:
            y <- DGEList(counts=Data, group=condition)
            rs = rowSums(Data)
            theta=0.3
            use=(rs>quantile(rs, probs=theta))
            table(use)
            yFilt=y[use,]
            or something along those lines (I haven't tested it, but that's the gist).

            Regarding my earlier comment about blindly applying the threshold, I had assumed that you just used (for example) a theta of 0.3 since that's what the DESeq vignette used. It sounds like you followed the genefilter vignette, so just ignore what I wrote there
            Hi, dpryan,
            Thank you for your help! I tried the other method to make edgeR and DESeq filter around same reads:
            In edgeR, when you run the scripts in the following,
            > colnames(y)<-targets$Label
            > dim(y)
            [1] 26788 6
            > keep<-rowSums(cpm(y)>1)>=3
            > y<-y[keep,]
            > dim(y)
            [1] 17613 6
            you can see that you have filtered 26788-17613=9175 low count reads.
            In Deseq, when you run
            cds=newCountDataSet(x[,1:6],condition)
            > rs=rowSums(counts(cds))
            > theta=0.34
            > use=(rs>quantile(rs,probs=theta))
            > table(use)
            use
            FALSE TRUE
            9148 17640
            so you filtered 9148 low count reads, which is very similar with 9175 in edgeR (here I used theta=0.34).
            Then I run DESeq again. But still, I get very similar results as before (just several genes are added).
            And then I tried without filter method, I get less genes, and still, all the DESeq genes are belonging to edgeR genes (more than 200)result.

            So maybe it is not the filtering problem?

            Could it be the analysis differences between edgeR and DESeq?

            And interesting thing is that DESeq genes are belonging to edgeR genes...

            Best,

            Sadiexiaoyu
            Last edited by sadiexiaoyu; 06-05-2013, 11:36 AM.

            Comment


            • #7
              Originally posted by Simon Anders View Post
              Why 0.01? This is an unusual strict value. You really cannot tolerate more than one percent false positives among your hits? (I hope, BTW, you used the same cut-off for edgeR. Otherwise, a comparison would be rather pointless.)

              BTW, why don't you simply use the same filter for both DESeq and edgeR (or, to keep things simpler, no filter at all)?

              Simon
              Hi, Simon,

              I just replied as #6. I do not know whether my method is right to make the filter similar between edgeR and DESeq.
              Besides, for FDR 0.01, maybe it is too strict...but for edgeR, I also choose genes with FDR<0.01.
              I will also try FDR<0.05 later to see what is the difference between 0.05 and 0.01 in the final results.

              Best,

              Sadiexiaoyu

              Comment


              • #8
                Hi Sadie, if the unfiltered data produces that difference, then it must be algorithmic. I've not run into that big of a difference in my datasets, so I can't give you any ready insight regarding why that might happen. It'd be interesting to just visually look at the data (with IGV or similar) to see if the edgeR results seem correct or not.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  Hi Sadie, if the unfiltered data produces that difference, then it must be algorithmic. I've not run into that big of a difference in my datasets, so I can't give you any ready insight regarding why that might happen. It'd be interesting to just visually look at the data (with IGV or similar) to see if the edgeR results seem correct or not.
                  Hi, dpryan,

                  Thank you for your suggestion I think maybe the DESeq is more strict than edgeR, although I do not know exactly why. I will try your suggestion and see what happens. Thanks!

                  Best,

                  Sadiexiaoyu

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:06 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-30-2024, 12:17 PM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-29-2024, 10:49 AM
                  0 responses
                  22 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-25-2024, 11:49 AM
                  0 responses
                  27 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X