Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • different gene numbers in edgeR and DESeq

    Hi,all,

    I used edgeR and DESeq to do RNAseq analysis. EdgeR gave me 236 genes, while DESeq gave me 49 genes. And the 49 genes in DESeq are all belong to the 236 genes of EdgeR. I wonder is this normal phenomenon?

    And I don,t know whether it is because the different filtering process. In edgeR , I use the filtering in the following:
    > keep<-rowSums(cpm(y)>1)>=3
    > y<-y[keep,]

    while in DESeq, I used the following filtering:
    > rs=rowSums(counts(cds))
    > theta=0.3
    > use=(rs>quantile(rs,probs=theta))
    > table(use)
    > cdsFit=cds[use,]

    Could anyone give me some suggestions? Thank you!

    Best,

    Sadiexiaoyu

  • #2
    It's probably the filtering. Don't just blindly use the 0.3 value for DESeq or 1cpm for edgeR, you should tailor that to your particular dataset (you'll probably have more similar results then).

    Comment


    • #3
      Originally posted by dpryan View Post
      It's probably the filtering. Don't just blindly use the 0.3 value for DESeq or 1cpm for edgeR, you should tailor that to your particular dataset (you'll probably have more similar results then).
      Hi,dpryan,
      Thank you for your reply. I filtered my data in DESeq according to this paper (I chose FDR 0.01), http://www.bioconductor.org/packages..._filtering.pdf
      But I do not know how to set same parameters in both DESeq or edgeR for filtering. Is there any code in edgeR can do the same filtering process as in DESeq, or vice versa?

      Best,

      Sadiexiaoyu

      Comment


      • #4
        The easiest example would be the method you used in DESeq and apply it to the edgeR case:

        Code:
        y <- DGEList(counts=Data, group=condition)
        rs = rowSums(Data)
        theta=0.3
        use=(rs>quantile(rs, probs=theta))
        table(use)
        yFilt=y[use,]
        or something along those lines (I haven't tested it, but that's the gist).

        Regarding my earlier comment about blindly applying the threshold, I had assumed that you just used (for example) a theta of 0.3 since that's what the DESeq vignette used. It sounds like you followed the genefilter vignette, so just ignore what I wrote there

        Comment


        • #5
          Originally posted by sadiexiaoyu View Post
          I filtered my data in DESeq according to this paper (I chose FDR 0.01)
          Why 0.01? This is an unusual strict value. You really cannot tolerate more than one percent false positives among your hits? (I hope, BTW, you used the same cut-off for edgeR. Otherwise, a comparison would be rather pointless.)

          BTW, why don't you simply use the same filter for both DESeq and edgeR (or, to keep things simpler, no filter at all)?

          Simon

          Comment


          • #6
            Originally posted by dpryan View Post
            The easiest example would be the method you used in DESeq and apply it to the edgeR case:

            Code:
            y <- DGEList(counts=Data, group=condition)
            rs = rowSums(Data)
            theta=0.3
            use=(rs>quantile(rs, probs=theta))
            table(use)
            yFilt=y[use,]
            or something along those lines (I haven't tested it, but that's the gist).

            Regarding my earlier comment about blindly applying the threshold, I had assumed that you just used (for example) a theta of 0.3 since that's what the DESeq vignette used. It sounds like you followed the genefilter vignette, so just ignore what I wrote there
            Hi, dpryan,
            Thank you for your help! I tried the other method to make edgeR and DESeq filter around same reads:
            In edgeR, when you run the scripts in the following,
            > colnames(y)<-targets$Label
            > dim(y)
            [1] 26788 6
            > keep<-rowSums(cpm(y)>1)>=3
            > y<-y[keep,]
            > dim(y)
            [1] 17613 6
            you can see that you have filtered 26788-17613=9175 low count reads.
            In Deseq, when you run
            cds=newCountDataSet(x[,1:6],condition)
            > rs=rowSums(counts(cds))
            > theta=0.34
            > use=(rs>quantile(rs,probs=theta))
            > table(use)
            use
            FALSE TRUE
            9148 17640
            so you filtered 9148 low count reads, which is very similar with 9175 in edgeR (here I used theta=0.34).
            Then I run DESeq again. But still, I get very similar results as before (just several genes are added).
            And then I tried without filter method, I get less genes, and still, all the DESeq genes are belonging to edgeR genes (more than 200)result.

            So maybe it is not the filtering problem?

            Could it be the analysis differences between edgeR and DESeq?

            And interesting thing is that DESeq genes are belonging to edgeR genes...

            Best,

            Sadiexiaoyu
            Last edited by sadiexiaoyu; 06-05-2013, 11:36 AM.

            Comment


            • #7
              Originally posted by Simon Anders View Post
              Why 0.01? This is an unusual strict value. You really cannot tolerate more than one percent false positives among your hits? (I hope, BTW, you used the same cut-off for edgeR. Otherwise, a comparison would be rather pointless.)

              BTW, why don't you simply use the same filter for both DESeq and edgeR (or, to keep things simpler, no filter at all)?

              Simon
              Hi, Simon,

              I just replied as #6. I do not know whether my method is right to make the filter similar between edgeR and DESeq.
              Besides, for FDR 0.01, maybe it is too strict...but for edgeR, I also choose genes with FDR<0.01.
              I will also try FDR<0.05 later to see what is the difference between 0.05 and 0.01 in the final results.

              Best,

              Sadiexiaoyu

              Comment


              • #8
                Hi Sadie, if the unfiltered data produces that difference, then it must be algorithmic. I've not run into that big of a difference in my datasets, so I can't give you any ready insight regarding why that might happen. It'd be interesting to just visually look at the data (with IGV or similar) to see if the edgeR results seem correct or not.

                Comment


                • #9
                  Originally posted by dpryan View Post
                  Hi Sadie, if the unfiltered data produces that difference, then it must be algorithmic. I've not run into that big of a difference in my datasets, so I can't give you any ready insight regarding why that might happen. It'd be interesting to just visually look at the data (with IGV or similar) to see if the edgeR results seem correct or not.
                  Hi, dpryan,

                  Thank you for your suggestion I think maybe the DESeq is more strict than edgeR, although I do not know exactly why. I will try your suggestion and see what happens. Thanks!

                  Best,

                  Sadiexiaoyu

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Addressing Off-Target Effects in CRISPR Technologies
                    by seqadmin






                    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                    08-27-2024, 04:44 AM
                  • seqadmin
                    Selecting and Optimizing mRNA Library Preparations
                    by seqadmin



                    Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                    08-07-2024, 12:11 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 08-27-2024, 04:40 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-22-2024, 05:00 AM
                  0 responses
                  293 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-21-2024, 10:49 AM
                  0 responses
                  135 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 08-19-2024, 05:12 AM
                  0 responses
                  124 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X