Announcement

Collapse
No announcement yet.

HTseq to DeSeq/EdgeR to Heatmap

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by dpryan View Post
    Have a look at my answer over on biostars to a similar question. That should tell you most of what you want to know (particularly given the included links and replies from others).
    Hi dpryan
    If I only have one sample in each condition, how do I do the DE analysis?
    Cause I think I need at least 2 replicates in each condition and then doing the DE statistical by DESeq/edgeR/Cuffdiff. Am I right?

    If I don't run Cufflinks but running Htseq-counts,
    Could I use this kind of method to measure it?

    gene1: (a1/s1)-(a2/s2)

    a1 is the counts (i.e. the number of mapped reads) of this gene in sample1
    s1 is the total mapped reads in sample 1
    a2 is the counts (i.e. the number of mapped reads) of this gene in sample2
    s2 is the total mapped reads in sample 2

    Or if I use FPKM calculated by Cifflinks to measure it ? i.e. FPKM1-FPKM2?

    Comment


    • #62
      You don't do a DE analysis without replicates. The best you can do is look at the ranked fold-changes (have a look at the GFold package, which tries to do this in a somewhat more useful way). Personally, I wouldn't waste my time without good reason.

      Comment


      • #63
        Originally posted by dpryan View Post
        You don't do a DE analysis without replicates. The best you can do is look at the ranked fold-changes (have a look at the GFold package, which tries to do this in a somewhat more useful way). Personally, I wouldn't waste my time without good reason.
        I would recommend Gfold as well. Also I made Wrapper/GUI for it on iPlant Collaborative. I am going to test it out this or next week to see if it works properly on it.

        Comment


        • #64
          Originally posted by Zapages View Post
          I would recommend Gfold as well. Also I made Wrapper/GUI for it on iPlant Collaborative. I am going to test it out this or next week to see if it works properly on it.
          Hi, dpryan and zapages, thank you both !

          Comment


          • #65
            Originally posted by dpryan View Post
            You don't do a DE analysis without replicates. The best you can do is look at the ranked fold-changes (have a look at the GFold package, which tries to do this in a somewhat more useful way). Personally, I wouldn't waste my time without good reason.
            I found that the ranking of DE genes list (e.g. Top 1000 up/down regulated genes) are different in DESeq,edgeR,Cuffdiff,baySeq,...... (Of course they are different), now including this Gfold method, which is not reply on P-value but ''generalized fold change''.
            How do I final make the decision by ranking Significant genes? Each methods has their advantage and claimed their most reliable.

            I remember last time I told you I am thinking to calculate the overlap over Top 1000 genes list among the 3-5 methods. And then decide to use the method with biggest overlap. Am I right? Or is this measurement too rude and naive ?Or any other methods from your suggestion?
            Last edited by super0925; 04-01-2014, 07:27 AM.

            Comment


            • #66
              It doesn't matter which one you pick, they'll all be only vaguely meaningful at best...that's the problem with not having replicates.

              Comment


              • #67
                Originally posted by dpryan View Post
                It doesn't matter which one you pick, they'll all be only vaguely meaningful at best...that's the problem with not having replicates.
                Sorry, I didn't say it clearly, what I mean is the real dataset with replicate. Of course the ranking of TOP1000 DE genes are different from cuffdiff/Deseq/edgeR/Gfold/limma...... So I said do I need to do 3-5 simultaneously and then compare the overlap? The method with the biggest overlap is the best one? Cheers.

                Comment


                • #68
                  If you have replicates then there's no point in using GFold, so just exclude it. Aside from that, the best way to go forward is to make a relatively small list of candidates that aren't shared between the different packages and then see if they validate using other means (qPCR or whatever you prefer). Then you have an idea which package is better modeling your particular dataset. Try to pick candidates with relatively similar characteristics (i.e., fold-change and adjusted p-value) just to make it a more apples-to-apples comparison.

                  BTW, usually the most significant genes overlap pretty heavily among the various tools. It's normally at the margins of significance where you see differences (unsurprisingly).

                  Comment


                  • #69
                    DEseq and Heatmap

                    Can someone please advise what is the best script to create a heatmap with only differentially expressed genes? I've tried the script below and I am not sure this is correct way to do it

                    > select <- order(p.adjust( res$pvalue[keep], method="BH" ) < .1 )
                    > heatmap.2( assay(rld) [ select, ], scale="row", cexRow=0.5, cexCol=0.75,
                    + trace="none", dendrogram="column",
                    + col = rev(heat.colors(25)))

                    thank you

                    Comment


                    • #70
                      The general concept of that seems reasonable. My only concern would be if you're first subsetting "res$pvalue" and if you haven't already done this with "rld" (or the object you used to make it" then the "select" indices may not be comparable. Otherwise, that seems reasonable.

                      Comment


                      • #71
                        thank you Ryan, do you mean I need to transform my data with rld before I select a subset?

                        Comment


                        • #72
                          No, my concern is whether the genes from " res$pvalue[keep]" correspond to those from "assay(rld)" or not, since in one case you're subsetting an object and the other not. Without knowing what you did in the previous steps, it's impossible to say of the indices in "select" are appropriate or not. Presumably you did things such that this makes sense, but this is unclear.

                          Comment


                          • #73
                            Originally posted by dpryan View Post
                            You don't do a DE analysis without replicates. The best you can do is look at the ranked fold-changes (have a look at the GFold package, which tries to do this in a somewhat more useful way). Personally, I wouldn't waste my time without good reason.
                            I have tried Gfold on my dataset . It seems works well. However, my samples are strand. I use strand (Tophat setting) and unstrand (the defaulf of Tophat) to map my samples and get the sam/bam file separately. However, the Top 100 genes ranking by GFold is almost same between strand and unstrand.Is it normal?

                            Comment


                            • #74
                              That would seem normal to me. If you used a stranded protocol then all of the counts and mappings should be very similar (they'd be identical in a perfect world).

                              Comment


                              • #75
                                Hi Ryan. The script I used does not seem to filter the genes based on padj < 0.1. I have followed the DESeq2 vignette and the only change I am introducing is the selection step for the genes based on the padj

                                Comment

                                Working...
                                X