Announcement

Collapse
No announcement yet.

HTseq to DeSeq/EdgeR to Heatmap

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).

    Comment


    • Originally posted by dpryan View Post
      I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).
      How about my methods in post#105?
      "
      Use Ensembl ID which is uniformed by the genes.gtf (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge(because Cufflinks+Cuffmerge would generate a merged.gtf whose gene name is mixture with gene name and uniprot ID as I show to you) , after then, I translate them to gene name by R package or online tools (Biomart) .
      "

      And "Tophat-htseq-edgeR/DESeq", the reads counts table is given by Ensembl ID, which I have tried.
      I also have run the "Texudo without Cufflinks and Cuffmerge" and the result could also give you the gene Ensembl ID.
      However, I miss the Cufflinks+Cuffmerge steps.
      Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?
      Last edited by super0925; 05-13-2014, 05:59 AM.

      Comment


      • Originally posted by super0925 View Post
        How about my methods in post#105?
        I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.

        Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?
        Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.

        Comment


        • Originally posted by dpryan View Post
          I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.



          Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.

          Hi I have a small question.
          If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
          How about DESeq?
          So far I know I could use 3 independence test (C1 vs C2, C2 vs C3, C1 vs C3) to get the DE genes. It is good but requiring for three times.
          Last edited by super0925; 05-14-2014, 06:59 AM.

          Comment


          • Originally posted by super0925 View Post
            Hi I have a small question.
            If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
            DESeq (better would be DESeq2) would work fine as well.

            Comment


            • Originally posted by dpryan View Post
              DESeq (better would be DESeq2) would work fine as well.
              But Cuffdiff could only do pairwise groups, am I right?
              So I need to do C1vsC2,C2vsC3,C3vsC1 in cuffdiff one by one, and compare the result with edgeR or DESeq2(What you recommend is better and advantage than DESeq).

              Comment


              • I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

                I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.

                Comment


                • Originally posted by dpryan View Post
                  I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

                  I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.
                  Hi D,
                  Q1:I have seen your post on bioStar. You suggest we remove the low counts by genefilter when using DESeq/edgeR/limma. But DESeq2 automatically has filter. Am I right?
                  Q2:
                  But so many function in genefilter, do I use genefilter function?
                  how about the parameter?


                  their usage:
                  set.seed(-1)
                  f1 <- kOverA(5, 10)
                  flist <- filterfun(f1, allNA)
                  exprA <- matrix(rnorm(1000, 10), ncol = 10) (change it to count matrix)
                  ans <- genefilter(exprA, flist)

                  Is that OK?
                  Last edited by super0925; 05-15-2014, 07:30 AM.

                  Comment


                  • At least the more recent versions, yes.

                    Comment


                    • Originally posted by dpryan View Post
                      At least the more recent versions, yes.
                      But so many function in genefilter,how about the parameter seeting?
                      their usage in Manual:
                      set.seed(-1)
                      f1 <- kOverA(5, 10)
                      flist <- filterfun(f1, allNA)
                      exprA <- matrix(rnorm(1000, 10), ncol = 10)
                      ans <- genefilter(exprA, flist)

                      My commands:
                      set.seed(-1)
                      f1 <- kOverA(5, 10)----Q: This means an expression measure above 10 in at least 5 samples.but how to decide this parameter? And, could I use ttest instead?
                      flist <- filterfun(f1)
                      exprA <- counts
                      ----Two conditions with 3 samples in each group
                      head(counts)
                      C1.R1 C1.R2 C1.R3 C2.R1 C2.R2 C2.R3
                      ENSBTAG00000000003 0 0 0 0 0 0
                      ENSBTAG00000000005 1 0 0 1 0 0
                      ENSBTAG00000000008 2 2 1 0 2 0
                      ans <- genefilter(exprA, flist)
                      counts_filter<-counts[which(ans==1), ]


                      Is this OK?
                      Last edited by super0925; 05-15-2014, 08:00 AM.

                      Comment


                      • You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)

                        Comment


                        • Originally posted by dpryan View Post
                          You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)
                          I have seen you PDF. It is nice. However, they use DESeq's fitNbinomGLMs function to decide the p-value, (but we could use DESeq2 to instead of DESeq and DESeq2 is atumatically filtered as you said)But how about edgeR?


                          And how about my command in my post #115? Is is too naive?

                          Comment


                          • I wouldn't normally recommend the kOverA method for RNAseq, since it's usually more meaningful to use summed counts or average counts (I think DESeq2 uses the average counts).

                            Using genefilter with edgeR would work the same as in the aforementioned PDF. You perform your tests as normal and then put whatever filtering metric (summed counts, average counts, etc.) and the raw p-values into filtered_p() or filtered_R() and continue in a similar manner.

                            Comment


                            • Originally posted by dpryan View Post
                              DESeq (better would be DESeq2) would work fine as well.
                              Hi D,
                              I have tried what you said the multiple condition comparison in DESeq2.

                              The input count matrix is (3 conditions with 3 samples in per condition)

                              C1.R1 C1.R2 C1.R3 C2.R1 C2.R2 C2.R3 C3.R1 C3.R2 C3.R3
                              ENSBTAG00000000003 0 0 0 0 0 0 0 0 0
                              ENSBTAG00000000005 1 0 0 1 0 0 2 2 2
                              ENSBTAG00000000008 2 2 1 0 2 0 1 1 2

                              Here is my script:

                              library("DESeq2")
                              colData<- data.frame(condition=factor(c(rep("C1",3),rep("C2",3),rep("C3",3))))
                              dds<-DESeqDataSetFromMatrix(countData= output2,colData= colData,design=~condition)
                              dds$condition<-factor(dds$condition,levels=c("C1","C2","C3"))
                              dds<-DESeq(dds)
                              res<-results(dds)
                              resOrdered<-res[order(res$padj),]
                              head(resOrdered)
                              write.csv(as.data.frame(resOrdered),file="C1_C2_C3_results.csv")

                              But the output is only one table with the significant DE genes list ranked by P-value (is it C1vsC2vsC3?), but how do I know the ranking of C1vs C2,C2vsC3 and C1vsC3 which is what I want ?
                              Sorry for this naive question
                              Last edited by super0925; 05-19-2014, 03:02 AM.

                              Comment


                              • Read help(results)

                                Comment

                                Working...
                                X