Announcement

Collapse
No announcement yet.

DEGseq

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Xi Wang,

    --------------
    Thanks again for your package, seems to work fine. Will you be including more info in the output.html? For instance;
    * correlation measures
    * nr of reads included for each
    * amount of differentially expressed
    Of course we can extract that ourselfs from the output_score.txt files, but it still would be nice to have some more info directly

    --------------
    And why are the header values for the two compared samples called "value1" and "value2". If you give the sample-names via the flag "groupLabel1/2" in the function DEGexp() it would be nice if they show up in output_score.txt files too.

    --------------
    And what does the option "rawCount" excactly do?

    thanks,
    -SvL

    Comment


    • #17
      Hi Xi,

      I guess I'm asking whether there is/ought to be a correction for gene length because more reads are expected to map to longer genes.

      Something else I was wondering about was the use of multiple examples for the two conditions. If I use multiple columns for expCol1 and expCol2 the number of reads appears to to summed. Is it therefore a bad idea to use say 3 columns for expCol1 and 2 for expCol2?

      Adam

      Comment


      • #18
        Originally posted by svl View Post
        Thanks again for your package, seems to work fine. Will you be including more info in the output.html? For instance;
        * correlation measures
        * nr of reads included for each
        * amount of differentially expressed
        Of course we can extract that ourselfs from the output_score.txt files, but it still would be nice to have some more info directly
        --------------
        And why are the header values for the two compared samples called "value1" and "value2". If you give the sample-names via the flag "groupLabel1/2" in the function DEGexp() it would be nice if they show up in output_score.txt files too.
        Thanks a lot for your suggestions. We will add these info in the next version. I am not sure what "nr" is in the sentense "* nr of reads included for each". So could you please give me more details? Thanks.

        Originally posted by svl View Post
        And what does the option "rawCount" excactly do?
        The option rawCount is only used when the method=MATR is chosen. If rawCount = TRUE, we will adjust the mean of M to the same value for the case-and-control samples and the technical replicates. The difference of the mean of M is caused by the different sequence depth in the two samples compared. If rawCount = FALSE, we assume that the gene expression levels have already been normalized (against the sequence depth), such as RPKM. Therefore, no need to adjust the mean of M.

        May this information help you.

        Wish best wishes,
        Xi
        Last edited by Xi Wang; 11-18-2009, 09:06 AM.
        Xi Wang

        Comment


        • #19
          Originally posted by adamreid View Post
          I guess I'm asking whether there is/ought to be a correction for gene length because more reads are expected to map to longer genes.
          It is ture that more reads come from the longer genes if the copy number of transcripts is the same. However, with the aim to identify the differently expressed genes, we can use raw read counts. The reason is that we only consider every gene, and the gene length in samples is not changed (if ignoring the alternative splicing). For the methods based on the random sampling model (such as LRT, FET, MARS), we suggest using the raw counts, which better fits the random sampling model.

          Originally posted by adamreid View Post
          Something else I was wondering about was the use of multiple examples for the two conditions. If I use multiple columns for expCol1 and expCol2 the number of reads appears to to summed. Is it therefore a bad idea to use say 3 columns for expCol1 and 2 for expCol2?
          It works.

          Thanks for your questions.
          Xi
          Last edited by Xi Wang; 11-18-2009, 09:19 AM.
          Xi Wang

          Comment


          • #20
            Originally posted by Xi Wang View Post
            Thanks a lot for your suggestions. We will add these info in the next version. I am not sure what "nr" is in the sentense "* nr of reads included for each". So could you please give me more details? Thanks.
            Xi
            I wasn't too clear indeed :P, I meant the amount of reads the analysis is based on. I just quickly wrote some things that came to mind.

            Originally posted by Xi Wang View Post
            If rawCount = FALSE, we assume that the gene expression levels have already been normalized (against the sequence depth), such as RPKM.
            Xi
            Right. I have RPKM values (cufflinks output), so do you suggest I'd be better off using the method=MATR with rawCount=F instead of method=MARS...? It's not all technical replicates I put up against each other...

            Comment


            • #21
              Originally posted by svl View Post
              I wasn't too clear indeed :P, I meant the amount of reads the analysis is based on. I just quickly wrote some things that came to mind.
              Thanks. It's quite clear this time. We are also feeling those statistics are quite important in practice.

              Originally posted by svl View Post
              Right. I have RPKM values (cufflinks output), so do you suggest I'd be better off using the method=MATR with rawCount=F instead of method=MARS...? It's not all technical replicates I put up against each other...
              Sorry that I didn't make myself quite clearly instead:-( The rawCount option is only for method=MATR. But for other methods, no need to check whether the gene expression levels are quantified by raw read counts or not.
              Further, as we recommend to use raw read count as the gene expression level, you can multiply the RPKM by the gene length to get back the raw read count. If you don't want to do like this, DEGexp deals with RPKM well.
              Xi Wang

              Comment


              • #22
                I can't get DEGseq to run my data

                DEGseq look really nice, but I'm having trouble getting my data file read. Do I just need to substitute:

                >geneExpFile <- system.file("data", "GeneExpExample5000.txt", package = "DEGseq")
                with

                >geneExpFile <- system.file("data", "MyData.txt", package = "DEGseq")
                and then run DEGexp(commands)?

                I'm getting the following error:

                Error in read.table(geneExpFile1, header = header, sep = sep) :
                no lines available in input
                In addition: Warning message:
                In file(file, "rt") :
                file("") only supports open = "w+" and open = "w+b": using the former
                Thanks in advance,
                Nick

                Comment


                • #23
                  Originally posted by ngcrawford View Post
                  DEGseq look really nice, but I'm having trouble getting my data file read. Do I just need to substitute:



                  with



                  and then run DEGexp(commands)?

                  I'm getting the following error:



                  Thanks in advance,
                  Nick
                  Nick,

                  You can specify the gene expression file in this way:

                  Suppose the file path is "D:/data/MyData.txt" (windows platform), then
                  Code:
                  geneExpFile <- "D:/data/MyData.txt"
                  Xi
                  Xi Wang

                  Comment


                  • #24
                    Xi,

                    That worked like a charm. Thanks!

                    - Nick

                    Comment


                    • #25
                      Fdr?

                      How do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.

                      Thanks in advance.

                      - Nick

                      Comment


                      • #26
                        Question:

                        I am attempting to analyze samples that do not have the same number of reads. For example, one has 800K and another has 1.3M. With the analysis from DEGseq, it is obvious that the fold changes between samples are due to the difference in total read numbers. With this particular example, would you recommend using a reads/million normalization?

                        Thanks in advance!

                        Comment


                        • #27
                          Originally posted by ngcrawford View Post
                          How do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.

                          Thanks in advance.

                          - Nick
                          Sorry, I cannot catch what your meaning. What "it" refers to? Thanks.

                          Xi
                          Xi Wang

                          Comment


                          • #28
                            Originally posted by AmyL View Post
                            Question:

                            I am attempting to analyze samples that do not have the same number of reads. For example, one has 800K and another has 1.3M. With the analysis from DEGseq, it is obvious that the fold changes between samples are due to the difference in total read numbers. With this particular example, would you recommend using a reads/million normalization?

                            Thanks in advance!
                            AmyL,

                            Thanks for your question.
                            If you only care the fold changes, you can use a normalization as you mentioned. Or, in DEGseq, you can use the option normalMethod="median".

                            Xi
                            Xi Wang

                            Comment


                            • #29
                              Originally posted by ngcrawford View Post
                              How do you set it? It's mentioned in the paper, but I can only find ways to adjust the p-value cut-off.

                              Thanks in advance.

                              - Nick
                              Hi ngcrawford and Xi,
                              I think ngcrawford want to find a way to set the fdr cut-off.
                              The following is an example to set it.
                              DEGexp(geneExpFile1=geneExpFile,geneExpFile2=geneExpFile2,thresholdKind=4,qValue=0.001)
                              Please type ?DEGexp for detail.
                              ---------------
                              Likun

                              Comment


                              • #30
                                Originally posted by Xi Wang View Post
                                AmyL,

                                Thanks for your question.
                                If you only care the fold changes, you can use a normalization as you mentioned. Or, in DEGseq, you can use the option normalMethod="median".

                                Xi
                                BTW: For the fold changes you can do normalization as you methioned or use the option normalMethod="median". But for the methods "LRT", "FET(fisher's exact test)" and "MARS", the row count and normalMethod="none" are recommended for your example. .

                                Comment

                                Working...
                                X