Announcement

Collapse
No announcement yet.

HTseq-count to DEseq2: Need a little help..

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTseq-count to DEseq2: Need a little help..

    Hi!
    This is what Ive done:

    sampleFiles <- list.files(path="/Volumes/timemachine/HTseq_DEseq2",pattern="*.txt")
    sampleCondition <- read.table("/Volumes/timemachine/HTseq_DEseq2/03_SampleCondition.txt", header=TRUE)
    sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
    directory <- c("/Volumes/timemachine/HTseq_DEseq2/")
    ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition)


    Im getting an error:

    "Error in validObject(.Object) :
    invalid class "DESeqDataSet" object: all variables in design formula must be columns in colData"
    Last edited by sindrle; 10-18-2013, 03:56 AM.

  • #2
    I have also tried this:

    status <- factor(c(rep("Healthy",26), rep("Diabetic",22)))
    timepoints = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2)
    des <- formula(~timepoints+status)
    ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ des)

    Same error..

    Comment


    • #3
      What are the dimensions of sampleTable and have you tried "design=des" rather than "design=~des"? For your first post, what were the dimensions of condition?

      Comment


      • #4
        Heres my "sampleTable":
        sampleName fileName Condition
        1 D104.txt D104.txt Diabetic_pre-exercise
        2 D121.txt D121.txt Diabetic_pre-exercise
        3 D153.txt D153.txt Diabetic_pre-exercise
        4 D155.txt D155.txt Diabetic_pre-exercise
        5 D161.txt D161.txt Diabetic_pre-exercise
        6 D162.txt D162.txt Diabetic_pre-exercise
        7 D167.txt D167.txt Diabetic_pre-exercise
        8 D173.txt D173.txt Diabetic_pre-exercise
        9 D176.txt D176.txt Diabetic_pre-exercise
        10 D177.txt D177.txt Diabetic_pre-exercise
        11 D179.txt D179.txt Diabetic_pre-exercise
        12 D204.txt D204.txt Diabetic_post-exercise
        13 D221.txt D221.txt Diabetic_post-exercise
        14 D253.txt D253.txt Diabetic_post-exercise
        15 D255.txt D255.txt Diabetic_post-exercise
        16 D261.txt D261.txt Diabetic_post-exercise
        17 D262.txt D262.txt Diabetic_post-exercise
        18 D267.txt D267.txt Diabetic_post-exercise
        19 D273.txt D273.txt Diabetic_post-exercise
        20 D276.txt D276.txt Diabetic_post-exercise
        21 D277.txt D277.txt Diabetic_post-exercise
        22 D279.txt D279.txt Diabetic_post-exercise
        23 N101.txt N101.txt Healthy_pre-exercise
        24 N108.txt N108.txt Healthy_pre-exercise
        25 N113.txt N113.txt Healthy_pre-exercise
        26 N170.txt N170.txt Healthy_pre-exercise
        27 N171.txt N171.txt Healthy_pre-exercise
        28 N172.txt N172.txt Healthy_pre-exercise
        29 N175.txt N175.txt Healthy_pre-exercise
        30 N181.txt N181.txt Healthy_pre-exercise
        31 N182.txt N182.txt Healthy_pre-exercise
        32 N183.txt N183.txt Healthy_pre-exercise
        33 N186.txt N186.txt Healthy_pre-exercise
        34 N187.txt N187.txt Healthy_pre-exercise
        35 N188.txt N188.txt Healthy_pre-exercise
        36 N201.txt N201.txt Healthy_post-exercise
        37 N208.txt N208.txt Healthy_post-exercise
        38 N213.txt N213.txt Healthy_post-exercise
        39 N270.txt N270.txt Healthy_post-exercise
        40 N271.txt N271.txt Healthy_post-exercise
        41 N272.txt N272.txt Healthy_post-exercise
        42 N275.txt N275.txt Healthy_post-exercise
        43 N281.txt N281.txt Healthy_post-exercise
        44 N282.txt N282.txt Healthy_post-exercise
        45 N283.txt N283.txt Healthy_post-exercise
        46 N286.txt N286.txt Healthy_post-exercise
        47 N287.txt N287.txt Healthy_post-exercise
        48 N288.txt N288.txt Healthy_post-exercise

        Comment


        • #5
          Design = des doen not work:

          ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= des)
          Error in validObject(.Object) :
          invalid class "DESeqDataSet" object: all variables in design formula must be columns in colData

          Comment


          • #6
            For the first post:

            sampleCondition =read.table("/Volumes/timemachine/HTseq_DEseq2/03_SampleCondition.txt", header=TRUE)
            condition = sampleCondition

            03_SampleCondition.txt:

            Condition
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_pre-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Diabetic_post-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_pre-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Healthy_post-exercise
            Last edited by sindrle; 10-18-2013, 05:57 AM.

            Comment


            • #7
              Ah, I see what the problem is. colData is is set from SampleTable (it's all but the first 2 columns). However, there's just a single column there and you have two factors in your design. Try the following:

              Code:
              sampleFiles <- list.files(path="/Volumes/timemachine/HTseq_DEseq2",pattern="*.txt")
              status <- factor(c(rep("Healthy",26), rep("Diabetic",22)))
              timepoints = as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2))
              sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, status=status, timepoints=timepoints)
              directory <- c("/Volumes/timemachine/HTseq_DEseq2/")
              des <- formula(~timepoints+status)
              ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= des)
              Something along those lines should work (I always use DESeqDataSetFromMatrix).

              Comment


              • #8
                No surprise, that worked like a charm!

                Thanks again, Ill try to run DEseq2 now for diff.exp.

                Comment


                • #9
                  One fast question:

                  What does the log2FoldChange from the DESeq2 results now tell? In light of the design= des?

                  Is it correct to interpret it as:

                  "These genes are significantly changed in diabetic patients from time point one to time point two, with healthy as controls"?

                  Comment


                  • #10
                    With a multifactor design, there's more than a single set of results. Given a DESeqDataSet named "dds" (they use this name in the vignette, so I'll use it here for the sake of consistency), you can type:
                    Code:
                    resultsNames(dds)
                    to get the coefficient names, which will be something like, "Intercept", "timepoints_1_vs_2", and "status_healthy_vs_diabetic", for you. You can the retrieve each results table:
                    Code:
                    statusResults <- results(dds, "status_healthy_vs_diabetic")
                    timepointsResults <- results(dds, "timepoints_1_vs_2")
                    The log2FoldChange column in statusResults would be "The log2() fold change in diabetic vs. control patients when controlling for timepoint". The equivalent in timepointsResults would be "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". BTW, since your timepoints are before/after treatment, I imagine that you're interested in a possible interaction between status and timepoint/treatment. You could specify that in your design by swapping a "*" for the "+".

                    One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same).

                    Comment


                    • #11
                      Very good! Thank you.

                      "One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same)."

                      This is true, they are drawn from the same subjects! Ill look into that.

                      Thanks again!

                      Comment


                      • #12
                        Just to check if I got this correctly:

                        These are the 4 results I get:

                        statusResults <- results(dds, "status_Healthy_vs_Diabetic");
                        timepointsResults <- results(dds, "timepoints_2_vs_1");
                        interceptResults <- results(dds, "Intercept");
                        status&treatmentResults <- results(dds, "timepoints2.statusHealthy")

                        # statusResults: "The log2() fold change in diabetic vs. control patients when controlling for timepoint". Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?

                        # timepointsResults: "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?

                        #interceptResults: What is this??

                        # status&treatmentResults: "interaction between status and treatment". Meaning how the treatment affects gene expression differently in healthy or diabetic?

                        I have posted a new thread on how to implement paired data between time point 1 and 2 btw.

                        Thanks!

                        Comment


                        • #13
                          One change I should have mentioned earlier is
                          Code:
                          status <- factor(c(rep("Healthy",26), rep("Diabetic",22)), levels=c("Healthy", "Diabetic")
                          which will make coefficients in the directions you want.

                          interceptResults <- results(dds, "Intercept");
                          Yeah, you can ignore that one. There are a number of ways to parametrize these sorts of models, one of which is without a Intercept. These results would effectively be things significantly higher than zero at the baseline condition (Diabetic at timepoint 1 in the original model, but Healthy at timepoint 1 with the "status" from above).

                          Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?
                          Yes, this is the difference due simply to being diabetic. I kind of assume that this isn't really that interesting for you from a biological standpoint, other than to show that you found similar changes to everyone else who's looked at gene expression in diabetes.

                          Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?
                          This is the change you would expect due to treatment, regardless of the persons diabetic status. If there are changes due to a combination of treatment and diabetic status, they generally shouldn't show up here (these would be termed "interaction effects").

                          Meaning how the treatment affects gene expression differently in healthy or diabetic?
                          Yeah. I assume that the point of the treatment is to reverse some aspect of diabetes. You could then think of diabetes increasing expression of some gene which is decreased in those taking the treatment only if they have diabetes. It's not always the case that including interaction terms make sense, but I would be interested in looking for these genes if I were you.

                          I have posted a new thread on how to implement paired data between time point 1 and 2 btw.
                          I noticed that and the post to the bioconductor mailing list. I won't reply there in hope that one of the DESeq authors know of a better way than what I'll present here to deal with that. The simplest way is to just treat each individual as a factor. So, something like:

                          Code:
                          patients <- factor(c(rep(1:13,2), rep(14:24,2)))
                          des <- formula(~patients + timepoints*status)
                          You don't really care about the various patient-specific differences, so don't bother with the results from that.
                          Last edited by dpryan; 10-18-2013, 03:08 PM. Reason: Changed the patients factor so it should be correct, previously, things were incorrectly paired.

                          Comment


                          • #14
                            Wow!
                            That was a great answer, thank you yet again.

                            It was Simon Anders who said I should post via the mailing list, never used it before. Ill post the answer in the thread if I get it, so others can read as well.

                            Comment


                            • #15
                              It runs fine, I changed "healthy" and "diabetic" to fit how it was imported (diabetics first). Does that influence anything?

                              I got this message after the run:

                              > dds <- DESeq(ddsHTSeq)
                              159 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

                              Is that a problem?

                              So my results is now fine? Paired data is handled correctly?
                              Last edited by sindrle; 10-18-2013, 06:07 PM.

                              Comment

                              Working...
                              X