Header Leaderboard Ad

Collapse

HTseq-count to DEseq2: Need a little help..

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • emma009
    replied
    Thanks!

    Thanks for the help - worked once I sorted out my typos...
    Last edited by emma009; 04-01-2016, 02:56 AM. Reason: Resolved

    Leave a comment:


  • nbahlis
    replied
    thank you, I will give this a try and see if the results differ significantly

    Leave a comment:


  • dpryan
    replied
    If the patients were similarly responsive, then accounting for an interaction is probably overkill. Otherwise you just need to change the design to "design~libType*condition".

    Leave a comment:


  • nbahlis
    replied
    thank you for the quick reply. I am attempting to identify genes involved in resistance to treatment "X". RNAseq were done on samples collected from individual patients pre-treatment and at the time of clinical progression (resistant). Therefore I think I need to account for pair:treatment interaction, no? I am not sure how to account for this interaction in my script. Greatly appreciate your advice.

    Leave a comment:


  • dpryan
    replied
    That appears to be correct. That doesn't look for any pair:treatment interaction, but that's likely not of interest (and would really suck up the degrees of freedom).

    Leave a comment:


  • nbahlis
    replied
    Hi Ryan,

    DESeq2 question
    Is the script below the correct way to set up a comparison between paired samples pre- and post treatment?
    thank you
    > sampleFiles <- list.files(path="~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts" , pattern="*.counts")
    > Table3 <- data.frame(
    + row.names = c( "P110", "P124", "P149", "P185", "P189", "P192", "P218", "P227", "P235", "P280", "P308", "P351", "P357", "P367", "P377", "P384", "P426", "P543", "P584", "P590", "P594", "P610" ),
    +
    > sampleFiles <- list.files(path="~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts" , pattern="*.counts")
    > Table3 <- data.frame(
    + sampleName = sampleFiles, fileName = sampleFiles,
    + condition = c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post" ),
    + libType = c( "pair8", "pair10", "pair9", "pair1", "pair7", "pair11", "pair2", "pair3", "pair4", "pair5", "pair5", "pair6", "pair7", "pair1", "pair3", "pair4", "pair11", "pair2", "pair6", "pair10", "pair9", "pair8" ) )
    Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles, :
    arguments imply differing number of rows: 22, 20
    > Table3 <- data.frame(
    + sampleName = sampleFiles, fileName = sampleFiles,
    + condition = c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post", "post" ),
    + libType = c( "pair8", "pair10", "pair9", "pair1", "pair7", "pair11", "pair2", "pair3", "pair4", "pair5", "pair5", "pair6", "pair7", "pair1", "pair3", "pair4", "pair11", "pair2", "pair6", "pair10", "pair9", "pair8" ) )
    > directory <- c("~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts/")
    > design <- formula(~ libType + condition)
    > ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable= Table3, directory= directory, design= design)
    > Table3
    sampleName fileName condition libType
    1 P110.counts P110.counts pre pair8
    2 P124.counts P124.counts pre pair10
    3 P149.counts P149.counts pre pair9
    4 P185.counts P185.counts pre pair1
    5 P189.counts P189.counts pre pair7
    6 P192.counts P192.counts pre pair11
    7 P218.counts P218.counts pre pair2
    8 P227.counts P227.counts pre pair3
    9 P235.counts P235.counts pre pair4
    10 P280.counts P280.counts pre pair5
    11 P308.counts P308.counts post pair5
    12 P351.counts P351.counts pre pair6
    13 P357.counts P357.counts post pair7
    14 P367.counts P367.counts post pair1
    15 P377.counts P377.counts post pair3
    16 P384.counts P384.counts post pair4
    17 P426.counts P426.counts post pair11
    18 P543.counts P543.counts post pair2
    19 P584.counts P584.counts post pair6
    20 P590.counts P590.counts post pair10
    21 P594.counts P594.counts post pair9
    22 P610.counts P610.counts post pair8

    Many thanks

    Leave a comment:


  • sindrle
    replied
    Very good!
    Ill start reading up on EdgeR later, the manual there had some nice sections on design.

    Thank you again, have a nice weekend.

    Leave a comment:


  • dpryan
    replied
    Swapping Healthy/Diabetic will just change the sign on the fold changes. With the updated version of "status", Normal/timepoint 1 is the baseline for all of the comparisons, which is how I expect you and others would want to think about the experiment. Previously it was Diabetic/timepoint 1.

    Regarding the error message, just change the "maxit" option to something bigger to see if that goes away. The model is now rather more complicated, so I'm not surprised that it takes more iterations to fit. If you still have things not fitting, then see which rows they are and don't trust the results from them (the other 30,000 or so rows should be fine, however). The results should be from a paired-analysis then.

    Leave a comment:


  • sindrle
    replied
    It runs fine, I changed "healthy" and "diabetic" to fit how it was imported (diabetics first). Does that influence anything?

    I got this message after the run:

    > dds <- DESeq(ddsHTSeq)
    159 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

    Is that a problem?

    So my results is now fine? Paired data is handled correctly?
    Last edited by sindrle; 10-18-2013, 06:07 PM.

    Leave a comment:


  • sindrle
    replied
    Wow!
    That was a great answer, thank you yet again.

    It was Simon Anders who said I should post via the mailing list, never used it before. Ill post the answer in the thread if I get it, so others can read as well.

    Leave a comment:


  • dpryan
    replied
    One change I should have mentioned earlier is
    Code:
    status <- factor(c(rep("Healthy",26), rep("Diabetic",22)), levels=c("Healthy", "Diabetic")
    which will make coefficients in the directions you want.

    interceptResults <- results(dds, "Intercept");
    Yeah, you can ignore that one. There are a number of ways to parametrize these sorts of models, one of which is without a Intercept. These results would effectively be things significantly higher than zero at the baseline condition (Diabetic at timepoint 1 in the original model, but Healthy at timepoint 1 with the "status" from above).

    Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?
    Yes, this is the difference due simply to being diabetic. I kind of assume that this isn't really that interesting for you from a biological standpoint, other than to show that you found similar changes to everyone else who's looked at gene expression in diabetes.

    Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?
    This is the change you would expect due to treatment, regardless of the persons diabetic status. If there are changes due to a combination of treatment and diabetic status, they generally shouldn't show up here (these would be termed "interaction effects").

    Meaning how the treatment affects gene expression differently in healthy or diabetic?
    Yeah. I assume that the point of the treatment is to reverse some aspect of diabetes. You could then think of diabetes increasing expression of some gene which is decreased in those taking the treatment only if they have diabetes. It's not always the case that including interaction terms make sense, but I would be interested in looking for these genes if I were you.

    I have posted a new thread on how to implement paired data between time point 1 and 2 btw.
    I noticed that and the post to the bioconductor mailing list. I won't reply there in hope that one of the DESeq authors know of a better way than what I'll present here to deal with that. The simplest way is to just treat each individual as a factor. So, something like:

    Code:
    patients <- factor(c(rep(1:13,2), rep(14:24,2)))
    des <- formula(~patients + timepoints*status)
    You don't really care about the various patient-specific differences, so don't bother with the results from that.
    Last edited by dpryan; 10-18-2013, 03:08 PM. Reason: Changed the patients factor so it should be correct, previously, things were incorrectly paired.

    Leave a comment:


  • sindrle
    replied
    Just to check if I got this correctly:

    These are the 4 results I get:

    statusResults <- results(dds, "status_Healthy_vs_Diabetic");
    timepointsResults <- results(dds, "timepoints_2_vs_1");
    interceptResults <- results(dds, "Intercept");
    status&treatmentResults <- results(dds, "timepoints2.statusHealthy")

    # statusResults: "The log2() fold change in diabetic vs. control patients when controlling for timepoint". Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?

    # timepointsResults: "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?

    #interceptResults: What is this??

    # status&treatmentResults: "interaction between status and treatment". Meaning how the treatment affects gene expression differently in healthy or diabetic?

    I have posted a new thread on how to implement paired data between time point 1 and 2 btw.

    Thanks!

    Leave a comment:


  • sindrle
    replied
    Very good! Thank you.

    "One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same)."

    This is true, they are drawn from the same subjects! Ill look into that.

    Thanks again!

    Leave a comment:


  • dpryan
    replied
    With a multifactor design, there's more than a single set of results. Given a DESeqDataSet named "dds" (they use this name in the vignette, so I'll use it here for the sake of consistency), you can type:
    Code:
    resultsNames(dds)
    to get the coefficient names, which will be something like, "Intercept", "timepoints_1_vs_2", and "status_healthy_vs_diabetic", for you. You can the retrieve each results table:
    Code:
    statusResults <- results(dds, "status_healthy_vs_diabetic")
    timepointsResults <- results(dds, "timepoints_1_vs_2")
    The log2FoldChange column in statusResults would be "The log2() fold change in diabetic vs. control patients when controlling for timepoint". The equivalent in timepointsResults would be "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". BTW, since your timepoints are before/after treatment, I imagine that you're interested in a possible interaction between status and timepoint/treatment. You could specify that in your design by swapping a "*" for the "+".

    One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same).

    Leave a comment:


  • sindrle
    replied
    One fast question:

    What does the log2FoldChange from the DESeq2 results now tell? In light of the design= des?

    Is it correct to interpret it as:

    "These genes are significantly changed in diabetic patients from time point one to time point two, with healthy as controls"?

    Leave a comment:

Working...
X