HTseq-count to DEseq2: Need a little help..

emma009 replied

04-01-2016, 02:48 AM
Thanks!

Thanks for the help - worked once I sorted out my typos...

Last edited by emma009; 04-01-2016, 02:56 AM. Reason: Resolved
Leave a comment:
nbahlis replied

11-20-2013, 07:20 AM
thank you, I will give this a try and see if the results differ significantly
Leave a comment:
dpryan replied

11-20-2013, 07:17 AM
If the patients were similarly responsive, then accounting for an interaction is probably overkill. Otherwise you just need to change the design to "design~libType*condition".
Leave a comment:
nbahlis replied

11-20-2013, 07:13 AM
thank you for the quick reply. I am attempting to identify genes involved in resistance to treatment "X". RNAseq were done on samples collected from individual patients pre-treatment and at the time of clinical progression (resistant). Therefore I think I need to account for pair:treatment interaction, no? I am not sure how to account for this interaction in my script. Greatly appreciate your advice.
Leave a comment:
dpryan replied

11-20-2013, 07:03 AM
That appears to be correct. That doesn't look for any pair:treatment interaction, but that's likely not of interest (and would really suck up the degrees of freedom).
Leave a comment:
nbahlis replied

11-20-2013, 06:46 AM
Hi Ryan,

DESeq2 question
Is the script below the correct way to set up a comparison between paired samples pre- and post treatment?
thank you
> sampleFiles <- list.files(path="~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts" , pattern="*.counts")
> Table3 <- data.frame(
+ row.names = c( "P110", "P124", "P149", "P185", "P189", "P192", "P218", "P227", "P235", "P280", "P308", "P351", "P357", "P367", "P377", "P384", "P426", "P543", "P584", "P590", "P594", "P610" ),
+
> sampleFiles <- list.files(path="~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts" , pattern="*.counts")
> Table3 <- data.frame(
+ sampleName = sampleFiles, fileName = sampleFiles,
+ condition = c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post" ),
+ libType = c( "pair8", "pair10", "pair9", "pair1", "pair7", "pair11", "pair2", "pair3", "pair4", "pair5", "pair5", "pair6", "pair7", "pair1", "pair3", "pair4", "pair11", "pair2", "pair6", "pair10", "pair9", "pair8" ) )
Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles, :
arguments imply differing number of rows: 22, 20
> Table3 <- data.frame(
+ sampleName = sampleFiles, fileName = sampleFiles,
+ condition = c( "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "pre", "post", "pre", "post", "post", "post", "post", "post", "post", "post", "post", "post", "post" ),
+ libType = c( "pair8", "pair10", "pair9", "pair1", "pair7", "pair11", "pair2", "pair3", "pair4", "pair5", "pair5", "pair6", "pair7", "pair1", "pair3", "pair4", "pair11", "pair2", "pair6", "pair10", "pair9", "pair8" ) )
> directory <- c("~/Desktop/Realigned_to_human_g1K_v37/Cuffdiff_IMIDS_Nov/HTSeq/HTseq_gene_counts/")
> design <- formula(~ libType + condition)
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable= Table3, directory= directory, design= design)
> Table3
sampleName fileName condition libType
1 P110.counts P110.counts pre pair8
2 P124.counts P124.counts pre pair10
3 P149.counts P149.counts pre pair9
4 P185.counts P185.counts pre pair1
5 P189.counts P189.counts pre pair7
6 P192.counts P192.counts pre pair11
7 P218.counts P218.counts pre pair2
8 P227.counts P227.counts pre pair3
9 P235.counts P235.counts pre pair4
10 P280.counts P280.counts pre pair5
11 P308.counts P308.counts post pair5
12 P351.counts P351.counts pre pair6
13 P357.counts P357.counts post pair7
14 P367.counts P367.counts post pair1
15 P377.counts P377.counts post pair3
16 P384.counts P384.counts post pair4
17 P426.counts P426.counts post pair11
18 P543.counts P543.counts post pair2
19 P584.counts P584.counts post pair6
20 P590.counts P590.counts post pair10
21 P594.counts P594.counts post pair9
22 P610.counts P610.counts post pair8

Many thanks
Leave a comment:
sindrle replied

10-19-2013, 01:04 AM
Very good!
Ill start reading up on EdgeR later, the manual there had some nice sections on design.

Thank you again, have a nice weekend.
Leave a comment:
dpryan replied

10-18-2013, 11:23 PM
Swapping Healthy/Diabetic will just change the sign on the fold changes. With the updated version of "status", Normal/timepoint 1 is the baseline for all of the comparisons, which is how I expect you and others would want to think about the experiment. Previously it was Diabetic/timepoint 1.

Regarding the error message, just change the "maxit" option to something bigger to see if that goes away. The model is now rather more complicated, so I'm not surprised that it takes more iterations to fit. If you still have things not fitting, then see which rows they are and don't trust the results from them (the other 30,000 or so rows should be fine, however). The results should be from a paired-analysis then.
Leave a comment:
sindrle replied

10-18-2013, 05:58 PM
It runs fine, I changed "healthy" and "diabetic" to fit how it was imported (diabetics first). Does that influence anything?

I got this message after the run:

> dds <- DESeq(ddsHTSeq)
159 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

Is that a problem?

So my results is now fine? Paired data is handled correctly?

Last edited by sindrle; 10-18-2013, 06:07 PM.
Leave a comment:
sindrle replied

10-18-2013, 04:06 PM
Wow!
That was a great answer, thank you yet again.

It was Simon Anders who said I should post via the mailing list, never used it before. Ill post the answer in the thread if I get it, so others can read as well.
Leave a comment:
dpryan replied

10-18-2013, 02:19 PM
One change I should have mentioned earlier is

Code:

status <- factor(c(rep("Healthy",26), rep("Diabetic",22)), levels=c("Healthy", "Diabetic")

which will make coefficients in the directions you want.

interceptResults <- results(dds, "Intercept");

Yeah, you can ignore that one. There are a number of ways to parametrize these sorts of models, one of which is without a Intercept. These results would effectively be things significantly higher than zero at the baseline condition (Diabetic at timepoint 1 in the original model, but Healthy at timepoint 1 with the "status" from above).

Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?

Yes, this is the difference due simply to being diabetic. I kind of assume that this isn't really that interesting for you from a biological standpoint, other than to show that you found similar changes to everyone else who's looked at gene expression in diabetes.

Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?

This is the change you would expect due to treatment, regardless of the persons diabetic status. If there are changes due to a combination of treatment and diabetic status, they generally shouldn't show up here (these would be termed "interaction effects").

Meaning how the treatment affects gene expression differently in healthy or diabetic?

Yeah. I assume that the point of the treatment is to reverse some aspect of diabetes. You could then think of diabetes increasing expression of some gene which is decreased in those taking the treatment only if they have diabetes. It's not always the case that including interaction terms make sense, but I would be interested in looking for these genes if I were you.

I have posted a new thread on how to implement paired data between time point 1 and 2 btw.

I noticed that and the post to the bioconductor mailing list. I won't reply there in hope that one of the DESeq authors know of a better way than what I'll present here to deal with that. The simplest way is to just treat each individual as a factor. So, something like:

Code:

patients <- factor(c(rep(1:13,2), rep(14:24,2))) des <- formula(~patients + timepoints*status)

You don't really care about the various patient-specific differences, so don't bother with the results from that.
Last edited by dpryan; 10-18-2013, 03:08 PM. Reason: Changed the patients factor so it should be correct, previously, things were incorrectly paired.
Leave a comment:
sindrle replied

10-18-2013, 12:42 PM
Just to check if I got this correctly:

These are the 4 results I get:

statusResults <- results(dds, "status_Healthy_vs_Diabetic");
timepointsResults <- results(dds, "timepoints_2_vs_1");
interceptResults <- results(dds, "Intercept");
status&treatmentResults <- results(dds, "timepoints2.statusHealthy")

# statusResults: "The log2() fold change in diabetic vs. control patients when controlling for timepoint". Kinda like its "no treatment", meaning what are the difference in gene expression between diabetics and healthy? Genes changed due only to treatment are not shown. Assumes that the treatment has the same effect on both groups?

# timepointsResults: "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". Meaning how the treatment effects gene expression regardless of diabetes? Assumes that the treatment has the same effect on both groups?

#interceptResults: What is this??

# status&treatmentResults: "interaction between status and treatment". Meaning how the treatment affects gene expression differently in healthy or diabetic?

I have posted a new thread on how to implement paired data between time point 1 and 2 btw.

Thanks!
Leave a comment:
sindrle replied

10-18-2013, 07:30 AM
Very good! Thank you.

"One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same)."

This is true, they are drawn from the same subjects! Ill look into that.

Thanks again!
Leave a comment:
dpryan replied

10-18-2013, 07:20 AM
With a multifactor design, there's more than a single set of results. Given a DESeqDataSet named "dds" (they use this name in the vignette, so I'll use it here for the sake of consistency), you can type:

Code:

resultsNames(dds)

to get the coefficient names, which will be something like, "Intercept", "timepoints_1_vs_2", and "status_healthy_vs_diabetic", for you. You can the retrieve each results table:

Code:

statusResults <- results(dds, "status_healthy_vs_diabetic") timepointsResults <- results(dds, "timepoints_1_vs_2")

The log2FoldChange column in statusResults would be "The log2() fold change in diabetic vs. control patients when controlling for timepoint". The equivalent in timepointsResults would be "The log2() fold change in timepoint 2 vs 1, controlling for diabetic status". BTW, since your timepoints are before/after treatment, I imagine that you're interested in a possible interaction between status and timepoint/treatment. You could specify that in your design by swapping a "*" for the "+".

One more thing to think about is if these samples were drawn from the same subjects at both timepoints. If so, you can model this as a paired design. I don't think there are examples of that in the DESeq2 vignette, but you can probably find an example in the limma user guide (the model setup steps are more or less the same).
Leave a comment:
sindrle replied

10-18-2013, 06:51 AM
One fast question:

What does the log2FoldChange from the DESeq2 results now tell? In light of the design= des?

Is it correct to interpret it as:

"These genes are significantly changed in diabetic patients from time point one to time point two, with healthy as controls"?
Leave a comment:

Previous 1 2 template Next

Addressing Off-Target Effects in CRISPR Technologies

by seqadmin

The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality¹. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes². This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways³. Identifying the full range...
- Channel: Articles
08-27-2024, 04:44 AM
Selecting and Optimizing mRNA Library Preparations

by seqadmin

Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
- Channel: Articles
08-07-2024, 12:11 PM

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News