DESeq2 DataSetFromMatrix question

Sow replied

02-17-2016, 12:41 AM
Thanks Michele,

I will keep your suggestion in mind as well.

Sow
Leave a comment:
Michael.Ante replied

02-17-2016, 12:32 AM
Hi Sow,

just use:

Code:

x=RivsRU[,-1] #removing the first column rownames(x)=RivsRU[,1] # adding the first column as rownames

Afterwards you'll just use countData=x.

P.S.: be careful using similar names for different variables: RivsRU and RIvsRU will be hard for others to understand your code.

Cheers,
Michael
Last edited by Michael.Ante; 02-17-2016, 12:46 AM. Reason: Typo
Leave a comment:
Sow replied

02-16-2016, 06:20 PM
I see that I still have the gene names as a separate column - my bad.
What would be the code to change row names to gene names?
Sorry if this is a simple thing in R, I'm still very new to R!
Leave a comment:
Sow replied

02-16-2016, 05:52 PM
Hi,
I'm still not able to enter the data. Here's what it shows..

RivsRU <-read.table("RIvsRUdata", header=TRUE, row.names = 1)
> head(RivsRU)
FEATURE_ID RI2 RI3 RI1 RU1 RU2 RU3
1 AAAAAAAAAAAAAAAAA 2 3 2 8 1 2
2 AAAAAAAAAAAAAAAAAA 2 2 1 10 1 5
3 AAAAAAAAAAAAAAAAAAA 0 1 0 3 2 1
4 AAAAAAAAAAAAAAAAAAAA 1 1 0 2 0 0
5 AAAAAAAAAAAAAAAAAAAAA 0 0 1 2 1 2
6 AAAAAAAAAAAAAAAAAAAAAACAAAAA 1 0 0 0 0 0
>samples <- data.frame(row.names=c("RI1","RI2","RI3","RU1","RU2","RU3"), condition=as.factor(c(rep("RI",3),rep("RU",3))))

> RIvsRU <- DESeqDataSetFromMatrix(countData = RivsRU, colData = samples, design=~conditions)

Error in validObject(.Object) :
invalid class “SummarizedExperiment0” object: 'assays' ncol differs from 'colData' nrow
In addition: Warning message:
In sort(rownames(colData)) == sort(colnames(countData)) :
longer object length is not a multiple of shorter object length

Could someone please tell me what am I doing wrong?
Thanks

Last edited by Sow; 02-16-2016, 05:54 PM.
Leave a comment:
m10001 replied

05-11-2015, 08:38 PM
Maureen,

Many thanks for your detailed summary. This was perfect to get me up and running.

Cheers,
Mark
Leave a comment:
MDonlin replied

12-09-2014, 09:37 AM
Gong,

Glad the summary helped.

Maureen
Leave a comment:
gong chen replied

12-09-2014, 07:44 AM
I just want to say thank you Mdonlin. I am new to data analysis and your final summary helped me a lot.
Leave a comment:
MDonlin replied

11-04-2013, 10:54 AM
Thank you dpryan! That worked quite well.

To summarize what worked:

> bckCountTable <- read.table("bck_counts.txt", header=TRUE, row.names=1)
> head(bckCountTable)
ctl1 ctl2 ctl3 del1 del2 del3
CNAG_00001 0 0 0 0 0 0
CNAG_00002 29 34 27 26 13 21
CNAG_00003 38 26 38 41 38 27
CNAG_00004 63 42 58 58 62 55
CNAG_00005 57 49 49 30 39 40
CNAG_00006 433 398 571 422 353 435

> samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"), condition=as.factor(c(rep("ctl",3),rep("del",3))))
> samples
condition
ctl1 ctl
ctl2 ctl
ctl3 ctl
del1 del
del2 del
del3 del

> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)
> bckCDS_1 <- DESeq(bckCDS)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

> bck_res <- results(bckCDS_1)

> head(bck_res)
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
CNAG_00001 0.00000 NA NA NA NA NA
CNAG_00002 25.38193 -0.50881187 0.2159023 -2.3566760 0.018439326 NA
CNAG_00003 34.46418 -0.08588360 0.2010327 -0.4272120 0.669224911 0.84870693
CNAG_00004 56.07399 -0.05626052 0.1771525 -0.3175824 0.750801714 0.89623448
CNAG_00005 44.69432 -0.52210975 0.1925740 -2.7112160 0.006703695 0.05953184
CNAG_00006 434.90795 -0.35984037 0.1175182 -3.0619982 0.002198648 0.02705483

> write.csv(bck_res,file="bck_results.csv")

I should have filtered out those genes with few or no counts, but I can do that after importing the data into Excel.

Thanks again.

Maureen
Leave a comment:

dpryan replied

11-04-2013, 09:10 AM

The gene names need to be the row.names, not a column of their own as they were in your example. In what you just wrote, try instead:

Code:

samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"),
    condition=as.factor(c(rep("ctl",3), rep("del",3))))
bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)

Leave a comment:

MDonlin replied

11-04-2013, 09:06 AM
I appreciate your help, but unless I've missed something, if I remove the gene name IDs from the count data, then I won't be able to figure out what genes are differentially expressed.

The issue really seems to be with how the colData is defined.

I tried defining a samples data frame:

> samples
samples condition
1 ctl1 ctl
2 ctl2 ctl
3 ctl3 ctl
4 del1 del
5 del2 del
6 del3 del

And then using:
> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples$samples, design=samples$condition)

But I still get the same error:
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: invalid object for slot "colData" in class "SummarizedExperiment": got class "factor", should be or extend class "DataFrame"
Leave a comment:
dpryan replied

11-02-2013, 05:35 AM
The original problem is partly that "bckCountTable" contains a column with gene names. Just remove that. Also, ExpDesign only has one column, so you can just:

Code:

bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=ExpDesign, design=~condition)

or something along those lines. The number of rows in colData need to match the number of samples that you're analysing.

Regarding "type", you won't find it in the count matrix because it just describes whether things were run as single or paired-end. It's only used latter in the vignette where they discuss multifactor designs. For simple experiments, ignore it. You can't use the design from the pasilla experiment because it had 7 samples and you have 6.
Leave a comment:
MDonlin replied

11-01-2013, 10:15 AM
In trying to understand my issue with the use of the DESeqDataSetfromMatrix command in DESeq2 I tried to compare my data to the Pasilla data used in the vignette:

> data("pasillaGenes")
> countData <- counts(pasillaGenes)

> colData <- pData(pasillaGenes)[,c("condition","type")]
> colData
condition type
treated1fb treated single-read
treated2fb treated paired-end
treated3fb treated paired-end
untreated1fb untreated single-read
untreated2fb untreated single-read
untreated3fb untreated paired-end
untreated4fb untreated paired-end

Which imports nicely into a DEseq Data Set:
> pasilla_dds <- DESeqDataSetFromMatrix(countData, colData, formula(~condition))

However, I don't understand where does the type information come from. It doesn't seem to be in the countData matrix because when I look at countData, all I see are 7 columns of data, (GeneIDs plus the count columns).

> head(countData)
treated1fb treated2fb treated3fb untreated1fb untreated2fb untreated3fb untreated4fb
FBgn0000003 0 0 1 0 0 0 0
FBgn0000008 78 46 43 47 89 53 27
FBgn0000014 2 0 0 0 0 1 0
FBgn0000015 1 0 1 0 1 1 2
FBgn0000017 3187 1672 1859 2445 4615 2063 1711
FBgn0000018 369 150 176 288 383 135 174

> dim(countData)
[1] 14470 7

My data seems to be quite similar, but when I try using pData to set the colData variable, I get an error.

My data:
> head(bckCountTable)
ctl1 ctl2 ctl3 del1 del2 del3
CNAG_00001 0 0 0 0 0 0
CNAG_00002 29 34 27 26 13 21
CNAG_00003 38 26 38 41 38 27
CNAG_00004 63 42 58 58 62 55
CNAG_00005 57 49 49 30 39 40
CNAG_00006 433 398 571 422 353 435

> dim(bckCountTable)
[1] 6967 6

When I try to set colData using the same commands:
> colData <- pData(bckCountTable)[,c("condition","type")]
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘pData’ for signature ‘"data.frame"’

I'm still not clear how my matrix of counts differs from that generated for the Pasilla genes.

I also tried following the example in the reference guide:
> colData <- data.frame(condition=factor(c("ctl","del")))

> colData
condition
1 ctl
2 del

But when I try to set up the DEseqDataSet, I get an error
> bck_dds <- DESeqDataSetFromMatrix(bckCountTable, colData, formula(~condition))
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol

Again, any advice or ideas would be most appreciated.

Regards,
Maureen
Leave a comment:
MDonlin started a topic DESeq2 DataSetFromMatrix question

11-01-2013, 08:03 AM
DESeq2 DataSetFromMatrix question

Hi,

I've got a pretty simple RNAseq experiment (2 conditions, 3 biological replicates) but I'm having trouble getting the count data into DESeq2.

What I've done:

> bckCountTable <- read.table("bck_counts.txt", header=TRUE, row.names=1)

> head(bckCountTable)
GeneID ctl1 ctl2 ctl3 del1 del2 del3
1 CNAG_00001 0 0 0 0 0 0
2 CNAG_00002 29 34 27 26 13 21
3 CNAG_00003 38 26 38 41 38 27
4 CNAG_00004 63 42 58 58 62 55

> ExpDesign <- data.frame(row.names=colnames(bckCountTable), condition = c("ctl","ctl","ctl","del","del","del"))

> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=ExpDesign$condition, design=~ExpDesign$condition)
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: invalid object for slot "colData" in class "SummarizedExperiment": got class "factor", should be or extend class "DataFrame"

I guess I don't understand how to define colData.

Any advice would be appreciated.

Regards,
Maureen

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq2_1.2.2 RcppArmadillo_0.3.920.1 Rcpp_0.10.6 GenomicRanges_1.14.3
[5] XVector_0.2.0 IRanges_1.20.4 BiocGenerics_0.8.0 BiocInstaller_1.12.0

loaded via a namespace (and not attached):
[1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7 genefilter_1.44.0
[6] grid_3.0.2 lattice_0.20-24 locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4
[11] splines_3.0.2 stats4_3.0.2 survival_2.37-4 tools_3.0.2 XML_3.95-0.2
[16] xtable_1.7-1
Tags: None

Previous template Next

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: