Thanks Michele,
I will keep your suggestion in mind as well.
Sow
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Sow,
just use:
Code:x=RivsRU[,-1] #removing the first column rownames(x)=RivsRU[,1] # adding the first column as rownames
P.S.: be careful using similar names for different variables: RivsRU and RIvsRU will be hard for others to understand your code.
Cheers,
Michael
Leave a comment:
-
I see that I still have the gene names as a separate column - my bad.
What would be the code to change row names to gene names?
Sorry if this is a simple thing in R, I'm still very new to R!
Leave a comment:
-
Hi,
I'm still not able to enter the data. Here's what it shows..
RivsRU <-read.table("RIvsRUdata", header=TRUE, row.names = 1)
> head(RivsRU)
FEATURE_ID RI2 RI3 RI1 RU1 RU2 RU3
1 AAAAAAAAAAAAAAAAA 2 3 2 8 1 2
2 AAAAAAAAAAAAAAAAAA 2 2 1 10 1 5
3 AAAAAAAAAAAAAAAAAAA 0 1 0 3 2 1
4 AAAAAAAAAAAAAAAAAAAA 1 1 0 2 0 0
5 AAAAAAAAAAAAAAAAAAAAA 0 0 1 2 1 2
6 AAAAAAAAAAAAAAAAAAAAAACAAAAA 1 0 0 0 0 0
>samples <- data.frame(row.names=c("RI1","RI2","RI3","RU1","RU2","RU3"), condition=as.factor(c(rep("RI",3),rep("RU",3))))
> RIvsRU <- DESeqDataSetFromMatrix(countData = RivsRU, colData = samples, design=~conditions)
Error in validObject(.Object) :
invalid class “SummarizedExperiment0” object: 'assays' ncol differs from 'colData' nrow
In addition: Warning message:
In sort(rownames(colData)) == sort(colnames(countData)) :
longer object length is not a multiple of shorter object length
Could someone please tell me what am I doing wrong?
ThanksLast edited by Sow; 02-16-2016, 05:54 PM.
Leave a comment:
-
Maureen,
Many thanks for your detailed summary. This was perfect to get me up and running.
Cheers,
Mark
Leave a comment:
-
I just want to say thank you Mdonlin. I am new to data analysis and your final summary helped me a lot.
Leave a comment:
-
Thank you dpryan! That worked quite well.
To summarize what worked:
> bckCountTable <- read.table("bck_counts.txt", header=TRUE, row.names=1)
> head(bckCountTable)
ctl1 ctl2 ctl3 del1 del2 del3
CNAG_00001 0 0 0 0 0 0
CNAG_00002 29 34 27 26 13 21
CNAG_00003 38 26 38 41 38 27
CNAG_00004 63 42 58 58 62 55
CNAG_00005 57 49 49 30 39 40
CNAG_00006 433 398 571 422 353 435
> samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"), condition=as.factor(c(rep("ctl",3),rep("del",3))))
> samples
condition
ctl1 ctl
ctl2 ctl
ctl3 ctl
del1 del
del2 del
del3 del
> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)
> bckCDS_1 <- DESeq(bckCDS)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> bck_res <- results(bckCDS_1)
> head(bck_res)
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
CNAG_00001 0.00000 NA NA NA NA NA
CNAG_00002 25.38193 -0.50881187 0.2159023 -2.3566760 0.018439326 NA
CNAG_00003 34.46418 -0.08588360 0.2010327 -0.4272120 0.669224911 0.84870693
CNAG_00004 56.07399 -0.05626052 0.1771525 -0.3175824 0.750801714 0.89623448
CNAG_00005 44.69432 -0.52210975 0.1925740 -2.7112160 0.006703695 0.05953184
CNAG_00006 434.90795 -0.35984037 0.1175182 -3.0619982 0.002198648 0.02705483
> write.csv(bck_res,file="bck_results.csv")
I should have filtered out those genes with few or no counts, but I can do that after importing the data into Excel.
Thanks again.
Maureen
Leave a comment:
-
The gene names need to be the row.names, not a column of their own as they were in your example. In what you just wrote, try instead:
Code:samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"), condition=as.factor(c(rep("ctl",3), rep("del",3)))) bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)
Leave a comment:
-
I appreciate your help, but unless I've missed something, if I remove the gene name IDs from the count data, then I won't be able to figure out what genes are differentially expressed.
The issue really seems to be with how the colData is defined.
I tried defining a samples data frame:
> samples
samples condition
1 ctl1 ctl
2 ctl2 ctl
3 ctl3 ctl
4 del1 del
5 del2 del
6 del3 del
And then using:
> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples$samples, design=samples$condition)
But I still get the same error:
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: invalid object for slot "colData" in class "SummarizedExperiment": got class "factor", should be or extend class "DataFrame"
Leave a comment:
-
The original problem is partly that "bckCountTable" contains a column with gene names. Just remove that. Also, ExpDesign only has one column, so you can just:
Code:bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=ExpDesign, design=~condition)
Regarding "type", you won't find it in the count matrix because it just describes whether things were run as single or paired-end. It's only used latter in the vignette where they discuss multifactor designs. For simple experiments, ignore it. You can't use the design from the pasilla experiment because it had 7 samples and you have 6.
Leave a comment:
-
In trying to understand my issue with the use of the DESeqDataSetfromMatrix command in DESeq2 I tried to compare my data to the Pasilla data used in the vignette:
> data("pasillaGenes")
> countData <- counts(pasillaGenes)
> colData <- pData(pasillaGenes)[,c("condition","type")]
> colData
condition type
treated1fb treated single-read
treated2fb treated paired-end
treated3fb treated paired-end
untreated1fb untreated single-read
untreated2fb untreated single-read
untreated3fb untreated paired-end
untreated4fb untreated paired-end
Which imports nicely into a DEseq Data Set:
> pasilla_dds <- DESeqDataSetFromMatrix(countData, colData, formula(~condition))
However, I don't understand where does the type information come from. It doesn't seem to be in the countData matrix because when I look at countData, all I see are 7 columns of data, (GeneIDs plus the count columns).
> head(countData)
treated1fb treated2fb treated3fb untreated1fb untreated2fb untreated3fb untreated4fb
FBgn0000003 0 0 1 0 0 0 0
FBgn0000008 78 46 43 47 89 53 27
FBgn0000014 2 0 0 0 0 1 0
FBgn0000015 1 0 1 0 1 1 2
FBgn0000017 3187 1672 1859 2445 4615 2063 1711
FBgn0000018 369 150 176 288 383 135 174
> dim(countData)
[1] 14470 7
My data seems to be quite similar, but when I try using pData to set the colData variable, I get an error.
My data:
> head(bckCountTable)
ctl1 ctl2 ctl3 del1 del2 del3
CNAG_00001 0 0 0 0 0 0
CNAG_00002 29 34 27 26 13 21
CNAG_00003 38 26 38 41 38 27
CNAG_00004 63 42 58 58 62 55
CNAG_00005 57 49 49 30 39 40
CNAG_00006 433 398 571 422 353 435
> dim(bckCountTable)
[1] 6967 6
When I try to set colData using the same commands:
> colData <- pData(bckCountTable)[,c("condition","type")]
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘pData’ for signature ‘"data.frame"’
I'm still not clear how my matrix of counts differs from that generated for the Pasilla genes.
I also tried following the example in the reference guide:
> colData <- data.frame(condition=factor(c("ctl","del")))
> colData
condition
1 ctl
2 del
But when I try to set up the DEseqDataSet, I get an error
> bck_dds <- DESeqDataSetFromMatrix(bckCountTable, colData, formula(~condition))
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol
Again, any advice or ideas would be most appreciated.
Regards,
Maureen
Leave a comment:
-
DESeq2 DataSetFromMatrix question
Hi,
I've got a pretty simple RNAseq experiment (2 conditions, 3 biological replicates) but I'm having trouble getting the count data into DESeq2.
What I've done:
> bckCountTable <- read.table("bck_counts.txt", header=TRUE, row.names=1)
> head(bckCountTable)
GeneID ctl1 ctl2 ctl3 del1 del2 del3
1 CNAG_00001 0 0 0 0 0 0
2 CNAG_00002 29 34 27 26 13 21
3 CNAG_00003 38 26 38 41 38 27
4 CNAG_00004 63 42 58 58 62 55
> ExpDesign <- data.frame(row.names=colnames(bckCountTable), condition = c("ctl","ctl","ctl","del","del","del"))
> bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=ExpDesign$condition, design=~ExpDesign$condition)
Error in validObject(.Object) :
invalid class “SummarizedExperiment” object: invalid object for slot "colData" in class "SummarizedExperiment": got class "factor", should be or extend class "DataFrame"
I guess I don't understand how to define colData.
Any advice would be appreciated.
Regards,
Maureen
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.2.2 RcppArmadillo_0.3.920.1 Rcpp_0.10.6 GenomicRanges_1.14.3
[5] XVector_0.2.0 IRanges_1.20.4 BiocGenerics_0.8.0 BiocInstaller_1.12.0
loaded via a namespace (and not attached):
[1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7 genefilter_1.44.0
[6] grid_3.0.2 lattice_0.20-24 locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4
[11] splines_3.0.2 stats4_3.0.2 survival_2.37-4 tools_3.0.2 XML_3.95-0.2
[16] xtable_1.7-1Tags: None
Latest Articles
Collapse
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 02:44 PM
|
0 responses
7 views
0 likes
|
Last Post
by seqadmin
Yesterday, 02:44 PM
|
||
Started by seqadmin, 10-11-2024, 06:55 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
10-11-2024, 06:55 AM
|
||
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
110 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
117 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
Leave a comment: