Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sow
    replied
    Thanks Michele,

    I will keep your suggestion in mind as well.

    Sow

    Leave a comment:


  • Michael.Ante
    replied
    Hi Sow,

    just use:
    Code:
    x=RivsRU[,-1] #removing the first column
    rownames(x)=RivsRU[,1] # adding the first column as rownames
    Afterwards you'll just use countData=x.

    P.S.: be careful using similar names for different variables: RivsRU and RIvsRU will be hard for others to understand your code.

    Cheers,
    Michael
    Last edited by Michael.Ante; 02-17-2016, 12:46 AM. Reason: Typo

    Leave a comment:


  • Sow
    replied
    I see that I still have the gene names as a separate column - my bad.
    What would be the code to change row names to gene names?
    Sorry if this is a simple thing in R, I'm still very new to R!

    Leave a comment:


  • Sow
    replied
    Hi,
    I'm still not able to enter the data. Here's what it shows..

    RivsRU <-read.table("RIvsRUdata", header=TRUE, row.names = 1)
    > head(RivsRU)
    FEATURE_ID RI2 RI3 RI1 RU1 RU2 RU3
    1 AAAAAAAAAAAAAAAAA 2 3 2 8 1 2
    2 AAAAAAAAAAAAAAAAAA 2 2 1 10 1 5
    3 AAAAAAAAAAAAAAAAAAA 0 1 0 3 2 1
    4 AAAAAAAAAAAAAAAAAAAA 1 1 0 2 0 0
    5 AAAAAAAAAAAAAAAAAAAAA 0 0 1 2 1 2
    6 AAAAAAAAAAAAAAAAAAAAAACAAAAA 1 0 0 0 0 0
    >samples <- data.frame(row.names=c("RI1","RI2","RI3","RU1","RU2","RU3"), condition=as.factor(c(rep("RI",3),rep("RU",3))))

    > RIvsRU <- DESeqDataSetFromMatrix(countData = RivsRU, colData = samples, design=~conditions)

    Error in validObject(.Object) :
    invalid class “SummarizedExperiment0” object: 'assays' ncol differs from 'colData' nrow
    In addition: Warning message:
    In sort(rownames(colData)) == sort(colnames(countData)) :
    longer object length is not a multiple of shorter object length

    Could someone please tell me what am I doing wrong?
    Thanks
    Last edited by Sow; 02-16-2016, 05:54 PM.

    Leave a comment:


  • m10001
    replied
    Maureen,

    Many thanks for your detailed summary. This was perfect to get me up and running.

    Cheers,
    Mark

    Leave a comment:


  • MDonlin
    replied
    Gong,

    Glad the summary helped.

    Maureen

    Leave a comment:


  • gong chen
    replied
    I just want to say thank you Mdonlin. I am new to data analysis and your final summary helped me a lot.

    Leave a comment:


  • MDonlin
    replied
    Thank you dpryan! That worked quite well.

    To summarize what worked:

    > bckCountTable <- read.table("bck_counts.txt", header=TRUE, row.names=1)
    > head(bckCountTable)
    ctl1 ctl2 ctl3 del1 del2 del3
    CNAG_00001 0 0 0 0 0 0
    CNAG_00002 29 34 27 26 13 21
    CNAG_00003 38 26 38 41 38 27
    CNAG_00004 63 42 58 58 62 55
    CNAG_00005 57 49 49 30 39 40
    CNAG_00006 433 398 571 422 353 435


    > samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"), condition=as.factor(c(rep("ctl",3),rep("del",3))))
    > samples
    condition
    ctl1 ctl
    ctl2 ctl
    ctl3 ctl
    del1 del
    del2 del
    del3 del

    > bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)
    > bckCDS_1 <- DESeq(bckCDS)
    estimating size factors
    estimating dispersions
    gene-wise dispersion estimates
    mean-dispersion relationship
    final dispersion estimates
    fitting model and testing

    > bck_res <- results(bckCDS_1)

    > head(bck_res)
    DataFrame with 6 rows and 6 columns
    baseMean log2FoldChange lfcSE stat pvalue padj
    <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
    CNAG_00001 0.00000 NA NA NA NA NA
    CNAG_00002 25.38193 -0.50881187 0.2159023 -2.3566760 0.018439326 NA
    CNAG_00003 34.46418 -0.08588360 0.2010327 -0.4272120 0.669224911 0.84870693
    CNAG_00004 56.07399 -0.05626052 0.1771525 -0.3175824 0.750801714 0.89623448
    CNAG_00005 44.69432 -0.52210975 0.1925740 -2.7112160 0.006703695 0.05953184
    CNAG_00006 434.90795 -0.35984037 0.1175182 -3.0619982 0.002198648 0.02705483

    > write.csv(bck_res,file="bck_results.csv")

    I should have filtered out those genes with few or no counts, but I can do that after importing the data into Excel.

    Thanks again.

    Maureen

    Leave a comment:


  • dpryan
    replied
    The gene names need to be the row.names, not a column of their own as they were in your example. In what you just wrote, try instead:
    Code:
    samples <- data.frame(row.names=c("ctl1","ctl2","ctl3","del1","del2","del3"),
        condition=as.factor(c(rep("ctl",3), rep("del",3))))
    bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples, design=~condition)

    Leave a comment:


  • MDonlin
    replied
    I appreciate your help, but unless I've missed something, if I remove the gene name IDs from the count data, then I won't be able to figure out what genes are differentially expressed.

    The issue really seems to be with how the colData is defined.

    I tried defining a samples data frame:

    > samples
    samples condition
    1 ctl1 ctl
    2 ctl2 ctl
    3 ctl3 ctl
    4 del1 del
    5 del2 del
    6 del3 del

    And then using:
    > bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=samples$samples, design=samples$condition)

    But I still get the same error:
    Error in validObject(.Object) :
    invalid class “SummarizedExperiment” object: invalid object for slot "colData" in class "SummarizedExperiment": got class "factor", should be or extend class "DataFrame"

    Leave a comment:


  • dpryan
    replied
    The original problem is partly that "bckCountTable" contains a column with gene names. Just remove that. Also, ExpDesign only has one column, so you can just:
    Code:
    bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=ExpDesign, design=~condition)
    or something along those lines. The number of rows in colData need to match the number of samples that you're analysing.

    Regarding "type", you won't find it in the count matrix because it just describes whether things were run as single or paired-end. It's only used latter in the vignette where they discuss multifactor designs. For simple experiments, ignore it. You can't use the design from the pasilla experiment because it had 7 samples and you have 6.

    Leave a comment:


  • MDonlin
    replied
    In trying to understand my issue with the use of the DESeqDataSetfromMatrix command in DESeq2 I tried to compare my data to the Pasilla data used in the vignette:

    > data("pasillaGenes")
    > countData <- counts(pasillaGenes)

    > colData <- pData(pasillaGenes)[,c("condition","type")]
    > colData
    condition type
    treated1fb treated single-read
    treated2fb treated paired-end
    treated3fb treated paired-end
    untreated1fb untreated single-read
    untreated2fb untreated single-read
    untreated3fb untreated paired-end
    untreated4fb untreated paired-end

    Which imports nicely into a DEseq Data Set:
    > pasilla_dds <- DESeqDataSetFromMatrix(countData, colData, formula(~condition))

    However, I don't understand where does the type information come from. It doesn't seem to be in the countData matrix because when I look at countData, all I see are 7 columns of data, (GeneIDs plus the count columns).

    > head(countData)
    treated1fb treated2fb treated3fb untreated1fb untreated2fb untreated3fb untreated4fb
    FBgn0000003 0 0 1 0 0 0 0
    FBgn0000008 78 46 43 47 89 53 27
    FBgn0000014 2 0 0 0 0 1 0
    FBgn0000015 1 0 1 0 1 1 2
    FBgn0000017 3187 1672 1859 2445 4615 2063 1711
    FBgn0000018 369 150 176 288 383 135 174

    > dim(countData)
    [1] 14470 7

    My data seems to be quite similar, but when I try using pData to set the colData variable, I get an error.

    My data:
    > head(bckCountTable)
    ctl1 ctl2 ctl3 del1 del2 del3
    CNAG_00001 0 0 0 0 0 0
    CNAG_00002 29 34 27 26 13 21
    CNAG_00003 38 26 38 41 38 27
    CNAG_00004 63 42 58 58 62 55
    CNAG_00005 57 49 49 30 39 40
    CNAG_00006 433 398 571 422 353 435

    > dim(bckCountTable)
    [1] 6967 6

    When I try to set colData using the same commands:
    > colData <- pData(bckCountTable)[,c("condition","type")]
    Error in (function (classes, fdef, mtable) :
    unable to find an inherited method for function ‘pData’ for signature ‘"data.frame"’

    I'm still not clear how my matrix of counts differs from that generated for the Pasilla genes.

    I also tried following the example in the reference guide:
    > colData <- data.frame(condition=factor(c("ctl","del")))

    > colData
    condition
    1 ctl
    2 del

    But when I try to set up the DEseqDataSet, I get an error
    > bck_dds <- DESeqDataSetFromMatrix(bckCountTable, colData, formula(~condition))
    Error in validObject(.Object) :
    invalid class “SummarizedExperiment” object: 'colData' nrow differs from 'assays' ncol

    Again, any advice or ideas would be most appreciated.

    Regards,
    Maureen

    Leave a comment:


  • MDonlin
    started a topic DESeq2 DataSetFromMatrix question

    DESeq2 DataSetFromMatrix question

    Hi,

    I've got a pretty simple RNAseq experiment (2 conditions, 3 biological replicates) but I'm having trouble getting the count data into DESeq2.

    What I've done:

    > bckCountTable <- read.table("bck_counts.txt", header=TRUE, row.names=1)

    > head(bckCountTable)
    GeneID ctl1 ctl2 ctl3 del1 del2 del3
    1 CNAG_00001 0 0 0 0 0 0
    2 CNAG_00002 29 34 27 26 13 21
    3 CNAG_00003 38 26 38 41 38 27
    4 CNAG_00004 63 42 58 58 62 55

    > ExpDesign <- data.frame(row.names=colnames(bckCountTable), condition = c("ctl","ctl","ctl","del","del","del"))

    > bckCDS <- DESeqDataSetFromMatrix(countData = bckCountTable, colData=ExpDesign$condition, design=~ExpDesign$condition)
    Error in validObject(.Object) :
    invalid class “SummarizedExperiment” object: invalid object for slot "colData" in class "SummarizedExperiment": got class "factor", should be or extend class "DataFrame"

    I guess I don't understand how to define colData.

    Any advice would be appreciated.

    Regards,
    Maureen


    > sessionInfo()
    R version 3.0.2 (2013-09-25)
    Platform: x86_64-apple-darwin10.8.0 (64-bit)

    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    attached base packages:
    [1] parallel stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] DESeq2_1.2.2 RcppArmadillo_0.3.920.1 Rcpp_0.10.6 GenomicRanges_1.14.3
    [5] XVector_0.2.0 IRanges_1.20.4 BiocGenerics_0.8.0 BiocInstaller_1.12.0

    loaded via a namespace (and not attached):
    [1] annotate_1.40.0 AnnotationDbi_1.24.0 Biobase_2.22.0 DBI_0.2-7 genefilter_1.44.0
    [6] grid_3.0.2 lattice_0.20-24 locfit_1.5-9.1 RColorBrewer_1.0-5 RSQLite_0.11.4
    [11] splines_3.0.2 stats4_3.0.2 survival_2.37-4 tools_3.0.2 XML_3.95-0.2
    [16] xtable_1.7-1

Latest Articles

Collapse

  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 02-23-2024, 04:11 PM
1 response
21 views
0 likes
Last Post kim897
by kim897
 
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
38 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
57 views
0 likes
Last Post seqadmin  
Working...
X