Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gamb
    replied
    Data is structured like this:

    Code:
       Animal Chem1 Chem2  Tissue
    1       F   Yes    No tissueA
    2       F   Yes    No tissueB
    3       F   Yes    No tissueC
    4       G    No   Yes tissueA
    5       G    No   Yes tissueB
    6       G    No   Yes tissueC
    7       H    No   Yes tissueA
    8       H    No   Yes tissueB
    9       H    No   Yes tissueC
    10      I    No   Yes tissueA
    11      I    No   Yes tissueB
    12      I    No   Yes tissueC
    13      J   Yes   Yes tissueA
    14      J   Yes   Yes tissueB
    15      J   Yes   Yes tissueC
    16      K   Yes   Yes tissueA
    17      K   Yes   Yes tissueB
    18      K   Yes   Yes tissueC
    19      L   Yes   Yes tissueA
    20      L   Yes   Yes tissueB
    21      L   Yes   Yes tissueC
    22      A    No    No tissueA
    23      A    No    No tissueB
    24      A    No    No tissueC
    25      B    No    No tissueA
    26      B    No    No tissueB
    27      B    No    No tissueC
    28      C    No    No tissueA
    29      C    No    No tissueB
    30      C    No    No tissueC
    31      D   Yes    No tissueA
    32      D   Yes    No tissueB
    33      D   Yes    No tissueC
    34      E   Yes    No tissueA
    35      E   Yes    No tissueB
    36      E   Yes    No tissueC

    Leave a comment:


  • Michael Love
    replied
    Can you show as.data.frame(colData(dds))?

    When I build a data frame as you described the samples, I get a full rank matrix with this design:

    Code:
    > d = data.frame(t=factor(rep(1:3,each=12)),
      c1=factor(rep(rep(1:2,each=6),3)),
      c2=factor(rep(rep(1:2,each=3),6)),
      a=factor(rep(1:3,12)))
    > m = model.matrix(~ a + t*c1*c2, data=d)
    > qr(m)$rank
    [1] 14
    > ncol(m)
    [1] 14

    Leave a comment:


  • gamb
    replied
    Spoke too soon. It appears that either "animal" variable is forcing this error or the tissue*chem1*chem2


    Code:
    dds=phyloseq_to_deseq2(data, ~ animal + tissue*Chem1*Chem2)
    converting counts to integer mode
    [COLOR="Red"]Error in DESeqDataSet(se, design = design, ignoreRank) : 
      the model matrix is not full rank, so the model cannot be fit as specified.
      one or more variables or interaction terms in the design formula
      are linear combinations of the others and must be removed[/COLOR]
    Last edited by gamb; 10-26-2014, 11:45 AM.

    Leave a comment:


  • gamb
    replied
    Thank you! It seems like everything is now working and makes sense!

    Leave a comment:


  • Michael Love
    replied
    hi,

    sorry, I had a typo in my previous posts: I forgot to wrap the names in c(). I've corrected this above.

    To adapt to your level names, where I wrote "chem1", substitute "Chem1_Yes_vs_No", where I wrote "tissueC.chem1" substitute "tissuetissueC.Chem1Yes", etc.

    To add a fixed effect which controls for animal, you should add this variable to the design:

    Code:
    ~ animal + tissue*chem1*chem2
    The "every gene contains at least one zero" can be solved by using an alternate size factor estimation. I just answered this question on the Bioconductor support site this week. If you search there you should be able to find a solution.

    Leave a comment:


  • gamb
    replied
    Code:
    > sessionInfo()
    R version 3.1.1 (2014-07-10)
    Platform: x86_64-apple-darwin13.1.0 (64-bit)
    
    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    
    attached base packages:
     [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
    [10] base     
    
    other attached packages:
     [1] fossil_0.3.7              shapefiles_0.7            foreign_0.8-61           
     [4] maps_2.3-9                sp_1.0-15                 phyloseq_1.10.0          
     [7] gridExtra_0.9.1           plyr_1.8.1                reshape2_1.4             
    [10] ggdendro_0.1-15           gdata_2.13.3              plotrix_3.5-7            
    [13] ellipse_0.3-8             ggplot2_1.0.0             vegan_2.0-10             
    [16] lattice_0.20-29           permute_0.8-3             RColorBrewer_1.0-5       
    [19] biom_0.3.12               ape_3.1-4                 DESeq2_1.6.1             
    [22] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3               GenomicRanges_1.18.1     
    [25] GenomeInfoDb_1.2.0        IRanges_2.0.0             S4Vectors_0.4.0          
    [28] BiocGenerics_0.12.0      
    
    loaded via a namespace (and not attached):
     [1] acepack_1.3-3.3      ade4_1.6-2           annotate_1.44.0      AnnotationDbi_1.28.0
     [5] base64enc_0.1-2      BatchJobs_1.4        BBmisc_1.7           Biobase_2.26.0      
     [9] BiocParallel_1.0.0   Biostrings_2.34.0    brew_1.0-6           checkmate_1.5.0     
    [13] chron_2.3-45         cluster_1.15.3       codetools_0.2-9      colorspace_1.2-4    
    [17] data.table_1.9.4     DBI_0.3.1            digest_0.6.4         fail_1.2            
    [21] foreach_1.4.2        Formula_1.1-2        genefilter_1.48.1    geneplotter_1.44.0  
    [25] gtable_0.1.2         gtools_3.4.1         Hmisc_3.14-5         igraph_0.7.1        
    [29] iterators_1.0.7      latticeExtra_0.6-26  locfit_1.5-9.1       MASS_7.3-35         
    [33] Matrix_1.1-4         multtest_2.22.0      munsell_0.4.2        nlme_3.1-118        
    [37] nnet_7.3-8           proto_0.3-10         RJSONIO_1.3-0        rpart_4.1-8         
    [41] RSQLite_1.0.0        scales_0.2.4         sendmailR_1.2-1      splines_3.1.1       
    [45] stringr_0.6.2        survival_2.37-7      tools_3.1.1          XML_3.98-1.1        
    [49] xtable_1.7-4         XVector_0.6.0        zlibbioc_1.12.0
    Edit: Closed out of my session and re-ran my code in a new session. Now getting this readout:

    Code:
    > dds=phyloseq_to_deseq2(data, ~tissue*Chem1*Chem2)
    converting counts to integer mode
    > dds=DESeq(dds, test = "Wald", fitType = "local", betaPrior=FALSE)
    estimating size factors
    Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  : 
      every gene contains at least one zero, cannot compute log geometric means
    Last edited by gamb; 10-26-2014, 08:52 AM.

    Leave a comment:


  • Michael Love
    replied
    Can you post the output of:

    sessionInfo()

    Leave a comment:


  • gamb
    replied
    Thanks Michael-

    I updated everything and now I have problems!

    Following your code,

    Code:
    results(dds, contrast=c("tissue","C","B"))
    Works for me but this doesn't:

    Code:
    results(dds, contrast=list("chem1","chem1.tissueB"))
    I get the following:

    Code:
    results(dds, contrast=list("Chem1", "Chem1.tissueB"))
    Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues) : 
      all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'
    My code:

    Code:
    dds=phyloseq_to_deseq2(data, ~tissue*Chem1*Chem2)
    dds=DESeq(dds, test = "Wald", fitType = "local", betaPrior=FALSE)
    resultsNames(dds)
    
     [1] "Intercept"                       "tissue_tissueC_vs_tissueA"      
     [3] "tissue_tissueB_vs_tissueA"       "Chem1_Yes_vs_No"                
     [5] "Chem2_Yes_vs_No"                 "tissuetissueC.Chem1Yes"         
     [7] "tissuetissueB.Chem1Yes"          "tissuetissueC.Chem2Yes"         
     [9] "tissuetissueB.Chem2Yes"          "Chem1Yes.Chem2Yes"              
    [11] "tissuetissueC.Chem1Yes.Chem2Yes" "tissuetissueB.Chem1Yes.Chem2Yes"

    Also, the tissues are paired by the source animal (i.e. Chem1+Chem2+ has tissue samples A,B, and C, from animals 1,2 and 3). How do I account for this?

    Leave a comment:


  • Michael Love
    replied
    hi gamb,

    Firstly, as you are just starting to analyze, best to update to Bioconductor 3.0 and DESeq2 version 1.6 which was just released.

    As the analysis is a bit complicated, I'd recommend using DESeq(dds, betaPrior=FALSE) as this simplifies the building of contrasts with interactions.

    The design should be

    Code:
    ~ tissue*chem1*chem2
    ...because you're interested in both the interactions of chem1 and chem2, and in the specific effects for each tissue.

    The list of effects is given by

    Code:
    resultsNames(dds)
    You can build up the contrasts using a list like below. I'll give some examples and see if you can extend to the others. Comparing the tissues alone is simple by specifying the name of the factor and the two levels to compare, e.g.:

    Code:
    results(dds, contrast=c("tissue","C","B"))
    Then some basics about interaction terms: the effect of chem 1 in tissue B will be:

    Code:
    results(dds, contrast=list(c("chem1","chem1.tissueB")))
    ...because you need to add the main chem 1 effect (the effect for tissue A, the base level of tissue) and the extra chem 1 effect for tissue B.

    If you wanted to test if chem 1 has an effect in tissue B different than tissue A you would only test the interaction term "chem1.tissueB", because this is that difference.

    Tissue A is the base level, so the effect of chem 1 in tissue A is simply

    Code:
    results(dds, name="chem1")
    The effect of chem 1 and chem 2 in tissue B is a bit lengthy, you need to add main effects and all relevant interaction terms:

    Code:
    results(dds, contrast=list(c("chem1","chem2","chem1.tissueB","chem2.tissueB","chem1.chem2","chem1.chem2.tissueB")))
    It's simpler for tissue A because this is the base level:

    Code:
    results(dds, contrast=list(c("chem1","chem2","chem1.chem2")))
    And if you want to test only the interaction, answering if the effect of the two chems is different than each separately, you remove some of these effects. For tissue B this contrast would be:

    Code:
    results(dds, contrast=list(c("chem1.chem2","chem1.chem2.tissueB")))
    For tissue A it is again simpler:

    Code:
    results(dds, name="chem1.chem2")
    Note that another way to compare individual groups would be to create a new variable which describes all groups:

    Code:
    dds$group = factor(paste(dds$chem1, dds$chem2, dds$tissue))
    design(dds) = ~ group
    Then use the following to compare to specific groups:

    Code:
    results(dds, contrast=c("group","...","..."))
    Last edited by Michael Love; 10-26-2014, 09:20 AM.

    Leave a comment:


  • gamb
    started a topic DESeq2 model design + contrasts

    DESeq2 model design + contrasts

    I've skimmed through previous threads about contrasts and I'm still getting hung up with my data.

    My data are 16s rDNA count data from a full factorial design with 3 tissue types (A,B,C) that were either exposed to Chemical1, Chemical2, or both. Each condition has three replicates.

    Edit: Each replicate has tissue A, B, C samples from the same animal. So, 3 animals per treatment. How should I account for this?

    For each tissue (just showing A for example)

    TissueA, Chem1+, Chem2+ x 3 replicates
    TissueA, Chem1-, Chem2+ x 3 replicates
    TissueA, Chem1+, Chem2- x 3 replicates
    TissueA, Chem1-, Chem2- x 3 replicates [aka control]

    The questions I want to test are:
    • Which OTUs are different from Tissue A vs B, B vs C, and A vs C?
    • Effects of Chem1 on each Tissue
    • Effects of Chem2 on each Tissue
    • The effects of Chem1 and Chem2 on each tissue


    Is it worth taking the chem1 and chem2 treatments and merge them as one condition? i.e. condition1= chem1+chem2+, condition2=chem1-chem2+ etc for simplicity and setup my model as ~tissue + condition?
    Last edited by gamb; 10-25-2014, 12:04 PM.

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    Yesterday, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:57 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:17 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-02-2024, 08:06 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-30-2024, 12:17 PM
0 responses
23 views
0 likes
Last Post seqadmin  
Working...
X