Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq2*: Dispersion in a DESeq analysis with design=~group

    Hello,

    I make a differential expression analysis between treated samples (factors) and untreated (controls) at different states (vegetative, early, inter, late).
    The problem is that I lack replicates.

    -For the vegetative state I have no replicates at all.
    Contrast*:
    EZL1.veg (factor) vs ICL7.veg (control)

    -For the early state I have partially replicates (only 2 for the control)
    Contrast*:
    EZL1.T0 (factor) vs ICL7.T0 & ICL7.T5 (control)

    -For the inter state I have partially replicates (only 2 for the factor)
    Contrast*:
    EZL1.T10 & EZL1.T20 vs ICL7.T20

    -For the late state I have partially replicates (only 2 for the factor)
    Contrast*:
    EZL1.T35 & EZL1.T50 vs ICL7.T35

    My questions are :
    1) How Deseq calculates the dispersion in the case of the vegetative state where I have no replicates at all*? (given that I apply a DESEq analysis with design=~group and then I ask results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0")))

    2) How Deseq calculates the dispersion in all other cases where I have partially replicates*?


    Thank you in advance for any ideas*!

    Here is the code*:

    # 0:Vegetative, 1:Early, 2:Inter, 3:Late
    > times=factor(c(1, rep(2,2),rep(3,2),0,rep(1,2), rep(2,1),rep(3,1),0), levels=c(0,1,2,3))
    >
    > strain = factor(c(rep("factors", 6), rep("controls",5)), levels=c("controls","factors"))
    >
    > colData = data.frame( strain= strain, times=times)
    > rownames(colData)=colnames(duo)
    > colData$group <- factor(paste0(colData$strain,colData$times))
    > colData
    strain times group
    EZL1.T0 factors 1 factors1
    EZL1.T10 factors 2 factors2
    EZL1.T20 factors 2 factors2
    EZL1.T35 factors 3 factors3
    EZL1.T50 factors 3 factors3
    EZL1.veg factors 0 factors0
    ICL7.T0 controls 1 controls1
    ICL7.T5 controls 1 controls1
    ICL7.T20 controls 2 controls2
    ICL7.T35 controls 3 controls3
    ICL7.veg controls 0 controls0
    >
    > ddsEZL1_ICL7 <-DESeqDataSetFromMatrix(countData= as.matrix(duo), colData= colData, design=~group)
    > ddsEZL1_ICL7 <-DESeq(ddsEZL1_ICL7)
    estimating size factors
    estimating dispersions
    gene-wise dispersion estimates
    mean-dispersion relationship
    final dispersion estimates
    fitting model and testing
    >
    >
    > resVEG<-results(ddsEZL1_ICL7, contrast=c("group", "controls0", "factors0"))
    > resVEG<- resVEG[!is.na(resVEG$padj) & resVEG$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG<-rownames(resVEG)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_VEG)
    [1] 15

    >
    > resEARLY <- results(ddsEZL1_ICL7, contrast=c("group", "controls1", "factors1"))
    > resEARLY <- resEARLY[!is.na(resEARLY$padj) & resEARLY$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY<-rownames(resEARLY)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_EARLY)
    [1] 412
    >
    > resINTER <- results(ddsEZL1_ICL7, contrast=c("group", "controls2", "factors2"))
    > resINTER <- resINTER[!is.na(resINTER$padj) & resINTER$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER<-rownames(resINTER)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_INTER)
    [1] 1087
    >
    > resLATE <- results(ddsEZL1_ICL7, contrast= c("group", "controls3", "factors3"))
    > resLATE <- resLATE[!is.na(resLATE$padj) & resLATE$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE<-rownames(resLATE)
    > length(list_of_differentially_expressed_genes_for_EZL1vsICL7_LATE)
    [1] 2921

    Note*:
    When I do a simple DESeq just for the vegetative state DESeq ignores the condition labels (factor and control) and estimates the variance by treating all samples (EZL1.veg and ICL7.veg) as if they were replicates of the same
    condition, in order to estimate the dispersions. But this method gives me different results at the end (much more conservative which is normal):

    f <- list("~/EZL1_results/EZL1-veg_mRNA_GCCAAT.TOPHAT.pt_51.pe.sorted.bam_fragments.csv",
    "~/ICL7_results/ICL7-veg_RNA_ATCACG.TOPHAT.pt_51.pe.sorted.bam_fragments.csv")
    ….
    ….

    > str(df_points_EARLY)
    'data.frame': 41533 obs. of 2 variables:
    $ EZL1.veg: int 104 105 44 0 58 84 111 22 0 105 ...
    $ ICL7.veg: int 280 235 106 0 85 446 185 64 1 168 ...

    > strain = factor(c(rep("EZL1",1), rep("ICL7",1)), levels=c("EZL1","ICL7"))
    > colData = data.frame( strain= strain )
    > rownames(colData)=colnames(df_points_EARLY)
    > dds<-DESeqDataSetFromMatrix(countData= as.matrix(df_points_EARLY), colData= colData, design=~strain)
    > dds <-DESeq(dds)
    > resultsNames(dds)
    [1] "Intercept" "strainEZL1" "strainICL7"
    > resEZL1_ICL7 <- results(dds, contrast=list("strainICL7","strainEZL1"))
    > resEZL1_ICL7<- resEZL1_ICL7[!is.na(resEZL1_ICL7$padj) & resEZL1_ICL7$padj <= 0.05,]
    > list_of_differentially_expressed_genes_for_EZL1_ICL7_VFG<-rownames(resEZL1_ICL7)
    > length(list_of_differentially_expressed_genes_for_EZL1_ICL7_VEG)
    [1] 0

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:57 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-02-2024, 08:06 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-30-2024, 12:17 PM
0 responses
24 views
0 likes
Last Post seqadmin  
Working...
X