Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysis of Variance between Biological Samples

    Hi all,

    I have a gene expression data matrix consists of about 10000 rows that represent genes and 20 columns represent 20 samples belonging to 2 different groups (10 samples per group).

    I want to prove that the variance between the 10 samples of the first group is more than the variance between the 10 samples of the second group.

    Analysis of Variance (ANOVA) methods may not work here (not sure of that) because it gives bad p-value as the number of variables (genes) are much much more than the number of observations (samples). Do I need to remove correlated genes, or cluster genes? is there any better solution for that?

    Thanks in advance.

  • #2
    Originally posted by Fernas View Post
    Hi all,

    I have a gene expression data matrix consists of about 10000 rows that represent genes and 20 columns represent 20 samples belonging to 2 different groups (10 samples per group).

    I want to prove that the variance between the 10 samples of the first group is more than the variance between the 10 samples of the second group.
    Hi there,

    Here's my suggestion, quite simple really. First count how many genes have variance in the 1st group greater than the variance in the 2nd group. Then apply a binomial test the null hypothesis that this count is equal to 50%. This is a sample R code:

    Code:
    ## Some test data: 100 genes (rows), 20 samples (columns)
    set.seed(1234)
    ngenes<- 100
    grp1<- 1:10
    grp2<- 11:20
    grp<- cbind(
        matrix(nrow= ngenes, ncol= length(grp1), data= rnorm(ngenes * length(grp1), sd= 1.1)),
        matrix(nrow= ngenes, ncol= length(grp2), data= rnorm(ngenes * length(grp2), sd= 1))
    )
    
    grpvar<- apply(grp, 1, function(x) ifelse(var(x[grp1]) > var(x[grp2]), TRUE, FALSE))
    btest<- binom.test(table(grpvar))
    btest
    
    	Exact binomial test
    
    data:  table(grpvar)
    number of successes = 39, number of trials = 100, p-value = 0.0352
    alternative hypothesis: true probability of success is not equal to 0.5
    95 percent confidence interval:
     0.2940104 0.4926855
    sample estimates:
    probability of success 
                      0.39
    I would check that there is no particular sample that gives a lot of variation to either group. Do this by repeating the above but leaving out one sample at a time and checking that the output p-values are in the same range as the initial p-value:

    Code:
    jkn<- vector(length= ncol(grp))
    for(i in 1:ncol(grp)){
        if(i <= max(grp1)){
            grp1b<- grp1[which(grp1 != i)]
        } else {
            grp2b<- grp2[which(grp2 != i)]
        }
        grpvar<- apply(grp, 1, function(x) ifelse(var(x[grp1b]) > var(x[grp2b]), TRUE, FALSE))
        xtest<- binom.test(table(grpvar))
        jkn[i]<- xtest$p.value
    }
    ## Dots should align more or less on a straight line
    qqnorm(-log10(c(btest$p.value, jkn)))
    Similarly, I would leave out groups of genes or sample genes to assess whether the initial result is due to a particular set of genes. However, I guess you would expect most of the genes to have the same variance?

    Just a thought...

    Dario

    Comment


    • #3
      Thank you very much indeed dariober for this clear explanation and suggestion.

      I like the idea and I found it (kind of) comparable to (Rank Sum) test. In this test we calculate the variance of each gene in each group. So, we have to column of variances (column 1 contains variances of all genes in group1, and column 2 has variances of all genes in group2). Then, apply Rank Sum test to test whether both vectors (columns) come from continuous distributions with the same median against the alternative hypothesis that one significantly differ than the other.
      What do you think? which of these two methods looks more related to the question I want to answer?

      Regarding your suggestion: Do I need to normalize the expression matrix row-wise in the beginning or it will not change the results.?

      Thanks again for the informative suggestion.

      Comment


      • #4
        Hi.
        We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer variance per sample and per group. If you'd like to give it a try it is available as an R package here:

        Comment


        • #5
          Hi.
          We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer expression and variance per group. If you'd like to give it a try it is available as an R package here:

          Comment


          • #6
            Originally posted by Jean View Post
            Hi.
            We have an RNAseq tool we are working on called ALDEx (ANOVA-like Differential Expression) which will infer expression and variance per group. If you'd like to give it a try it is available as an R package here:

            https://code.google.com/p/aldex/
            Thanks Jean for your reply.
            I went quickly through the manual of ALDEx. As the purpose is to study variance between samples within a group (not differentialy expressed genes between two groups), I am not sure if ALDEx tool's functions can provide such information.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X