Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculate TMM normalization

    Hello all,

    I would like to calculate TMM normalization for identifying DE genes. I know that edgeR does all the normalization when you run the code but I would like to calculate TMM for one or two genes to manually to better understand the statistic behind it. I found this formula to do so ( below) but I am not sure what the component, factor means?

    (raw.counts)*10^7/(librarysize*factors)

    librarysize=total number of mapped reads.

    I would really appreciate any suggestions and help.

    Thanks !

  • #2
    TMM wouldn't have any meaning (i.e., you couldn't calculate it) for one or two genes. The whole purpose of these sorts of normalization methods is to use larger datasets.

    Comment


    • #3
      Thank you for the quick response and clarification. Would you mind providing some insight into how the calculation is done and what that component factor stands for in the equation.


      Thanks!

      Comment


      • #4
        The algorithm for generating the normalization factor is described in the edgeR paper.

        Comment


        • #5
          Yes I got the function you use to calculate TMM from that paper. It uses R function, calcNormFactors().

          However I was wondering if I could manually calculate it using the formula I mentioned earlier since I do not know how to use R much.

          Thanks!

          Comment


          • #6
            You can just look at the R code, which you'll need to learn anyway if you want to work in bioinformatics.

            calcNormFactors:
            Code:
            function (object, method = c("TMM", "RLE", "upperquartile", "none"), 
                refColumn = NULL, logratioTrim = 0.3, sumTrim = 0.05, doWeighting = TRUE, 
                Acutoff = -1e+10, p = 0.75) 
            {
                if (is(object, "DGEList")) {
                    x <- as.matrix(object$counts)
                    lib.size <- object$samples$lib.size
                }
                else {
                    x <- as.matrix(object)
                    lib.size <- colSums(x)
                }
                method <- match.arg(method)
                allzero <- rowSums(x > 0) == 0
                if (any(allzero)) 
                    x <- x[!allzero, , drop = FALSE]
                if (nrow(x) == 0 || ncol(x) == 1) 
                    method = "none"
                f <- switch(method, TMM = {
                    f75 <- .calcFactorQuantile(data = x, lib.size = lib.size, 
                        p = 0.75)
                    if (is.null(refColumn)) refColumn <- which.min(abs(f75 - 
                        mean(f75)))
                    if (length(refColumn) == 0 | refColumn < 1 | refColumn > 
                        ncol(x)) refColumn <- 1
                    f <- rep(NA, ncol(x))
                    for (i in 1:ncol(x)) f[i] <- .calcFactorWeighted(obs = x[, 
                        i], ref = x[, refColumn], libsize.obs = lib.size[i], 
                        libsize.ref = lib.size[refColumn], logratioTrim = logratioTrim, 
                        sumTrim = sumTrim, doWeighting = doWeighting, Acutoff = Acutoff)
                    f
                }, RLE = .calcFactorRLE(x)/lib.size, upperquartile = .calcFactorQuantile(x, 
                    lib.size, p = p), none = rep(1, ncol(x)))
                f <- f/exp(mean(log(f)))
                if (is(object, "DGEList")) {
                    object$samples$norm.factors <- f
                    return(object)
                }
                else {
                    return(f)
                }
            }
            .calcFactorQuantile
            Code:
            function (data, lib.size, p = 0.75) 
            {
                y <- t(t(data)/lib.size)
                f <- apply(y, 2, function(x) quantile(x, p = p))
            }
            .calcFactorWeighted
            Code:
            function (obs, ref, libsize.obs = NULL, libsize.ref = NULL, logratioTrim = 0.3, 
                sumTrim = 0.05, doWeighting = TRUE, Acutoff = -1e+10) 
            {
                if (all(obs == ref)) 
                    return(1)
                obs <- as.numeric(obs)
                ref <- as.numeric(ref)
                if (is.null(libsize.obs)) 
                    nO <- sum(obs)
                else nO <- libsize.obs
                if (is.null(libsize.ref)) 
                    nR <- sum(ref)
                else nR <- libsize.ref
                logR <- log2((obs/nO)/(ref/nR))
                absE <- (log2(obs/nO) + log2(ref/nR))/2
                v <- (nO - obs)/nO/obs + (nR - ref)/nR/ref
                fin <- is.finite(logR) & is.finite(absE) & (absE > Acutoff)
                logR <- logR[fin]
                absE <- absE[fin]
                v <- v[fin]
                n <- sum(fin)
                loL <- floor(n * logratioTrim) + 1
                hiL <- n + 1 - loL
                loS <- floor(n * sumTrim) + 1
                hiS <- n + 1 - loS
                keep <- (rank(logR) >= loL & rank(logR) <= hiL) & (rank(absE) >= 
                    loS & rank(absE) <= hiS)
                if (doWeighting) 
                    2^(sum(logR[keep]/v[keep], na.rm = TRUE)/sum(1/v[keep], 
                        na.rm = TRUE))
                else 2^(mean(logR[keep], na.rm = TRUE))
            }

            Comment


            • #7
              Thank you very much. Appreciate it!

              Comment


              • #8
                is it possible to calculate TMM and logFC value through tab delimited file.

                geneid file1 file2
                a 32 33
                b 68 410
                c 78 140
                d 31 340
                e 213 470
                f 211 29

                total number of reads
                file1 7538239
                file2 3828901

                total number of reads align
                file1 6583829
                file2 3029328

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Working...
                X