Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Values to use for hierarchical clustering of RNA-seq data with outliers

    Hello,

    I am new to this, and have been unable to find questions/advice related to my situation, so I hope someone can provide some insight.

    I have RNA-seq data that I have processed with the following simplified pipeline:

    fastq --> bowtie2 (mapped to reference transcriptome) --> eXpress (outputs count data, and fpkm) --> limma (count data from eXpress, weighted voom transformation, which gives normalized log2counts with associated precision weights) --> DE transcripts (with log2FC, Avg. Expr, P.Value, etc.)

    The data is from a time-course injury experiment, so I have 0 hr (uninjured), 1 day post-injury, and 2 days post-injury. For each time-point I have 3 replicates. One of the 1 day and 2 day samples look to outliers, so I have down-weighted them in limma using the weighted voom transformation. I really would like to NOT throw away any data, so I kept it.

    I would like to perform hierarchical clusterstering across the time-points and originally wanted to use fpkm values from eXpress, but realized that these values are not weighted and not normalized. Due to the fact that some of the samples might be outliers, I though this might cause an issue in the clustering. So I would like to use normalized, weighted values it possible.

    My questions are:
    1. Which values would be best to use:
    a. mean fpkm from eXpress (non-normalized, non-weighted)
    b. log2FC from limma (normalized, weighted)
    c. Average expression value from limma (normalized, weighted)

    2. If log2FC is suggested, how should I go about clustering since I believe I would only have values for the 1 day and 2 day time-points?

    Thank you,

    Chris

  • #2
    I would use the normalized weighted log2 counts from voom().

    Comment


    • #3
      Obtain log2 counts from Elist object created with voom()

      Thank you for your quick reply. I will use the log2 counts from voom.

      voom() returns an Elist object. I have been having trouble trying to convert the Elist object, v, to a dataframe. Would you happen to know how this should be done? Or another way I can obtain the log2 counts from the list?

      From close to the beginning to performing the voom transformation:
      Code:
      samples$countf = paste(samples$LibraryName, "count", sep=".")
      samples
      library("edgeR")
      counts = readDGE(samples$countf)$counts
      cpms = cpm(counts)
      keep = rowSums(cpms >1) >=3
      counts = counts[keep,]
      colnames(counts) = samples$shortname
      head( counts[,order(samples$condition)], 5)
      d = DGEList(counts=counts, group=samples$condition)
      d = calcNormFactors(d)
      plotMDS(d, labels=samples$shortname, col=c("darkgreen","blue")[factor(samples$condition)])
      library("limma")
      design = model.matrix(~samples$condition)
      colnames(design) = c("CTRL", "T08", "T16")
      design
      v = voomWithQualityWeights(d, design=design, normalization="none", plot=TRUE)
      Then I tried to convert the Elist object to a dataframe using information from stackoverflow:
      Code:
      vhclust = do.call(rbind.data.frame, v)
      And I received the following error:
      Code:
      Error in (function (..., deparse.level = 1)  : 
        numbers of columns of arguments do not match
      Thanks,

      Chris

      Comment


      • #4
        I'm not entirely sure since I never tried. An EList is a specific type of list containing matrices, so you should be able to use any list method (e.g., names(v)) to get an idea of what if contains. I suspect you want v[["E"]], though.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-25-2024, 11:49 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-24-2024, 08:47 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        62 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Working...
        X