Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A fold change heatmap for RNA seq analysis using CuffDiff and cummerbund

    Hi friends,

    I am an undergrad and was asked by a post doc to help analyze an Illumina Hi-seq single end RNA seq dataset. Needless to say I am a complete novice to this type of analysis and a little bit overwhelmed. We are working with S. cerevisiae; one wildtype condition and five test conditions (two biological replicates each, so 12 total FASTQ files). We are not interested in novel gene/splice variants and simply want to explore gene expression profiles of known genes. So I first aligned the reads to an annotated reference genome with TopHat then used CuffDiff using a reference transcriptome along with the BAM files from TopHat.

    My cuffdiff input looked like:

    $ cuffdiff -p 8 -o cuffdiff_out -b genome.fa -u genes.gtf -L WT,ies2,arp8,ino80,ies6,arp5 ./BY4741_1_accepted_hits.bam,./BY4741_2_accepted_hits.bam
    ./ies2_1_accepted_hits.bam,./ies2_2_accepted_hits.bam
    ./arp8_1_accepted_hits.bam,./arp8_2_accepted_hits.bam
    ./ino80_1_accepted_hits.bam,./ino80_2_accepted_hits.bam
    ./ies6_1_accepted_hits.bam,./ies6_2_accepted_hits.bam
    ./arp5_1_accepted_hits.bam,./arp5_2_accepted_hits.bam

    My question is: How do I generate a heatmap showing the fold change in gene expression of each of the 5 mutants compared to WT using cummeRbund?

    With cummerBund I was able to generate a heatmap with
    > csHeatmap(myGenes, cluster="both") where 'myGenes' was a list of the top 50 differentially expressed genes. I have attached the figure.

    However, I would rather have a heatmap where there is no WT (BY4741) column, and instead just shows fold change of the five mutant samples relative to WT. Is there a way to do this? If so, is there a way to specify the color scheme so that down-regulated genes are clearly one color (like blue) and up-regulated genes are clearly another color (like red), and genes that have no change are just black?

    Again, I'm new to this kind of analysis, and completely new to R, but cummeRbund seems like the easiest/most powerful way to do this. Advice?
    Attached Files
    Last edited by devking; 11-08-2012, 10:17 AM.

  • #2
    Hi devking,

    sorry I can't help you with your CuffDiff issue. Have you used BY4741 as a reference genome for your alignments? I am working with the same strain and I would like to use it as a reference genome to align my evolved strains to this one. Do you know where can I find an assembled genome of BY4741 to use as a reference?

    Comment


    • #3
      Dear Devkin,

      The heatmap you posted was done in R using the ggplot2 package.

      As you are already using the tuxedo pipeline, you could take a look at the cummeRbund

      which should enable you to get easy data integration.

      Best

      Comment


      • #4
        Did you try DESeq ? There is an interesting clustering section in the DESeq vignette on bioconductor.

        It's pretty simple to use :

        1. align
        2. Extract read count with htseq-count
        3. Use DESeq to perform DE analysis and plot some results

        Check : http://bioconductor.org/packages/rel...tml/DESeq.html

        Comment


        • #5
          Originally posted by jordiet View Post
          Hi devking,

          sorry I can't help you with your CuffDiff issue. Have you used BY4741 as a reference genome for your alignments? I am working with the same strain and I would like to use it as a reference genome to align my evolved strains to this one. Do you know where can I find an assembled genome of BY4741 to use as a reference?
          Hi Jordiet

          For my analysis I was only interested in quantifying expression according to a known and annotated genome so I used the Ensembl release 69 EF4 genome.

          I figured since the transcripts in yeast are so well-annotated already it seemed like more trouble than it was worth to assemble a new transcriptome using cufflinks/cuffmerge.

          I might be misunderstanding your question, but it seems like you're more interested in doing a full analysis as outlined in the nat protocol paper "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks" in which you'll have your own assembled transcriptome annotation to quantify.

          check the paper: http://www.nature.com/nprot/journal/....2012.016.html

          Best,
          Devin

          Comment


          • #6
            Originally posted by tir_al View Post
            As you are already using the tuxedo pipeline, you could take a look at the cummeRbund
            Hi Tir,

            Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
            Attached Files

            Comment


            • #7
              Originally posted by NicoBxl View Post
              Did you try DESeq ? There is an interesting clustering section in the DESeq vignette on bioconductor.
              Hi Nico,

              Thanks for the tip. It seems like all of the differential gene expression statistical methods (e.g. cuffdiff, DESeq, edgeR, NOIseq etc) return slightly different sets of significantly differentially expressed genes. E.g. check out:


              So I was actually planning on using DESeq next and comparing the results with my cuffdiff data. I was thinking of just focusing on the genes that tested significant for DE from both methods. Not sure if this is reasonable or worth the time...

              Best,
              Devin

              Comment


              • #8
                Originally posted by devking View Post
                Hi Tir,

                Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
                Hi Devking,

                I was looking to do the same plot as you did...is it done with DESeq?
                did you use a customized R script?
                Thanks!!!
                Manu

                Comment


                • #9
                  I quantified expression using the 'Tuxedo' protocol (http://www.nature.com/nprot/journal/....2012.016.html) and used a custom R script to generate the heatmap. Using cummeRbund, you can get a matrix of FPKM expression values, and then use heatmap.2() in the R 'gplots' package for the heatmap call. You could also import your expression matrix into Java TreeView which also plots nice heatmaps. Hope this helps!

                  Comment


                  • #10
                    Originally posted by devking View Post
                    I quantified expression using the 'Tuxedo' protocol (http://www.nature.com/nprot/journal/....2012.016.html) and used a custom R script to generate the heatmap. Using cummeRbund, you can get a matrix of FPKM expression values, and then use heatmap.2() in the R 'gplots' package for the heatmap call. You could also import your expression matrix into Java TreeView which also plots nice heatmaps. Hope this helps!
                    Thanks for the answer...I actually have followed the same path, but I have just been able to produce a differential expression heatmap.

                    I'll see what I can do for up and down regulation...unfortunately java is too hardcore informatics for me!!!
                    Thanks
                    Manu

                    Comment


                    • #11
                      Maybe this can help you get started:


                      Code:
                      library(cummeRbund); library(gplots)
                      cuff <- readCufflinks("cuffdiff_out",rebuild=T)
                      db <- fpkmMatrix(genes(cuff))
                      sigGenes <- getSig(cuff,'BY4741','ino80',alpha=0.05,level='genes')
                      db <- db[sigGenes,]	
                      
                      
                      WT <- "BY4741" # Set the column name of the WT sample
                      mut <- c("ino80","arp5","ies6") # Set column names of test conditions you want in the analysis
                      
                      logFC <- function(db,mutants,WT,logBase=2,pseudo=1) {
                      		if (length(WT) !=1 ) {
                      			stop('WT must refer to a single gene/column')
                      			}
                      		if (is.numeric(logBase)==FALSE) {
                      			stop('logBase must be a numeric value')
                      			}
                      		
                      		db <- (db[,mutants]+pseudo)/(db[,WT]+pseudo)
                      		db <- log(db,logBase)
                      		
                      }
                      
                      db <- logFC(db,mutants=mut,WT=WT,logBase=2,pseudo=1) # This does the log transformation
                      db <- as.matrix(db)
                      
                      heatmap.2(
                      		db,
                      		Rowv=TRUE,
                      		Colv=FALSE,
                      		dendrogram="row",
                      		trace="none",
                      		labRow="",
                      		density.info=c("none"),
                      		main="Log(Fold-Change) \nExpression Profiles"
                      	)

                      Comment


                      • #12
                        Thanks Devking!
                        This really helps me out!!!

                        Comment


                        • #13
                          Hi
                          Can you please give me your commands to make csHeatmap using cummeRbund ?
                          I tried making the using cummeRbund and succeeded, however, can not add dandogram and change color which you have done it.
                          May you please write back to me in little detail because I am a beginer.
                          Thank you in advance.
                          Jp.


                          Originally posted by devking View Post
                          Hi Tir,

                          Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.

                          Comment


                          • #14
                            Hi
                            Would you post your commands to make dendrogram in csHeatmap ?
                            I just can not add dendrogram using csHeatmap.
                            You used heatmap.2(.... can read cuffdiff out and make heatmap with dendrogram ?
                            Thank you

                            Originally posted by devking View Post
                            Hi Tir,

                            Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.

                            Comment


                            • #15
                              maybe the program cluster can help u. you just need to change the format of the cuffdiff result.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                06-06-2024, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 07:24 AM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 08:58 AM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-12-2024, 02:20 PM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-07-2024, 06:58 AM
                              0 responses
                              184 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X