Seqanswers Leaderboard Ad

**dpryan** · 02-02-2014, 11:23 AM

There are instructions for making heatmaps in the DESeq(2) vignette (and likely the edgeR vignette as well).

In addition, there are the heatmap and heatmap.2 functions in R (the latter is in the gplot package).

**Zapages** · 02-02-2014, 01:07 PM

Thank you, dpryan. Its my first time trying to create a heat map via R for edgeR and DeSeq. Is there any guide that I should follow?

The CummeRbund guide was very helpful when I was following the tuxedo workflow.

I did find this guide for using heat map.2: http://davetang.org/muse/2010/12/06/...eatmap-with-r/

Is there anything specific I should do with my output counts file or edgeR and DeSeq output file from Iplant Collaborative to make it ready for creating a heat map via R. Is there normalization step or something like that?

My original combined tab limited values have something like this

Gene ID | Organ A | Organ A | Organ B | Organ B|
GeneA | 1 | 2 | 2520 | 2414 |

I did see that there was command in EdgeR manual but it was not helpful for a new timer like myself. As it does not really show an output of a heat map.

http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Sorry for so many amateur questions.

**Zapages** · 02-03-2014, 04:31 PM

Hi guys,

I have been able to figure out how to create a heat maps for the 100 genes or something like that. But I am still struggling on how to create your own heat maps based on the genes of interest.

Some one could kindly help me on this. I would really appreciate it.

So far this has been for DESeq.

This is the source that I have been following: http://cartwrightlab.wikispaces.com/DESeq

EDIT: I just figured it out.

But I have question, did I do it properly? Should I have normalized the data before creating the heat map?

Code:

> rawCounts <- read.delim("file.txt")
> conditions <-as.factor(c("Organ1","Organ1","Organ2","Organ2"))
> countDataSet <- newCountDataSet(rawCounts,conditions)
> countDataSet <- estimateSizeFactors(countDataSet)
> countDataSet <- estimateDispersions(countDataSet)
> colors <- colorRampPalette(c("white","darkblue"))(100)
> heatmap( vsd[select,], col = colors, scale = "none")
> cdsBlind <- estimateDispersions( cds)
> countDataSet -> cds
> cdsBlind <- estimateDispersions( cds)
> vsd <- getVarianceStabilizedData( cdsBlind )
> select <- order(res$pval)[1:100] #for 1st hundred genes
> heatmap( vsd[select,], col = colors, scale = "none")
> Genes_of_Interest <-c("gene1","gene2","gene3") #for genes of interest
> heatmap.2( vsd[Genes_of_Interest,], col=rev(heat.colors(25)),trace="none",scale="none",cexRow=0.7,cexCol=0.7)

To anyone who is new to this, please be aware that you should install gsplot library before creating a heat map.

**dpryan** · 02-04-2014, 01:48 AM

That's likely fine. Lior Pachter had a recent blog post regarding distance metrics in the creation of heatmaps and I addressed a bit about clustering in a recent answer on biostars.

The most important part is that you did a variance stabilising transform, which is good. I'm not sure how informative a heatmap using 3 genes would be, but I'll leave that to you.

**Zapages** · 02-07-2014, 06:48 AM

Thank you for the great help. I really appreciate it. We have other in situ analysis for these genes of interest. The other in situ techniques postulate that these genes should be highly expressed and have certain expression.

Do you think if I do normalization in Deseq and then create a heatmap will that still be good?

Or creating a bar graph with error will be good approach to visualizing the gene expression variance between the two organs of interest.

Thank you.

**dpryan** · 02-07-2014, 07:43 AM

For just a few datapoints you might either just create a table or directly make a graph (dot or box plots would probably suffice). The biggest problem with heatmaps is that it's annoying to try to tell exact levels from the coloring. For large number of genes you wouldn't actually do that, so this isn't a problem, but for just a few genes of a priori biological interest...

**Zapages** · 02-09-2014, 06:12 AM

Thank you dpryan. I was wondering what should the table include?

Should it be based on the normalized values, Binomial Test, or specific statistical test output from EdgeR/DeSeq. What's the preferred statistical test?

Also would the table from gene_exp.diff will be fine from CummeRbund?

As for box plots or histograms in CummeRbund its easy to create them. How should I go about creating them in R? I mean what should I base the box plot or histograms normalized values or specific statistical test values?

Sorry for so many questions. Many Thanks for your great help and advice.

EDIT: Could I use graphpad Prism or Excel to create the bar graphs?

**dpryan** · 02-10-2014, 01:55 AM

Depends on what you want to show. Normalized values might be the simplest, but you could also just put up the log2 fold-change and p-value and be done with it (or you could do both in the same table, since it's small enough).

I can't say I recall the format of gene_exp.diff, so I couldn't say, though I think it gives confidence intervals, which would actually be quite nice.

Histograms use the hist() functions and I think box plots use the boxplot() function. CummeRbund is actually an R package, it doesn't do any of the graphing itself. It very likely uses the ggplot2 package, which makes quite nice graphs (I use this myself). I should note that ggplot2 is more complicated to use, but if you're making a graph for publication then it might be worth it.

Having said that, it'd be simpler to do with graphpad prism if you're unfamiliar with R. I would avoid Excel, it's not a very powerful program and its graphs are atrocious.

**Zapages** · 02-14-2014, 11:20 AM

Thank you dpryan for all the great advice. I really appreciate it.

So something like this should be fine for both DESeq and EdgeR tables:

Gene ID|baseMean|Base Mean Sample 1|Base Mean Sample 2| Normalized value|log2 fold-change | p value | q value |

the gene_exp.diff file has the following information:

Gene ID | locus | sample_1 Name | sample_2 Name| status | value_1 |value_2 log2(fold_change) |test_stat| p_value |q_value | significant (Yes/No)

Yes its true, we are hoping to publish this data.

As for creating bar graphs. Should I just use the normalized values for the genes in Graphpad? I am learning R, any advice on creating these graphs with error bars. Although Cummerbund is fairly easy to use to make these graphs for the tuxedo pipeline. I don't think graph pad will allow me to create bar graphs with error bars unless I do a statistical test. Any statistical test that you would recommend for Graphpad or through DESeq/EdgeR?

Is the binomial test fine for DESeq? Can that be then be used for creating a bar/histogram?

Sorry for so many questions. Thank you again for the great help.

**super0925** · 02-21-2014, 09:53 AM

How Zapages How do you compare your DE result from DEseq/edgeR/cuffdiff? By which statistical variable? I am a rookie in RNA-seq. Thank you!

**sindrle** · 02-21-2014, 12:54 PM

Are you going untargeted? Or have you decided an a priori set of genes?

This is a ok I guess:

edgeR: (You can use the "togTags" command to get these)

Gene ID, logCPM, logFC, test-stat, p-value, (FDR if untargeted)

Cuffdiff:

Gene ID, value_1, value_2 log2(fold_change) |test_stat| p_value, (FDR if untargeted)

Cuffdiff also gives confidence intervals, standard deviation and dispersion estimates. Might be nice if you only look a some genes. Might be overkill if >1000 genes.

**Zapages** · 02-23-2014, 01:28 PM

Originally posted by super0925 View Post

How Zapages How do you compare your DE result from DEseq/edgeR/cuffdiff? By which statistical variable? I am a rookie in RNA-seq. Thank you!

I am currently debating, which statistical tests will be best to use. I am thinking of doing a negative binomial test as I am comparing the expression presence between two organs. Cuffdiff already does a statistical test via its algorithm.

On top of this, I am thinking of doing a normalization of the counts to see how the data appears in normalized data sets. I will be using the Benjamini-Hochberg false discovery rate of 0.01 for EdgeR based on a recent paper . I am going to apply the same principle on DeSEQ as well.

Overall goal is to illustrate that these three genes based on three independent RNA-Seq Analysis tools yield the same results of these certain genes are being highly expressed in Organ A versus Organ B. I am working on other in situ models and analysis to indicate that these genes share a common expression.

Also I created the counts using HTSEQ.

Originally posted by sindrle View Post

Are you going untargeted? Or have you decided an a priori set of genes?

This is a ok I guess:

edgeR: (You can use the "togTags" command to get these)

Gene ID, logCPM, logFC, test-stat, p-value, (FDR if untargeted)

Cuffdiff:

Gene ID, value_1, value_2 log2(fold_change) |test_stat| p_value, (FDR if untargeted)

Cuffdiff also gives confidence intervals, standard deviation and dispersion estimates. Might be nice if you only look a some genes. Might be overkill if >1000 genes.

Thank you, I really appreciate it here. I am actually looking at about 3 to 5 genes. There a few duplicates of certain genes. Is following something like this good idea for recalling the toptags. I think I have been using the iplant's edgeR and it has not provided me with tags information. So I am going to try it on my laptop (16 GB of ram), that should be fine?

Source for EdgeR: http://rnajournal.cshlp.org/content/....DC1/edgeR.pdf

Thank you again for all the great help.

Where exactly the confidence intervals, SD, Dispersion estimates in the Cuffdiff output files?

**sindrle** · 02-23-2014, 04:02 PM

Hi!
Depends on the version you use, but check out the _tracking - files

**super0925** · 02-24-2014, 02:10 AM

Thank you！ I will try your method to compare the performance of EdgeR/DeSeq/Tuxedo

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

HTseq to DeSeq/EdgeR to Heatmap

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News