Seqanswers Leaderboard Ad

**GenoMax** · 01-09-2014, 06:01 AM

FYI: The documentation link is not working. http://www.cgat.org/~andreas/documen.../cgat/cgat.htm

**sudders** · 01-09-2014, 06:03 AM

Sorry, should now be fixed.

**crazyhottommy** · 01-31-2014, 02:02 PM

Hi thanks for making this tool
could you please give the R script for making the average plot and the heatmap for bam2geneprofile

http://www.cgat.org/~andreas/documentation/cgat/scripts/bam2geneprofile.html

Tommy

**sudders** · 02-01-2014, 12:04 PM

Hi Tommy,

Thanks for your interest.

Simple average profiles are produced automatically by the script (using matplotlib code rather than R).

You can use R to produce more complex plots (like showing more than one profile on the same plot). A simple example is given on the page for the recipe:

What is the binding profile of NFKB across gene models?

If you wanted to plot more than one line on the plot (for example the data for the input) you could do some like the following (assuming that you'd already run bam2geneprofile for each sample)

Code:

> profile_chip <- read.csv("nfkb_profile.geneprofile.matrix.tsv.gz", header = T, stringsAsFactors = F, sep = "\t")

> profile_input <- read.csv("input_profile.geneprofile.matrix.tsv.gz", header = T, stringsAsFactors = F, sep="\t")

> plot(profile_chip$bin, profile_chip$counts, cex = 0, xaxt = "none")

> lines(profile_chip$bin, profile_chip$counts, col = "blue")

> lines(profile_input$bin, profile_input$counts, col = "red")

> abline(v = c(1000, 2000), lty = 2)

> mtext("upstream", adj = 0.1)

> mtext("exons", adj = 0.5)

> mtext("downstream", adj = 0.9)

The data for the heatmaps is produced by bam2peakshape rather than bam2geneprofile.

bam2peakshape doesn't produce currenlty the plots itself, but the R code to do so isn't difficult. It is also on the recipe linked above, at the bottom of the pages. I reproduce it below.

Assuming you've run bam2peakshape with a test and control bam and output to the pattern peakshap.%s you can do the following in R

Code:

> library( gplots )

> library( RColorBrewer )

> # read the H3K4me3 matrix into R
> me3 <- read.csv( "peakshape.matrix_peak_height.gz", header=TRUE, sep="\t", row.names=1 )

> # convert to matrix
> me3.matrix <- as.matrix( me3 )

> # A proportion of NFkB intervals have no discernable H3K4me3 or H3K4me1 coverage. These are removed before plotting.
> me3.matrix <- me3.matrix[ c( 4000, 14906 ), ]

> # the remainder are plotted
> cols <- brewer.pal( 9, "Blues" )

> heatmap.2( me3.matrix, col=cols, Rowv=F, Colv=F, labRow="", key=FALSE, labCol="", trace="none", dendrogram="none", breaks=seq(0, 1000, 101) )

> # A second plot can be produced for the H3K4me1 data
> me1 <- read.csv( "peakshape.control_peak_height.gz", header=T, sep="\t", row.names=1 )

> me1.matrix <- as.matrix( me3 )

> me1.matrix <- me1.matrix[ c( 4000, 14906 ), ]

> cols <- brewer.pal( 9, "Greens" )

> heatmap.2( me1.matrix, col=cols, Rowv=F, Colv=F, labRow="", key=FALSE, labCol="", trace="none", dendrogram="none", breaks=seq(0, 100, 11))

I hope this helps. Do let me know if I can help further.

Ian
---

**crazyhottommy** · 02-04-2014, 07:34 AM

Hi Ian,

Thank you so much for your kind reply. meta gene plot is becoming very routine in the NGS papers. Your tool is very helpful.

I was using Homer + R for heatmap and HTSeq for meta-gene profile.

Thank you again.

Tommy

Originally posted by sudders View Post

Hi Tommy,

Thanks for your interest.

Simple average profiles are produced automatically by the script (using matplotlib code rather than R).

You can use R to produce more complex plots (like showing more than one profile on the same plot). A simple example is given on the page for the recipe:

What is the binding profile of NFKB across gene models?

If you wanted to plot more than one line on the plot (for example the data for the input) you could do some like the following (assuming that you'd already run bam2geneprofile for each sample)

Code:

> profile_chip <- read.csv("nfkb_profile.geneprofile.matrix.tsv.gz", header = T, stringsAsFactors = F, sep = "\t")

> profile_input <- read.csv("input_profile.geneprofile.matrix.tsv.gz", header = T, stringsAsFactors = F, sep="\t")

> plot(profile_chip$bin, profile_chip$counts, cex = 0, xaxt = "none")

> lines(profile_chip$bin, profile_chip$counts, col = "blue")

> lines(profile_input$bin, profile_input$counts, col = "red")

> abline(v = c(1000, 2000), lty = 2)

> mtext("upstream", adj = 0.1)

> mtext("exons", adj = 0.5)

> mtext("downstream", adj = 0.9)

The data for the heatmaps is produced by bam2peakshape rather than bam2geneprofile.

bam2peakshape doesn't produce currenlty the plots itself, but the R code to do so isn't difficult. It is also on the recipe linked above, at the bottom of the pages. I reproduce it below.

Assuming you've run bam2peakshape with a test and control bam and output to the pattern peakshap.%s you can do the following in R

Code:

> library( gplots )

> library( RColorBrewer )

> # read the H3K4me3 matrix into R
> me3 <- read.csv( "peakshape.matrix_peak_height.gz", header=TRUE, sep="\t", row.names=1 )

> # convert to matrix
> me3.matrix <- as.matrix( me3 )

> # A proportion of NFkB intervals have no discernable H3K4me3 or H3K4me1 coverage. These are removed before plotting.
> me3.matrix <- me3.matrix[ c( 4000, 14906 ), ]

> # the remainder are plotted
> cols <- brewer.pal( 9, "Blues" )

> heatmap.2( me3.matrix, col=cols, Rowv=F, Colv=F, labRow="", key=FALSE, labCol="", trace="none", dendrogram="none", breaks=seq(0, 1000, 101) )

> # A second plot can be produced for the H3K4me1 data
> me1 <- read.csv( "peakshape.control_peak_height.gz", header=T, sep="\t", row.names=1 )

> me1.matrix <- as.matrix( me3 )

> me1.matrix <- me1.matrix[ c( 4000, 14906 ), ]

> cols <- brewer.pal( 9, "Greens" )

> heatmap.2( me1.matrix, col=cols, Rowv=F, Colv=F, labRow="", key=FALSE, labCol="", trace="none", dendrogram="none", breaks=seq(0, 100, 11))

I hope this helps. Do let me know if I can help further.

Ian
---

**crazyhottommy** · 02-04-2014, 08:00 AM

Hi Ian,

I do have another question. for bam2geneprofile, one needs to provide a gtf file.

for bam2peakshape, one needs to provide a bed file (generated from MACS)

If I want to generate an average plot with the ChIP-seq data ( similar to the meta-gene plot, but I am plotting the average on the peak intervals rather than the gene-model), I can still use bam2peakshape, get the matrix (the matrix is used for the heatmap), and calculate the colMeans for each bin, and then plot a line graph.

Or I can use the bam2geneprofile, but I need to convert the MACS bed file to gtf file first.

Am I correct?

Thanks!
Tommy

Originally posted by sudders View Post

Hi Tommy,

Thanks for your interest.

Simple average profiles are produced automatically by the script (using matplotlib code rather than R).

You can use R to produce more complex plots (like showing more than one profile on the same plot). A simple example is given on the page for the recipe:

What is the binding profile of NFKB across gene models?

If you wanted to plot more than one line on the plot (for example the data for the input) you could do some like the following (assuming that you'd already run bam2geneprofile for each sample)

Code:

> profile_chip <- read.csv("nfkb_profile.geneprofile.matrix.tsv.gz", header = T, stringsAsFactors = F, sep = "\t")

> profile_input <- read.csv("input_profile.geneprofile.matrix.tsv.gz", header = T, stringsAsFactors = F, sep="\t")

> plot(profile_chip$bin, profile_chip$counts, cex = 0, xaxt = "none")

> lines(profile_chip$bin, profile_chip$counts, col = "blue")

> lines(profile_input$bin, profile_input$counts, col = "red")

> abline(v = c(1000, 2000), lty = 2)

> mtext("upstream", adj = 0.1)

> mtext("exons", adj = 0.5)

> mtext("downstream", adj = 0.9)

The data for the heatmaps is produced by bam2peakshape rather than bam2geneprofile.

bam2peakshape doesn't produce currenlty the plots itself, but the R code to do so isn't difficult. It is also on the recipe linked above, at the bottom of the pages. I reproduce it below.

Assuming you've run bam2peakshape with a test and control bam and output to the pattern peakshap.%s you can do the following in R

Code:

> library( gplots )

> library( RColorBrewer )

> # read the H3K4me3 matrix into R
> me3 <- read.csv( "peakshape.matrix_peak_height.gz", header=TRUE, sep="\t", row.names=1 )

> # convert to matrix
> me3.matrix <- as.matrix( me3 )

> # A proportion of NFkB intervals have no discernable H3K4me3 or H3K4me1 coverage. These are removed before plotting.
> me3.matrix <- me3.matrix[ c( 4000, 14906 ), ]

> # the remainder are plotted
> cols <- brewer.pal( 9, "Blues" )

> heatmap.2( me3.matrix, col=cols, Rowv=F, Colv=F, labRow="", key=FALSE, labCol="", trace="none", dendrogram="none", breaks=seq(0, 1000, 101) )

> # A second plot can be produced for the H3K4me1 data
> me1 <- read.csv( "peakshape.control_peak_height.gz", header=T, sep="\t", row.names=1 )

> me1.matrix <- as.matrix( me3 )

> me1.matrix <- me1.matrix[ c( 4000, 14906 ), ]

> cols <- brewer.pal( 9, "Greens" )

> heatmap.2( me1.matrix, col=cols, Rowv=F, Colv=F, labRow="", key=FALSE, labCol="", trace="none", dendrogram="none", breaks=seq(0, 100, 11))

I hope this helps. Do let me know if I can help further.

Ian
---

**sudders** · 02-27-2014, 03:02 AM

Hi Tommy,

Sorry for the slow reply, i've been away. In future, if you use the CGAT user group (https://groups.google.com/forum/?fro...gat-user-group), your message will go to more people, so someone will reply to you even if i'm not around.

As to your question:

If I want to generate an average plot with the ChIP-seq data ( similar to the meta-gene plot, but I am plotting the average on the peak intervals rather than the gene-model), I can still use bam2peakshape, get the matrix (the matrix is used for the heatmap), and calculate the colMeans for each bin, and then plot a line graph.

Or I can use the bam2geneprofile, but I need to convert the MACS bed file to gtf file first.

I would recommend using the second of these two methods. The tool bed2gff will do the conversion for you.

Code:

  zcat my_bed_file.bed.gz 
| cgat bed2gff -a 
| cgat bam2geneprofile --bamfile=my_bam_file.bam --gtffile=- --method=intervalprofile --reporter=transcript

Along with what ever normalisation and output options you want. The - in --gtfile tells bam2geneprofile to use stdin for the interval file, and the -a on bed2gff tells it to output gtf.

Let me know if you have any further problems.

Ian
---

**crazyhottommy** · 03-02-2014, 06:20 PM

Hi Ian,

Thank you very much!

Tommy

Originally posted by sudders View Post

Hi Tommy,

Sorry for the slow reply, i've been away. In future, if you use the CGAT user group (https://groups.google.com/forum/?fro...gat-user-group), your message will go to more people, so someone will reply to you even if i'm not around.

As to your question:

I would recommend using the second of these two methods. The tool bed2gff will do the conversion for you.

Code:

  zcat my_bed_file.bed.gz 
| cgat bed2gff -a 
| cgat bam2geneprofile --bamfile=my_bam_file.bam --gtffile=- --method=intervalprofile --reporter=transcript

Along with what ever normalisation and output options you want. The - in --gtfile tells bam2geneprofile to use stdin for the interval file, and the -a on bed2gff tells it to output gtf.

Let me know if you have any further problems.

Ian
---

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Introducing CGAT: computational genomics analysis toolkit

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News