I'm interested in visualizing some ChIP-seq peaks for specific loci. I've got an image that plots the read depth at each position throughout an interval, and I'm fairly satisfied with it. But is this the right idea when plotting peak data? As I understand, there's a forward strand peak and a reverse strand peak. Should I worry about trying to merge the two peaks when visualizing, or would an image like mine be an acceptable way to visualize peaks?
Unconfigured Ad
Collapse
X
-
Coverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically. Then you'll see the strand peaks nice and clear.
If it's too bumpy, use kernel smoothing - which is basically what you're doing when you show coverage instead of read counts, except you're using a rectangular kernel that's centered at an arbitrary position in the fragment.
-
-
Thank you for the input. Are you suggesting that I make density curves? When I do that, it changes the image to where some of the peaks have different proportions from when I examine the reads in something like IGV. I also lose the ability to plot the input data along with IP data since the scales are completely different.
If you don't mean density curves, could you tell me what the axes would be? I assume that the x axis would be genome position, but what about the y axis?
Comment
-
-
I'm proposing figure 1A here: http://www.biomedcentral.com/1471-2164/14/720/figure/F1
The vertical axis is the number of reads that start at the given position, i.e. the number of reads whose 5'-most base is at that coordinate. I'm not sure if there's a way to do this in IGV, but you can get the 5' ends via the SAMtools API and make a wiggle file, or you could just simply use the convert_align tool provided with UniPeak.
Comment
-
-
Why is coverage not meaningful for ChIP-seq, why would you only count each read once at its 5' position, and why would you plot strand specifically (strand specificity isn't really relevant for ChIP-seq like it is for RNA-seq)? I think each read should be extended in the appropriate direction (depending on which strand it aligns to) to its estimated fragment length (usually around 200bp) and then coverage at each position plotted. Though I don't know what the best software is for doing all this, I use my own scripts.Originally posted by jwfoley View PostCoverage isn't really meaningful for ChIP-seq since you're counting reads, not bases. It would be clearer to count each read as a hit once at its 5'-most position, strand-specifically.
Comment
-
-
As I said, it's because you're counting reads, not bases. A 100 nt read is not twice as much evidence for DNA-protein interaction as a 50 nt read. The read length is arbitrary, and may vary if you're doing quality trimming.Why is coverage not meaningful for ChIP-seq, why would you only count each read once at its 5' position
If you have already corrected for the strand shift, then by all means, do sum the strands into one profile - that's indeed the most meaningful view. If not, you'll get your strand-specific peaks shifted in the 5' direction from each other, so when you combine the two strands without shifting you'll get an unnecessarily low peak or even a bimodal one. Even the shift between the peaks has nothing to do with read length; it's determined by insert size. If you happen to have done paired-end sequencing, you can disregard my previous advice and use the known center of each fragment instead of the 5' end of the read (to be perfectly precise, you're counting fragments, not bases or reads); otherwise, there are lots of different ways to estimate the average strand shift and apply a uniform correction with very good resolution. This is explained in more detail in the QuEST paper (see figure 1).and why would you plot strand specifically (strand specificity isn't really relevant for ChIP-seq like it is for RNA-seq)?
It's worth noting that a coverage plot is basically a lazy shortcut to get a roughly similar, but worse, result: you're doing kernel smoothing except the kernel function is a rectangle instead of something more mathematically efficient like a bell curve or parabola, and the kernel bandwidth is determined by the read length instead of some more meaningful value related to fragment size variation, and the kernels are centered at the middle of reads instead of the middle of fragments (so forward and reverse won't line up correctly unless insert length = read length, which is generally avoided in library construction so that you don't sequence into the adapter). It might be okay for informally browsing your data, but don't use this lazy shortcut to make a figure for publication.Last edited by jwfoley; 10-22-2014, 02:29 PM.
Comment
-
-
Thank you, jwfoley. I've thought about what you've said and it makes a good amount of sense. I'm using ggplot2 in R to make these visualizations. It may not be the best approach, but I'm most familiar with it so it seemed like a good place to start. I think there will be some issues with adjusting the height of the input data if I visualize it with a density kernel as well, but I think I have a good idea about how to adjust the y axis. Thanks again!
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Today, 10:09 AM
|
0 responses
9 views
0 reactions
|
Last Post
by SEQadmin2
Today, 10:09 AM
|
||
|
Started by SEQadmin2, Yesterday, 08:59 AM
|
0 responses
16 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
24 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
21 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
Comment