Seqanswers Leaderboard Ad

**fkrueger** · 08-14-2018, 03:03 AM

Hi Hedi,

q1- could i say CpG sites in my RRBS library are equal to number of Total methylated C's in CpG context + number of Total C to T conversions in CpG context (around 19 million) ? if No how i can find total CpG sites in RRBS library?

No, I'm afraid you can’t say that. The numbers reported are the overall numbers of methylation calls performed for the entire run, and have nothing to do with the number of genomic positions covered. If you want to find out how many Cs were covered in your experiment you generate a coverage file where each line corresponds to a covered C position. So the number of lines in the file (zcat file.cov.gz | wc -l) is the number of positions covered in your experiment.

q2- i downloaded pig CGI annotation and counted all CpG sites but the total was around 2 million. sound very low for me. how i can find the actual number of CpG sites in reference genome?

You could use

Code:

bam2nuc

(part of Bismark) to find out the number of Cs, or CpGs, in the genome. Here is the output for the Sscrofa11.1 build (genome-wide).

Code:

A       717891230
AA      237125812
AC      124343360
AG      171421615
AT      185000140
C       517402066
CA      178358877
CC      136906913
CG      30619972
CT      171516061
G       517706165
GA      147162051
GC      108922386
GG      136983938
GT      124637555
T       719048243
TA      155244114
TC      147229152
TG      178680414
TT      237894187

CGIs are only a small, albeit CG-rich, fraction of the genome, so 2M doesn’t sound too bad.

q3- is there a way to determine CpG sites per chromosome and compare it with CpG sites in each chromosome of reference genome?

I would suggest you use SeqMonk for this kind of work. You need to keep in mind though that RRBS only expects to cover ~1-2% of the genome at very specific positions, so getting an idea about how many CpG were covered per chromosome is almost certainly not anything you should be interested in.

**Hedi86** · 08-15-2018, 02:27 AM

thank you for your advice and help. in methylkit using following command you can get coverage as well. but im wondering is it CpG coverage or read coverage? they used both definitions in their tutorial (https://www.bioconductor.org/package...ics_on_samples) . is it different with your suggested way of CpG coverage calculation?

getCoverageStats(my.methRaw[[1]],plot = F,both.strands = FALSE)
read coverage statistics per base
summary:
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 12.00 15.00 28.25 20.00 131376.00

thanks again

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, Today, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, Yesterday, 08:18 AM	0 responses 20 views 0 likes	Last Post by seqadmin Yesterday, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, Yesterday, 08:04 AM	0 responses 18 views 0 likes	Last Post by seqadmin Yesterday, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

CpG sites in Bismark

Comment

Comment

Latest Articles

ad_right_rmr

News