Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CpG sites in Bismark

    Hello

    im trying to calculate the percentage of covered CpG sites in my RRBS library and compare it with total CpG sites in reference genome. i got splitting report from Bismark (see bellow)

    q1- could i say CpG sites in my RRBS library are equal to number of Total methylated C's in CpG context + number of Total C to T conversions in CpG context (around 19 million) ? if No how i can find total CpG sites in RRBS library?

    q2- i downloaded pig CGI annotation and counted all CpG sites but the total was around 2 million. sound very low for me. how i can find the actual number of CpG sites in reference genome?

    q3- is there a way to determine CpG sites per chromosome and compare it with CpG sites in each chromosome of reference genome?


    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 141645338

    Total methylated C's in CpG context: 7904886
    Total methylated C's in CHG context: 50683
    Total methylated C's in CHH context: 107717

    Total C to T conversions in CpG context: 12298571
    Total C to T conversions in CHG context: 35912924
    Total C to T conversions in CHH context: 85370557

  • #2
    Hi Hedi,

    q1- could i say CpG sites in my RRBS library are equal to number of Total methylated C's in CpG context + number of Total C to T conversions in CpG context (around 19 million) ? if No how i can find total CpG sites in RRBS library?
    No, I'm afraid you can’t say that. The numbers reported are the overall numbers of methylation calls performed for the entire run, and have nothing to do with the number of genomic positions covered. If you want to find out how many Cs were covered in your experiment you generate a coverage file where each line corresponds to a covered C position. So the number of lines in the file (zcat file.cov.gz | wc -l) is the number of positions covered in your experiment.

    q2- i downloaded pig CGI annotation and counted all CpG sites but the total was around 2 million. sound very low for me. how i can find the actual number of CpG sites in reference genome?
    You could use
    Code:
    bam2nuc
    (part of Bismark) to find out the number of Cs, or CpGs, in the genome. Here is the output for the Sscrofa11.1 build (genome-wide).

    Code:
    A       717891230
    AA      237125812
    AC      124343360
    AG      171421615
    AT      185000140
    C       517402066
    CA      178358877
    CC      136906913
    CG      30619972
    CT      171516061
    G       517706165
    GA      147162051
    GC      108922386
    GG      136983938
    GT      124637555
    T       719048243
    TA      155244114
    TC      147229152
    TG      178680414
    TT      237894187
    CGIs are only a small, albeit CG-rich, fraction of the genome, so 2M doesn’t sound too bad.


    q3- is there a way to determine CpG sites per chromosome and compare it with CpG sites in each chromosome of reference genome?
    I would suggest you use SeqMonk for this kind of work. You need to keep in mind though that RRBS only expects to cover ~1-2% of the genome at very specific positions, so getting an idea about how many CpG were covered per chromosome is almost certainly not anything you should be interested in.

    Comment


    • #3
      thank you for your advice and help. in methylkit using following command you can get coverage as well. but im wondering is it CpG coverage or read coverage? they used both definitions in their tutorial (https://www.bioconductor.org/package...ics_on_samples) . is it different with your suggested way of CpG coverage calculation?

      getCoverageStats(my.methRaw[[1]],plot = F,both.strands = FALSE)
      read coverage statistics per base
      summary:
      Min. 1st Qu. Median Mean 3rd Qu. Max.
      10.00 12.00 15.00 28.25 20.00 131376.00

      thanks again

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X