Header Leaderboard Ad

Collapse

Bismark - A New Tool for Mapping and Analysis of Bisulfite-Seq Data

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Felix,

    I managed to get the script to work by having all the files (input, output) in the same directory as the bismark2Bedgraph script.

    I could not find anything wrong with the file locations/file names that I had used previously but it works now!

    Thanks again for your time and assistance

    Comment


    • Originally posted by fkrueger View Post
      Hi seqfast,

      When you are generating and sequencing PCR amplified regions it is quite common that you see only the top strand (with both the OT and CTOT strands) or the bottom strand (with both OB and CTOB as in your case), depending on which strand you targeted when designing the primers.

      To help getting your head around this it really helps to draw the sequence of an amplicon of interest out on a sheet of paper. You should see that the bisulfite treatment will change one of the two strands so much that the oligos will only amplify one of the two strands, and this PCR product is then usually sequenced from both sides, e.g. the CTOB and OB strands, but this may be different depending on how the library preparation was done.

      So in short: PCR amplicons are not normally directional libraries but you only sequence both versions of either the top or the bottom strand, depending on how the primers were designed. I hope this helps, let me know if I can be of any further assistance. Cheers, Felix

      Hi Felix or any other experts,

      I encountered similar problem which most of the reads from a paired-end library aligned to the bottom strand after running Bismark (with either directional option or with the non-directional option). I would like to know that how we could process the further analysis and how we would calculate the coverage for each site if the reads from bottom strands and from top strands are extremely unbalanced.
      Sorry for the inconvenience caused and hope to get a response from any of you guys.

      Best regards,
      Daisy

      Comment


      • Hi Daisy,

        Were you expecting to see imbalances in your library, i.e. was it some kind of target enrichment or (PCR) amplification library? Depending on what you expect from the library design you should only analyse alignments from the strands you are expecting, so if you only expect OT or OB alignments you should not perform non-directional alignments.

        If your library strategy was designed in a way that you only amplified or pulled down one the two strands prior to preparing the libraries you will indeed only see the methylation of either the top or the bottom strands, but not both. The coverage in that case would simply be the number of reads you see for a region (unless you used UMIs, in which case you could also take uniqueness of fragments into account...).

        Comment


        • Originally posted by fkrueger View Post
          Hi Daisy,

          Were you expecting to see imbalances in your library, i.e. was it some kind of target enrichment or (PCR) amplification library? Depending on what you expect from the library design you should only analyse alignments from the strands you are expecting, so if you only expect OT or OB alignments you should not perform non-directional alignments.

          If your library strategy was designed in a way that you only amplified or pulled down one the two strands prior to preparing the libraries you will indeed only see the methylation of either the top or the bottom strands, but not both. The coverage in that case would simply be the number of reads you see for a region (unless you used UMIs, in which case you could also take uniqueness of fragments into account...).
          Dear Felix,

          Thanks a lot for answering.

          So I think in that case my enrichment sequencing is targeted at one strand, I should only look into that strand. That really solves my problem

          Comment


          • divergent methylation at same CpG dinucleotide

            I am trying to look at differential methylation from low coverage WGBS from 2 samples. My data is not ideal and I need higher coverage/more samples per group but this is what I have now to do "something" with. I did a sliding window analysis of CpG methylation to detect DMRs. I then wanted to see what the original site based methylation rate from Bismark output was. What I am seeing is that for many sites, I think I have the CpG in the context of the forward and reverse strand but they are giving much different methylation values. Being low coverage, I expect that it would not be identical but I am trying to figure out what can be done when the numbers are so wildly different. Is what I am seeing "normal" for low coverage WGBS sequencing (I usually do targeted)? Below is an example of one of my windows and the bismark output of the sites within that window.

            Sample 1
            chr1 224012513 224012513 11 45.45454545 54.54545455
            chr1 224012518 224012518 13 53.84615385 46.15384615
            chr1 224012523 224012523 13 30.76923077 69.23076923
            chr1 224012528 224012528 12 50 50
            chr1 224012533 224012533 11 54.54545455 45.45454545
            chr1 224012539 224012539 10 50 50
            chr1 224012540 224012540 1 100 0
            chr1 224012549 224012549 11 45.45454545 54.54545455
            chr1 224012550 224012550 1 100 0
            chr1 224012554 224012554 11 45.45454545 54.54545455
            chr1 224012555 224012555 1 0 100
            chr1 224012559 224012559 1 0 100
            chr1 224012560 224012560 10 50 50
            chr1 224012561 224012561 1 100 0
            chr1 224012570 224012570 9 55.55555556 44.44444444
            chr1 224012571 224012571 2 100 0
            chr1 224012575 224012575 10 50 50
            chr1 224012576 224012576 3 66.66666667 33.33333333
            chr1 224012641 224012641 8 87.5 12.5
            chr1 224012642 224012642 6 33.33333333 66.66666667
            chr1 224012661 224012661 10 100 0
            chr1 224012662 224012662 9 55.55555556 44.44444444
            chr1 224012675 224012675 12 66.66666667 33.33333333
            chr1 224012676 224012676 11 45.45454545 54.54545455
            chr1 224012686 224012686 8 87.5 12.5
            chr1 224012687 224012687 10 30 70
            chr1 224012691 224012691 7 85.71428571 14.28571429
            chr1 224012692 224012692 10 50 50
            chr1 224012766 224012766 9 77.77777778 22.22222222
            chr1 224012767 224012767 7 85.71428571 14.28571429
            chr1 224012796 224012796 11 100 0
            chr1 224012797 224012797 14 57.14285714 42.85714286
            chr1 224012830 224012830 14 92.85714286 7.142857143
            chr1 224012831 224012831 10 40 60
            chr1 224012850 224012850 19 73.68421053 26.31578947
            chr1 224012851 224012851 9 33.33333333 66.66666667
            chr1 224012865 224012865 19 78.94736842 21.05263158
            chr1 224012866 224012866 5 80 20
            chr1 224012895 224012895 18 61.11111111 38.88888889
            chr1 224012896 224012896 5 60 40
            chr1 224012911 224012911 15 100 0
            chr1 224012912 224012912 7 100 0
            chr1 224012925 224012925 18 44.44444444 55.55555556
            chr1 224012926 224012926 6 66.66666667 33.33333333
            chr1 224013081 224013081 4 75 25
            chr1 224013082 224013082 3 100 0
            chr1 224013130 224013130 12 75 25
            chr1 224013131 224013131 6 66.66666667 33.33333333
            chr1 224013141 224013141 15 100 0
            chr1 224013142 224013142 5 100 0
            chr1 224013161 224013161 16 100 0
            chr1 224013162 224013162 5 60 40
            chr1 224013191 224013191 23 95.65217391 4.347826087
            chr1 224013192 224013192 5 80 20
            chr1 224013216 224013216 21 61.9047619 38.0952381
            chr1 224013217 224013217 4 50 50
            chr1 224013222 224013222 1 100 0
            chr1 224013246 224013246 23 60.86956522 39.13043478
            chr1 224013247 224013247 8 62.5 37.5
            chr1 224013255 224013255 19 57.89473684 42.10526316
            chr1 224013256 224013256 8 75 25
            chr1 224013281 224013281 21 76.19047619 23.80952381
            chr1 224013282 224013282 9 66.66666667 33.33333333
            chr1 224013316 224013316 13 92.30769231 7.692307692
            chr1 224013317 224013317 5 100 0
            chr1 224013331 224013331 10 90 10
            chr1 224013332 224013332 4 100 0
            chr1 224013375 224013375 12 83.33333333 16.66666667
            chr1 224013376 224013376 8 100 0
            chr1 224013405 224013405 14 92.85714286 7.142857143
            chr1 224013406 224013406 9 100 0
            chr1 224013435 224013435 13 92.30769231 7.692307692
            chr1 224013436 224013436 7 100 0

            Sample 2
            chr1 224012513 224012513 11 18.18181818 81.81818182
            chr1 224012518 224012518 12 25 75
            chr1 224012523 224012523 12 0 100
            chr1 224012528 224012528 11 0 100
            chr1 224012533 224012533 12 0 100
            chr1 224012539 224012539 10 10 90
            chr1 224012549 224012549 11 0 100
            chr1 224012554 224012554 11 0 100
            chr1 224012560 224012560 13 7.692307692 92.30769231
            chr1 224012570 224012570 13 7.692307692 92.30769231
            chr1 224012575 224012575 14 7.142857143 92.85714286
            chr1 224012641 224012641 2 50 50
            chr1 224012642 224012642 1 0 100
            chr1 224012661 224012661 1 0 100
            chr1 224012662 224012662 1 0 100
            chr1 224012675 224012675 2 0 100
            chr1 224012676 224012676 1 0 100
            chr1 224012686 224012686 1 0 100
            chr1 224012687 224012687 1 0 100
            chr1 224012691 224012691 1 0 100
            chr1 224012692 224012692 1 100 0
            chr1 224012766 224012766 2 50 50
            chr1 224012796 224012796 2 0 100
            chr1 224012830 224012830 12 0 100
            chr1 224012850 224012850 15 40 60
            chr1 224012865 224012865 14 28.57142857 71.42857143
            chr1 224012895 224012895 12 33.33333333 66.66666667
            chr1 224012911 224012911 5 100 0
            chr1 224012925 224012925 8 37.5 62.5
            chr1 224013161 224013161 2 100 0
            chr1 224013191 224013191 3 66.66666667 33.33333333
            chr1 224013216 224013216 2 0 100
            chr1 224013246 224013246 2 100 0
            chr1 224013281 224013281 2 100 0
            chr1 224013405 224013405 1 100 0
            chr1 224013435 224013435 1 100 0

            Comment


            • divergent CpG dinucleotide methylation

              I am not sure if this is the proper thread to post this on but since I am using bismark output, I thought I would ask here. I have very low coverage WGBS data. Ideally I would have higher coverage and/or more samples per group but right now, I only have 2 samples I am comparing with low coverage. I was trying to see if I could identify any DMR using a sliding window analysis. After I did that calculation, I wanted to see what the individual sites in those regions looked like in terms of per CpG methylation. What I think I am seeing in many cases is the same CpG site but the forward and reverse strand so it appears 1 base apart. However, they are showing pretty different methylation values at what should be the same CpG site. It also shows really different coverage values too. I assume this is some kind of mapping bias but I am not sure. Is what I am seeing "normal" for this type of data or should it be more consistent at the sample CpG site? I am having trouble trusting my window analysis when I see the methylation values jump around so wildly. Attached is a text file with all the sites in 1 region of my 2 samples.
              Attached Files

              Comment


              • Just generally, in order to get reads for both the forward and reverse strands you need to have a fairly good sequencing depth and have a library with a good complexity. While your sample 1 has several CpG dinucleotides covered on both strands, Sample 2 has hardly any covered on more than one strand.

                By just glancing at over the values in your example the values don't seem to be differing all that wildly to be honest. If you for example take the example:

                Code:
                224012575	224012575	10	50	50
                224012576	224012576	3	66.66666667	33.33333333
                Here the 2 positions don't agree perfectly but with 3 reads in total there simply are only a limited number of percentages you can achieve, namely 0, 33, 66 or 100%. There will always be a problem to match numbers perfectly if you have a shallow read depth, but this is why you need to average the values over larger distances to even those limitations out.

                Comment


                • Hi Felix,

                  I am trying to make sens of my M-bias plot.
                  I have 150nt PE sequencing and the M-bias plot shows a gradual decrease in CHH calls for R2 starting from ~75nt.

                  R2 are trimmed and per base quality plot looks fine. What concerns me is that I could not find the same CHH calls when I looked into the methylation call strings (XM:Z). It works for R1; I am able to reproduce the exact same Mbias plot, but for R2, the methylation call strings do not seem to reflect the Mbias plot.

                  Moreover, when processing R2 as SE, the Mbias plot shows a fairly even distribution of CHH calls across the length of the reads (no decrease).

                  Could you help me understand this gradual decrease in PE mode? why does it disappear in SE mode?

                  Thanks for the help!
                  Cheers,

                  Amira

                  Comment


                  • Hi Amira,

                    The reason for this is the overlap detection, and -removal, in paired-end files. What this simply does is to detect when R1 and R2 start overlapping, and as soon as this is the case the methylation extraction from R2 is halted. This mechanisms avoids introducing a coverage bias for parts of the reads that are overlapping. If you would specify the option --include_overlap you will not see this dramatic decrease anymore, but this is not to say that you should be using it. Does this help? Cheers, Felix

                    Comment


                    • Originally posted by fkrueger View Post
                      Hi Amira,

                      The reason for this is the overlap detection, and -removal, in paired-end files. What this simply does is to detect when R1 and R2 start overlapping, and as soon as this is the case the methylation extraction from R2 is halted. This mechanisms avoids introducing a coverage bias for parts of the reads that are overlapping. If you would specify the option --include_overlap you will not see this dramatic decrease anymore, but this is not to say that you should be using it. Does this help? Cheers, Felix
                      Hi Felix,

                      I wasn't aware that the overlapping region of R2 was removed for the M-bias plot too, I thought that R1 and R2 were processed independently.

                      I understand my plots now thanks!

                      Cheers,
                      Amira

                      Comment

                      Working...
                      X