Header Leaderboard Ad


Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bismark Bisulfite Aligner - Now supporting CpG, CHG and CHH context

    I would like to announce that we have just released a new version of our Bisulfite-seq alignment and methylation calling tool Bismark. All associated files are available for free from http://www.bioinformatics.bbsrc.ac.uk/projects/.

    As the most noticeable difference, Bismark does now further subdivide non-CpG context into CHG and CHH context, which will be especially interesting for researchers working on plant systems. The former characters 'C/c' in the methylation call has been replaced by:

    CHG-context: X / x (methylated / unmethylated)
    CHH-context: H / h (methylated / unmethylated)

    In addition, I noticed that due to recent changes in the Bowtie source code, Bismark was producing lots of warnings 'best-first memory chunk exhaustion...') which was also mentioned here on SEQanswers. As suggested by Ben Langmead, the best way to counteract this problem is to increase the memory size for each bowtie thread, or mute bowtie. Thus, Bismark will now understand the additional option '--chunkmbs <int>' to change the memory from 64 (default)to any integer (I found that 256 or even 512 got rid of nearly all warnings). These errors were especially frequent in --best mode or for paired-end alignments. Bismark will now also understand the '--quiet' option to suppress memory chunk exhaustion (and other) warnings.

    Some other minor fixes include:

    - FastA files do no longer require the file ending ".fa".

    - Fixed an issues so that Bismark will no longer tolerate chromosomes with
    same name when reading the genome into memory.

    - Fixed an issue with paired-end alignment reports.

    - The methylation extractor will by default distinguish between cytosines in the three contexts CpG, CHG or CHH. If this is not needed, CHG and CHH context can be merged into 'non-CpG' context by specifying '--merge_non_CpG'.

    - Due to the fundamental changes in v0.2.0 (CHG and CHH context methylation calls) the methylation extractor will now require that the Bismark mapping result file was generated with the same version of Bismark.

    If you have any suggestion or comments I would like from you!

    Best wishes,

    Last edited by fkrueger; 09-08-2010, 12:40 PM. Reason: formatting

  • #2
    I tried using the new version on some reads that I had previously aligned using an older version of bismark and ran into problems.

    I started by making fresh bowtie indexes because I previously had each chromosome in a different file and the new version of the prep program can handle MFA files (which is great thanks) and also because my reference has non ACGTN characters and I had previously converted these but now no longer need to (again, great).

    I then tried running bismark and have errors of this type for every read that gets aligned

    Chromosomal sequence could not be extracted for FC704H7AAXX:5:3:3219:9818#0 Chr2 1818938
    Use of uninitialized value in substr at /usr/local/bin/bismark line 1381, <$__ANONIO__> line 395162.
    substr outside of string at /usr/local/bin/bismark line 1381, <$__ANONIO__> line 395162.
    Use of uninitialized value in transliteration (tr///) at /usr/local/bin/bismark line 1778, <$__ANONIO__> line 395162.
    Use of uninitialized value in substr at /usr/local/bin/bismark line 1392, <$__ANONIO__> line 395162.
    substr outside of string at /usr/local/bin/bismark line 1392, <$__ANONIO__> line 395162.
    Here is a copy of the commands used

    bismark_genome_preparation --verbose /data/bismark/
    with /data/bismark containing a file soft-linked to my reference fasta file. This ran fine with no errors.

    bismark --chunkmbs 512 -q --phred64-quals -n 0 -l 32 /data/bismark/ -1 cobl-5_1_1.fq,cobl-5_2_1.fq -2 cobl-5_1_2.fq,cobl-5_2_2.fq
    I tried without --chunkmbs (because it's the only thing I changed from my previous runs) but get the same errors. The fastq files are the same ones that previously worked fine.

    Any ideas where I'm going wrong?


    • #3
      This problem was caused by the MFA file. I have hotfixed it now and hope it will work fine!


      • #4
        I can confirm that the hotfix works and the new version is now working great.


        • #5
          What kind data is good for this software? If I have exon sequencing data from illumina pipeline, woult it make sense to run with your software?


          • #6
            Originally posted by foxyg View Post
            What kind data is good for this software? If I have exon sequencing data from illumina pipeline, woult it make sense to run with your software?
            The software is for mapping bisulfite treated sequencing data to examine methylation. Exon sequence data requires a completely different mapping approach. I would look into Tophat and Cufflinks or something similar.


            • #7
              A quick overview of Bismark can be found here.


              • #8
                Bismark v0.2.2 has just been released which fixes a bug in the methylation extractor whereby the positions of some cytosines were offset by a few base pairs (this affected some cytosines from reverse-mapped reads in single-end mapping mode). Sorry for any inconvenience caused.


                • #9
                  Originally posted by fkrueger View Post
                  This problem was caused by the MFA file. I have hotfixed it now and hope it will work fine!
                  Even I am facing the same problem. Can you please post here, what was the problem with fasta file and how did you fixed it.

                  Many Thanks in advance.


                  • #10
                    Sorry this post is nearly 3 years old and I seem to have forgotten the exact details... What exactly is the problem you are seeing? And which version of Bismark are you using?

                    One thing that springs to my mind about multi fasta files for alignments is that Bismark expects them with alternating header and sequence lines, such as:


                    If sequences are spanning multiple lines it won't work, such as this:



                    • #11
                      Thanks for replying fkrueger,

                      I am using the latest version of the Bismark i.e. v0.7.10, and I am getting a error message exactly like this,
                      Chromosomal sequence could not be extracted for FC704H7AAXX:5:3:3219:9818#0 Chr2 1818938

                      Currently I am using the multiline fasta file. As per your suggestion I will convert the reference to single line fasta.

                      Thanks again....


                      • #12
                        This is actually not an error but a warning message, and this has nothing to with the type of file you are using. It simply means that aligned to the very end of a chromosome, and Bismark could not extract further 2 bp from the end of the chromosome simply because there are no further 2 bp. It is normally safe to just ignore these warnings.



                        • #13
                          I tried to process a sorted bedGraph file that contains only CHH contexts using "genome_methylation_bismark2bedGraph_v5.pl". However, when I processed the file, the other contexts like CHG and CG were also included with methylation information and also it was not consistent with the methylation information of the input file. The bedGraph file and the resulting processed one looked like below.
                          The options I used were --CX and --genome_folder. The version of bismark is 0.7.8.
                          Any ideas where I'm going wrong?

                          The part of the bedGraph file that only contains CHH contexts
                          chr1 3003874 3003874 0 0 19
                          chr1 3003875 3003875 0 0 19
                          chr1 3003884 3003884 0 0 21
                          chr1 3003889 3003889 9.52380952380952 2 19
                          chr1 3003892 3003892 0 0 21
                          chr1 3003893 3003893 0 0 21
                          chr1 3003895 3003895 23.8095238095238 5 16
                          chr1 3003896 3003896 0 0 11
                          chr1 3003903 3003903 7.69230769230769 1 12
                          chr1 3003908 3003908 0 0 12
                          chr1 3003910 3003910 0 0 12
                          chr1 3003911 3003911 0 0 9
                          chr1 3003921 3003921 0 0 22
                          chr1 3003922 3003922 4.54545454545455 1 21
                          chr1 3003923 3003923 0 0 22

                          The part of the processed file by genome_methylation_bismark2bedGraph_v5.pl
                          chr1 3003874 + 0 0 CHH CCT
                          chr1 3003875 + 0 19 CHH CTT
                          chr1 3003881 + 0 0 CHG CAG
                          chr1 3003883 - 0 0 CHG CTG
                          chr1 3003884 - 0 0 CHH CCT
                          chr1 3003885 + 0 21 CG CGG
                          chr1 3003886 - 0 0 CG CGC
                          chr1 3003887 - 0 0 CHG CCG
                          chr1 3003889 - 0 0 CHH CTC
                          chr1 3003892 - 0 0 CHH CTT
                          chr1 3003893 - 0 21 CHH CCT
                          chr1 3003895 - 0 0 CHH CAC
                          chr1 3003896 + 5 16 CHH CCC
                          chr1 3003897 + 0 11 CHG CCG
                          chr1 3003898 + 0 0 CG CGG
                          chr1 3003899 - 0 0 CG CGG
                          chr1 3003900 - 0 0 CHG CCG
                          chr1 3003903 - 0 0 CHH CAT
                          chr1 3003905 + 0 0 CHG CTG
                          chr1 3003907 - 0 0 CHG CAG
                          chr1 3003908 - 0 0 CHH CCA
                          chr1 3003910 - 0 0 CHH CTC
                          chr1 3003911 + 0 12 CHH CCT
                          chr1 3003912 + 0 9 CHG CTG
                          chr1 3003914 - 0 0 CHG CAG
                          chr1 3003918 + 0 0 CHG CAG
                          chr1 3003920 - 0 0 CHG CTG
                          chr1 3003921 - 0 0 CHH CCT
                          chr1 3003922 - 0 22 CHH CCC
                          chr1 3003923 - 1 21 CHH CCC
                          Last edited by momokenken; 05-26-2013, 08:11 AM. Reason: some mistakes


                          • #14
                            Hi momokenken,

                            To me the output you linked looks just fine, but you have to note a couple of things:

                            - The bedGraph output is 0-based, however the genome-wide cytosine methylation report (the last format) uses 1-based coordinates (as does Bismark itself). Thus, you need to add +1 to all bedGraph coordinates to get to the cytosine report coords.
                            - The metylation extractor offers the options CpG-only or all cytosine contexts, i.e. CG, CHG and CHH combined. There is no CHH context-only format unless you filter it out specifically. Thus the full cytosine output contains Cs in many different contexts.

                            Finally, may I ask you to install the latest version (v0.7.12) which offers quite a few new features for the methylation extraction, bedGraph and cytosine report? In addition to being a LOT quicker than older versions Bismark comes now with the modules bismark2bedGraph and bedGraph2cytosine that replace any older versions of these scripts. Both of them work either from within the methylation extractor or as stand-alone tools. If you have further questions you can also contact me directly via email.

                            Cheers, Felix


                            • #15
                              You may also note that the methylation report contains also the cytosines with no coverage at all. Those will never appear in the bedgraph file.

                              You can filter the bedgrapd file that contains all the cytosine methylation averages using bedtools (in combination with the report file) plus <grep> command.



                              Latest Articles


                              • seqadmin
                                Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
                                by seqadmin

                                Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
                                Today, 01:49 PM
                              • seqadmin
                                Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
                                by seqadmin

                                Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
                                03-10-2023, 05:31 AM
                              • seqadmin
                                Expert Advice on Automating Your Library Preparations
                                by seqadmin

                                Using automation to prepare sequencing libraries isn’t a new concept, and most researchers are aware that there are numerous benefits to automating this process. However, many labs are still hesitant to switch to automation and often believe that it’s not suitable for their lab. To combat these concerns, we’ll cover some of the key advantages, review the most important considerations, and get real-world advice from automation experts to remove any lingering anxieties....
                                02-21-2023, 02:14 PM





                              Topics Statistics Last Post
                              Started by seqadmin, 03-17-2023, 12:32 PM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 03-15-2023, 12:42 PM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 03-09-2023, 10:17 AM
                              0 responses
                              1 like
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2023, 12:03 PM
                              0 responses
                              Last Post seqadmin