Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bismark v0.6.beta1: Now supporting gapped Bisulfite-Seq alignments

    We would like to announce that Bismark has received a major overhaul. While the default alignment behaviour of Bismark (using Bowtie 1) has not changed very much (see below), Bismark does now also support gapped alignments using Bowtie 2. From all test we have performed so far (single or paired-end, directional or non-directional with various simulated methylation levels or real life datasets) the Bismark results of both Bowtie 1 and Bowtie 2 are very concordant.

    However, as Bowtie 2 is still in beta and subject to change, the current release of Bismark has therefore also to be considered a beta version (0.6.beta1).
    Here is an overview of the most prominent changes:

    Running Bismark with Bowtie 1 (default)

    - Default output changed to SAM format

    - The ‘old’ output format is still available via the option ‘--vanilla’

    - Alignment processes were slightly modified to run in --norc/--nofw mode where appropriate, which may result in a slightly increased mapping efficiencies

    - The former option ‘--directional’ is now the new default mode (‘--non_directional’ will report alignments to all four strands)

    - The default paired-end maximum insert size ('-X') was increased to 500bp (up from 250bp)

    Running Bismark with Bowtie 2 (optional)

    - Alignments are performed in end-to-end mode (similar to Bowtie 1), but do allow gapped alignments with insertions and/or deletions

    - Output format is SAM

    - Since Bowtie 2 requires different indexes for alignments, the bismark genome preparation does now also support Bowtie 2 bisulfite indexing of a reference genome

    I should like to stress that we don’t think that using Bowtie 2 for Bismark alignments is simply a replacement for Bowtie 1. Rather, as is also stated on the project its page, Bowtie 2 is supposed to work more efficiently for longer reads and allows gapped alignments. For shorter and/or indel-free reads Bowtie 1 may well be faster and more accurate, which is why Bowtie 1 will remain the default alignment mode for Bismark. Indeed, in some of the tests I have run so far the Bowtie 1 seemed to have a speed advantage.

    While Bismark seems to work fine in all alignments modes, its methylation_extractor works currently only on the old Bowtie 1 (‘--vanilla’) output and not yet on SAM output files (I am going to work on this in the next couple of days/weeks). This is another reason for calling the current Bismark version 0.6.beta1.

    Compared to Bowtie 1, Bowtie 2 has many ‘new’ parameters, of which the following are currently adjustable:

    -M <int> (reporting the best out of N valid alignments)
    -N <int> (multi-seed mismatches)
    -L <int> (seed length)
    -D <int> (maximum number of seed extension fail tries)
    -R <int> (reseeding of repetitive alignments)
    --score-min <func> (setting minimum alignment score for valid alignments)

    We are still in the process of determining a set of most sensible parameters to generate unique 'best' alignments in a reasonable time (inceasing some of the parameters above might make Bismark run dog slow...). I would very much appreciate any comments or input in this regard (and of course also bug reports...).

    All files are available from the Bismark project page.


  • #2
    We have just added a parallelization option for Bowtie 2 alignments (-p NTHREADS). This option became feasible because the latest Bowtie 2 release (Version 2.0.0-beta5 - December 15, 2011) added the option --reorder which reports alignments in the same way as they are read in, even if multiple threads are used for alignment.

    This option should potentially be useful to speed up Bismark alignments as well, however - as a word of caution - it also requires much higher system resources. E.g. specifying -p 3 will use 4*3 = 12 threads/cores for alignments as well as 1 thread for Bismark itself, and use > 15GB of memory for a human genome.

    The use of Bowtie 2 for Bismark alignments is still experimental and I would appreciate any input or feedback!

    Bismark v0.6.beta2 is available from the Bismark project page.


    • #3
      Originally posted by fkrueger View Post
      While Bismark seems to work fine in all alignments modes, its methylation_extractor works currently only on the old Bowtie 1 (‘--vanilla’) output and not yet on SAM output files (I am going to work on this in the next couple of days/weeks). This is another reason for calling the current Bismark version 0.6.beta1.
      I am aligned my BS reads using v_0.6.beta1 and generated an output sam file. i am now trying to run the methylation extractor on that file and I am getting an error stating:

      The methylation extractor and Bismark itself need to be of the same version!

      Versions used: methylation extractor: ' v0.6.beta1 '
      Bismark: ' @HD VN:1.0 SO:unsorted '

      I am wondering if what you quoted in the above post is relevant to my issue and if I upgrade to the most recent version of bismark, will I have an issue because the alignment was done in another version?


      • #4
        If you used Bismark to generate SAM output you need to run a more recent version of the methylation_extractor, which does now use SAM format as default input file (as of version 0.6.3).

        In any case I would recommend downloading the latest version (v0.7.2) and rerunning your alignments since several things have changed since version 0.6.1.



        • #5
          Is it really necessary to rerun my alignments? It took over a week the last time because I have 7 lanes of 100bp hiseq data.
          Last edited by shawpa; 03-19-2012, 05:00 AM.


          • #6
            The alignment and methylation information should still be the same, but there were several changes that might positively affect the outcome of your alignments, such as:

            - Changed Bismark's behavior for "--directional" mode (default) to run only 2 parallel instances of Bowtie 1/2 to the original top (OT) and bottom (OB) strands, instead of 4 instances to all possible bisulfite strands. This change might result in somewhat faster alignment speed and mapping efficiency. It is still possible to run the 4-alignment strand mode for any combination of input file(s) and choice of aligner by specifying --non_directional.
            - Sequences in FastA format do now receive Phred score qualities of 40 throughout (ASCII 'I') to prevent the SAM to BAM conversion in SAMtools from failing
            - If a genomic sequence could not be extracted it will now also be counted and reported for use with Bowtie 1
            - Changed the XX:Z mismatch field in the SAM output to display mismatching nucleotides of the reference sequence (instead of the read sequence ones)

            Since Bismark does now only run 2 alignment instances instead of 4 for directional alignments, you should not only see an increase in mapping efficiency but it should also be quite a bit quicker than it would be if you run it with 4 strand mapping (I did several lanes of 100bp SE HiSeq mapping with ~240M sequences overnight on a single instance). You may check the change log on the Bismark page to see if there is anything of relevance for you.


            • #7
              Thanks for your advice. I'll go ahead and download the new one and try again.


              Latest Articles


              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin

                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin

                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM





              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              Last Post seqadmin