Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by fwessely View Post
    I have some questions regarding the directional (strand-specific) vs non-directional protocol.

    1. As far as a I understood, the reads from a non-directional library (based on the Cokus protocol) have some sequence tags left (FW reads: TCTGT and RC reads: TCCAT).
    Do I have to remove these tags in all the reads before using Bismark? Or does Bismark handle them internally, perhaps, by using their information to find the correct alignment on the correct strand? I think BS Seeker does exploit this.

    2. What about reads from a non-directional library that do not have such sequence tags. I do not know why some reads (it seems to be the minority) do not have these tags. Are they artefacts?

    3. Is there a way to infer the underlying experimental protocol (directional vs non-directional), if it is not clear from the information of the metadata of the sequencing run?
    One idea would be to scan the reads for these tags. However, I do not know if these tags are always present in the raw data based on non-directional protocols.
    A workaround could be to run Bismark with its current default of having a directional library, then check in the summary report whether there are lots of rejected alignments to the complementary strands indicating a non-directional protocol.
    Should there be approximately equal amounts of alignments to all the four strands in case of a non-directional library?

    Any comments highly appreciated.
    Dear fwessely,

    It is true that reads prepared by the Cokus protocol contain sequence tags in the start, however there are also other protocols out there which produce non-directional libraries without any tags. Examples can be found in reads from Smallwood et al, 2011, Hansen et al., 2011 or all kinds of target amplified regions. Bismark does not exploit tags internally, so data from Cokus et al. or Popp et al. need to have the first 5 bp removed before performing alignments.

    If you are not entirely sure about the nature of a library you might want to run just the first 100000 reads or so (using -u 100000). This should finish in under a minute, and you can guess the type of library by the strand alignment ratio. If a library was directional, OT:OB:CTOT:CTOB should have a ration of 1:1:0:0 (maybe like 1% to the complementary strands). A non directional library produces roughly the same amount of alignments for each strand, but due the way the alignments work for reads that contain either no C at all or only methylated Cs, the ratio for non-directional libraries typically looks like 2:2:1:1.

    I hope this helps,

    Best wishes,
    Felix

    Comment


    • New Bismark version plus RRBS User Guide

      We have just released a new version of Bismark (v0.7.0) which changes its default behaviour for directional alignments from four-strand to two-strand alignment mode. This means that directional libraries are only aligned to the OT and OB strands, and the potentially time and resource consuming alignments to the complementary strands are not carried out. Another added benefit might be that the mapping efficiency increases marginally since fewer alignments are booted because they are considered ambiguous. It is nevertheless still possible to run the four-alignment strand mode for any combination of input files (FastA/FastQ), choice of aligner (Bowtie 1/2) or single-end/paired-end mode by specifying “--non_directional”.

      I have received so many emails about RRBS data that I have tried to write up a small RRBS user guide that tries to illustrate several potential pitfalls one can run into. The user guide is now also up on the Bismark project page. The adapter and quality trimming wrapper script trim_galore, which is mentioned in the RRBS guide, also has some functionality to remove artificial cytosine positions from FastQ reads that are a result of the end-repair step during library preparation. I am happy to send out a copy of trim_galore upon request.

      All data is avaiable on the Bismark project page www.bioinformatics.bbsrc.ac.uk/projects/bismark.

      Comment


      • We have just released Bismark v0.7.1 that fixes a bug caused by space (or tab) characters in the read IDs for Bowtie2 alignments. These are the changed features:

        - Adjusted Bismark so that white spaces or tab characters in the read IDs get replaced with underscores on the fly. This was necessary because some ID checks would fail as Bowtie2 truncates read IDs if it encounters spaces in the read ID (causing errors with the latest RTA version), whereas Bowtie 1 only truncates read IDs if 'tab' characters were found. More information about this can be found in the RELEASE_NOTES.

        - An RRBS QC pack is now avaliable for download which contains a brief guide to RRBS, the Cutadapt-wrapping script trim_galore as well as a validate_paired_end_files script to remove read pairs for which at least one of the read has been trimmed to a too short read length due to quality and/or adapter trimming.

        All files are available from the Bismark project page.

        Comment


        • Hey Felix,

          The --quality option doesn't work on trim_galore (but -q does). I think line 322 needs to be changed to 'q|quality=i' => \$quality' to fix it.
          Cheers
          Pete

          Comment


          • Felix, why does trim_galore call cutadapt twice - firstly for quality score trimming and then for adaptor trimming - when cutadapt can do both steps in one pass?

            Comment


            • Hi Pete,

              Thanks for your comment about the quality option, we have put an updated version onto the homepage for download.

              Cutadapt is called twice because, in --RRBS mode, trim_galore needs to know whether a sequence was trimmed because of poor quality bases or due to adapter contamination. Only sequence reads that had adapter removed (irrespective of whether they have been trimmed for qualities already) are having 2 additional bp clipped off to remove the biased cytosine fill-in position.

              trim_galore was initially designed for RRBS-type libraries, but I can see your point for other types of FastQ files. I suppose for a next version I could have Cutadapt perform both steps (quality and adapter trimming) in one go for non-RRBS type libraries.

              Cheers,
              Felix

              Comment


              • Originally posted by fkrueger View Post
                Cutadapt is called twice because, in --RRBS mode, trim_galore needs to know whether a sequence was trimmed because of poor quality bases or due to adapter contamination. Only sequence reads that had adapter removed (irrespective of whether they have been trimmed for qualities already) are having 2 additional bp clipped off to remove the biased cytosine fill-in position.

                trim_galore was initially designed for RRBS-type libraries, but I can see your point for other types of FastQ files. I suppose for a next version I could have Cutadapt perform both steps (quality and adapter trimming) in one go for non-RRBS type libraries.
                Ah, I'm always forgetting about RRBS-type libraries and their particular quirks! An option to perform the quality and adaptor trimming in one pass for "non-RRBS" libraries would be most welcome. I have a script that basically does just that, but it's a highly non-portable and clumsy shell script and I see myself moving to the much better trim_galore from now on.

                Comment


                • Cheers for that, I'll try to implement it as soon as I find the time.

                  Comment


                  • We have just put out an update for several tools in the Bismark package. This includes updates and bug fixes for trim_galore, the validate_paired_end_files and bismark result deduplication tools. Here is a list of the changes in more detail:

                    - methylation_extractor: changed the file endings of all files generated by the methylation extractor to '.txt'; this is to avoid confusing these files with SAM formatted Bismark output files

                    - deduplicate_Bismark_alignment_output.pl: Fixed a bug for paired-end deduplication mode in SAM format, which only printed the second read alignment of a pair to the deduplicated file

                    - trim_galore: Updated so that non-RRBS FastQ files are adapter and quality trimmed in a single pass
                    - trim_galore: added an option --fastqc_args "..." to pass extra arguments to FastQC for easier integration into pipelines
                    - trim_galore: Added some more documentation; trim_galore can now be found separately here

                    - validate_paired_end_files: Updated so that one can optionally write out unpaired single-end reads should a read-pair fail to be considered a valid paired-end read pair

                    This link takes you to the Bismark project page.

                    Comment


                    • Can anyone tell me if it is okay to pause Bismark during the alignment process then restart? We need to restart our computing core. If it is helpful to know, we are using platform lava to run.

                      Comment


                      • I am afraid Bismark itself is not geared towards being resumed from a restart (or crash for that matter...), but you might truncate the input file to the position of the last reported alignment, resume from there and merge the outputs afterwards.

                        Comment


                        • error writing sam to bam

                          I am trying to convert my Bismark SAM output to BAM and it is giving an error. I am using Picard to do this. I am wondering if there is anything different about the SAM file that is produced from bismark since I've never used it before. I've used these SAM to BAM commands before and had no issue from bowtie. Maybe there is something wrong with my SAM file. I noticed from my output that there are multiple lines that say something like "Chromosomal sequence could not be extracted for MWR-PRG-4:710F4CACXX:2:1101:7832:184789_1:N:0:GATCAGMT16469". Not sure where the error lies.

                          The specific error I get is "Exception in thread "main" java.lang.IllegalArgumentException: Negative value (-8112) passed to unsigned writing method." Which means absolutely nothing to me.

                          Comment


                          • Originally posted by shawpa View Post
                            I am trying to convert my Bismark SAM output to BAM and it is giving an error. I am using Picard to do this. I am wondering if there is anything different about the SAM file that is produced from bismark since I've never used it before. I've used these SAM to BAM commands before and had no issue from bowtie. Maybe there is something wrong with my SAM file. I noticed from my output that there are multiple lines that say something like "Chromosomal sequence could not be extracted for MWR-PRG-4:710F4CACXX:2:1101:7832:184789_1:N:0:GATCAGMT16469". Not sure where the error lies.

                            The specific error I get is "Exception in thread "main" java.lang.IllegalArgumentException: Negative value (-8112) passed to unsigned writing method." Which means absolutely nothing to me.
                            Hi shawpa,

                            I have just tested a SAM to BAM conversion of single-end and paired-end Bismark SAM files (with Bowtie 1 and 2) with SAMTtools ("samtools view -S -b result_bismark.sam > results_bismark.bam"), and all worked fine. I don't really know why Picard cannot handle it, the exception you linked doesn't mean anything to me either.

                            I could imagine that it is caused by the somewhat weird FLAG values that are used for Bismark output, but this is just a guess. The "chromosomal sequence could not be extracted" error messages don't have anything to do with this. You only get these messages whenever a sequence aligns to the very end of a chromosome (the MT chromosome especially), because Bismark extracts 2 additional basepairs to determine the sequence context. If these extra bases can't be extracted you'll get a warning.

                            Comment


                            • We have just released a new version of Bismark (v0.7.3), which addresses a couple of issues:

                              - Corrected a bug for the TLEN field in paired-end SAM output. This value was occasionally calculated incorrectly if both reads were overlapping almost entirely with a difference of only a single bp between the end of one read and the start of the second read. This did not affect the output of the methylation extractor but merely the display of the read alignment itself
                              - Removed a potential source of crashes with the combination of gzipped input files and the option -u/--qupto
                              - methylation_extractor: Corrected a potential flaw for the 'remove overlap' option for paired-end alignments in --vanilla mode when the first read aligned in a reverse orientation
                              - methylation_extractor: file endings of all files generated by the methylation extractor will be only a single '.txt' if the file was called .txt before

                              Bismark is available here.

                              I would like to mention that we are aware that Bowtie 2 sometimes crashes silently which may in turn result in crashes of Bismark. These crashes also occur when Bowtie 2 is run independently of Bismark and are currently under investigation by the Bowtie 2 developers. We are hoping that these problems will be fixed with the next update of Bowtie 2.

                              Comment


                              • We have just released a new, mainly cosmetic, version of Bismark (v0.7.4).

                                This release adds a new option --temp_dir <dir> that allows one to specify a directory to which the temporary C->T or G->A transcribed files are written to (this is needed for a potential implementation of Bismark into Galaxy). In addition, the input files to be aligned may now contain path information, e.g. /home/user/file.fq or ../temp/file.fq, so that one no longer has to call Bismark from within the directory containing the input files.

                                As usual, Bismark is available from www.bioinformatics.babraham.ac.uk/projects/.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-19-2024, 07:20 AM
                                0 responses
                                119 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-16-2024, 05:49 AM
                                0 responses
                                105 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-15-2024, 06:53 AM
                                0 responses
                                103 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                43 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X