Seqanswers Leaderboard Ad

**fkrueger** · 07-24-2013, 04:50 AM

Fine, I'll have this changed as well then!

**fkrueger** · 07-24-2013, 07:36 AM

We have just released a new version of Bismark (v0.8.2). The changes address several feature requests and bug fixes raised above:

• Bismark: Changed the values of the TLEN values in paired-end SAM format generated by Bowtie 2 whenever one read was completely contained within the other; in such cases both TLEN values will be set to the length of the longer fragment
• Bismark: Changed the output filename for Bowtie 2 files for single-end reads from '...bt2_bismark.sam' to '...bismark_bt2.sam' so that single-end and paired-end file names are more consistent

• Methylation Extractor: Added a new option '--mbias_only'. If this option is specified, the M-bias plot(s) and their data are being written out. The standard methylation report ('--report') is optional. Since this option will not extract any methylation data, neither bedGraph nor cytosine report conversion are not allowed
• Methylation Extractor: If a specific output directory and '--cytosine_report' are specified at the same time, the bedGraph2cytosine module will now use the bedGraph file located in the output directory as intended
• Methylation Extractor: Added an additional check for the module GD::Graph::colour; if it can't be found on the system drawing of the M-bias plot will be skipped

Bismark can be dowloaded here: https://www.bioinformatics.babraham....jects/bismark/.

**PeteH** · 07-24-2013, 07:24 PM

Originally posted by fkrueger View Post

Hi Pete,

I had a go at implementing a new function '--mbias_only' which you can find attached. If this option is specified, only a report and the M-bias plot(s) and their data are being written out. It will not extract any methylation data, also the bedGraph and cytosine report conversion are not allowed.

The new version of the extractor (v0.8.1a) is attached; if you find it working alright I will include it into a new release once I am back from ISMB.

Thanks, Felix. My (brief) testing suggests there are no problems except that M-bias text file isn't written to the directory given by the "--output" flag, but rather is written to the current working directory. The pngs are written to correct output directory.

**PeteH** · 07-24-2013, 07:28 PM

It's probably worth noting in the documentation that bismark_methylation_extractor expects that for paired-end data that the reads are in queryname order, i.e. the same ordering scheme that Bismark natively outputs when writing SAM/BAM. I ran bismark_methylation_extractor on a coordinate sorted paired-end BAM file and got some strange results; obviously because bismark_methylation_extractor expects read_2 to immediately follow read_1 for each read-pair in the SAM/BAM, which is not the case for a coordinate sorted SAM/BAM.

**fkrueger** · 07-25-2013, 12:31 AM

Originally posted by PeteH View Post

Thanks, Felix. My (brief) testing suggests there are no problems except that M-bias text file isn't written to the directory given by the "--output" flag, but rather is written to the current working directory. The pngs are written to correct output directory.

Hi Pete,

I have also found this flaw, and the v0.8.2 release does also write the text file to a specified output folder correctly. Thanks for testing.

Originally posted by PeteH View Post

It's probably worth noting in the documentation that bismark_methylation_extractor expects that for paired-end data that the reads are in queryname order, i.e. the same ordering scheme that Bismark natively outputs when writing SAM/BAM. I ran bismark_methylation_extractor on a coordinate sorted paired-end BAM file and got some strange results; obviously because bismark_methylation_extractor expects read_2 to immediately follow read_1 for each read-pair in the SAM/BAM, which is not the case for a coordinate sorted SAM/BAM.

This is probably really worth mentioning, I believe there were a few cases where users reported some weird results. Most notably, position-sorted BAM files will subsequently look as if non-directional libraries had been used, which can lead to a lot of confusion.

**frozenlyse** · 07-25-2013, 05:59 PM

What kind of weird results did you get? I've always used it on position sorted bam files (had no idea it wanted name sorting), and WGBS data I've got correlates really well with HM450 arrays O_o

I guess bismark_methylation_extractor should error out if the bam file is sorted incorrectly in the future?

**PeteH** · 07-25-2013, 08:00 PM

Originally posted by frozenlyse View Post

What kind of weird results did you get? I've always used it on position sorted bam files (had no idea it wanted name sorting), and WGBS data I've got correlates really well with HM450 arrays O_o

I guess bismark_methylation_extractor should error out if the bam file is sorted incorrectly in the future?

I should emphasise, that it is only for paired-end data that the SAM/BAM needs to be sorted by queryname order. If you have single-end data then it won't matter whether the SAM/BAM is sorted by queryname or coordinate so you needn't worry (Felix, please correct me if I'm wrong on this).

The weird results obtained when running bismark_methylation_extractor on a coordinate sorted paired-end SAM/BAM where described by Felix a few posts above me. Basically, you end up getting methylation calls on the CTOT and CTOB strands when the data are from the 2-stranded protocol (and hence there should only be methylation calls on the OT and OB strands) as well as some other weird issues if the --no_overlap flag is active.

I agree that adding a check regarding the sort order of the SAM/BAM would be a good idea. There are a couple of ways to do this:
(1) check the SO field in the SAM/BAM header, however, this assumes the SAM/BAM has been sorted by a program that correctly sets this field (e.g. Picard's SortSam).
(2) A more direct way is to check that the read names are identical for read_1 and read_2 for each read-pair that is processed. I'd recommend that this check is included when the -p flag is active in bismark_methylation_extractor.

**fkrueger** · 07-26-2013, 07:26 AM

We have just released a new version of Bismark (v0.8.3) that implements earlier suggestions raised here on SeqAnswers or via email (many thanks to Pete Hickey for contributing the idea for the FLAG tags, see below). This new version will now die and warn about using positionally sorted paired-end SAM/BAM input files for the methylation extractor, and Bismark is going to use new FLAG values for paired-end SAM/BAM files to allow better visualisation and processing in other software suites.

The changes in detail include:

• Bismark: Changed the FLAG values of paired-end SAM/BAM output files to comply with other downstream applications such as Picard. In addition, reads will no longer have /1 or /2 appended to the read IDs. For the time being, the old FLAG values and read ID tags can still be obtained using the option '--old_flag'. For more information on the change of FLAG tags please see the RELEASE NOTES or type 'bismark --help'

Code:

                         new default                         old_flag
                      ===================              ===================
                      Read 1       Read 2              Read 1       Read 2

             OT:         99          147                  67          131

             OB:         83          163                 115          179

             CTOT:       83          163                 115          179

             CTOB:       99          147                  67          131

• Methylation Extractor: Changed the additional check for the module GD::Graph::colour to an 'eval {require ...}' statement instead of using 'use'. This should now properly skip drawing the M-bias plot if the module is not installed on the system
• Methylation Extractor: Implemented two quick tests for paired-end SAM/BAM files to see if the file had been sorted by chromosomal position prior to using the methylation extractor, because this would cause problems with the strand identity and overlaps since both reads 1 and read 2 are expected to follow each other directly in the Bismark alignment file. The first test attempts to find an @SO (for sorted) tag in the SAM header. If this cannot be found, the first 100000 sequences are checked for whether or not their ID is the same. If the file appears to have been sorted, the methylation extractor will bail and ask for an unsorted file instead

Bismark can be downloaded here: https://www.bioinformatics.babraham....jects/bismark/.

**PeteH** · 07-26-2013, 07:52 PM

Thanks for incorporate those changes, Felix. One thing, you've got the CTOT and CTOB flags swapped around in that table. They should be:

Code:

                         new default                         old_flag
                      ===================              ===================
                      Read 1       Read 2              Read 1       Read 2

             OT:         99          147                  67          131

             OB:         83          163                 115          179

             CTOB:       83          163                 115          179

             CTOT:       99          147                  67          131

Here's a screenshot that shows the difference between using the new flag values (labelled bismark v0.8.1 patched) and to when using the old flag values (labelled bismark v0.8.1).
.

**fkrueger** · 07-26-2013, 10:43 PM

Thanks for noticing, luckily it was only a mixup in the documentation and not the code. I have still corrected it in User manual, the Bismark help and the release notes, and put a new tar file for download.

**PeteH** · 07-28-2013, 11:50 PM

Originally posted by fkrueger View Post

Thanks for noticing, luckily it was only a mixup in the documentation and not the code. I have still corrected it in User manual, the Bismark help and the release notes, and put a new tar file for download.

Thanks for the new tar file. There's some issue with the archive utility on my macbook because it works fine for me on other machines but not on my macbook. Sorry for the confusion.

The description of the --old_flag argument is still not quite right for the old_flag column in the Release Notes or when you run bismark --help.

**fkrueger** · 07-31-2013, 01:07 AM

Thansk Pete, all should be fine now.

**yzizhen** · 08-06-2013, 10:25 AM

Hi,

I have some pair-end RRBS data that suffer from poor qualities at the 3' end of R2 reads (Unfortunately about 20 out of 36 cycles are affected). What is the best strategy to deal with it? It seems with bowtie1, quality scores will be taken into account. Will mismatches at read positions with poor quality be ignored? With bowtie2, --ignore-quals are disabled.
Do you think choosing bowtie1 will make the difference? Or should I simply trim R2 sequences? Would different lengths of R1 and R2 have any unexpected downstream effect?

It appears to me that a better solution is to use bowtie2 "local" mode, with which the reads with poor quality at the 3' end will be trimmed. Will it be difficult to support "local" mode for bismark?

Thank you very much for your help!

**fkrueger** · 08-06-2013, 11:50 AM

Originally posted by yzizhen View Post

Hi,

I have some pair-end RRBS data that suffer from poor qualities at the 3' end of R2 reads (Unfortunately about 20 out of 36 cycles are affected). What is the best strategy to deal with it? It seems with bowtie1, quality scores will be taken into account. Will mismatches at read positions with poor quality be ignored? With bowtie2, --ignore-quals are disabled.
Do you think choosing bowtie1 will make the difference? Or should I simply trim R2 sequences? Would different lengths of R1 and R2 have any unexpected downstream effect?

It appears to me that a better solution is to use bowtie2 "local" mode, with which the reads with poor quality at the 3' end will be trimmed. Will it be difficult to support "local" mode for bismark?

Thank you very much for your help!

If 20 out of 36 cycles are badly affected by poor qualities, there won't be much good sequence left you can make use of... In any case, I would suggest trimming with Trim Galore and the following parameters:

trim_galore --paired --rrbs --trim1 file1.fq file2.fq

This will remove adapters, qualities lower than Phred 20 and also trim one additional bp off the 3' end to facilitate alignments with Bowtie (1). It also deals with fill-in problems with unmethylated cytosines, for more information please refer to the Trim Galore and RRBS guide on the Babraham Bioinformatics website.

Just to clarify a few points: Bowtie 1 takes qualities into account, but only as much as that it would allow more mismatches in poor quality sequence than it would allow high confidence mismatches. For example in its default mode it would allow up to 7 mismatches in the entire read (0-3 within the seed, the rest beyond that), but only 2 good quality ones (with a Phred score of 30+). Thus, this is certainly the wrong approach for BS-Seq.

The way Bowtie 2 is implemented in Bismark, every mismatch will get a penalty score of -6 (regardless of its quality), so for poor quality sequences it would allow even fewer mismatches than Bowtie 1 and thus be more stringent.

If you quality-trim your data first, the Bowtie 1 mode will allow roughly 2 mismatches (sometimes 3) per read, while Bowtie 2 mode will only allow 1 for shorter reads, but already 3 by the time you reach a read length 90bp. Bowtie 2 is specifically not designed for short read lengths, so I would favour the Trim Galore -> Bowtie 1 model.

Alternatively, if read 2 is likely to be trimmed so much that it would throw out a big portion of your data either in the trimming process or during mapping if the reads got way to short, you might want to consider forgetting about read 2 entirely, trim read 1 alone and run alignments in single-end mode.

I hope this helps, best, Felix

**yzizhen** · 08-07-2013, 11:24 AM

Thanks a lot for your quick response and insightful input. I will trim the reads as suggested, remove the reads entirely if they become too short and run bowtie 1.

Topics	Statistics	Last Post
The Adaptation of the Cell Cycle in Multiciliated Cells by seqadmin Started by seqadmin, 06-07-2024, 06:58 AM	0 responses 13 views 0 likes	Last Post by seqadmin 06-07-2024, 06:58 AM
New Method for DNA Sequence Amplification by seqadmin Started by seqadmin, 06-06-2024, 08:18 AM	0 responses 21 views 0 likes	Last Post by seqadmin 06-06-2024, 08:18 AM
New Tools Enhance Single-Molecule DNA Analysis with Minimal Samples by seqadmin Started by seqadmin, 06-06-2024, 08:04 AM	0 responses 20 views 0 likes	Last Post by seqadmin 06-06-2024, 08:04 AM
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, 06-03-2024, 06:55 AM	0 responses 14 views 0 likes	Last Post by seqadmin 06-03-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News