Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fkrueger
    replied
    M-Bias plot and alignment stats pie chart

    We have just released a new version of Bismark (v0.8.0). The main purpose of this release is the introduction of an M-bias plot for the methylation extractor (thanks to K.D. Hansen for the suggestion, see reference below). The M-bias plot allows researchers to see whether there are fundamental technical biases in the methylation calling of their reads.

    Attached are two examples:

    a) the profile of a PBAT-Seq library (read1) which should in theory start with a 'random' 4-mer oligo that was used to bind to bisulfite converted reads. The M-bias (and also base composition profiles of these libraries as judged by FastQC (not shown here)) do show some serious biases primarily within, but not limited to, the first 4bp of the library.

    b) Read 2 of any paired-end BS-Seq library. Due to the end-repair reaction and A-tailing procedure for Illumina libaries, the 3' ends of sonicated fragments are filled in by the Klenow enzyme with unmethylated cytosines. These will then be converted to Ts in the bisulfite treatment, resulting in a very high amount of apparently unmethylated cytosines at the first couple of positions of Read 2. (The problem is somewhat reminiscent of the fill-in problem in RRBS libraries).

    In addition to just the methylation profile, the M-bias plot also shows the actual number of events for an entire read file. This should enable investigators to make an informed decision of whether to remove biased positions in the reads, e.g. by using the option --ignore <int> of the methylation extractor or by hard-trimming read 2 by 2bp prior to running the alignments, or whether it is OK to live with what they observe.

    Other changes in the latest version include:
    Bismark: Added new option '--prefix' to add a prefix to the output filenames. For example, '--prefix test' with 'file.fq' would result in the output file 'test.file.fq_bismark.sam' etc.
    Bismark: Fixed a warning message that occurred when chromosomal sequences could not be extracted in paired-end Bowtie2 mode
    Bismark: will now generate a pie chart with the alignment statistics once a run has finished; this allows to get a quick overview of how many sequences aligned uniquely or sequences that did not align, either due to producing no alignment at all, multiple mapping or because it was impossible to extract the chromosomal sequence

    Methylation Extractor: upon completion, the methylation extractor will now produce an M-bias (methylation bias) plot, which shows the methylation proportion across each possible position in the reads (described in: Hansen et al., Genome Biology, 2012, 13:R83). The data for the M-bias plot will be written into a text file (to generate graphs by alternative means) and drawn into a .png file. The plot also contains the absolute number of methylation calls per position (methylated + unmethylated)

    Bismark is avaialable for download here: https://www.bioinformatics.babraham....jects/bismark/
    Attached Files

    Leave a comment:


  • fkrueger
    replied
    Good news. So apparently the warning messages you saw amounted to a total of 215 (out of 49 million read pairs), indeed something you should be able to live with.

    Bismark has an option '--qupto' to just use the first so many reads; just type bismark --help to see further explanations on it

    So you could run single end alignments against your masked genome for the first 500000 reads with not-so-stringent criteria using the following command:

    bismark --bowtie2 --qupto 500000 --score_min L,0,-0.6 /path/to/genome/ file_val_1.fq

    Leave a comment:


  • oria34
    replied
    Well, apparently errors started to appear also in the alignment run against the standard genome, but a bit later I left the office (see below). Not really concerning if I get a good enough mapping efficiency.... Still running

    Well this is embarrassing…… No idea how "they" masked the genome for the interspersed repeats. I can't even ask till next Monday...Sorry for this also, but I have no idea how to extract the first 500K reads. I am quite newbie in all this

    The reason for aligning to both versions of the genome is just to compare results...well, I am just curious about how the interspersed repeats can determine certain patterns of methylation in certain parts of a couple of particular chromosomes. I just thought it was worth the shot.



    Processed 47700000 sequences so far
    Processed 47800000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 95642710.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 95642710.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:2314:2821:99210#TCTTATAT/1 scaffold_1878
    Processed 47900000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 95989234.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 95989234.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:2314:12271:44482#TCTTATAT/1 scaffold_232
    Processed 48000000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 96116284.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 96116284.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:2315:3124:23585#TCTTATAT/1 scaffold_297
    Processed 48100000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 96230734.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 96230734.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:2314:15481:55756#TCTTATAT/1 scaffold_1105
    Processed 48200000 sequences so far
    Processed 48300000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 96765254.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 96765254.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:2316:2905:27999#TCTTATAT/1 scaffold_1711
    Processed 48400000 sequences so far
    Processed 48500000 sequences so far
    Processed 48600000 sequences so far
    Processed 48700000 sequences so far
    Processed 48800000 sequences so far
    Processed 48900000 sequences so far


    FYI - Updated info: The alignment to standard genome is now finished. It took around 17h and the mapping efficiency was fairly high....thanks again for this worderful tool! I copy/paste the report:

    Final Alignment report
    ======================
    Sequence pairs analysed in total: 48900263
    Number of paired-end alignments with a unique best hit: 31991336
    Mapping efficiency: 65.4%

    Sequence pairs with no alignments under any condition: 13461118
    Sequence pairs did not map uniquely: 3447809
    Sequence pairs which were discarded because genomic sequence could not be extracted: 215

    Number of sequence pairs with unique best (first) alignment came from the bowtie output:
    CT/GA/CT: 15981506 ((converted) top strand)
    GA/CT/CT: 0 (complementary to (converted) top strand)
    GA/CT/GA: 0 (complementary to (converted) bottom strand)
    CT/GA/GA: 16009615 ((converted) bottom strand)

    Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 1298097901

    Total methylated C's in CpG context: 122944840
    Total methylated C's in CHG context: 870088
    Total methylated C's in CHH context: 2311989

    Total C to T conversions in CpG context: 71226620
    Total C to T conversions in CHG context: 301248948
    Total C to T conversions in CHH context: 799495416

    C methylated in CpG context: 63.3%
    C methylated in CHG context: 0.3%
    C methylated in CHH context: 0.3%
    Last edited by oria34; 07-06-2013, 05:19 AM.

    Leave a comment:


  • fkrueger
    replied
    Good to hear that the alignments against the standard genome went well.

    The errors from the masked genome are quite likely to be a result of the masking process itself, do you have an idea about how much sequence has been masked into Ns (presumably)?

    As far as I am aware Bowtie2 supports indexing of Ns in there reference sequence (whereas Bowtie 1 does not), so in theory it should work in the same way. N's in the reads or reference do receive a mapping penalty as well though (albeit not as much as true mismatches), so if you simply have masked too much of your genome this might just exceed the fairly stringent penalty allowance of Bismark's default setting which is governed by the "--score_min" parameter. The longer your reads and the paired-end nature of your library could make it possible that effectively all reads would fall into a 'masked' region.

    To see if it is the degree of masking you could just run a few sample reads (e.g. with the parameter --qupto 500000 for the firt 500K reads), run only Read 1 in single end mode and use --score_min L,0,-0.6. This should finish in a couple of minutes and tell you quickly if the mapping efficiency increases notably.

    Just out of interest, why would you want to align reads to both a masked and unmasked genome in parallel (especially since the normal genome alignments were fine)?

    Leave a comment:


  • oria34
    replied
    Hi again,

    OK, I have some more news. I made a mistake checking online my runs ....the one that is running error-free is the alignment against the standard genome.

    All the errors I´ve reported have been produced in the alignment run against the masked genome. These runs are now over (see report below) and they didn´t work at all.

    To my understanding, Bowtie doesn´t have any type of restrictions in the use of masked genomes, does it? Do you think I could miss something during the genome preparation step?

    Cheers


    Bismark report for: M24_FIS17_trim_galore_1.fq and M24_FIS17_trim_galore_2.fq (version: v0.7.12)
    Bowtie was run against the bisulfite genome of /Bisulfite/BISMARK_REALIGNMENT/MASKED_Version_Genome/ with the specified options: -q --phred64 --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --maxins 500

    Option '--directional' specified: alignments to complementary strands will be ignored (i.e. not performed)
    Final Alignment report
    ======================
    Sequence pairs analysed in total: 62787165
    Number of paired-end alignments with a unique best hit: 1587
    Mapping efficiency: 0.0%
    Sequence pairs with no alignments under any condition: 62785501
    Sequence pairs did not map uniquely: 77
    Sequence pairs which were discarded because genomic sequence could not be extracted: 0

    Number of sequence pairs with unique best (first) alignment came from the bowtie output:
    CT/GA/CT: 557 ((converted) top strand)
    GA/CT/CT: 0 (complementary to (converted) top strand)
    GA/CT/GA: 0 (complementary to (converted) bottom strand)
    CT/GA/GA: 1030 ((converted) bottom strand)

    Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 50806

    Total methylated C's in CpG context: 1118
    Total methylated C's in CHG context: 99
    Total methylated C's in CHH context: 172

    Total C to T conversions in CpG context: 2166
    Total C to T conversions in CHG context: 16576
    Total C to T conversions in CHH context: 30675

    C methylated in CpG context: 34.0%
    C methylated in CHG context: 0.6%
    C methylated in CHH context: 0.6%
    Last edited by oria34; 07-06-2013, 12:39 AM.

    Leave a comment:


  • fkrueger
    replied
    Alright, thanks for keeping me posted.

    Leave a comment:


  • oria34
    replied
    Thanks for your quick reply!

    I am not really worried for the errors, I remember reading in this thread that they are due to the Cs located at the very end and, as you said, they are not relevant...Is the outcome what is killing me!

    I have created two different folders for the two different alignments (unmasked and masked), each with copies of bismark´s scripts. Each folder has two subfolders, one per individual (two .fq files each)...I have even duplicated the bowtie2 folder just in case....

    It is true that the two genomes are in different folders but each genome is used at the same time for the alignment of the different individuals (2x)...but this doesn´t seem to affect to the masked version of the genome.

    I´ll post tomorrow how it is going.

    Many thanks!

    Leave a comment:


  • fkrueger
    replied
    Hi oria,

    Interestingly I encountered the same error messages yesterday myself; they are caused by trying to print out the starting position of the read which aligns so close to the edge of a chromosome or contig/scaffold that additional 2bp sequence can't be extracted. These warnings are harmless though and can be ignored safely; I have still fixed this bug yesterday (if you want the new version just send me an email).

    From the output below it seems that you are getting roughly 1 of these warning per 100000 analysed sequences, so this can hardly be the reason for poor overall mapping results. Are you by any chance running the other alignment to the repeat masked genome from the same files in the same folder? This would certainly have the potential to interfere with the results... In any case, if you continue to have problems can you email me further details of your problems? If necessary I could also create an FTP server for you to upload your files in question so I could take a look myself.

    Leave a comment:


  • oria34
    replied
    Hi all

    I am trying to re-align my reads again using a new assembly of the reference genome and I am finding many problems in the alignment step.

    After running genome_preparation I am trying to run bismark for paired end reads trimmed with trim_galore (--phred64 -t --paired filename_2 filename_2). Right after the script creates the two version os the reads (c-t, a-g) it gives back a un ending number of errors. Many of them are related to the impossibility to determine the genomic context (due to Cs located at the end of the contigs) but others are quite new for me.

    The resulting SAM file is fairly small and the number of Cs analyzed (from the report file) are ridiculously low; around 50 000.

    I am using the last version of Bismark with the --bowtie2 option (also --botwtie2 option for the genome preparation step)

    I am running in parallel an alignment with the same .fq files against a masked version of the genome and no errors so far (4 millions reads...)

    Any thoughts?

    Cheers


    PS. some of the lines with multiple errors

    Reading in the sequence files BPM24_FIS17_trim_galore_1.fq and BPM24_FIS17_trim_galore_2.fq
    Processed 100000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 203898.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 203898.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:1101:2724:10559#ACCAGACT/1 scaffold_1495
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 223364.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 223364.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:1101:7055:11458#ACCAGACT/1 scaffold_402
    Processed 200000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 592170.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 592170.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:1101:8410:27041#ACCAGACT/1 scaffold_1917
    Processed 300000 sequences so far
    Processed 400000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 957754.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 957754.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:1102:14266:27423#ACCAGACT/1 scaffold_1862
    Processed 500000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 1095048.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 1095048.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:1102:14067:33341#ACCAGACT/1 scaffold_297
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 1171846.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 1171846.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:1102:20587:36545#ACCAGACT/1 scaffold_391
    Processed 600000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 1323878.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 1323878.
    Chromosomal sequence could not be extracted for FCD1LHLACXX:8:1101:10776:58374#ACCAGACT/1 scaffold_1028
    Processed 700000 sequences so far
    Processed 800000 sequences so far
    Processed 900000 sequences so far
    Processed 1000000 sequences so far
    Processed 1100000 sequences so far
    Use of uninitialized value in length at bismark line 2780, <$__ANONIO__> line 2259346.
    Use of uninitialized value in concatenation (.) or string at bismark line 2781, <$__ANONIO__> line 2259346.
    Last edited by oria34; 07-05-2013, 10:53 AM.

    Leave a comment:


  • frozenlyse
    replied
    Hi Felix - thanks for the prompt reply, and sorry I forgot to include my library type/bowtie version. I was just making sure this was the intended behaviour before I updated my pipeline to ignore the /1 /2 tags in remove PCR duplicates - thanks

    Leave a comment:


  • fkrueger
    replied
    Hi forzenlyse,

    It seems that you were running paired-end alignments with Bowtie2, correct? Bismark does indeed append /1/1 to read 1, one of which is removed during the alignment step. The second one is then removed at a later step, but to keep the output in a way that allows immediate identification of what was R1 and R2, /1 and /2 are appended just before the SAM/BAM output is created. If you don't want these trailing fragment numbers you can just locate the following two lines in the subroutine "paired_end_SAM_output":

    Code:
      my $id_1 = $id.'/1';
      my $id_2 = $id.'/2';
    and comment them out like so:
    Code:
      # my $id_1 = $id.'/1';
      # my $id_2 = $id.'/2';
    Let me know if this fixes your problem of if I can anything else for you.

    Leave a comment:


  • frozenlyse
    replied
    Originally posted by fkrueger View Post
    Hi pengchy,

    To 1) It is true that Bismark appends segment numbers to the end of read. This is because Bowtie or Bowtie2 tend to delete these tags internally while aligning, and to make it more difficult they don't do it in the same way. To properly keep track of which read is doing what I had to do this change (btw also white spaces or tab characters are being replaced by _ in the read ID.
    Hi fkrueger - I recently upgraded to bismark v0.7.12 and have run into an issue with the read segment numbers

    My input fastq does not have these tags

    Code:
    @HWI-D00119:25:D1WYEACXX:3:1101:1169:2017 1:N:0:
    NGGTGTATTGGGTTAGTGTATTTTTTTGTTGATATTTTTTTTTTGTATTGTGGTTGATTTTTGATTAAGGACGTTTGTTTTTAAGGGTTTTTGGTGGG
    +
    #0<<BFFFFFFFFFIIFFIFFFIIIIIIIIIFIFFIIIIIIIIFFB<BFBBBFBBB<BBFFFF<BFBBBB<BBBFBFFBFFFBBBFF<BFFFBF<BBF
    However *two* tags get added in the read prep step of bismark (eg C_to_T fastq)

    Code:
    @HWI-D00119:25:D1WYEACXX:3:1101:1169:2017_1:N:0:/1/1
    NGGTGTATTGGGTTAGTGTATTTTTTTGTTGATATTTTTTTTTTGTATTGTGGTTGATTTTTGATTAAGGATGTTTGTTTTTAAGGGTTTTTGGTGGG
    +
    #0<<BFFFFFFFFFIIFFIFFFIIIIIIIIIFIFFIIIIIIIIFFB<BFBBBFBBB<BBFFFF<BFBBBB<BBBFBFFBFFFBBBFF<BFFFBF<BBF
    then in the output bam, only one is removed

    Code:
    HWI-D00119:25:D1WYEACXX:3:1101:1169:2017_1:N:0:/1	67	chr6	151954983	255	98M	=	151955045	162	NGGTGTATTGGGTTAGTGTATTTTTTTGTTGATATTTTTTTTTTGTATTGTGGTTGATTTTTGATTAAGGACGTTTGTTTTTAAGGGTTTTTGGTGGG	#0<<BFFFFFFFFFIIFFIFFFIIIIIIIIIFIFFIIIIIIIIFFB<BFBBBFBBB<BBFFFF<BFBBBB<BBBFBFFBFFFBBBFF<BFFFBF<BBF	NM:i:21	XX:Z:A4C1C4C8C3C2C7C3C1C4C5C3CC5CC7C3C1C9CC7	XM:Z:.....h.x....h........h...x..x.......h...h.x....x.....x...hh.....hh.....Z.h...h.h.........hx.......	XR:Z:CT	XG:Z:CT
    This is screwing up picard markduplicates as the read names do not match anymore - I just want to clarify if this is a bug in bismark or should manually be removing tags after alignment or use read_names_regex in markduplicates?

    Thanks

    Leave a comment:


  • fkrueger
    replied
    Originally posted by Ahra View Post
    Bismark methylation extractor version v0.7.11
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
    These files are the primary output of the methylation extractor (described in the manual or if you type --help). The bedGraph will be generated from these files, and ends in *.bedGraph (just be aware that this step can take a long while if you are using --CX). The bedGraph can then be further transformed into a genome-wide cytosine report.

    To see if there is something going wrong on the way could you please send the onscreen dialog via email?

    Leave a comment:


  • Ahra
    replied
    bedGraph usage

    hello. I am a newbie using bismark. I'm not a bioinformation and also i don't know much about NGS alignment, especially how to running alignment programs. But, i could use bismark thanks to kindly well made bismark user guide book.

    i ran bismark and got the report file and sam file. so to get the position of methylC, i ran bismark_methylation_extractor especially used --bedGraph option to see methlylation %.

    According to user guide book,

    A typical command including the optional bedGraph --counts output could look like this:
    bismark_methylation_extractor -s --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


    my data is paired-end so i edited little that one. like this:

    bismark_methylation_extractor -p --no_overlap --comprehensive --CX
    --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


    but, i couldn't get additional information about <chromosome> <start position> <end position> <methylation percentage>

    my result was like thins

    Bismark methylation extractor version v0.7.11
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 -

    how can i get the methylation percentage information and run further bedGraph2Cytosines process!

    i am looking forward anyone's reply

    Leave a comment:


  • Julien Roux
    replied
    Solved: alignment on one strand only

    I am now facing another problem: when I run bismark of some of my samples, reads end up mapped to only the top or the bottom strand...
    Just a quick message to let users know that my problem is solved and was due to an error when generating the Bowtie index files for one of the strands. All index files were present, but one of them was incomplete, making the alignment impossible.
    Thanks Felix for your help
    Julien

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 05-10-2024, 06:35 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Working...
X