Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • twotwo
    replied
    Originally posted by fkrueger View Post
    A typical command to extract methylation calls and get a coverage file is:

    Code:
    bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
    Do you have any problems running this command, and if so can you post details?
    Hi, Thanks fkrueger! I ran your code. The following is the head what I have from the bedgraph file:

    track type=bedGraph
    chr11 110189 110190 100
    chr11 110211 110212 100
    chr11 113464 113465 100
    chr11 113508 113509 100
    chr11 113509 113510 100
    chr11 113524 113525 100
    chr11 113525 113526 100
    chr11 123420 123421 100
    chr11 123449 123450 100
    So my question is: Is there all of the probe having 100% methylation percentage? That is a little weird....
    By the way, I have another question: If I have paired samples, can I compare the methylation percentage with bismark? Thanks!

    Leave a comment:


  • fkrueger
    replied
    A typical command to extract methylation calls and get a coverage file is:

    Code:
    bismark_methylation_extractor --bedGraph --buffer_size 10G file_bismark.bam
    Do you have any problems running this command, and if so can you post details?

    Leave a comment:


  • twotwo
    replied
    Originally posted by Ahra View Post
    hello. I am a newbie using bismark. I'm not a bioinformation and also i don't know much about NGS alignment, especially how to running alignment programs. But, i could use bismark thanks to kindly well made bismark user guide book.

    i ran bismark and got the report file and sam file. so to get the position of methylC, i ran bismark_methylation_extractor especially used --bedGraph option to see methlylation %.

    According to user guide book,

    A typical command including the optional bedGraph --counts output could look like this:
    bismark_methylation_extractor -s --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


    my data is paired-end so i edited little that one. like this:

    bismark_methylation_extractor -p --no_overlap --comprehensive --CX
    --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


    but, i couldn't get additional information about <chromosome> <start position> <end position> <methylation percentage>

    my result was like thins

    Bismark methylation extractor version v0.7.11
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
    HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 -

    how can i get the methylation percentage information and run further bedGraph2Cytosines process!

    i am looking forward anyone's reply
    I have the same question as yours. Did you solve your problem? Then which command should I use the generate the methylation percentage?

    Leave a comment:


  • pig_raffles
    replied
    Excellent, thank you for the very clear explanation

    Leave a comment:


  • fkrueger
    replied
    The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

    L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

    A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

    An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

    If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

    This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

    If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

    Best wishes, Felix

    Leave a comment:


  • fkrueger
    replied
    Originally posted by pig_raffles View Post
    Hi,

    I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

    One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

    Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

    Many thanks
    The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

    L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

    A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

    An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

    If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

    This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

    If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

    Leave a comment:


  • fkrueger
    replied
    The score min function is linear (L), the first number is offset (which will be added to the final score which is determined by the number of mismatches, Ns, or insertions and deletions in the read), and the last number is the penalty that is allowed for each base pair in the read. As an example for a 100bp long read:

    L,0,-0.2 would allow an alignment score of 0 + (100 x -0.2) = -20

    A mismatch counts as -6, so this setting would allow up to 3 mismatches (AS = -18). A read with 4 mismatches (-24) or more would be rejected.

    An insertion of 3 bp somewhere in the read would cost -5 for opening a gap, and -2 for each bp of insertion, so 3 * -2 = -6 here. Thus, a 3bp insertion would count as -11, so you could add another mismatch somewhere in the read, or another smallish InDel if you want to stay under the allowed limit of -20.

    If you were to increase score min to L,0,-0.6 would allow an alignment score of 0 + (100 x -0.6) = -60

    This would now allow up to 10 (non-bisulfite) mismatches in the 100bp read, or a several mismatches and indels.

    If your RRBS reads were trimmed well (e.g. with Trim Galore in --rrbs mode) then you should normally see that the mapping efficiency doesn't change a great deal for score min scores of -0.2 to -0.6, but then it might start to increase quite a bit because you allow reads to map to locations where the read most likely doesn't belong. If I had to choose between pretty much the same mapping efficiency for different mapping scenarios I would go for the more stringent one because this reduces the chances of introducing mis-alignments to the results.

    Leave a comment:


  • pig_raffles
    replied
    Hi,

    I have been using Bismark to align RRBS read data to a reference genome. Up until now I have been using the default alignment settings with Bowtie2. However I want to find out if these are optimal

    One way of doing this would be to vary the -L and -N parameters, however I find it quite confusing understanding what these parameters actually do. The default for -L is L,0,-0.2 but I am not sure what varying the two different numeric values would actually achieve (despite having read the Bowtie2 page), for instance what would a more relaxed alignment parameter allowing more mismatches look like?

    Finally, does anyone have any tips on choosing the best alignment results from repeated alignment runs using different parameters?

    Many thanks

    Leave a comment:


  • Kuckunniwi
    replied
    Originally posted by fkrueger View Post
    The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.
    Thanks for the answer! Actually the file is not truncated (it has a lot more lines while it will end exactly at this line in case of truncated file), most probably I was not able to copy the full line from the
    Code:
    less -S file.sam
    . Also, these files were processed once before. I will check if I accidentally truncated line after copying it and will edit this answer...

    UPD: yes, this .bam file does not contain any additional columns. I really wonder how they were processed before (I have bismark report for these files!) and how columns, not rows, may be truncated... ='(
    Last edited by Kuckunniwi; 04-11-2017, 08:24 AM.

    Leave a comment:


  • fkrueger
    replied
    Originally posted by Kuckunniwi View Post
    Hi,
    27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG 16 chr10 71747 255 51M * 0 0 CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC CG 3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG NM:i:9 MD:Z:2AAA6CA3A2A2A24A3 XM:Z:Z..x........................x..z..h...h.......hhx..
    The alignment line is truncated, the methylation call is not quite long enough and the read does not contain information about the read or genome conversion which is required by the methylation extractor.

    Files can be truncated if the file was copied and the transfer corrupted the files, or potentially if the alignment process was terminated abruptly. This may happen either by pressing ctrl + C or something similar, or if the Bismark align process runs out of memory or gets killed by your OS.
    Normally it is sufficient to run the alignments again (making sure that you have enough system resources available). If you can see the final Bismark alignment report the file should be ready for the methylation extraction. Best, Felix

    Leave a comment:


  • Kuckunniwi
    replied
    Hi,

    I am trying to launch methylation_extractor and it does not like my files =( It says:

    Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1654, <IN> line 27.
    Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1657, <IN> line 27.
    Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1660, <IN> line 27.
    Use of uninitialized value $read_conversion in string eq at ./bismark_methylation_extractor line 1663, <IN> line 27.
    Use of uninitialized value $read_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
    Use of uninitialized value $genome_conversion in concatenation (.) or string at ./bismark_methylation_extractor line 1667, <IN> line 27.
    Unexpected combination of read and genome conversion: '' / ''
    The line 27 is the first read of sam file:

    Code:
    27 D00796:121:C9MR4ANXX:1:1103:15888:85556 1:N:0:GATCAG    16      chr10   71747   255     51M     *       0       0       CTAAACAACATCACAAAACACTATCTCTATAATTTCTTTTTAAACCAAC     CG     3<BBBGCGGG1FG1@F0EFGGGGEG1>1?1:@FGGGG1FC<=FBFFGGGFG     NM:i:9  MD:Z:2AAA6CA3A2A2A24A3  XM:Z:Z..x........................x..z..h...h.......hhx..
    The methylation extractor worked well for me before (even for the same files), what can cause the problem? (I've tried to play with cmd modes, did not help...)

    The command line: ./bismark_methylation_extractor -s test.sam

    Thanks!
    Last edited by Kuckunniwi; 04-10-2017, 12:59 PM. Reason: added the command line

    Leave a comment:


  • seqfast
    replied
    Thank you Felix,

    I've been told to PCR primers are 'agnostic' to bisulfite conversion, but I believe there will still be some effects and a likelihood of some strand-favoring in the amplification.

    Appreciate your quick reply, and thanks for a very valuable tool!

    -sf

    Leave a comment:


  • fkrueger
    replied
    Originally posted by seqfast View Post
    Hi (hopefully) Felix,

    I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

    This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

    Final Alignment report
    ======================
    Sequences analysed in total: 57545
    Number of alignments with a unique best hit from the different alignments: 36008
    Mapping efficiency: 62.6%
    Sequences with no alignments under any condition: 21427
    Sequences did not map uniquely: 110
    Sequences which were discarded because genomic sequence could not be extracted: 0

    Number of sequences with unique best (first) alignment came from the bowtie output:
    CT/CT: 1667 ((converted) top strand)
    CT/GA: 13421 ((converted) bottom strand)
    GA/CT: 1088 (complementary to (converted) top strand)
    GA/GA: 19832 (complementary to (converted) bottom strand)

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 448283

    Total methylated C's in CpG context: 22378
    Total methylated C's in CHG context: 6624
    Total methylated C's in CHH context: 5983

    Total unmethylated C's in CpG context: 9707
    Total unmethylated C's in CHG context: 221431
    Total unmethylated C's in CHH context: 182160

    C methylated in CpG context: 69.7%
    C methylated in CHG context: 2.9%
    C methylated in CHH context: 3.2%
    Hi seqfast,

    When you are generating and sequencing PCR amplified regions it is quite common that you see only the top strand (with both the OT and CTOT strands) or the bottom strand (with both OB and CTOB as in your case), depending on which strand you targeted when designing the primers.

    To help getting your head around this it really helps to draw the sequence of an amplicon of interest out on a sheet of paper. You should see that the bisulfite treatment will change one of the two strands so much that the oligos will only amplify one of the two strands, and this PCR product is then usually sequenced from both sides, e.g. the CTOB and OB strands, but this may be different depending on how the library preparation was done.

    So in short: PCR amplicons are not normally directional libraries but you only sequence both versions of either the top or the bottom strand, depending on how the primers were designed. I hope this helps, let me know if I can be of any further assistance. Cheers, Felix

    Leave a comment:


  • seqfast
    replied
    Directional/Non-directional amplicons

    Hi (hopefully) Felix,

    I'm working with amplicons generated from bisulfite converted DNA. The PCR primers are not methylation specific, so they will amplify both converted and non-converted strands. Standard TruSeq adapters were ligated and sequencing as single-end. As I understand it, this might still be 'directional' due to the TruSeq adapters.

    This is the SE report, which confuses me a little bit, since the bottom strand is seeing more coverage. As well, in amplicon regions many of the converted CpG bases are present in only one strand (either forward or reverse). In some samples there are no reverse reads, in others there are both forward and reverse reads. Thanks for any insights!

    Final Alignment report
    ======================
    Sequences analysed in total: 57545
    Number of alignments with a unique best hit from the different alignments: 36008
    Mapping efficiency: 62.6%
    Sequences with no alignments under any condition: 21427
    Sequences did not map uniquely: 110
    Sequences which were discarded because genomic sequence could not be extracted: 0

    Number of sequences with unique best (first) alignment came from the bowtie output:
    CT/CT: 1667 ((converted) top strand)
    CT/GA: 13421 ((converted) bottom strand)
    GA/CT: 1088 (complementary to (converted) top strand)
    GA/GA: 19832 (complementary to (converted) bottom strand)

    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 448283

    Total methylated C's in CpG context: 22378
    Total methylated C's in CHG context: 6624
    Total methylated C's in CHH context: 5983

    Total unmethylated C's in CpG context: 9707
    Total unmethylated C's in CHG context: 221431
    Total unmethylated C's in CHH context: 182160

    C methylated in CpG context: 69.7%
    C methylated in CHG context: 2.9%
    C methylated in CHH context: 3.2%

    Leave a comment:


  • fkrueger
    replied
    Sorry for this, I have just fixed this typo. Please clone the latest development version here:

    A tool to map bisulfite converted sequence reads and determine cytosine methylation states - FelixKrueger/Bismark


    Cheers, Felix

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
50 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
67 views
0 likes
Last Post seqadmin  
Working...
X