Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • my_bio
    replied
    Originally posted by fkrueger View Post
    I have now written something that generates a cytosine methylation report for a genome of interest and a sorted methylation input file produced by the script "genome_methylation2bedGraph". By default, the output uses 1-based chromosome coordinates (zero-based cords are optional) and reports CpG context only (all cytosines optional). The output considers all Cs on both forward and reverse strands.
    The input file needs to have been generated with the genome_methylation2bedGraph script while using the '--counts' option, or otherwise be exactly in the following format:
    <chromosome> <start position> <end position> <methylation percentage> <count methylated> <count unmethylated>
    I didn’t have time for extensive testing but it seems to work alright.

    The USAGE is: genome_wide_cytosine_report.pl [options] <input> > <output>, --help will show all options.

    The default output is:
    <Chromosome> <Position> <Strand> <Count methylated> <Count non-methylated> <Trinucleotide context>
    Also, there is an option ‘--dinucleotide_context’ to produce an additional column at the end with the di-nucleotide context. As the dinucleotide context is already contained within the tri-nucleotide context field and because it makes the output file considerably larger, this extra field is only optional.

    Some additional comments:
    - If this script turns out to be working nicely and this is useful for users, I could add to the methylation extractor an option to generate a bedGraph files or this output format straight away. I appreciate that it is somewhat tedious to run methylation_extractor > extractor2bedGraph > genome_wide_cytosine_report in succession, but I am sure this could be streamlined

    - You need to be aware that if you are dealing with extremely large datasets that cover almost the entire genome, e.g. 2 billion WGSBS-reads, the whole task becomes very memory intensive (probably up to 20GB or so) even though it is working on a chromosome-by-chromosome basis (large chromosomes may have well over 100 million Cs). This is a limitation of using standard Perl and the overheads used for its data structures, and one would have to look at using different approaches or a different programming language to do the task if this solution fails.

    - Running a binomial test on the methylated positions does not necessarily have anything to do with Bismark or any aligner per se (however I am not sure you want to call differential methylation on a per-cytosine basis instead of per feature or region of interest…). You could possibly even run the binomial test with the R script provided by the BiSS authors.

    Please let me know what your experiences are.
    Best,
    Felix


    Edit: I just realised that you were not interested in a separate dinucleotide context but in a CG/CHG/CHH classification which makes a lot more sense. I shall amend this tomorrow.

    It is very kind of you to provide this script to generates a cytosine methylation report. But it does not seems to work, something error like this:
    perl genome_wide_cytosine_report.pl --help
    Type of arg 1 to keys must be hash (not hash element) at genome_wide_cytosine_report.pl line 119, near "},"
    Type of arg 1 to keys must be hash (not hash element) at genome_wide_cytosine_report.pl line 206, near "},"
    BEGIN not safe after errors--compilation aborted at genome_wide_cytosine_report.pl line 289.

    In my opinion, It is completely necessary to add an option to generate a bedGraph files and cytosine methylation report straight away in methylation extractor .
    traditional dinucleotide context may not suitable for our samples, I need trinucleotide classification. As you said, I will call differential methylation on a per-cytosine basis instead of a region. It is will be more meaningful.

    Leave a comment:


  • fkrueger
    replied
    I have now written something that generates a cytosine methylation report for a genome of interest and a sorted methylation input file produced by the script "genome_methylation2bedGraph". By default, the output uses 1-based chromosome coordinates (zero-based cords are optional) and reports CpG context only (all cytosines optional). The output considers all Cs on both forward and reverse strands.
    The input file needs to have been generated with the genome_methylation2bedGraph script while using the '--counts' option, or otherwise be exactly in the following format:
    <chromosome> <start position> <end position> <methylation percentage> <count methylated> <count unmethylated>
    I didn’t have time for extensive testing but it seems to work alright.

    The USAGE is: genome_wide_cytosine_report.pl [options] <input> > <output>, --help will show all options.

    The output looks like this:
    <chromosome> <position> <strand> <count methylated> <count non-methylated> <C context> <trinucleotide context>

    Some additional comments:
    - If this script turns out to be working nicely and this is useful for users, I could add to the methylation extractor an option to generate a bedGraph files or this output format straight away. I appreciate that it is somewhat tedious to run methylation_extractor > extractor2bedGraph > genome_wide_cytosine_report in succession, but I am sure this could be streamlined

    - You need to be aware that if you are dealing with extremely large datasets that cover almost the entire genome, e.g. 2 billion WGSBS-reads, the whole task becomes very memory intensive (probably up to 20GB or so) even though it is working on a chromosome-by-chromosome basis (large chromosomes may have well over 100 million Cs). This is a limitation of using standard Perl and the overheads used for its data structures, and one would have to look at using different approaches or a different programming language to do the task if this solution fails.

    - Running a binomial test on the methylated positions does not necessarily have anything to do with Bismark or any aligner per se (however I am not sure you want to call differential methylation on a per-cytosine basis instead of per feature or region of interest…). You could possibly even run the binomial test with the R script provided by the BiSS authors.

    Please let me know what your experiences are.
    Best,
    Felix


    Edit: Script amended to also output sequence context.
    Attached Files
    Last edited by fkrueger; 09-21-2012, 01:38 AM. Reason: changed the default output

    Leave a comment:


  • my_bio
    replied
    Originally posted by fkrueger View Post
    This should be fairly straight forward to do. For a not covered CpG, would you envisage to have a record such as this:

    chr1 10 + 0 0 CG CGA
    chr1 11 - 0 0 CG CGT

    ?
    For a not covered CpG, I would like to have a record. That is to say, for different samples, as long as the reference genome is same, their output files will have the same number of lines. This will be more convenient in subsequent analysis, especially in the comparison of different samples. So I think Bismark need to add these functions. By the way, Bismark will be more popular if it can do methylcytosines calling by binomial test. BiSS(http://www.cibiv.at/software/ngm/BiSS/) can do all these works, but run this software need NVIDIA Accelerated Linux Graphics Driver. I look forward to update of Bismark to do more analysis.

    Leave a comment:


  • fkrueger
    replied
    Originally posted by my_bio View Post
    Dear fkrueger,
    genome_methylation_bismark2bedGraph_v4.pl is a useful program to count methylation of cytosine genome-wide. if a cytosine site did not covered one read, information of this cytosine will not appear in output file. So I need a program which can count methylation level of every cytosine site in reference genome, this will be more useful in subsequent analysis. the output file may like this:
    chromosome strand coordinate methylated_reads unmethylated_reads context context detaile_context
    chr1 10 + 4 6 CG CGA
    chr1 11 - 5 6 CG CGT
    chr1 15 + 1 11 CHG CAG
    chr1 17 - 2 10 CHG CTG
    chr1 20 + 0 12 CHH CAA
    Could you provide a program like this ?
    This should be fairly straight forward to do. For a not covered CpG, would you envisage to have a record such as this:

    chr1 10 + 0 0 CG CGA
    chr1 11 - 0 0 CG CGT

    ?

    Leave a comment:


  • drdna
    replied
    Good point my_bio.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
66 views
0 likes
Last Post seqadmin  
Working...
X