Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PeteH
    replied
    Hi Felix,

    genome_methylation_bismark2bedGraph_v4.pl breaks when the --split_by_chromosome argument is used and the chromosome name contains special characters. For example, I have aligned the test_data.fastq reads against a genome that contains the hg19 human reference genome as well as a contig for the unmethylated cl857 Sam7 Lambda genome that is often used as spike in controls in BS-seq experiments.

    The contig name of the lambda phage in the fasta file is gi|215104|gb|J02459.1|LAMCG and genome_methylation_bismark2bedGraph_v4.pl doesn't seem to like this. Specifically, I think it's the '|' characters in the contig name that isn't properly being escaped; I've attached the output below.

    As the '|' character is not uncommon in the naming of FASTA contigs is it possible to fix this in the genome_methylation_bismark2bedGraph_v4.pl script?

    Thanks,
    Pete

    binfbig1 514 % genome_methylation_bismark2bedGraph_v4.pl --counts --s CpG_context_test_data.fastq_bismark.txt > bloop
    Now generating individual files for each chromosome (sorting very large files might fail otherwise...)
    Finished writing out individual chromosome files
    Collecting temporary chromosome file information...
    processing the following input file(s):
    chrchr1.meth_extractor.temp
    chrchr10.meth_extractor.temp
    chrchr11.meth_extractor.temp
    chrchr12.meth_extractor.temp
    chrchr13.meth_extractor.temp
    chrchr14.meth_extractor.temp
    chrchr15.meth_extractor.temp
    chrchr16.meth_extractor.temp
    chrchr17.meth_extractor.temp
    chrchr18.meth_extractor.temp
    chrchr19.meth_extractor.temp
    chrchr2.meth_extractor.temp
    chrchr20.meth_extractor.temp
    chrchr21.meth_extractor.temp
    chrchr22.meth_extractor.temp
    chrchr3.meth_extractor.temp
    chrchr4.meth_extractor.temp
    chrchr5.meth_extractor.temp
    chrchr6.meth_extractor.temp
    chrchr7.meth_extractor.temp
    chrchr8.meth_extractor.temp
    chrchr9.meth_extractor.temp
    chrchrM.meth_extractor.temp
    chrchrUn_gl000220.meth_extractor.temp
    chrchrX.meth_extractor.temp
    chrchrY.meth_extractor.temp
    chrgi|215104|gb|J02459.1|LAMCG.meth_extractor.temp

    Sorting input file chrchr1.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr1.meth_extractor.temp

    Sorting input file chrchr10.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr10.meth_extractor.temp

    Sorting input file chrchr11.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr11.meth_extractor.temp

    Sorting input file chrchr12.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr12.meth_extractor.temp

    Sorting input file chrchr13.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr13.meth_extractor.temp

    Sorting input file chrchr14.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr14.meth_extractor.temp

    Sorting input file chrchr15.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr15.meth_extractor.temp

    Sorting input file chrchr16.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr16.meth_extractor.temp

    Sorting input file chrchr17.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr17.meth_extractor.temp

    Sorting input file chrchr18.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr18.meth_extractor.temp

    Sorting input file chrchr19.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr19.meth_extractor.temp

    Sorting input file chrchr2.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr2.meth_extractor.temp

    Sorting input file chrchr20.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr20.meth_extractor.temp

    Sorting input file chrchr21.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr21.meth_extractor.temp

    Sorting input file chrchr22.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr22.meth_extractor.temp

    Sorting input file chrchr3.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr3.meth_extractor.temp

    Sorting input file chrchr4.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr4.meth_extractor.temp

    Sorting input file chrchr5.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr5.meth_extractor.temp

    Sorting input file chrchr6.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr6.meth_extractor.temp

    Sorting input file chrchr7.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr7.meth_extractor.temp

    Sorting input file chrchr8.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr8.meth_extractor.temp

    Sorting input file chrchr9.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchr9.meth_extractor.temp

    Sorting input file chrchrM.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchrM.meth_extractor.temp

    Sorting input file chrchrUn_gl000220.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchrUn_gl000220.meth_extractor.temp

    Sorting input file chrchrX.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchrX.meth_extractor.temp

    Sorting input file chrchrY.meth_extractor.temp by positions
    Successfully deleted the temporary input file chrchrY.meth_extractor.temp

    Sorting input file chrgi|215104|gb|J02459.1|LAMCG.meth_extractor.temp by positions
    sort: open failed: chrgi: No such file or directory
    sh: LAMCG.meth_extractor.temp: command not found
    sh: gb: command not found
    sh: J02459.1: command not found
    sh: 215104: command not found
    Died at /usr/local/bioinf/bin/genome_methylation_bismark2bedGraph_v4.pl line 162.

    Leave a comment:


  • shadow19c
    replied
    Ok I sent you the email.
    Thanks

    Leave a comment:


  • fkrueger
    replied
    Can you maybe email me some more details about your experiment to [email protected].

    It would be useful to know how many reads you have in total, whether it is single-end or paired end etc, the Bismark output format, the parameters you used and maybe the exact error message.

    Felix

    Leave a comment:


  • shadow19c
    replied
    I will tried to take a simple data to try the deduplicate script.

    Leave a comment:


  • shadow19c
    replied
    Thank you for your prompt answer.
    So I tried the deduplicate_bismark_alignment_output.pl but it takes a lot of memory and stop the server.

    ANd it writed a lot of 0 in the file, so I want to know if there is an error in the script?

    Thanks
    Last edited by shadow19c; 10-19-2012, 04:40 AM.

    Leave a comment:


  • fkrueger
    replied
    Originally posted by shadow19c View Post
    Hello,
    Thank for your answer so I discover Bismark methylation extractor,
    There is a difference if I do the deduplication before to do the methylation extractor?
    I want to understand more HOW I can analyze the file after?
    The deduplication only works on the mapping output; thus, you can run the methylation extractor either on the raw mapping output (containing duplicates) or on the deduplicated output (obviously not containing duplicates).

    There are lots of ways of looking at and interpreting methylation data afterwards, and it very much depends on what you are confident/familiar with and what the biological questions are you would like to answer. I already mentioned that we mainly use SeqMonk for our data analysis, but there are numerous tools out there that are specifically designed to perform analyses of methylation data such as methylKit.

    Christoph Bock has very recently published a nice review in Nature Genetics on this topic which is probably a good starting point (Analysing and interpreting DNA methylation data).

    Leave a comment:


  • shadow19c
    replied
    Hello,
    Thank for your answer so I discover Bismark methylation extractor,
    There is a difference if I do the deduplication before to do the methylation extractor?
    I want to understand more HOW I can analyze the file after?
    Last edited by shadow19c; 10-18-2012, 07:10 AM.

    Leave a comment:


  • fkrueger
    replied
    Originally posted by shadow19c View Post
    Hello, thank you fkrueger for your answer.

    So to resume, so to analyse data from BS-seq , so I started with a fastqc analysis, anf after I did the mapping with bismark with default parameters for paire-end.
    The problem is the next step the deduplication (I did not see the command line for it) and the downstepanalysis (how to do the coverage : is it good to do the horizontal coverage or vertical? and If is it the vertical how you do that? Any method or script? )

    Thanks
    If you want to deduplicate the alignment output you can download a deduplication script here, just type --help to see all options.

    As I said we personally use SeqMonk for downstream analysis. SeqMonk is a mapped read genome browser which has extensive capabilities to visualize, quantitate and export data; what we do for BS-Seq is mainly to first run a sliding window read coverage analysis to exlude regions which a too high read coverage (mainly caused by repetitive reads that are not part of the genome assembly) and then use the "Bisulfite methylation over Feature pipeline" to calculate percentage methylation values for different genomic features of interest (this pipeline allows you to filter on read coverage per position (vertical coverage) as well as events per feature (horizontal coverage)). If you are interested in using SeqMonk may I refer you to the Standard and Advanced course manuals which explain a great deal of its functionality.

    Leave a comment:


  • shadow19c
    replied
    Hello, thank you fkrueger for your answer.

    So to resume, so to analyse data from BS-seq , so I started with a fastqc analysis, anf after I did the mapping with bismark with default parameters for paire-end.
    The problem is the next step the deduplication (I did not see the command line for it) and the downstepanalysis (how to do the coverage : is it good to do the horizontal coverage or vertical? and If is it the vertical how you do that? Any method or script? )

    Thanks

    Leave a comment:


  • fkrueger
    replied
    Originally posted by shadow19c View Post
    There is difference with the option --directional for mthylation extractor?
    The methylation extractor does not care how the files were analysed, and will therefore make files for all possible strands. An ' rm *CTO[BT]* ' will get rid of all empty complementary strands if you don't need them.

    Leave a comment:


  • shadow19c
    replied
    There is difference with the option --directional for mthylation extractor?

    Leave a comment:


  • fkrueger
    replied
    Originally posted by shadow19c View Post
    Hello,

    2)I have a question concerning the description of the vertical coverage, how to do that after the mapping and the filtering ?

    Thanks
    We primarily use SeqMonk for downstream analysis which lets you identify and exclude regions with too high read coverage from subseqent quantitations.

    Leave a comment:


  • shadow19c
    replied
    Hello,
    thank you for your answer so I made the mapping with default parameters :
    Bismark report for: /data/a2e/kassam/BS-seq-WT/1.fq and /data/a2e/kassam/BS-seq-WT/2.fq (version: v0.7.7)
    Bowtie was run against the bisulfite genome of /import/gr_a2e/TAIR9/ with the specified options: -q -n 1 -k 2 --best --maxins 500 --chunkmbs 512

    1) Is it normal to have just the 1 sam file, because I have only 1.fq_bismark_pe.sam?

    -------------

    Sorry I have the answer so It is yes.

    ------------------------------------------------------


    2)I have a question concerning the description of the vertical coverage, how to do that after the mapping and the filtering ?

    Thanks
    Last edited by shadow19c; 10-14-2012, 11:52 PM.

    Leave a comment:


  • fkrueger
    replied
    I would personally use the defaults to start with (0-500 bp) since often the size selection step does not quite what you would expect it to do. Only come back and change them if you are trying to track down errors such as low mapping efficiency.

    Leave a comment:


  • shadow19c
    replied
    Hello,
    I have a question concerning the parameters when you are doing the mapping what is the best if you have 90 bp for each paire_end reads?
    Because I see the -I
    150 and -X 300 !!

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM
  • seqadmin
    Addressing Off-Target Effects in CRISPR Technologies
    by seqadmin






    The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
    08-27-2024, 04:44 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 06:25 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 01:02 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-18-2024, 06:39 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-11-2024, 02:44 PM
0 responses
14 views
0 likes
Last Post seqadmin  
Working...
X