Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fkrueger
    replied
    Originally posted by Dipro View Post
    Hi Felix,

    I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.

    Thanks.
    Hi Dipro,

    This was indeed a typo which will be fixed in the next release which is actually due out today or tomorrow (and will finally support parallel alignments – so stay tuned!).

    A couple of things about the command you used:

    bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder

    'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'
    Sorry if it is a stupid question, but did you change the ‘/path/to/file’ by a valid path of the file on your system?

    -s: not necessary (will be determined automatically)
    -o /requires/path/to/output/folder
    --samtools_path /requires/path/to/samtools/executable
    --counts: not necessary (used by default)
    --remove_spaces: only use this if really necessary, will otherwise cost time and temporary space
    --buffer_size: requires input, e.g. 10G
    --genome_folder /requires/path/to/genome/folder

    input file is required

    If you still struggle can you just send me the onscreen-text via email? This would make spotting mistakes in the command much easier. Cheers, Felix

    Leave a comment:


  • Dipro
    replied
    Hi Felix,

    I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.

    Thanks.

    Leave a comment:


  • fkrueger
    replied
    The default mode would pick the first alignment per position, while representative would always pick the most highly amplified and thus most biased one.

    Leave a comment:


  • chxu02
    replied
    same as what I understand. That's why I chose to run it in --representative mode. Not sure if there would be any bias by random pick.

    Leave a comment:


  • fkrueger
    replied
    The default command would pick a random one for that location, the representative command would take the most highly amplified one. Both would only leave a single alignment for the position.

    Leave a comment:


  • chxu02
    replied
    If two fragments have the same ends but distinct methylation state, would they be regarded as duplicates by this command?

    Leave a comment:


  • fkrueger
    replied
    Yes that might be the case since in representative mode it first slurps the entire file into memory.

    In any case you should not be using --representative anyway unless you want to find the most highly represented PCR artefact in your data. Maybe I should simply remove it as an option because people keep getting it wrong. I would suggest rerunning it in the default mode.

    Leave a comment:


  • chxu02
    replied
    Hi Felix,
    I'm sorry for keeping bothering you. One more obstacle. I'm running
    Code:
    deduplicate_bismark -p --representative XXX.sam
    I tried either the SAM(~40G) generated by bismark, or the sorted BAM generated by samtools. In both cases, the "skipping SAM header lines" was quickly printed but nothing happened after that, even after 24 hours' running. My PC has 32G ram and 8 cores. Is it insufficient for running deduplicate_bismark in representative mode?

    Leave a comment:


  • fkrueger
    replied
    Originally posted by chxu02 View Post
    Hi Felix,
    PS: Does OT necessarily mean + (Watson) strand?
    Yes, that's right.

    Leave a comment:


  • fkrueger
    replied
    If you limit the input to CpG context it should be fairly quick, so just run bismark2bedGraph CpG*, followed by coverage2cytosine on the .cov file. You could then either use a clever awk command or a tiny script like the one attached to filter out only reads that were covered.
    Attached Files

    Leave a comment:


  • chxu02
    replied
    Hi Felix,
    Is there a way to segregate cytosines into those on + strand and on - strand? The "-/+" in bismark methylation extractor output files specifies methylation state instead of strand info. I think generating that optional genome-wide cytocine report and then grepping can do this job. But that will generate tons of data... Any idea?

    PS: Does OT necessarily mean + (Watson) strand?
    Last edited by chxu02; 02-25-2015, 01:09 PM.

    Leave a comment:


  • fkrueger
    replied
    Hi kentawan,
    unless you are interested in looking at events for top and bottom strand separately you can indeed just merge the two outputs. You can then use this output to find DMRs.

    Some tools even go one step further and recommend that you merge the top and bottom strand information of a CpG dinucleotide. I you wanted to do this you could use the option --merge_CpG of the coverage2cytosine script to do this (version 0.13.1 required; look at the latest release notes here).

    Regarding the question of whether 56.6% mapping efficiency is 'normal', well this difficult to answer as it depends very much on what you have done. Among the factors that play a role are: genome used (repeat content), read length, rigorous adapter/quality trimming, single-end, paired-end, library strategy (directional, PBAT, enrichment, shotgun), contaminations etc. As a guideline for 100bp single-end shotgun reads you would probably expect efficiencies of ~70-75% against the mouse and ~80-85% against the human genome (in Bowtie2 mode).

    Leave a comment:


  • kentawan
    replied
    Hi All,

    I have gotten my final methyl_extractor output files based on my bismark output files which have a mapping efficiency of 56.6% (is this normal?).

    Right now I am stuck on the research relevancy of OB and OT CpG Mehylation. If I just want to search for DMRs, can I merge both OB and OT together to make a total CpG file?

    many thanks in advance.

    Leave a comment:


  • jeni
    replied
    Thanks....

    Leave a comment:


  • fkrueger
    replied
    I would certainly interpret it that way, yes. You could potentially also look at spike in controls, but since 0.8% is you lowest value the efficiency must have been >= 99.1%.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X