Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • If you limit the input to CpG context it should be fairly quick, so just run bismark2bedGraph CpG*, followed by coverage2cytosine on the .cov file. You could then either use a clever awk command or a tiny script like the one attached to filter out only reads that were covered.
    Attached Files

    Comment


    • Originally posted by chxu02 View Post
      Hi Felix,
      PS: Does OT necessarily mean + (Watson) strand?
      Yes, that's right.

      Comment


      • Hi Felix,
        I'm sorry for keeping bothering you. One more obstacle. I'm running
        Code:
        deduplicate_bismark -p --representative XXX.sam
        I tried either the SAM(~40G) generated by bismark, or the sorted BAM generated by samtools. In both cases, the "skipping SAM header lines" was quickly printed but nothing happened after that, even after 24 hours' running. My PC has 32G ram and 8 cores. Is it insufficient for running deduplicate_bismark in representative mode?

        Comment


        • Yes that might be the case since in representative mode it first slurps the entire file into memory.

          In any case you should not be using --representative anyway unless you want to find the most highly represented PCR artefact in your data. Maybe I should simply remove it as an option because people keep getting it wrong. I would suggest rerunning it in the default mode.

          Comment


          • If two fragments have the same ends but distinct methylation state, would they be regarded as duplicates by this command?

            Comment


            • The default command would pick a random one for that location, the representative command would take the most highly amplified one. Both would only leave a single alignment for the position.

              Comment


              • same as what I understand. That's why I chose to run it in --representative mode. Not sure if there would be any bias by random pick.

                Comment


                • The default mode would pick the first alignment per position, while representative would always pick the most highly amplified and thus most biased one.

                  Comment


                  • Hi Felix,

                    I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.

                    Thanks.

                    Comment


                    • Originally posted by Dipro View Post
                      Hi Felix,

                      I am using the latest version of bismark (v0.13.1) and have encounterd some problems with methylation extractor. Could I please get some help. First I noticed that the first line of the cytosine2cytosine script is '2#!/usr/bin/perl'. I assume this is a typo-error so I changed it to '#!/usr/bin/perl'. Running the following commands: bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder, I keep getting the following error 'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'. I double checked my folder and could not find a .cov file. I thought the bismark2bedgraph script was suppose to generate a .cov file, I am therefore unsure what might have gone wrong. Could I please get some help with this issue.

                      Thanks.
                      Hi Dipro,

                      This was indeed a typo which will be fixed in the next release which is actually due out today or tomorrow (and will finally support parallel alignments – so stay tuned!).

                      A couple of things about the command you used:

                      bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder

                      'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'
                      Sorry if it is a stupid question, but did you change the ‘/path/to/file’ by a valid path of the file on your system?

                      -s: not necessary (will be determined automatically)
                      -o /requires/path/to/output/folder
                      --samtools_path /requires/path/to/samtools/executable
                      --counts: not necessary (used by default)
                      --remove_spaces: only use this if really necessary, will otherwise cost time and temporary space
                      --buffer_size: requires input, e.g. 10G
                      --genome_folder /requires/path/to/genome/folder

                      input file is required

                      If you still struggle can you just send me the onscreen-text via email? This would make spotting mistakes in the command much easier. Cheers, Felix

                      Comment


                      • Bismark finally supporting parallel alignments

                        We would like to announce that a new version of Bismark (v0.14.0) has just been released. This version adds a parallelization switch to the Bismark alignment step, and also changes a couple of other issues detailed below:

                        o Bismark: Eventually added parallelization to the Bismark alignment step using the option '--muticore int' which sets the number of parallel instances of Bismark to be run concurrently. At least in this first distribution this is achieved by forking the Bismark alignment step very early on so that each individual Spawn of Bismark (SoB?) processes only every n-th sequence (n being set by --multicore). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance.

                        If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --multicore 8 tested so far). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie/Bowtie2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --multicore specified will effectively lead to a linear increase in compute and memory requirements, so --multicore 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned...

                        o Bismark: Changed the default output to BAM. SAM output may be requested using the option --sam

                        o Bismark: No longer generates a piechart (.png) with the alignment stats. bismark2report generates a much nicer report anyway

                        o Methylation Extractor: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required. In some instances files containing e.g. -1-2 in their filename might previously have been identified as paired-end incorrectly

                        o deduplicate_bismark: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required

                        o deduplicate_bismark: Added option --version so that Clusterflow can report a version number

                        o bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well

                        o coverage2cytosine: Fixed a typo in the shebang which prevented coverage2cytosine from running

                        Even though we have tried out several corner cases this release is still somewhat experimental and we would appreciate any comments! Bismark can be downloaded from the Babraham Bioinformatics website.

                        Comment


                        • Hello,I use the recent Bismark application. I tried to execute multicore command

                          bismark --multicore 4 <renome> -n 1 <filename>

                          The result is 4 split file of sam.gz (I haven't installed samtools). After that, I tried to call the methylation extractor

                          bismark_methylation_extractor -p --comprehensive *.sam.gz

                          The parameter *.sam.gz is to select all 4 sam.gz file but this is failed. Which one I should use to call the methylation extractor command? Thank you.


                          *note :
                          I notice there is a one .bam file. I don't know how this file generated because I haven't installed samtools, but this bam file is only 16 kb size and it's nothing
                          And I notice that the alignment process is done assuming it's single end and my dat ais paired end, how can I set the alignment so that it use paired end?
                          Last edited by barbarian; 03-15-2015, 05:53 PM.

                          Comment


                          • Hi Barbarian,

                            I am afraid the problem you are seeing is indeed caused by the fact that you don't have Samtools installed. In a bid to get multicore processing working in a reasonable time I assumed that everyone is running Samtools already so it is currently not designed to deal with sam.gz files.

                            Just as a heads-up, the initial release (v0.14.0) also doesn't deal correctly with some corner cases such as the --gzip option for temp files (which is already fixed in the development version), and the -B (basename) option (which is yet to be looked at).

                            So bottom line: If you install Samtools and don't try to run all cornercases at the same time (--gzip, -B) it should work nicely.

                            Comment


                            • Hi Felix,

                              I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.

                              Youyou

                              Comment


                              • Have you specified --gzip for this run?

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X