Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by lancelothk View Post
    Hi fkrueger,

    I am currently trying to use bismark to analyse a huge BS-seq dataset in HPC environment. I am thinking to split the big fastq into smaller pieces, run bismark with each of them in one node, then merge back the BAM results. Do you have any suggestion how can I merge bismark reports? Do you have existing script to do this?

    Thanks.
    Hi Lanceloth,

    As it stands there is no stand-alone script to merge the mapping reports, but the code should pretty much all be there for it is used for --multicore core runs anyway. The two subroutines
    Code:
    'merge_individual_mapping_reports' and
    'read_alignment_report'
    should contain everything. Let me know if you need help merging these into a stand-alone script.

    Just as a short word of warning when you are trying to merge paired-end BAM files with samtools merge you need to make sure that the files are subsequently sorted by read name, otherwise the reads are not guaranteed to follow each other line by line. Would maybe the --multicore option be a little more feasible?

    Best, Felix

    Comment


    • Thank you so much Felix. I will take a look at the source code.

      Comment


      • Hi fkrueger,

        I found two minor issues in bismark v0.14.3.
        The deduplicate_bismark will give errors with --representative option:
        Failed to close output filehandle: Bad file descriptor
        Failed to close report filehandle: Bad file descriptor

        I found out that it is caused by a bug in line:548. The } should be after two close lines, since OUT and REPORT have been closed in deduplicate_representative().

        The -B/--basename <basename> option can be found in the script, but not in pdf version manual.

        BTW, I finished extracting reports merging code into stand alone script. The most painful part is the global variable...

        Thanks.

        Comment


        • Thanks for pointing out these issues. I have updated the manual and removed the superfluous closing statements. They will find their way into a new release which we'll be releasing later today.

          Edit: Just as a quick word of warning: the --representative mode is almost certainly not what what you want to use because it will find the most highly amplified and thus biased sequence for a given position instead of a random. I will probably hide it from use in the next release...
          Last edited by fkrueger; 08-19-2015, 01:19 AM.

          Comment


          • Bismark v0.14.4. New functionality and allele-specific alignment support

            We have just released a new Bismark version (v0.14.4). This brings a few convenience features, adds some options and also fixes some bugs, further details are outlined below.

            It is also worth mentioning that it should now be possible to use Bismark in conjunction with SNPsplit to align Bisulfite-Seq data in an allele-specific fashion against an N-masked genome if both genotypes are known. More information about this may be found on the SNPsplit project page.

            o Bismark: Changed the FLAG values of paired-end alignments to the CTOT or CTOB strands so that reads can be properly displayed in SeqMonk when imported as BAM files. This change affects only paired-end alignments in --pbat or --non_directional mode. In detail we simply swapped the Read 1 and Read 2 FLAG values round so reads now resemble exactly concordant read pairs to the OT or OB strands. Note that results produced by the methylation extractor or further downstream of that are not affected by this change
            o Bismark: Input files specified with filepath information for FastA files are now handled properly in --multicore runs (this was fixed only for FastQ files in the previous patch)
            o Bismark: Unmapped and ambiguous files (options --unmapped and --ambiguous) are now written out as gzip compressed files by default
            o Bismark: Changed the default mode of operation to --bowtie2. Bowtie (1) alignments may still be chosen using the option --bowtie1

            o Bismark Genome Preparation: Changed the execution of the genome indexing of the parent process to system() rather than an exec() call since this seemed to lead to interesting faults when run in a pipeline setting
            o Bismark Genome Preparation: Changed the default indexing mode to --bowtie2. Bowtie (1) indexing is still available via the option --bowtie1

            o bismark2bedGraph: The coverage (.cov) and bedGraph (.bedGraph) files are now written out as gzip compressed files by default

            o coverage2cytopsine: Added new option '--gc/--gc_context' to reprocess the genome to find methylation in GpC context. This might be useful for specialist applications where GpC methylases had been employed. The output format is exactly the same as for the normal cytosine report, and only positions covered by at least one read are reported (output file ends in .GpC_report.txt). In addition this will write out a Bismark coverage file (ending in GpC.cov)

            o deduplicate_bismark: Removed redundant closing statements to get rid of warning messages
            o deduplicate_bismark: The option --representative is no longer displayed in the help text. The option was once useful to determine the PCR bias that had been introduced by over digestion with bisulfite and is nearly always not what should be used for deduplication (it will be left in and is still functional for the time being though)

            Bismark is available from the Babraham Bioinformatics project page.

            Comment


            • I found one more bug in deduplicate_bismark. It is also in v0.14.4.
              There are several calls of samtools directly use "samtools" instead of using $samtools_path. E.g. line 269, line 207.

              Comment


              • Originally posted by lancelothk View Post
                I found one more bug in deduplicate_bismark. It is also in v0.14.4.
                There are several calls of samtools directly use "samtools" instead of using $samtools_path. E.g. line 269, line 207.
                Thanks for spotting that, I've fixed all these calls.

                Comment


                • error with seedlen &gt; 32

                  I get an error from bowtie2 when I try to define seed length of 50 in bismark. I haven't found any mention of this problem elsewhere nor mention of seedlen limits in the bowtie2 manual. Particularly, it seems strange given the recommended "typical' settings are for a seed length of 50 in the bismark manual. Can anyone help me to trace the source of this error?

                  Using Bowtie 2 index: /home/tair10/Bisulfite_Genome/CT_conversion/BS_CT

                  Error: -L argument must be <= 32; was50
                  Error: Encountered internal Bowtie 2 exception (#1)
                  Command: /cm/shared/apps/bowtie/2-2.1.0/bowtie2-align --wrapper basic-0 -q -N 1 -L 50 --score-min L,0,-0.2 --ignore-quals --norc -x /home/tair10/Bisulfite_Genome/CT_conversion/BS_CT -U 4AB_trimmed_r1.fastq_C_to_T.fastq
                  bowtie2-align exited with value 1
                  The alignment does seem to work when no seedlen is defined. Here is a sample of a read from the relevant fastq, you will note the read length is 50bp but I don't think this is relevant since the error says -L must be <= 32.

                  @HWI-D00458:73:C6EBDANXX:1:1101:1728:1972 1:N:0:GCTCTA_A
                  AGCGTGGTTTATTGATTTTTTAGATTTTCGGAATTTGAAGTTAGAGGTGT
                  +
                  CG>EFF<EFECE>D1<111@/FG>CFGGGG///0=1:FGGD1FE1FGEBG
                  P.S. This is my first comment in the forum (though I have been stalking this place for years) so I apologise if it is out of place.
                  Last edited by marcusmchale; 09-01-2015, 10:10 AM.

                  Comment


                  • If you type bowtie2 --help you can find the following text:

                    Code:
                    -L <int>           length of seed substrings; must be >3, <32 (22)
                    Obviously this is not mentioned in the manual but you need to keep the seed substrings in the range of 3 to 32. The default is 22. I hope this helps, Felix

                    Comment


                    • Thanks for the prompt reply, the manual for bismark suggests the following command:

                      bismark -n 1 -l 50 /data/genomes/homo_sapiens/GRCh37/ test_dataset.fastq
                      Which would call bowtie2 to use "-L 50".

                      Is there something I'm missing?

                      Oh, it's because of the differences in alignment strategy between bowtie1/bowtie2. Thanks for the lead!
                      Last edited by marcusmchale; 09-01-2015, 10:52 AM.

                      Comment


                      • Oh it seems I need to update the manual because we very recently changed the default aligner to Bowtie 2, and the command in the manual still refers to bowtie1 (if you use --bowtie1 you can use the command as in the manual). I'll have this changed soon, thanks for spotting this.

                        If you want to run the test dataset just leave out all options and try using the defaults. Best, Felix

                        Comment


                        • genome preparation

                          hi,
                          I am trying to run bismark genome preparation but unable to do so.
                          I have bismark v 14.5 unzipped folder on server and have bowtie-2.2.2.6 version unzipped folder and genome files for human grch38- all these three folders in one folder. Do i need to run any installation step for bismark/bowtie before i run genome preparation ?

                          I am new to methylation analysis so will be great if you could please help.

                          thanks in advance.

                          Comment


                          • Bismark just needs to be extracted as is outlined step by step in the manual (http://www.bioinformatics.babraham.a...User_Guide.pdf). I believe Bowtie 2 also only needs to be unzipped, then either you place the bowtie2 executable in the PATH (just google how to do this), or you specify the path with --path_to_bowtie in Bismark.

                            All other steps including the genome preparation (
                            Code:
                            bismark_genome_preparation [options] <path_to_genome_folder>
                            ) are explained in detail in the manual, this protocol, or this methylation analysis course. Good luck, Felix

                            Comment


                            • Hi,

                              I am unable to run the bismark_genome_preparation step yet.
                              I get an error "Command not found'.
                              Any idea? I am trying since yesterday, not sure what am i doing wrong?

                              Comment


                              • Originally posted by daanum View Post
                                Hi,

                                I am unable to run the bismark_genome_preparation step yet.
                                I get an error "Command not found'.
                                Any idea? I am trying since yesterday, not sure what am i doing wrong?

                                I admire your perseverance but you might want to consider doing a basic Linux operation tutorial, I think you might benefit.

                                Here you've got a couple of options:
                                1) either you move to the folder containing the Bismark installation and then run ./bismark_genome_preparation (./ prepends the path to the current genome)
                                2) you can type /path/to/Bismark/bismark_genome_preparation which should work from anywhere.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                61 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X