Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • It is still not working...I tried putting the bismark file within the folder /media/3TBpt1/bismark_intermediate_results/ too.

    I will now try it with the new bismark version..

    Comment


    • Bismark itself does not have to be in the analysis folder, but you need to start the analysis from withing that directory. Specifying relative or abolsute paths for filenames will cause it to fail.

      cd /media/3TBpt1/bismark_intermediate_results/

      and then

      ./bismark --path_to_bowtie /home/rini/bismark/bowtie-0.12.7 /home/rini/bismark/bowtie-0.12.7/genomes/ -1 1725-SB-5_1_sequence.fastq -2 1725-SB-5_2_sequence.fastq -o /home/rini/bismark/bismark_v0.5.4/

      The same is also true for 0.6.beta1. Hope it'll work now.

      Comment


      • I am within the folder..it does not seem to work!

        Comment


        • Can you please send me the exact command you are using via email to
          [email protected] (including the error message)? Cheers, Felix

          Comment


          • I am having a lot of issues with the code alignment. I have tried many different things. I think the problem is that (1)I don't understand what folder to be in to execute the command and (2) sometimes it wants me to put </../../> or sometimes just /.../.../

            The following is something I tried. Please help if you could.

            /usr/local/bin/bismark_v0.6.beta1/bismark --path_to_bowtie /usr/local/bin/bowtie /mnt/DATA/Cores/hiseq2000/Homo_sapiens_UCSC_hg19/Homo_sapiens/UCSC/hg19/Sequence/Chromosomes/ -1 /mnt/DATA/Cores/hiseq2000/111123_SN874_0071_AD0F4CACXX/unaligned/Project_kinome_WGBS/Sample_22647_BS/read_1/22647_BS_GATCAG_L002_R1_001.fastq -o /mnt/DATA/Cores/hiseq2000/111123_SN874_0071_AD0F4CACXX/bismark/

            Comment


            • It is really quite simple, the input files which you specify with -1 and -2 must not contain full path information as this will screw up the naming of the output files. (in your example you might have forgotten to specify -2 in addition to that).

              For this to work you need to be in the folder containing the files to be aligned (e.g. cd /mnt/DATA/Cores/hiseq2000/111123_SN874_0071_AD0F4CACXX/unaligned/Project_kinome_WGBS/Sample_22647_BS/read_1/) and then use:

              -1 file1.fastq -2 fastq.fastq

              For everything else you should be able to use path information as well.

              Hope this helps.

              Comment


              • Two questions:

                1. How does Bismark handle haplotype variation with regards to methylation. In other words, what happens to a methylation call when 10 reads show CpG methylation at a particular site, while 10 reads do not show it at the same site?

                2. In our analysis, the number of C's analyzed in the Final Cytosine Methylation Report is 5X bigger than the total number of base pairs in the genome (all A,C,G,Ts). I think I am misunderstanding the output here. Why is this?

                Comment


                • Originally posted by sbst View Post
                  Two questions:

                  1. How does Bismark handle haplotype variation with regards to methylation. In other words, what happens to a methylation call when 10 reads show CpG methylation at a particular site, while 10 reads do not show it at the same site?

                  2. In our analysis, the number of C's analyzed in the Final Cytosine Methylation Report is 5X bigger than the total number of base pairs in the genome (all A,C,G,Ts). I think I am misunderstanding the output here. Why is this?
                  Hi sbst,

                  1. Bismark itself doesn't perform any sophisticated haplotype analysis. It will simply determine unique best alignments, and then perform its methylation call. For cytosine positions in the genome, and only for these, Bismark determines whether it was methylated (C in the read) or unmethylated (T in the read). Bases other than C or T at the position in question will be ignored.

                  2. The number of Cs analysed in total is simply summing up all cytosine positions for all reads for which a methylation call has been performed. The report is intended to provide a rough idea about the methylation state of the sample analysed and is totally independent of the genome used for the alignments.

                  I hope this helps,
                  Felix

                  Comment


                  • Originally posted by fkrueger View Post
                    Hi sbst,

                    1. Bismark itself doesn't perform any sophisticated haplotype analysis. It will simply determine unique best alignments, and then perform its methylation call. For cytosine positions in the genome, and only for these, Bismark determines whether it was methylated (C in the read) or unmethylated (T in the read). Bases other than C or T at the position in question will be ignored.

                    2. The number of Cs analysed in total is simply summing up all cytosine positions for all reads for which a methylation call has been performed. The report is intended to provide a rough idea about the methylation state of the sample analysed and is totally independent of the genome used for the alignments.

                    I hope this helps,
                    Felix
                    Thanks Felix. That's helpful! So at any specific cytosine, if there are 10 calls as methylated and 10 calls as not methylated (from a total of 20 reads), then it will have a methylation status of 50%. If this is correct, then I totally understand now.

                    Comment


                    • That's indeed right, such a position would have an overall methylation rate of 50%. Bismark itself determines the methylation only on a read-by-read basis, so the actual quantitation would be be accomplished by your analysis program (or script) of choice.

                      Comment


                      • New Bismark version 0.6.3

                        We have just released a new version of Bismark (v0.6.3) and updated its documentation extensively to account for the recent changes that arose from implementing Bowtie 2 and changing the default output format to SAM.

                        Main changes:

                        - The methylation extractor does now also work with Bismark SAM output files
                        - Fixed a bug caused when a read was called 0 (zero)
                        - Changed the XX:Z mismatch field in the SAM output to display mismatching nucleotides of the reference sequence (instead of the read sequence ones)

                        More information can be found here or on the Bismark project page.

                        Comment


                        • Hi Felix,

                          Since v0.6.3 now produces SAM files, do you see any reason why I can't use samtools rmdup or picard to remove alignment duplicates? Would it be better to output to vanilla format and use the old de-duplicate script? With SAM it will be necessary to convert to BAM, run rmdup, then convert back to SAM to run the methylation extractor, so perhaps running vanilla output is best way to go if you want to remove alignment duplicates.

                          Apologies if this question has been asked already.

                          Larry W.

                          Comment


                          • Originally posted by wilhelml View Post
                            Hi Felix,

                            Since v0.6.3 now produces SAM files, do you see any reason why I can't use samtools rmdup or picard to remove alignment duplicates? Would it be better to output to vanilla format and use the old de-duplicate script? With SAM it will be necessary to convert to BAM, run rmdup, then convert back to SAM to run the methylation extractor, so perhaps running vanilla output is best way to go if you want to remove alignment duplicates.

                            Apologies if this question has been asked already.

                            Larry W.
                            Hi Larry,

                            I have adapted the de-duplication script to handle both SAM or vanilla output so there should be no need run it in vanilla mode just for this reason. Basically, I would imagine that rmdup or Picard could also be used for deduplication, I just didn't want to have an out-of-date version of the deduplication script floating around. I am not quite sure whether they would get confused by the somewhat unusual FLAG tags which are used for paired-end BS-Seq files. I have not compared the outputs, but it would certainly be worth a try.

                            Best,
                            Felix

                            Comment


                            • We have just released a new version of Bismark (v0.6.4) to address a few minor issues.

                              The changes include:

                              - Adjusted the options -u and -s so that only the non-skipped part of the input file will be transcribed and analysed. This allows splitting up very large files into smaller chunks to allow parallel processing, e.g -s 10000000 -u 20000000 would analyse sequences 10000001 to 20000000. The alignment report will be based on this reduced number of reads analysed
                              - In paired-end mode, the options --unmapped and --ambiguous do now output unaligned or multiply aligned reads, respectively, to their correct output files as intended
                              - Sequences in FastA format do now receive Phred score qualities of 40 throughout (ASCII 'I') to prevent the SAM to BAM conversion in SAMtools from failing
                              - If a genomic sequence could not be extracted it will now also be counted and reported for use with Bowtie 1
                              - Suppressed debugging warning meassages that were printed in error for Bowtie2 alignments (single-end mode only)

                              Bismark is available here.

                              Comment


                              • I have some questions regarding the directional (strand-specific) vs non-directional protocol.

                                1. As far as a I understood, the reads from a non-directional library (based on the Cokus protocol) have some sequence tags left (FW reads: TCTGT and RC reads: TCCAT).
                                Do I have to remove these tags in all the reads before using Bismark? Or does Bismark handle them internally, perhaps, by using their information to find the correct alignment on the correct strand? I think BS Seeker does exploit this.

                                2. What about reads from a non-directional library that do not have such sequence tags. I do not know why some reads (it seems to be the minority) do not have these tags. Are they artefacts?

                                3. Is there a way to infer the underlying experimental protocol (directional vs non-directional), if it is not clear from the information of the metadata of the sequencing run?
                                One idea would be to scan the reads for these tags. However, I do not know if these tags are always present in the raw data based on non-directional protocols.
                                A workaround could be to run Bismark with its current default of having a directional library, then check in the summary report whether there are lots of rejected alignments to the complementary strands indicating a non-directional protocol.
                                Should there be approximately equal amounts of alignments to all the four strands in case of a non-directional library?

                                Any comments highly appreciated.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-16-2024, 05:49 AM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-15-2024, 06:53 AM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                40 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 09:45 AM
                                0 responses
                                205 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X