Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello,
    I have a question the option of bismark, concerning the bowtie 2 reporting options --most_valid_alignments.
    If I'm not wrong the option -M is not available now in Bowtie2, so how can you ask to the program to keep only valid aligments only unique aligments?

    Comment


    • Originally posted by shadow19c View Post
      Hello,
      I have a question the option of bismark, concerning the bowtie 2 reporting options --most_valid_alignments.
      If I'm not wrong the option -M is not available now in Bowtie2, so how can you ask to the program to keep only valid aligments only unique aligments?
      Bismark determines if there are any other alignments with the same alignment score. If there are the read is not unique and discarded, otherwise it is kept.

      Comment


      • Dear Felix,
        I am wondering if Bismark can be fed with an input stream in a pipeline.
        For example would that work?
        Code:
        cutadapt -b ACTGCTCG input_file.fastq | bismark -n 1 --solexa1.3-quals --bam indexed_genome/ -
        Or this?
        Code:
        cutadapt -b ACTGCTCG input_file.fastq | bismark -n 1 --solexa1.3-quals --bam indexed_genome/ stdin
        Or do I have to create an intermediate file?
        Code:
        cutadapt -b ACTGCTCG input_file.fastq > temp_file.fastq
        bismark -n 1 --solexa1.3-quals --bam indexed_genome/ temp_file.fastq
        Thanks for your help
        Julien

        Comment


        • Hi Julien,
          I am afraid you would have to go via the temporary trimmed file because Bismark uses the input file to determine output file names etc. You might want to take a quick look at Trim Galore which also uses Cutadapt for trimming with a set of stringent and useful parameters that are ideally suited for bisulfite applications.

          Comment


          • Thanks Felix for your answer.

            I am now facing another problem: when I run bismark of some of my samples, reads end up mapped to only the top or the bottom strand... This is not happening for all samples, but it is happening repeatedly on given samples. Is this a memory issue?

            Here is a an example of report file:
            Code:
            Bismark report for: ./C3K1_trimmed.fq.gz (version: v0.7.12)
            Option '--directional' specified: alignments to complementary strands will be ignored (i.e. not performed!)
            Bowtie was run against the bisulfite genome of panTro3_nonrandom+Lambda_prepared_bismark/ with the specified options: -q --phred64-quals -n 1 -k 2 --best --chunkmbs 512
            
            Final Alignment report
            ======================
            Sequences analysed in total:    50619505
            Number of alignments with a unique best hit from the different alignments:      19828727
            Mapping efficiency:     39.2%
            Sequences with no alignments under any condition:       23787863
            Sequences did not map uniquely: 7002915
            Sequences which were discarded because genomic sequence could not be extracted: 0
            
            Number of sequences with unique best (first) alignment came from the bowtie output:
            CT/CT:  0       ((converted) top strand)
            CT/GA:  19828727        ((converted) bottom strand)
            GA/CT:  0       (complementary to (converted) top strand)
            GA/GA:  0       (complementary to (converted) bottom strand)
            
            Number of alignments to (merely theoretical) complementary strands being rejected in total:     0
            
            Final Cytosine Methylation Report
            =================================
            Total number of C's analysed:   185057365
            
            Total methylated C's in CpG context:     6829013
            Total methylated C's in CHG context:    216805
            Total methylated C's in CHH context:    741531
            
            Total C to T conversions in CpG context:        2672385
            Total C to T conversions in CHG context:        41885198
            Total C to T conversions in CHH context:        132712433
            
            C methylated in CpG context:    71.9%
            C methylated in CHG context:    0.5%
            C methylated in CHH context:    0.6%
            Thanks for your help
            Julien

            Comment


            • Hi Julien,
              This does indeed look like the first instance of Bowtie (OT) is running out of memory... Does your machine have fairly low RAM or are you running many instances of Bismark concurrently? Could you run the analysis on a more powerful machine and see what happens there?

              Comment


              • Hi Felix
                Actually I was running bismark on a cluster and I requested 32g of memory. I have now run it with 48g and I'm still encountering the same issue...
                Do you have any idea of what else could go wrong? There is no apparent error message printed in the report files
                Thanks
                Julien

                Comment


                • Hi fkrueger,

                  When I use the bismark_methylation_extractor, it need to define single-end or paired-end. I have filter the original reads using Trimmomatic, which produced paired-end and single-end simultaneously. Is it possible to merge the two bam files together using samtools and bismark_methylation_extractor later with -p parameter?

                  Thank you.
                  P

                  Comment


                  • @pengchy,
                    It will probably work fine for the singleton reads but this won't guarantee the strandedness of paired-end reads, i.e. read 1 and read 2 would most likely be regarded as coming from OT and and CTOT (instead of OT only) or similarly from OB and CTOB (instead of OB only). In terms of the methylation information it should still be fine though.

                    To be on the safe side I would probably still treat them separately and then merge the results at the level of the final methylation extractor output.

                    Comment


                    • Originally posted by Julien Roux View Post
                      Hi Felix
                      Actually I was running bismark on a cluster and I requested 32g of memory. I have now run it with 48g and I'm still encountering the same issue...
                      Do you have any idea of what else could go wrong? There is no apparent error message printed in the report files
                      Thanks
                      Julien
                      This is somewhat weird... You should not need more than 12GB memory or so, maybe your cluster farms one alignment thread out and can't read from it anymore? Could you try to login to a compute node and run it locally on there (or some other machine for that matter)?

                      If you could send me a few reads (e.g. the first 100K or so reads) via email I could take a quick look here what is going on.

                      Comment


                      • Solved: alignment on one strand only

                        I am now facing another problem: when I run bismark of some of my samples, reads end up mapped to only the top or the bottom strand...
                        Just a quick message to let users know that my problem is solved and was due to an error when generating the Bowtie index files for one of the strands. All index files were present, but one of them was incomplete, making the alignment impossible.
                        Thanks Felix for your help
                        Julien

                        Comment


                        • bedGraph usage

                          hello. I am a newbie using bismark. I'm not a bioinformation and also i don't know much about NGS alignment, especially how to running alignment programs. But, i could use bismark thanks to kindly well made bismark user guide book.

                          i ran bismark and got the report file and sam file. so to get the position of methylC, i ran bismark_methylation_extractor especially used --bedGraph option to see methlylation %.

                          According to user guide book,

                          A typical command including the optional bedGraph --counts output could look like this:
                          bismark_methylation_extractor -s --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


                          my data is paired-end so i edited little that one. like this:

                          bismark_methylation_extractor -p --no_overlap --comprehensive --CX
                          --bedGraph --counts --buffer_size 10G s_1_sequence.txt_bismark.sam


                          but, i couldn't get additional information about <chromosome> <start position> <end position> <methylation percentage>

                          my result was like thins

                          Bismark methylation extractor version v0.7.11
                          HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
                          HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
                          HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
                          HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 -

                          how can i get the methylation percentage information and run further bedGraph2Cytosines process!

                          i am looking forward anyone's reply

                          Comment


                          • Originally posted by Ahra View Post
                            Bismark methylation extractor version v0.7.11
                            HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498192 x
                            HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/1 - Vr08 7498173 x
                            HWI-D00111:9926ADACXX:7:1101:1567:2165_1:N:0:CTTGTA/2 - Vr08 7498096 x
                            These files are the primary output of the methylation extractor (described in the manual or if you type --help). The bedGraph will be generated from these files, and ends in *.bedGraph (just be aware that this step can take a long while if you are using --CX). The bedGraph can then be further transformed into a genome-wide cytosine report.

                            To see if there is something going wrong on the way could you please send the onscreen dialog via email?

                            Comment


                            • Originally posted by fkrueger View Post
                              Hi pengchy,

                              To 1) It is true that Bismark appends segment numbers to the end of read. This is because Bowtie or Bowtie2 tend to delete these tags internally while aligning, and to make it more difficult they don't do it in the same way. To properly keep track of which read is doing what I had to do this change (btw also white spaces or tab characters are being replaced by _ in the read ID.
                              Hi fkrueger - I recently upgraded to bismark v0.7.12 and have run into an issue with the read segment numbers

                              My input fastq does not have these tags

                              Code:
                              @HWI-D00119:25:D1WYEACXX:3:1101:1169:2017 1:N:0:
                              NGGTGTATTGGGTTAGTGTATTTTTTTGTTGATATTTTTTTTTTGTATTGTGGTTGATTTTTGATTAAGGACGTTTGTTTTTAAGGGTTTTTGGTGGG
                              +
                              #0<<BFFFFFFFFFIIFFIFFFIIIIIIIIIFIFFIIIIIIIIFFB<BFBBBFBBB<BBFFFF<BFBBBB<BBBFBFFBFFFBBBFF<BFFFBF<BBF
                              However *two* tags get added in the read prep step of bismark (eg C_to_T fastq)

                              Code:
                              @HWI-D00119:25:D1WYEACXX:3:1101:1169:2017_1:N:0:/1/1
                              NGGTGTATTGGGTTAGTGTATTTTTTTGTTGATATTTTTTTTTTGTATTGTGGTTGATTTTTGATTAAGGATGTTTGTTTTTAAGGGTTTTTGGTGGG
                              +
                              #0<<BFFFFFFFFFIIFFIFFFIIIIIIIIIFIFFIIIIIIIIFFB<BFBBBFBBB<BBFFFF<BFBBBB<BBBFBFFBFFFBBBFF<BFFFBF<BBF
                              then in the output bam, only one is removed

                              Code:
                              HWI-D00119:25:D1WYEACXX:3:1101:1169:2017_1:N:0:/1	67	chr6	151954983	255	98M	=	151955045	162	NGGTGTATTGGGTTAGTGTATTTTTTTGTTGATATTTTTTTTTTGTATTGTGGTTGATTTTTGATTAAGGACGTTTGTTTTTAAGGGTTTTTGGTGGG	#0<<BFFFFFFFFFIIFFIFFFIIIIIIIIIFIFFIIIIIIIIFFB<BFBBBFBBB<BBFFFF<BFBBBB<BBBFBFFBFFFBBBFF<BFFFBF<BBF	NM:i:21	XX:Z:A4C1C4C8C3C2C7C3C1C4C5C3CC5CC7C3C1C9CC7	XM:Z:.....h.x....h........h...x..x.......h...h.x....x.....x...hh.....hh.....Z.h...h.h.........hx.......	XR:Z:CT	XG:Z:CT
                              This is screwing up picard markduplicates as the read names do not match anymore - I just want to clarify if this is a bug in bismark or should manually be removing tags after alignment or use read_names_regex in markduplicates?

                              Thanks

                              Comment


                              • Hi forzenlyse,

                                It seems that you were running paired-end alignments with Bowtie2, correct? Bismark does indeed append /1/1 to read 1, one of which is removed during the alignment step. The second one is then removed at a later step, but to keep the output in a way that allows immediate identification of what was R1 and R2, /1 and /2 are appended just before the SAM/BAM output is created. If you don't want these trailing fragment numbers you can just locate the following two lines in the subroutine "paired_end_SAM_output":

                                Code:
                                  my $id_1 = $id.'/1';
                                  my $id_2 = $id.'/2';
                                and comment them out like so:
                                Code:
                                  # my $id_1 = $id.'/1';
                                  # my $id_2 = $id.'/2';
                                Let me know if this fixes your problem of if I can anything else for you.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X