Header Leaderboard Ad

Collapse

Bismark: paired-end low mapping efficiency

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by fkrueger View Post
    Once you finished the methylation extraction you can simply take the C-context output and simply concatenate it, e.g.

    cat CpG*.txt > CpG_context_merged.txt
    or
    zcat CpG*.txt.gz > CpG_context_merged.txt

    and use this merged output as input for bismark2bedGraph. The only thing that is somewhat inconvenient is that you don't get one nice mapping report / html file because the alignment process was split up into 3 processes...
    Felix, just wonder what's the advantage of this concatenation. I can use multiple files (either CpG_OT or CpG_OB or CpG_CTOT or CpG_CTOB) as input for bismark2bedgraph. This is what I do to my txt files from trimmed PE reads and unpaired SE reads.

    Comment


    • #32
      It shouldn't make a difference if you use bismark2bedGraph, other than looking a little bit tidier...

      Comment


      • #33
        Hi all,
        Maybe I am too late for posting here the same problem as all you did 6 years before.

        I am trying to analyze some bisulfite sequencing samples with bismark/bowtie2, and I also get really low mapping efficiencies with paired-end samples.
        > bismark --genome /home-gluster/mm10/ --bowtie2 --un --ambig_bam --nucleotide_coverage -q -1 R1C_1_val_1.fq.gz -2 R1C_2_val_2.fq.gz

        These are the alingment rates:
        435128526 (100.00%) were paired; of these:
        434844627 (99.93%) aligned concordantly 0 times
        62609 (0.01%) aligned concordantly exactly 1 time
        221290 (0.05%) aligned concordantly >1 times
        0.07% overall alignment rate
        435128526 reads; of these:
        435128526 (100.00%) were paired; of these:
        434841421 (99.93%) aligned concordantly 0 times
        58642 (0.01%) aligned concordantly exactly 1 time
        228463 (0.05%) aligned concordantly >1 times
        0.07% overall alignment rate
        Processed 435128526 sequences in total

        I have been quality trimming my sequences to get rid of the adaptors using Trim Galore! and FastQC did not report any specific problem. And then I have used Bowtie2, through Bismark, with the default parameters recommended on the protocol.

        After reading this thread I also have try to map the sequences in the single-end mode with default parameters of bismark protocol,
        > bismark --genome /home-gluster/mm10/ --bowtie2 --un --ambig_bam --nucleotide_coverage -q R1C_1_val_1.fq.gz

        but also there the mapping efficiency is low:
        435128526 (100.00%) were unpaired; of these:
        434723969 (99.91%) aligned 0 times
        110243 (0.03%) aligned exactly 1 time
        294314 (0.07%) aligned >1 times
        0.09% overall alignment rate
        435128526 reads; of these:
        435128526 (100.00%) were unpaired; of these:
        434728822 (99.91%) aligned 0 times
        118636 (0.03%) aligned exactly 1 time
        281068 (0.06%) aligned >1 times
        0.09% overall alignment rate

        - Maybe the problem could be a desyncronization of the paired sequences on the adapter trimming step? But I have used the paired end option for the trimming process with Trim Galore!

        - Maybe the problem is the reference genome? For the sequencing process the reference genome "gem3.mmusculus.GRCm38_BS" has been used, but as I always use the mm10 reference genome from the USCS for other sequencing analysis such as RNA or ChIP-Seq I tryed with the mm10 for the bisulfite sequencing analysis with bismark. Could be this the main fact of the low alingment rate? Some weeks ago I started a thread with this question but nobody answered me: http://seqanswers.com/forums/showthread.php?t=94563 , because I did not know if using the GRCm38 genome for the bisulfite analysis can not be appropiate if I want to compare this results with RNA and ChIP seq results analyzed with mm10 reference genome.

        In conclusion, I am new with the bisulfite analysis and I am really lost about how to follow the analysis process. So I am hoping that somebody here might have some more experience and might help me.

        Thanks in advance,

        Iraia

        Comment


        • #34
          Hi there,

          The mm10 and GRCm38 genomes should be exactly the same sequence, there are only some minor differences (e.g. chromosomes are called chr1, chr2 and not 1, 2 etc, chrM instead of MT and so on.

          I have written up a few tips you could try out here: https://github.com/FelixKrueger/Bism...ite-seq-sample

          If you still struggle to find an answer, please feel free to send me subset of your (gzip compressed) raw sequences, so I can take a look.

          Cheers, Felix

          Comment


          • #35
            Hi Felix,
            Thanks for answering so fast.
            I have been testing some of your tips from the link you shared on the previous comment, but I can't obtain higher alingment rates.

            First I change the trim_galore parameters when trimming the raw data, to the standard ones. It seems that the program detects some Illumina adapters, however I have identified overrepresented sequences such as poly A/G/T, so I also have trimmed them, always with the --paired parameter. Although this subsequent trimming steps, I have detected that tue reads length in the FASTQC report is between 20-150bp, thing that I don't know if is correct, or I need to fix a specific number (150bp).

            Next I have tried some of your tips for the alignment step. With the default parameters and paired end mode the alingment rate was too low (0.09%), as I described you in the first comment. Then I tried the alingment with single end mode. With R1 the alingment rate is 0.09% again and in the R2 the alingment rate is higher 47.04%. Now I am waiting for the paired end mode alingment with --score_min L,0,-0.6 parameter as you recommend, but as the data samples are to big it gets around 10 days to process the alingment.

            If it is not such a big problem I would like to share with you some fragments of my raw samples and see what you think about them. For that I also need a little help, what command line steps do I need to follow to cut a subset of the samples and be able to send them to you? I am pretty new in this sequencing worl, so I will be very grateful to receive all kind of help.

            Thanks in advance,

            Iraia

            Comment


            • #36
              Hi Iraia,

              I don't think you should start aligning the entire file until you know exactly what is the best way to do it. You are absolutely welcome to send me a few reads.

              If you are on Linux or on a Mac, you could use the following 2 commands:

              Code:
              gunzip -c file_R1.fastq.gz | head -800000 | gzip -c - > 200K_R1.fastq.gz
              gunzip -c file_R2.fastq.gz | head -800000 | gzip -c - > 200K_R2.fastq.gz
              This should then fit as an email attachment, you can use this email address.

              Comment


              • #37
                Hi Felix,

                Sorry, that I am writing here, not in a separate thread, but I was considering that I am having a related problem. Basically, I also had quiet low, 40-50% mapping after end-to-end alignment. None of the approaches that is discussed here helped me, except the alignment of unaligned reads (preserving paired-end content of files) in local mode. It gave me other 40-50% reads aligned, i.e. in total at least 70% of aligned reads I got. The problem comes with the automatic report generated after bismark run with the estimates of methylation rates. The methylation in different contexts are comparable except the "methylation in unknown context", which increases few times (like, from 1-2 % in end-to-end to 6-7% in local). As I work with plants, it is important to assess all methylation contexts more precisely and I would like to ask your advice on interpreting this difference, as for me it is not very clear mechanistically from where it comes. Is it a valid approach to use local alignment giving this difference in the methylation context? I would be grateful for your help!

                Comment


                • #38
                  Originally posted by antonkermanov View Post
                  Hi Felix,

                  Sorry, that I am writing here, not in a separate thread, but I was considering that I am having a related problem. Basically, I also had quiet low, 40-50% mapping after end-to-end alignment. None of the approaches that is discussed here helped me, except the alignment of unaligned reads (preserving paired-end content of files) in local mode. It gave me other 40-50% reads aligned, i.e. in total at least 70% of aligned reads I got. The problem comes with the automatic report generated after bismark run with the estimates of methylation rates. The methylation in different contexts are comparable except the "methylation in unknown context", which increases few times (like, from 1-2 % in end-to-end to 6-7% in local). As I work with plants, it is important to assess all methylation contexts more precisely and I would like to ask your advice on interpreting this difference, as for me it is not very clear mechanistically from where it comes. Is it a valid approach to use local alignment giving this difference in the methylation context? I would be grateful for your help!
                  Hi Anton,

                  The methylation rate in Unknown context is really only for your information, but as these calls cannot be assigned a specific context they are just discarded in the methylation extraction step. I wouldn't read anything into these values, and certainly not let them stop you proceeding with downstream analysis.

                  Before using local alignments, we tend to recommend doing a few things summarised here: https://github.com/FelixKrueger/Bism...ite-seq-sample

                  I would also be happy to briefly look at your data to see if special trimming etc could help. If you would like to take me up on this offer, just send me a few reads (raw, untrimmed, FastQ gzipped) via email, and I could run a few tests.

                  Cheers, Felix

                  Comment

                  Working...
                  X