Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • chxu02
    Member
    • Jan 2015
    • 18

    #31
    Originally posted by fkrueger View Post
    Once you finished the methylation extraction you can simply take the C-context output and simply concatenate it, e.g.

    cat CpG*.txt > CpG_context_merged.txt
    or
    zcat CpG*.txt.gz > CpG_context_merged.txt

    and use this merged output as input for bismark2bedGraph. The only thing that is somewhat inconvenient is that you don't get one nice mapping report / html file because the alignment process was split up into 3 processes...
    Felix, just wonder what's the advantage of this concatenation. I can use multiple files (either CpG_OT or CpG_OB or CpG_CTOT or CpG_CTOB) as input for bismark2bedgraph. This is what I do to my txt files from trimmed PE reads and unpaired SE reads.

    Comment

    • fkrueger
      Senior Member
      • Sep 2009
      • 627

      #32
      It shouldn't make a difference if you use bismark2bedGraph, other than looking a little bit tidier...

      Comment

      • iramai
        Junior Member
        • Mar 2018
        • 6

        #33
        Hi all,
        Maybe I am too late for posting here the same problem as all you did 6 years before.

        I am trying to analyze some bisulfite sequencing samples with bismark/bowtie2, and I also get really low mapping efficiencies with paired-end samples.
        > bismark --genome /home-gluster/mm10/ --bowtie2 --un --ambig_bam --nucleotide_coverage -q -1 R1C_1_val_1.fq.gz -2 R1C_2_val_2.fq.gz

        These are the alingment rates:
        435128526 (100.00%) were paired; of these:
        434844627 (99.93%) aligned concordantly 0 times
        62609 (0.01%) aligned concordantly exactly 1 time
        221290 (0.05%) aligned concordantly >1 times
        0.07% overall alignment rate
        435128526 reads; of these:
        435128526 (100.00%) were paired; of these:
        434841421 (99.93%) aligned concordantly 0 times
        58642 (0.01%) aligned concordantly exactly 1 time
        228463 (0.05%) aligned concordantly >1 times
        0.07% overall alignment rate
        Processed 435128526 sequences in total

        I have been quality trimming my sequences to get rid of the adaptors using Trim Galore! and FastQC did not report any specific problem. And then I have used Bowtie2, through Bismark, with the default parameters recommended on the protocol.

        After reading this thread I also have try to map the sequences in the single-end mode with default parameters of bismark protocol,
        > bismark --genome /home-gluster/mm10/ --bowtie2 --un --ambig_bam --nucleotide_coverage -q R1C_1_val_1.fq.gz

        but also there the mapping efficiency is low:
        435128526 (100.00%) were unpaired; of these:
        434723969 (99.91%) aligned 0 times
        110243 (0.03%) aligned exactly 1 time
        294314 (0.07%) aligned >1 times
        0.09% overall alignment rate
        435128526 reads; of these:
        435128526 (100.00%) were unpaired; of these:
        434728822 (99.91%) aligned 0 times
        118636 (0.03%) aligned exactly 1 time
        281068 (0.06%) aligned >1 times
        0.09% overall alignment rate

        - Maybe the problem could be a desyncronization of the paired sequences on the adapter trimming step? But I have used the paired end option for the trimming process with Trim Galore!

        - Maybe the problem is the reference genome? For the sequencing process the reference genome "gem3.mmusculus.GRCm38_BS" has been used, but as I always use the mm10 reference genome from the USCS for other sequencing analysis such as RNA or ChIP-Seq I tryed with the mm10 for the bisulfite sequencing analysis with bismark. Could be this the main fact of the low alingment rate? Some weeks ago I started a thread with this question but nobody answered me: http://seqanswers.com/forums/showthread.php?t=94563 , because I did not know if using the GRCm38 genome for the bisulfite analysis can not be appropiate if I want to compare this results with RNA and ChIP seq results analyzed with mm10 reference genome.

        In conclusion, I am new with the bisulfite analysis and I am really lost about how to follow the analysis process. So I am hoping that somebody here might have some more experience and might help me.

        Thanks in advance,

        Iraia

        Comment

        • fkrueger
          Senior Member
          • Sep 2009
          • 627

          #34
          Hi there,

          The mm10 and GRCm38 genomes should be exactly the same sequence, there are only some minor differences (e.g. chromosomes are called chr1, chr2 and not 1, 2 etc, chrM instead of MT and so on.

          I have written up a few tips you could try out here: https://github.com/FelixKrueger/Bism...ite-seq-sample

          If you still struggle to find an answer, please feel free to send me subset of your (gzip compressed) raw sequences, so I can take a look.

          Cheers, Felix

          Comment

          • iramai
            Junior Member
            • Mar 2018
            • 6

            #35
            Hi Felix,
            Thanks for answering so fast.
            I have been testing some of your tips from the link you shared on the previous comment, but I can't obtain higher alingment rates.

            First I change the trim_galore parameters when trimming the raw data, to the standard ones. It seems that the program detects some Illumina adapters, however I have identified overrepresented sequences such as poly A/G/T, so I also have trimmed them, always with the --paired parameter. Although this subsequent trimming steps, I have detected that tue reads length in the FASTQC report is between 20-150bp, thing that I don't know if is correct, or I need to fix a specific number (150bp).

            Next I have tried some of your tips for the alignment step. With the default parameters and paired end mode the alingment rate was too low (0.09%), as I described you in the first comment. Then I tried the alingment with single end mode. With R1 the alingment rate is 0.09% again and in the R2 the alingment rate is higher 47.04%. Now I am waiting for the paired end mode alingment with --score_min L,0,-0.6 parameter as you recommend, but as the data samples are to big it gets around 10 days to process the alingment.

            If it is not such a big problem I would like to share with you some fragments of my raw samples and see what you think about them. For that I also need a little help, what command line steps do I need to follow to cut a subset of the samples and be able to send them to you? I am pretty new in this sequencing worl, so I will be very grateful to receive all kind of help.

            Thanks in advance,

            Iraia

            Comment

            • fkrueger
              Senior Member
              • Sep 2009
              • 627

              #36
              Hi Iraia,

              I don't think you should start aligning the entire file until you know exactly what is the best way to do it. You are absolutely welcome to send me a few reads.

              If you are on Linux or on a Mac, you could use the following 2 commands:

              Code:
              gunzip -c file_R1.fastq.gz | head -800000 | gzip -c - > 200K_R1.fastq.gz
              gunzip -c file_R2.fastq.gz | head -800000 | gzip -c - > 200K_R2.fastq.gz
              This should then fit as an email attachment, you can use this email address.

              Comment

              • antonkermanov
                Junior Member
                • Aug 2020
                • 1

                #37
                Hi Felix,

                Sorry, that I am writing here, not in a separate thread, but I was considering that I am having a related problem. Basically, I also had quiet low, 40-50% mapping after end-to-end alignment. None of the approaches that is discussed here helped me, except the alignment of unaligned reads (preserving paired-end content of files) in local mode. It gave me other 40-50% reads aligned, i.e. in total at least 70% of aligned reads I got. The problem comes with the automatic report generated after bismark run with the estimates of methylation rates. The methylation in different contexts are comparable except the "methylation in unknown context", which increases few times (like, from 1-2 % in end-to-end to 6-7% in local). As I work with plants, it is important to assess all methylation contexts more precisely and I would like to ask your advice on interpreting this difference, as for me it is not very clear mechanistically from where it comes. Is it a valid approach to use local alignment giving this difference in the methylation context? I would be grateful for your help!

                Comment

                • fkrueger
                  Senior Member
                  • Sep 2009
                  • 627

                  #38
                  Originally posted by antonkermanov View Post
                  Hi Felix,

                  Sorry, that I am writing here, not in a separate thread, but I was considering that I am having a related problem. Basically, I also had quiet low, 40-50% mapping after end-to-end alignment. None of the approaches that is discussed here helped me, except the alignment of unaligned reads (preserving paired-end content of files) in local mode. It gave me other 40-50% reads aligned, i.e. in total at least 70% of aligned reads I got. The problem comes with the automatic report generated after bismark run with the estimates of methylation rates. The methylation in different contexts are comparable except the "methylation in unknown context", which increases few times (like, from 1-2 % in end-to-end to 6-7% in local). As I work with plants, it is important to assess all methylation contexts more precisely and I would like to ask your advice on interpreting this difference, as for me it is not very clear mechanistically from where it comes. Is it a valid approach to use local alignment giving this difference in the methylation context? I would be grateful for your help!
                  Hi Anton,

                  The methylation rate in Unknown context is really only for your information, but as these calls cannot be assigned a specific context they are just discarded in the methylation extraction step. I wouldn't read anything into these values, and certainly not let them stop you proceeding with downstream analysis.

                  Before using local alignments, we tend to recommend doing a few things summarised here: https://github.com/FelixKrueger/Bism...ite-seq-sample

                  I would also be happy to briefly look at your data to see if special trimming etc could help. If you would like to take me up on this offer, just send me a few reads (raw, untrimmed, FastQ gzipped) via email, and I could run a few tests.

                  Cheers, Felix

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 08:59 AM
                  0 responses
                  11 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...