Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mixter
    Member
    • May 2010
    • 22

    Bisulfite sequencing - filtering by min. conversion rate

    Hi,

    I am looking into filtering bisulfite reads by a minimum conversion rate. Something high like a 95% CpG and non-CpG conversion rate. I've been working with Bismark and really like it.

    I know that I get conversion rates in the end results, but I would like to also filter individual reads by conversion rate, either before or after methylation calling / mapping.

    Are there any available tools that could do it, or otherwise a suggested way to do this?

    Many thanks!
  • simonandrews
    Simon Andrews
    • May 2009
    • 870

    #2
    It should be fairly simple to do this by parsing the methylation call string in each line of the bismark output, but I guess I would also ask why you wanted to do this. I'm aware that some groups have applied this filter in the past (though only ever in non-CpG context, requiring full conversion in CpG context would definitely be a mistake), but that was under the assumption that there is effectively no non-CpG methylation, which increasingly appears to not be the case. By removing highly (or even moderately) methylated reads you run the risk of biasing your results and potentially removing interesting data. If non-conversion of specific reads does happen in your library then it should only be a problem if it's targeted in some way, otherwise randomly distributing a few methylated base calls shouldn't bias your results too much.

    What we do filter for in our analyses is regions which show unusually high coverage. Mismapping of repetitive regions does happen, and can produce odd results, but this type of filtering removes a region of the genome from the analysis, rather than individual reads.

    Comment

    • mixter
      Member
      • May 2010
      • 22

      #3
      Dear Simon,

      What you wrote here made a lot of sense and I have meanwhile adopted this practice of not filtering reads to correct for conversion rate. There are several options I'm exploring, including correcting observed methylation levels and filtering for unusual coverage peaks. I'd like to argue for that in future papers and that it does not make sense to throw away significant amounts of the data due to suspected low conversion.

      Are there any studies that are helpful to support the approach of not filtering for highly converted reads? Otherwise, would you perhaps agree with a reference to our correspondence?

      Thanks

      Comment

      • simonandrews
        Simon Andrews
        • May 2009
        • 870

        #4
        I'm not aware of any studies which have looked in a systematic way at the dynamics of bisulphite conversion so I'm not sure there's a fixed conclusion one way or the other. We therefore stick with the more conservative approach of not biasing our data by systematically removing parts of the data which we have no actual evidence are wrong.

        Removing overrepresented sequences is easily justified since we know that we can't get that many correct sequences from a region - therefore we must be mis-measuring our data in those regions and we can therefore ignore them.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


          Here are nine questions we think about, in roughly the order they matter, before...
          Yesterday, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM
        • SEQadmin2
          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
          by SEQadmin2


          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


          Introduction

          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
          05-22-2026, 06:42 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        38 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        43 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Working...