Hi,
I'm interested in references to expected and acceptable mapping efficiency (i.e. % of mappable reads) in different BS-Seq scenarios due to my own experiments but this should also be of general interest. I have seen little definite references in BS-seq papers about this as of yet.
100% efficiency can't be expected since there's always at least a bit of DNA degradation by bisulfite. Furthermore, there are differences between genome-wide and RRBS data, e.g. due to the amount of repeats and ambiguous reads.
One recent publication by Babraham institute seems to indicate that 80-90% mapping efficiency can be routinely expected in BS-seq base space (Fig 2b of "DNA methylome analysis using short bisulfite sequencing data", http://www.nature.com/nmeth/journal/...meth.1828.html). Did I understand that right, or was this on simulated/ideal data after all?
But I also found a post by Felix Krueger stating that 68% mapping efficiency is already fair for BS-Seq paired-end (http://seqanswers.com/forums/showthr...?t=8140&page=3).
About paired-end: as I understand, mapping quality is usually slightly lower in comparison to single end because both mate pairs need to be acceptable. It would also be interesting to elucidate whether there are BS-seq specific differences in mapping efficiency in single- vs. paired-end as well.
I'm interested in references to expected and acceptable mapping efficiency (i.e. % of mappable reads) in different BS-Seq scenarios due to my own experiments but this should also be of general interest. I have seen little definite references in BS-seq papers about this as of yet.
100% efficiency can't be expected since there's always at least a bit of DNA degradation by bisulfite. Furthermore, there are differences between genome-wide and RRBS data, e.g. due to the amount of repeats and ambiguous reads.
One recent publication by Babraham institute seems to indicate that 80-90% mapping efficiency can be routinely expected in BS-seq base space (Fig 2b of "DNA methylome analysis using short bisulfite sequencing data", http://www.nature.com/nmeth/journal/...meth.1828.html). Did I understand that right, or was this on simulated/ideal data after all?
But I also found a post by Felix Krueger stating that 68% mapping efficiency is already fair for BS-Seq paired-end (http://seqanswers.com/forums/showthr...?t=8140&page=3).
About paired-end: as I understand, mapping quality is usually slightly lower in comparison to single end because both mate pairs need to be acceptable. It would also be interesting to elucidate whether there are BS-seq specific differences in mapping efficiency in single- vs. paired-end as well.
Comment