Hello,
I'm working with circular consensus reads from a PCR product that may contain heteroduplexes. This is because the PCR was generated from a template that was a mix of minor variants. In the final annealing steps, strands amplified from different templates can anneal if they are similar enough. I'm expecting that a small percentage of the CCS reads may have two different strands that the PacBio 'Reads of Insert' method may assume are identical.
The simplest way I thought of to deal with this would be to look at the FASTQ CCS reads and filter based on the phred scores contained there. While most of the CCS reads are going to be really high accuracy, a heteroduplex would have an equal number of subreads for each strand, making the probability of error high for nucleotides affected by heteroduplexes.
My question is how the PacBio CCS error model works, and whether my strategy would work. I ask because I know there are many components of the PacBio error model that are consolidated into the FASTQ phred score, so it may not be as simple as I think.
Thanks!
I'm working with circular consensus reads from a PCR product that may contain heteroduplexes. This is because the PCR was generated from a template that was a mix of minor variants. In the final annealing steps, strands amplified from different templates can anneal if they are similar enough. I'm expecting that a small percentage of the CCS reads may have two different strands that the PacBio 'Reads of Insert' method may assume are identical.
The simplest way I thought of to deal with this would be to look at the FASTQ CCS reads and filter based on the phred scores contained there. While most of the CCS reads are going to be really high accuracy, a heteroduplex would have an equal number of subreads for each strand, making the probability of error high for nucleotides affected by heteroduplexes.
My question is how the PacBio CCS error model works, and whether my strategy would work. I ask because I know there are many components of the PacBio error model that are consolidated into the FASTQ phred score, so it may not be as simple as I think.
Thanks!
Comment