Good morning, everyone!
I am using bcftools for genome assembly and extracting consensus sequences of viral genomes. Firstly, I noticed that bcftools consensus does not consider depth, which sometimes results in a nucleotide at one site of the genome having a depth of 1000X and another with 10X. However, if the 10X nucleotide matches the reference, bcftools will choose the reference match.
As I delved deeper into the analyses to understand what was happening, I encountered an even more intriguing issue. In some sites, when examining the BAM file (using IGV or Tablet), I observed that even when there were no nucleotides at that position matching the reference, bcftools still inserts the reference nucleotide. I traced back through the previous files to find where this nucleotide, which is not present in the BAM but is included in the consensus, originates. It appears that I found this reference-matching nucleotide already present in the output file from mpileup.
I am sending some example files. Please observe positions 21618 and 21622. I am also sending a reference genome fasta file and the consensus sequences generated with both bcftools and IVar. IVar seems to resolve the issue.
Best regards
PS: I was unable to attach the files
I am using bcftools for genome assembly and extracting consensus sequences of viral genomes. Firstly, I noticed that bcftools consensus does not consider depth, which sometimes results in a nucleotide at one site of the genome having a depth of 1000X and another with 10X. However, if the 10X nucleotide matches the reference, bcftools will choose the reference match.
As I delved deeper into the analyses to understand what was happening, I encountered an even more intriguing issue. In some sites, when examining the BAM file (using IGV or Tablet), I observed that even when there were no nucleotides at that position matching the reference, bcftools still inserts the reference nucleotide. I traced back through the previous files to find where this nucleotide, which is not present in the BAM but is included in the consensus, originates. It appears that I found this reference-matching nucleotide already present in the output file from mpileup.
I am sending some example files. Please observe positions 21618 and 21622. I am also sending a reference genome fasta file and the consensus sequences generated with both bcftools and IVar. IVar seems to resolve the issue.
Best regards
PS: I was unable to attach the files
Comment