No announcement yet.

Samtools mpileup/bcftools

  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools mpileup/bcftools

    Hi all,
    I read through samtools manuals several times, but I'm still not clear on how exactly samtools & bcftools decide to call a SNP. I've tried to run through multiple combination of arguments with mpileup (-B, -C, -q, etc) & bcftools, but still ran into the problem below. I even ran bcftools view on the bcf file without the varFilter step, but the problem persists.

    I have 2 samples, an original & an "evolved" cell line. Based on numerous runs, I found that there are many SNPs being called only in the "evolved" cell line but not on the original, making it look like they're "novel" SNP. However, when I view them on IGV, I can see the SNP in the original cell line and there don't seem to be significant differences between the mapping quality or base quality at the SNP position in these 2 samples.
    It's not important to me if reads below a certain mapping quality don't get count, but the trouble is that it seems to be inconsistent. In one sample, SNPs on reads having mapping quality of 0 don't get count, but then they would get counted in the other sample, making it difficult to identify the true novel SNP. Is there anyway to force the SNP count to be more consistent?

  • #2

    I'm also having a similar problem! I have PCR confirmed SNPs that I can see in IGV when I look at the BAM file, however they are not present after varFilter. I'm guessing that there are parameters that I need to change to include them, but there doesn't seem to be any trend in map quality or coverage that I can distinguish ones that are called vs. missed in the vcf output.

    I mapped my PE Illumina reads with BWA and then running samtools mpileup:
    $ samtools mpileup -uf x.fasta Initial-sorted.bam | bcftools/bcftools view -bvcg - > initial.raw.bcf

    Then ran:
    $bcftools view initial.raw.bcf | varFilter -D500 > initial_snps.vcf

    I've tried adding -E switch to the samtools mpileup and get alot more SNPs, however this still does not include the SNPs confirmed by PCR. Also tried increasing the -D value when on the bcftools commands - again increases SNP count, but still not ones I know of.

    Please help! Not sure what else to do....


    • #3
      Try VarScan

      It takes the 'mpileup -f reference' output (piped or in a file) and then calls variants based on statistical analysis. Their paper does a good job at describing the algorithm. Plus the command line parameters are much more straightforward.

      I hope this helps!


      • #4
        Thanks so much! The VarScan results seemed to agree much better with my pcr data!