No announcement yet.

finding the base at a position in a bam

  • Filter
  • Time
  • Show
Clear All
new posts

  • finding the base at a position in a bam

    Hi All,

    I have seen some other threads on this subject, sorry I don't have the links. But none of them seem to resolve the problem. I am trying to do something very simple but no tools for bam files seem to have the facility. I have tried samtools and GATK to no avail.

    All I need is to get the consensus read base at particular positions of the reference in a bam file. I am trying to check the state of SNPs from a particular bacterial SNP typing scheme in my NGS data. Not all of the positions will actually be a SNP, in which cae they have the same base as the reference, and I want that reported, so .vcf is no good to me.

    Greatly appreciate any suggestions. I suppose I can just use samtools to make the vcf and then if a position is not listed then assume it is the reference base. I do find samtools almost impenetrable.. would this line be appropriate?

    samtools mpileup -I -uBf ref.fasta 1_sorted.bam | bcftools view -bcg - > 1.bcf && bcftools view 1.bcf > 1.vcf

    but I don't understand the output vcf file because it seems to have a '.' for the reference base and then the proper A T C or G for the 'alt' base(s).

    Thanks for any help,


  • #2
    I do think the mpileup command is what you want. I am not sure what your pipeline is making, but it sounds like the pileup format (. or , for the reference base with ATCG with alternates).
    You can output a vcf with genotype compared to the reference for each position:
    samtools mpileup -gu -f ref.fasta sorted.bam | bcftools call -c - > all.vcf
    The genotype will be 0/0 for homozygous reference, 0/1 for heterozygous with the alternate listed allele, and 1/1 for homozygous alternate.

    bcftools call -cv would output the variants only.

    You might want to use bcftools consensus to create a modified fasta file with your variants:
    Providing nextRAD genotyping and PacBio sequencing services.


    • #3
      Thanks SNPsaurus, I will try that. But I've now found that the published SNP locations I have been using have mistakes so have to go back a step..



      • #4
        samtools mpileup should also take as a commend line entry a position, or a bed file of positions, so you don't need to generate a line for every base in the genome.