Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • alma
    Junior Member
    • Jun 2011
    • 2

    Problem with samtools/bcftools

    Hi

    I am trying to call SNPs and indels from some bam files and I tried the following commands as given in the samtools/bcftools manual:

    samtools mpileup -uf reference.fa aln1.bam | bcftools view -bvcg - > var.raw.bcf
    bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

    However after running the commands I see that var.raw.bcf and var.flt.vcf files have results from only the first chromsome that is present in the bam file. I also get the following message:

    [mpileup] 1 samples in 1 input files
    <mpileup> Set max per-file depth to 8000
    [afs] 0:16349.158 1:6949.156 2:48149.686

    In my bam file Chr10 appears first followed by Chr11, 12, ...19, 1, 20, 21..., 2 etc. The reads, however, are sorted within the chromosomes. Just to make sure that the problem is not due to such arrangement of the chromsomes, I extracted Chr1, 2 and 3 and created a new bam file and sorted it using following commands.

    samtools view -bh aln1.bam Chr1 Chr2 Chr3 -o Chr1to3.bam
    samtools sort Chr1to3.bam Chr1to3sorted

    Then ran the same command as before for SNP/Indel analysis. But the problem exists.

    Can any of you please suggest what might be the possible problems/solutions?

    Thanks in advance

    ALM
  • Heisman
    Senior Member
    • Dec 2010
    • 534

    #2
    Try running it without chromosome 1 and see what happens. Make sure the headers for all files used are complete and the reference sequence file is complete as well.

    Comment

    • amruta.bn
      Junior Member
      • Jan 2012
      • 6

      #3
      Hi,

      I have generated a VCF file using the command

      samtools faidx input.reference.fa
      samtools index sortedhg19.bam
      samtools mpileup -d 8000 -uf input.reference.fa my.sortedhg19.bam | bcftools view -bvcg -> my.raw.bcf
      bcftools view my.raw.bcf | perl vcfutils.pl varFilter -D8000 > my.flt.vcf

      But when i validated my VCF file using vcf-validator of VCFTools and i am getting the warning message as given below

      The header tag 'reference' not present. (Not required but highly recommended.)
      The header tag 'contig' not present for CHROM=chr1. (Not required but highly recommended.)
      INFO field at chr1:10007 .. Could not validate the float [-nan]

      Can any one please help me to fix this.I think only the third one is critical.

      Regards,
      Amruta

      Comment

      • Heisman
        Senior Member
        • Dec 2010
        • 534

        #4
        Could you post that entire line in the file?

        Comment

        • amruta.bn
          Junior Member
          • Jan 2012
          • 6

          #5
          ##fileformat=VCFv4.1
          ##samtoolsVersion=0.1.18 (r982:295)
          ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
          ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
          ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
          ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
          ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
          ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
          ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
          ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
          ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
          ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
          ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
          ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
          ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
          ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
          ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
          ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
          ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
          ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
          ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
          ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
          ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
          ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
          ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
          ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
          #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 29dec/sorted.29dec.hg19.bam.bam
          chr1 10007 . T C 6.98 . DP=127;VDB=-nan;AF1=0.5001;AC1=1;DP4=1,3,0,7;MQ=20;FQ=5.64;PV4=0.36,1,1,1 GT:PL:GQ 0/1:36,0,33:34
          chr1 10186 . T A 6.98 . DP=140;VDB=-nan;AF1=0.4999;AC1=1;DP4=2,8,5,7;MQ=20;FQ=9.53;PV4=0.38,1,1,1 GT:PL:GQ 0/1:36,0,76:37
          chr1 10549 . T A 9.52 . DP=14;VDB=-nan;AF1=0.5;AC1=1;DP4=2,1,1,2;MQ=20;FQ=9.52;PV4=1,1,1,1 GT:PL:GQ 0/1:39,0,39:39
          chr1 12613 . G C 6.02 . DP=6;VDB=-nan;AF1=1;AC1=2;DP4=0,0,0,2;MQ=20;FQ=-33 GT:PL:GQ 1/1:36,6,0:6
          chr1 13275 . G A,C 3.41 . DP=9;VDB=-nan;AF1=1;AC1=2;DP4=0,0,1,2;MQ=20;FQ=-33 GT:PL:GQ 1/1:43,17,11,31,0,28:4
          chr1 13407 . T A 6.02 . DP=10;VDB=-nan;AF1=1;AC1=2;DP4=0,0,0,2;MQ=20;FQ=-33 GT:PL:GQ 1/1:36,6,0:6
          chr1 13411 . C A 5.29 . DP=10;VDB=-nan;AF1=1;AC1=2;DP4=0,0,1,2;MQ=20;FQ=-33 GT:PL:GQ 1/1:35,6,0:6
          chr1 16259 . T A 6.02 . DP=8;VDB=-nan;AF1=1;AC1=2;DP4=0,0,2,0;MQ=20;FQ=-33 GT:PL:GQ 1/1:36,6,0:6
          chr1 17216 . T G 3.64 . DP=6;VDB=-nan;AF1=0.5205;AC1=1;DP4=1,0,1,1;MQ=20;FQ=-16.1;PV4=1,0.03,1,0.44 GT:PL:GQ 0/1:31,0,11:18
          chr1 18712 . T A 3.54 . DP=15;VDB=-nan;AF1=0.4998;AC1=1;DP4=3,1,2,3;MQ=20;FQ=5.35;PV4=0.52,0.042,1,1 GT:PL:GQ 0/1:31,0,45:30
          chr1 20700 . C G 3.64 . DP=9;VDB=-nan;AF1=0.5205;AC1=1;DP4=1,0,1,1;MQ=20;FQ=-16.1;PV4=1,1,1,0.28 GT:PL:GQ 0/1:31,0,11:18
          chr1 20794 . C A 3.64 . DP=17;VDB=-nan;AF1=0.5205;AC1=1;DP4=1,0,1,1;MQ=20;FQ=-16.1;PV4=1,0.056,1,0.2 GT:PL:GQ 0/1:31,0,11:18
          chr1 21113 . T A 3.64 . DP=9;VDB=-nan;AF1=0.5205;AC1=1;DP4=0,1,1,1;MQ=20;FQ=-16.1;PV4=1,0.33,1,1 GT:PL:GQ 0/1:31,0,11:18
          chr1 21130 . C A 7.39 . DP=11;VDB=-nan;AF1=0.5718;AC1=1;DP4=0,1,2,3;MQ=20;FQ=-21;PV4=1,0.023,1,0.44 GT:PL:GQ 0/1:36,0,6:10
          chr1 21186 . G A 6.02 . DP=9;VDB=-nan;AF1=1;AC1=2;DP4=0,0,1,1;MQ=20;FQ=-33 GT:PL:GQ 1/1:36,6,0:6
          chr1 21253 . G A 8.64 . DP=24;VDB=-nan;AF1=0.5;AC1=1;DP4=5,2,3,1;MQ=20;FQ=11.3;PV4=1,0.1,1,0.1 GT:PL:GQ 0/1:38,0,76:40
          chr1 21300 . C A 3.54 . DP=19;VDB=-nan;AF1=0.4998;AC1=1;DP4=4,2,1,2;MQ=20;FQ=5.47;PV4=0.52,0.34,1,1 GT:PL:GQ 0/1:31,0,70:30
          chr1 21415 . C A 5.46 . DP=16;VDB=-nan;AF1=0.4999;AC1=1;DP4=3,2,1,2;MQ=20;FQ=7.77;PV4=1,0.41,1,0.26 GT:PL:GQ 0/1:34,0,55:34
          chr1 21564 . C A 5.29 . DP=4;VDB=-nan;AF1=1;AC1=2;DP4=0,0,2,1;MQ=20;FQ=-33 GT:P



          This is the VCF file.Hope you asked for vcf file .
          Is there any way to format reference fasta sequence?
          i have downloaded individual chromosomes from 1,2...22,X and y and joined into a single file in two orders like 1,11,12,13 ... and 1,2,3,4,.. .But in both cases this error is coming after validation

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          14 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          26 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          36 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          60 views
          0 reactions
          Last Post SEQadmin2  
          Working...