Hi,
I'm having a problem calling a variant with NGS which has been confirmed via Sanger sequencing.
I'm using samtools mpileup followed by VarScan.
Bit of background
The lab are doing resequencing of human candidate genes to identify rare mutations.
Target enrichment is performed via HaloPlex, sequencing is done on a MiSeq with 150bp paired end reads.
I am aligning reads with BWA using default PE settings againts the entire human genome
(I have tried the UCSC, 1KG, and MiSeq ref genomes - all the same result)
The problem
In the pileup (generated with or without BAQ computation) there is a ~20bp region 'missing'
Example commands
samtools mpileup -f ref.fa test.bam > test_BAQ.pileup
samtools mpileup -Bf ref.fa test.bam > test_NO_BAQ.pileup
From positions chr1:76740134 to chr1:76740154 are not present in the pileup?
The bases flanking the 'missing' region are marked by ^ (start of read segment) and $ (end of read segment).
However, when I look in IGV there are plenty reads covering the 'missing' region and the variant I know is there is there clear as day! It just doesn't make it into the pileup?!
Any help / advice about what is going on here would be much appreciated
BW
Chris
Example pileup
chr1 76740129 T 51 ................................................... FF:FGBFG?FGFFEGBF6F??GBGE@FB5DF?FGDFBBFGGFD.FF?G?FG
chr1 76740130 G 51 ................................................... GF@FGFDFBFFFFEG?DBFBBGDF2DGFBFGDFGFFFDFGFFFDFF?FBFG
chr1 76740131 G 51 ...$................................................ GG9,GFDFFFGFBEGDB?FBDGFG;EGD,F>BGGBFGFFFGGDFFFBGDFG
chr1 76740132 A 50 .................................................. GG?FFDGBGFFB@G>DFFBFGFG@DGFBG?;GG?GFFFFFGFFFG>F;GF
chr1 76740133 A 50 .................................................. GGFFFFFDBFDD@G>F?FBBFF>@DGF>G??FFBFGBFGBFGFFG?>,FB
chr1 76740134 T 50 .$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$ >>>>>>>>>>>>9>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
chr1 76740154 A 42 ^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^], B=ECAACCAEBA@BEB=;C?CCBCC;>ECAECC==>>
chr1 76740155 T 42 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, EEEGEG=GBGBD@EGEBEE;EDEGGEDEEBGDEBEGBEDDGD
chr1 76740156 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,^],^],^],^],^],^],^],^],^], FDDGFGDGEGEG@FGEBEGDGEEGGEGEEFGEGDFGGEDFEF@BBB@BB>9
chr1 76740157 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBDGFG4GEGDEEFGEDDGEGFFGGEGEEFGEGEFGGDDEGF@;DEDE@DD
chr1 76740158 T 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBBGFG;GGG,EEFEEDBGDGFFGGEGBFEEEGEFGGF=DGF**DD9E@BD
chr1 76740159 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FEEGFGBGGGDDEEGE=BG;GEEDEEGFFEDEEEEEGFEEGFED@99BE>9
I'm having a problem calling a variant with NGS which has been confirmed via Sanger sequencing.
I'm using samtools mpileup followed by VarScan.
Bit of background
The lab are doing resequencing of human candidate genes to identify rare mutations.
Target enrichment is performed via HaloPlex, sequencing is done on a MiSeq with 150bp paired end reads.
I am aligning reads with BWA using default PE settings againts the entire human genome
(I have tried the UCSC, 1KG, and MiSeq ref genomes - all the same result)
The problem
In the pileup (generated with or without BAQ computation) there is a ~20bp region 'missing'
Example commands
samtools mpileup -f ref.fa test.bam > test_BAQ.pileup
samtools mpileup -Bf ref.fa test.bam > test_NO_BAQ.pileup
From positions chr1:76740134 to chr1:76740154 are not present in the pileup?
The bases flanking the 'missing' region are marked by ^ (start of read segment) and $ (end of read segment).
However, when I look in IGV there are plenty reads covering the 'missing' region and the variant I know is there is there clear as day! It just doesn't make it into the pileup?!
Any help / advice about what is going on here would be much appreciated
BW
Chris
Example pileup
chr1 76740129 T 51 ................................................... FF:FGBFG?FGFFEGBF6F??GBGE@FB5DF?FGDFBBFGGFD.FF?G?FG
chr1 76740130 G 51 ................................................... GF@FGFDFBFFFFEG?DBFBBGDF2DGFBFGDFGFFFDFGFFFDFF?FBFG
chr1 76740131 G 51 ...$................................................ GG9,GFDFFFGFBEGDB?FBDGFG;EGD,F>BGGBFGFFFGGDFFFBGDFG
chr1 76740132 A 50 .................................................. GG?FFDGBGFFB@G>DFFBFGFG@DGFBG?;GG?GFFFFFGFFFG>F;GF
chr1 76740133 A 50 .................................................. GGFFFFFDBFDD@G>F?FBBFF>@DGF>G??FFBFGBFGBFGFFG?>,FB
chr1 76740134 T 50 .$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$ >>>>>>>>>>>>9>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
chr1 76740154 A 42 ^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^], B=ECAACCAEBA@BEB=;C?CCBCC;>ECAECC==>>
chr1 76740155 T 42 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, EEEGEG=GBGBD@EGEBEE;EDEGGEDEEBGDEBEGBEDDGD
chr1 76740156 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,^],^],^],^],^],^],^],^],^], FDDGFGDGEGEG@FGEBEGDGEEGGEGEEFGEGDFGGEDFEF@BBB@BB>9
chr1 76740157 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBDGFG4GEGDEEFGEDDGEGFFGGEGEEFGEGEFGGDDEGF@;DEDE@DD
chr1 76740158 T 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBBGFG;GGG,EEFEEDBGDGFFGGEGBFEEEGEFGGF=DGF**DD9E@BD
chr1 76740159 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FEEGFGBGGGDDEEGE=BG;GEEDEEGFFEDEEEEEGFEEGFED@99BE>9
Comment