Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cboustred
    Member
    • May 2010
    • 11

    Samtools mpileup - start and end segments

    Hi,

    I'm having a problem calling a variant with NGS which has been confirmed via Sanger sequencing.

    I'm using samtools mpileup followed by VarScan.

    Bit of background
    The lab are doing resequencing of human candidate genes to identify rare mutations.

    Target enrichment is performed via HaloPlex, sequencing is done on a MiSeq with 150bp paired end reads.

    I am aligning reads with BWA using default PE settings againts the entire human genome
    (I have tried the UCSC, 1KG, and MiSeq ref genomes - all the same result)

    The problem

    In the pileup (generated with or without BAQ computation) there is a ~20bp region 'missing'

    Example commands
    samtools mpileup -f ref.fa test.bam > test_BAQ.pileup
    samtools mpileup -Bf ref.fa test.bam > test_NO_BAQ.pileup

    From positions chr1:76740134 to chr1:76740154 are not present in the pileup?

    The bases flanking the 'missing' region are marked by ^ (start of read segment) and $ (end of read segment).

    However, when I look in IGV there are plenty reads covering the 'missing' region and the variant I know is there is there clear as day! It just doesn't make it into the pileup?!

    Any help / advice about what is going on here would be much appreciated

    BW

    Chris

    Example pileup
    chr1 76740129 T 51 ................................................... FF:FGBFG?FGFFEGBF6F??GBGE@FB5DF?FGDFBBFGGFD.FF?G?FG
    chr1 76740130 G 51 ................................................... GF@FGFDFBFFFFEG?DBFBBGDF2DGFBFGDFGFFFDFGFFFDFF?FBFG
    chr1 76740131 G 51 ...$................................................ GG9,GFDFFFGFBEGDB?FBDGFG;EGD,F>BGGBFGFFFGGDFFFBGDFG
    chr1 76740132 A 50 .................................................. GG?FFDGBGFFB@G>DFFBFGFG@DGFBG?;GG?GFFFFFGFFFG>F;GF
    chr1 76740133 A 50 .................................................. GGFFFFFDBFDD@G>F?FBBFF>@DGF>G??FFBFGBFGBFGFFG?>,FB
    chr1 76740134 T 50 .$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$.$ >>>>>>>>>>>>9>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    chr1 76740154 A 42 ^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^],^], B=ECAACCAEBA@BEB=;C?CCBCC;>ECAECC==>>
    chr1 76740155 T 42 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, EEEGEG=GBGBD@EGEBEE;EDEGGEDEEBGDEBEGBEDDGD
    chr1 76740156 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,^],^],^],^],^],^],^],^],^], FDDGFGDGEGEG@FGEBEGDGEEGGEGEEFGEGDFGGEDFEF@BBB@BB>9
    chr1 76740157 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBDGFG4GEGDEEFGEDDGEGFFGGEGEEFGEGEFGGDDEGF@;DEDE@DD
    chr1 76740158 T 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FBBGFG;GGG,EEFEEDBGDGFFGGEGBFEEEGEFGGF=DGF**DD9E@BD
    chr1 76740159 A 51 ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, FEEGFGBGGGDDEEGE=BG;GEEDEEGFFEDEEEEEGFEEGFED@99BE>9
  • cboustred
    Member
    • May 2010
    • 11

    #2
    OK I have found a work around

    If I use the mpileup -A flag (count anomolous read pairs) then the region is included in the pileup

    BW

    Chris

    Comment

    • dkoboldt
      Member
      • Mar 2009
      • 62

      #3
      Chris,

      I'm glad you came across the solution - it looks like there's a universal issue mapping reads in "proper" pairs at that location. Thanks for using VarScan!

      Comment

      • MWN
        Junior Member
        • Aug 2011
        • 8

        #4
        Originally posted by cboustred View Post
        OK I have found a work around

        If I use the mpileup -A flag (count anomolous read pairs) then the region is included in the pileup

        BW

        Chris
        How well does Haloplex perform based on your experience? I am thinking about trying it out. However, I am worrying about FFPE DNA input requirement.
        Any potential to get copy number information using Haloplex because of the multiple amplicon design? I understand the reads are not good as random reads from hybridization capture.

        Comment

        • rpauly
          Member
          • Apr 2011
          • 32

          #5
          I am having a similar problem with samtools mpileup:
          #If I use :
          samtools mpileup -r chr12:25398277-25398285 Sample1.bam >output1
          where output1 is below:
          chr12 25398279 N 16 G$A$*A$C$G$A$G$CCCCCCCC BB11BACBHHHHGHF0
          chr12 25398280 N 9 *GGGGGGGA 1EGG?AC01
          chr12 25398281 N 9 CCCCCCCCC 1EGGA/EE

          These are obviously truncated lines! Why does mpileup truncate the lines in the last two columns?

          Any help shall be appreciated.
          ~Rini

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 11:08 AM
          0 responses
          7 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          11 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          20 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          54 views
          0 reactions
          Last Post SEQadmin2  
          Working...