Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Chiel
    Junior Member
    • Sep 2008
    • 2

    Samtools variant calling questions

    Hi,

    In our group we are using samtools for variant calling. As a basic guide we use the example given at http://samtools.sourceforge.net/mpileup.shtml. It seems samtools is able to perform as a nice tool to get from bam to a useful variant call format that can be annotated using other resources. Yet we have some difficulties understanding and applying some parts to proper use.

    Instead of what is shown in the example we want to apply variant calling on a single sample. The first question is if it's safe to use mpileup on a single sample in a similar way as is shown in the example, or should I use normal pileup for this? (And does this still apply BAQ?)

    Then the data is converted to a raw bcf file using bcftools. The second question is if this output contains every possible variant disregarding quality, depth, and the number of variant supporting calls? I assume this is the case and further polishing is done using vcfutils but please correct me if I'm wrong.

    Finally, vcfutils' varfilter is applied for filtering. In the example only a depth filter is shown. Next to the depth there are some other thresholds we would like to set. We would like to apply a (base) quality cutoff, a strand-bias filter for reference and variant calls, and inlcude variant supporting calls.

    A close inspection of the varfilter help shows a couple of possibilities. I'll briefly describe how we think they should be used, or what our difficulties are.
    -Using the -a flag we can set the number of variant supporting calls?
    -The -1 flag seems to be a p-val for strand bias cutoff. Yet I'm unable to find any explanation on what useful values we can use. (Or how this behaves in certain conditions we are interested in. i.e. Both reference and variant calls found on both strands.
    -Then there are the -2, -3, and -4 flags which imply serveral p-val setting. Default values are given. However, also here an explanation on how to alter this for different practical conditions would be very welcome.
    -The default value for mapQ bias is 0, why?

    We couldn't find much information on these issues in literature or other recources. Nevertheless, some of these setting are crucial in variant calling and I would expect better descriptions than what we could find so far, especially when a clinical setting comes into play. It would be greatly appreciated if anyone could give some answers. Thanks.
  • hansdd
    Junior Member
    • May 2011
    • 6

    #2
    Originally posted by Chiel View Post
    Hi,

    In our group we are using samtools for variant calling. As a basic guide we use the example given at http://samtools.sourceforge.net/mpileup.shtml. It seems samtools is able to perform as a nice tool to get from bam to a useful variant call format that can be annotated using other resources. Yet we have some difficulties understanding and applying some parts to proper use.

    Instead of what is shown in the example we want to apply variant calling on a single sample. The first question is if it's safe to use mpileup on a single sample in a similar way as is shown in the example, or should I use normal pileup for this? (And does this still apply BAQ?)

    Then the data is converted to a raw bcf file using bcftools. The second question is if this output contains every possible variant disregarding quality, depth, and the number of variant supporting calls? I assume this is the case and further polishing is done using vcfutils but please correct me if I'm wrong.

    Finally, vcfutils' varfilter is applied for filtering. In the example only a depth filter is shown. Next to the depth there are some other thresholds we would like to set. We would like to apply a (base) quality cutoff, a strand-bias filter for reference and variant calls, and inlcude variant supporting calls.

    A close inspection of the varfilter help shows a couple of possibilities. I'll briefly describe how we think they should be used, or what our difficulties are.
    -Using the -a flag we can set the number of variant supporting calls?
    -The -1 flag seems to be a p-val for strand bias cutoff. Yet I'm unable to find any explanation on what useful values we can use. (Or how this behaves in certain conditions we are interested in. i.e. Both reference and variant calls found on both strands.
    -Then there are the -2, -3, and -4 flags which imply serveral p-val setting. Default values are given. However, also here an explanation on how to alter this for different practical conditions would be very welcome.
    -The default value for mapQ bias is 0, why?

    We couldn't find much information on these issues in literature or other recources. Nevertheless, some of these setting are crucial in variant calling and I would expect better descriptions than what we could find so far, especially when a clinical setting comes into play. It would be greatly appreciated if anyone could give some answers. Thanks.
    I have many of the same questions and cannot find answers. Can someone give some guidance or points us towards resources which explain this more.

    Comment

    • sergiodealencar
      Junior Member
      • Feb 2011
      • 3

      #3
      I would also like to know how to filter strand bias using GATK Unified Genotyper. What is the ideal SB (Strand Bias) threshold value?

      Thanks,
      Sérgio

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Yesterday, 11:10 AM
      0 responses
      7 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      42 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      104 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      125 views
      0 reactions
      Last Post SEQadmin2  
      Working...