Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SNP at every chromosome position

    Hi,

    I SOLiD sequencing data. After aligning with SHRiMP2, I used samtools mpileup for SNP calling:

    samtools mpileup -C50 -gf hg38.fa -o var.raw.bcf input.bam

    bcftools call -o var.raw.vcf -O v -c var.raw.bcf

    The raw vcf file looks like this:

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SINDHI
    chr1 33953 . T . 56.8087 PASS DP=11;MQSB=0.950952;MQ0F=1;AF1=0;AC1=0;DP4=6,5,0,0;MQ=0;FQ=-59.9998 GT:PL 0/0:0
    chr1 33954 . C . 56.8087 PASS DP=11;MQSB=0.950952;MQ0F=1;AF1=0;AC1=0;DP4=6,5,0,0;MQ=0;FQ=-59.9998 GT:PL 0/0:0
    chr1 33955 . T . 56.4609 PASS DP=10;MQSB=0.952347;MQ0F=1;AF1=0;AC1=0;DP4=5,5,0,0;MQ=0;FQ=-56.997 GT:PL 0/0:0
    chr1 35396 . C . 56.4609 PASS DP=10;MQSB=0.952347;MQ0F=1;AF1=0;AC1=0;DP4=5,5,0,0;MQ=0;FQ=-56.997 GT:PL 0/0:0
    chr1 35397 . C . 61.2368 PASS DP=12;MQSB=0.95494;MQ0F=1;AF1=0;AC1=0;DP4=5,7,0,0;MQ=0;FQ=-62.9905 GT:PL 0/0:0
    chr1 35398 . C . 61.2368 PASS DP=12;MQSB=0.95494;MQ0F=1;AF1=0;AC1=0;DP4=5,7,0,0;MQ=0;FQ=-62.9905 GT:PL 0/0:0
    chr1 35399 . A . 61.2368 PASS DP=12;MQSB=0.95494;MQ0F=1;AF1=0;AC1=0;DP4=5,7,0,0;MQ=0;FQ=-62.9905 GT:PL 0/0:0
    chr1 35400 . A . 61.2368 PASS DP=12;MQSB=0.95494;MQ0F=1;AF1=0;AC1=0;DP4=5,7,0,0;MQ=0;FQ=-62.9905 GT:PL 0/0:0
    chr1 35401 . C . 61.2368 PASS DP=12;MQSB=0.95494;MQ0F=1;AF1=0;AC1=0;DP4=5,7,0,0;MQ=0;FQ=-62.9905 GT:PL 0/0:0

    As you can see, the chromosome position is continuous. I know that these are raw variant file but it contains 333,399,862 variants (almost as the number of bases). So, how can I filter this (there are so many false positives), i need filtration or should i do something at mpileup or bcftools call stage?

  • #2
    You want to call -cv to display only the variant positions, instead of call -c which shows all positions. You can see in your vcf it lists ref bases without an alt, so the positions in your example would all be removed.
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Best Practices for Single-Cell Sequencing Analysis
      by seqadmin



      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
      Today, 07:15 AM
    • seqadmin
      Latest Developments in Precision Medicine
      by seqadmin



      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      05-24-2024, 01:16 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 08:18 AM
    0 responses
    11 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Today, 08:04 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 06-03-2024, 06:55 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-30-2024, 03:16 PM
    0 responses
    27 views
    0 likes
    Last Post seqadmin  
    Working...
    X