Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • franka
    replied
    Hi,
    I'm using the latest version of varScan
    Does the somatic filter support output in .vcf format?
    I used the indel and snp .vcf files obtained with somatic command line:
    java -Xmx6g -jar VarScan.v2.3.6.jar somaticFilter 34-C-S.varScan.output.snp.vcf --min-strands2 2 --min-avg-qual 25 --min-var-freq 0.3 --p-value 0.05 --min-strands2 2 --min-reads2 3 --indel-file 34-C-S.varScan.output.indel.vcf --output-vcf 1 34-C-S.vcf

    The software starts:
    Reading input from /Users/mac2/Documents/trasferimento/Napoli_2013/bam_bgi_pileup/34-C-S.varScan.output.snp.vcf
    1927 cluster SNPs identified
    Reading input from /Users/mac2/Documents/trasferimento/Napoli_2013/bam_bgi_pileup/34-C-S.varScan.output.snp.vcf
    97671 variants in input stream
    395 failed to meet coverage requirement
    969 failed to meet reads2 requirement
    5504 failed to meet varfreq requirement
    3510 failed to meet p-value requirement
    1340 in SNP clusters were removed
    329 were removed near indels
    85624 passed filters

    but no output file has been created
    Many thanks,
    Francesco

    Leave a comment:


  • dkoboldt
    replied
    Uma,

    Thank you for providing the files, which helped me track down the issue. As I'd suspected, the entries in the VCF file for SNVs were slightly more numerous than the native SNV output file.

    This is because we output "indelError" calls (positions where normal shows a SNV but tumor shows an indel or vice-versa) to the VCF for the sake of completeness. Their filter status is "indelError" to indicate that these are likely artifactual calls. We don't output them to the native output format for that reason.

    If I remove "indelError" positions from your SNP VCF and then apply somaticFilter, the results are identical to running it on the native output files. That being said, you might wish to use the filtering results from the unmodified VCF, because SNVs clustering around "indelError" calls should probably be removed. After that, any "indelError" calls that passed somaticFilter can be removed using grep.

    Thank you for your help on this!

    Yours,

    Dan Koboldt

    Leave a comment:


  • stqa8350
    replied
    Hi Dan,

    There is a difference in results and sent you sample data.

    Many thanks for your help

    Uma

    Leave a comment:


  • stqa8350
    replied
    Just to be sure, I can do an intersect given by their chrom, position between the native and vcf results (after just somatic command line arg).

    Uma

    Leave a comment:


  • stqa8350
    replied
    The difference in results (native output Vs VCF output) occurs after the somaticFilter step. In general I use something like the following args -

    java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter ./varscan-out/65_varscan13output.snps --min-var-freq 0.5 --indel-file ./varscan-out/65_varscan13output.indel --output ./varscan-out/65_varscan13output.snps.filtered

    java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter ./varscan-out/65_varscan13output.snps --min-var-freq 0.5 --indel-file ./varscan-out/65_varscan13output.indel.vcf --output-vcf > ./varscan-out/65_varscan13output.snps.filtered.vcf

    On the somatic results (java -jar ~/tools/VarScan.v2.3.1.jar somatic); the native and vcf results are about the same row lines, so I reckon the difference might not occur therein.

    Also if it was a parsing error and the above somaticFilter commanline arg is correct, then I would assume native-output should atleast be a subset of vcf-output, but instead I get uniques.

    I can send you 1000 lines of the somaticFilter output.

    Thanks

    Uma

    Leave a comment:


  • dkoboldt
    replied
    That's a curious result, and it could reflect an error in the new VCF parsing code. Would you be able to send me your 4 files (SNP and indel, original and VCF) or at least the first 1,000 lines or so? Send it to dkoboldt (at) genome [dot] wustl (dot) edu.

    Thanks,
    Dan Koboldt

    Leave a comment:


  • stqa8350
    started a topic varscan somaticFilterResults

    varscan somaticFilterResults

    Hi

    While using Varscan (V.2.3.1); I have used default options but get different results (for the same dataset) using indel.vcf filters (on snp.vcf) and simply using indelvcf filter (on snp dataset). In theory, this is just a file format variation and the stats should remain the same. However I see a difference due to p-value.

    time java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter 65_varscan-out.snp -min-var-freq 0.5 --indel-file 65_varscan-out.indel --output-file 65-filtered
    Window size: 10
    Window SNPs: 3
    Indel margin: 3
    Reading input from 65_varscan-out.snp
    2962 cluster SNPs identified
    Reading input from 65_varscan-out.snp
    88168 variants in input stream
    13612 failed to meet coverage requirement
    5579 failed to meet reads2 requirement
    24230 failed to meet varfreq requirement
    40748 failed to meet p-value requirement
    45 in SNP clusters were removed
    1 were removed near indels
    3953 passed filters

    real 0m3.532s
    user 0m2.268s
    sys 0m0.188s

    time java -jar ~/tools/VarScan.v2.3.1.jar somaticFilter 65_varscan-out.snp.vcf -min-var-freq 0.5 --indel-file 65_varscan-out.indel.vcf --output-file 65-filtered.vcf
    Window size: 10
    Window SNPs: 3
    Indel margin: 3
    Reading input from 65_varscan-out.snp.vcf
    2972 cluster SNPs identified
    Reading input from 65_varscan-out.snp.vcf
    88177 variants in input stream
    13615 failed to meet coverage requirement
    5583 failed to meet reads2 requirement
    24231 failed to meet varfreq requirement
    2862 failed to meet p-value requirement
    367 in SNP clusters were removed
    39 were removed near indels
    41480 passed filters

    real 0m1.506s
    user 0m2.504s
    sys 0m0.272s

    Any particular reason for these differences ? Please note that on a quick comparison between *.vcf files and its corresponding snp and indel files, there are no differences when compared by its chr and position.

    Many Thanks

Latest Articles

Collapse

  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM
  • seqadmin
    Investigating the Gut Microbiome Through Diet and Spatial Biology
    by seqadmin




    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
    02-24-2025, 06:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
180 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-28-2025, 12:58 PM
0 responses
275 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-24-2025, 02:48 PM
0 responses
663 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2025, 02:46 PM
0 responses
268 views
0 likes
Last Post seqadmin  
Working...
X