Seqanswers Leaderboard Ad

**ernfrid** · 03-16-2012, 09:38 AM

You might try:
SomaticSniper (http://gmt.genome.wustl.edu/somatic-sniper/current/), but it will only report SNVs.

The GATK's somatic indel detector (http://www.broadinstitute.org/gsa/wi...Indel_Detector).

The samtools package's mpileup command plus bcftools in "paired mode" ( http://samtools.sourceforge.net/samtools.shtml)

**dkoboldt** · 03-16-2012, 10:34 AM

Jane M,

Thank you for your message. I must respectfully disagree with your statement that Varscan "has important bugs in it." There are dozens of groups using it with great success to detect variants in humans and model organisms, and to call somatic mutations in cancer datasets.

However, I did realize that a few of your questions from this thread were outstanding, and I've done my best to answer them:

http://seqanswers.com/forums/showthread.php?p=67930

I would like to also recommend another tool developed at our institute for somatic mutation calling, SomaticSniper:

http://gmt.genome.wustl.edu/somatic-sniper/

**Jane M** · 03-16-2012, 01:51 PM

Thank both of you for your answers.

Dan, thank you for your answers to my questions on the other topic.
I must say that my main question/issue isn't solved: http://seqanswers.com/forums/showthread.php?t=16599.
I'm sure that my "missing reads" have not been filtered out due to low mapping or base quality. And as I specified, some other people have the same problem.

Could the problem not come from the fact that the model isn't adapted to all kinds of data? Or different versions of JDK, JVF? Or libraries, machine configuration?...

**patternist** · 03-17-2012, 06:50 PM

Hi,

Check the bambino out, it reports both SNVs and indels, then you can annotate with ANNOVAR.

https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html

**Jane M** · 03-18-2012, 12:02 PM

Thanks patternist, I didn't know bambino! I have found a publication (Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format, from January 2011) but the model is not described.
Do you know where I can find the details concerning what is done in it?

**airtime** · 03-20-2012, 06:21 AM

Hi,

VarScan2 is published try it out and share your experience.
It should be better than the first version (mentioned in the paper) and also outperformce the SomaticSniper.

**Jane M** · 03-20-2012, 08:50 AM

I have tried VarScan2 (2.2.8) in both modes (simple and somatic) and what I think is at the beginning of the topic and here: http://seqanswers.com/forums/showthread.php?t=16599.

**airtime** · 03-21-2012, 02:12 AM

Hi,

Ifound this info at VarScan 2 description:

Base alignment quality (BAQ) computation is turned on by default. BAQ is a phred-like score representing the probability that a read base is mis-aligned; it lowers the base quality score of mismatches that are near indels. This is to help rule out false positive SNP calls due to alignment artifacts near small indels. There have been recent suggestions, however, that BAQ may be too strict and cause real SNPs to be missed. Several users of the VarScan variant caller have reported that its read counts disagree with what is seen in IGV, or somatic mutations were missed when mpileup was used instead of pileup. These issues are almost always due to BAQ’s downgrade of base qualities to 0 or 1. This adjustment can’t be seen in IGV, but it’s below VarScan’s default base quality threshold. You can disable BAQ with the -B parameter, or perform a more sensitive BAQ calculation with -E. I’ve heard that the latter option will be turned on by default in the next version of SAMtools.

I hope it help's

**Jane M** · 03-22-2012, 12:53 AM

Hi,
Thank you for the info, I haven' read it.
Well, I'm rather in the case: "Several users of the VarScan variant caller have reported that its read counts disagree with what is seen in IGV".
From what I read, I could solve the problem with -B or -E parameters.
Could you please tell me where you got this info? I am wondering since Dan Kobold, who is VarScan maintainer, didn't suggest me that few days ago... Was the solution that you found proposed by the author?

**airtime** · 03-22-2012, 11:06 PM

Hi,

I found it here:

5 Things to Know About SAMtools Mpileup

http://www.massgenomics.org/2012/03/5-things-to-know-about-samtools-mpileup.html

Details on the samtools mpileup command, base alignment quality (BAQ), multi-sample calling, and other features.

It depends on samtool parameters, this could be the reason that Kobold didn't find out.

regards

Air

**Jane M** · 03-23-2012, 02:50 AM

Hi airtime,

Thanks for the link.
Yesterday, I reran samtools with -B option then VarScan2 and all the "bugs"= wrong read counts that I had noticed were now correct!

So thank you very much for the info !!!! I have been experiencing this issue for 2-3 months and you solved it

Thanks a lot !

I must admit that I don't understand yet why this option can change so much the results:
For example, at one position, I have:

In IGV: 185 (normal sample, reference) 165 (normal sample, variant) 8(tumoral sample,reference) 359(tumoral sample, variant)
In VarScan2 (without -B option in samtools) : 183 (normal sample, reference) 4 (normal sample, variant) 8(tumoral sample,reference) 14(tumoral sample, variant)
In VarScan2 (with -B option in samtools) : 184 (normal sample, reference) 164 (normal sample, variant) 8(tumoral sample,reference) 359(tumoral sample, variant)

I am much more confident in the results now

Now, I should apologize to Dan Kobold... The bugs were not in VarScan, sorry!
Dan, you told me that dozens of groups are using VarScan to detect variants. Maybe you could try to warn them about this issue, because the ones who are not using -B or -E option for samtools are probably working on incorrect data.

The last issue that I'm experiencing with VarScan2 is the strand filter. I am running it this way:

java -Xmx10g -jar VarScan.v2.2.8.jar somatic /data/fibros_convertedAB_sorted.pileup /data/296_convertedAB_sorted.pileup --output-snp /data/output_varscan_AB.snp --output-indel /data/output_varscan_AB.indel --min-coverage 10 --min-coverage-normal 10 --min-coverage-tumor 10 --min-var-freq 0.1 --min-freq-for-hom 0.75 --normal-purity 1 --tumor-purity 1 --p-value 0.01 --somatic-p-value 0.01 --strand-filter 1 --min-avg-qual 25 --min-strands2 2 --min-reads2 3

then SomaticFilter:

java -Xmx20g -jar VarScan.v2.2.8.jar somaticFilter /data/output_varscan_AB.snp --min-strands2 2 --min-avg-qual 25 --min-var-freq 0.1 --p-value 0.05 --indel-file /data/output_varscan_AB.indel --output-file /data/output_somaticFilter_varscan_AB.snp

But I get such an output:

chrom position ref var normal_reads1 normal_reads2 normal_var_freq normal_gt tumor_reads1 tumor_reads2 tumor_var_freq tumor_gt somatic_status variant_p_value somatic_p_value tumor_reads1_plus tumor_reads1_minus tumor_reads2_plus tumor_reads2_minus
chr4 114260538 C T 35 40 53,33% Y 0 86 100% T LOH 1.0 9.50234823641282E-15 0 0 0 86

Why this position has not been filtered out by "--strand-filter 1". For me, there is clearly a strand bias here...

**dkoboldt** · 03-28-2012, 08:20 AM

Jane,

Thank you for this detailed post, and for following up on this strand question. Your site is homozygous in the tumor (due to LOH) but VarScan's strand filter currently only works on sites that are heterozygous in the tumor.

This is because it compares the strand representation of the reference allele to the strand representation of the variant allele. If no reference alleles are seen in the tumor, that comparison can't be made.

Your comment has me thinking, however, that the strand filtering capabilities in VarScan need some improvement. I'll work on that for the next release.

In the meantime, you might try the filtering strategy we outlined in the VarScan 2 paper, in which you run bam-readcount on all sites and then process the results with the VarScan 2 accessory script fpfilter.pl.

**swbarnes2** · 03-28-2012, 12:53 PM

For the record, I observed the same thing with mpileup and BAQ calculations on a few occaions. Specifically, I observed that some SNPs that were called fine with pileup were vanishing in mpileup, including seom which had been verified with sanger sequencing. When I looked at the pileup files made by mpileup, and compared them to the .sam files, it was clear that mpileup was representing the quality scores of the alternate letters as being almost 0, while in the .sam, the quality scores were high. The older pileup was faithfully carrying over the quality scores in the pileup output file. A little investigating, and I saw that it was the BAQ calculations responsible, on by default in mpileup. When I disengaged them with -B, the quality scores in the pileup output files matched the quality scores in the .sam files, and the SNPs were callable.

**Jane M** · 03-29-2012, 07:21 AM

Thank you for the explanation Dan. Do you do a FET for the strand filter on sites that are heterozygous in the tumor? I guess that you can take the number of reads supporting the reference (in forward and reverse strands) in tumoral sample as theoretical counts and the number of reads supporting the variant (in forward and reverse strands) in tumoral sample as observed counts...

I don't enough experiment yet, but I assume that handling only the cases where sites are heterozygous in the tumor allows to filter half of data? Or is it known that in tumour, we observed more heterozygous sites than homozygous mutated sites?

I have developed a basic filter to handle the "other half" of the cases, but for now, it's not very good. When do you think to have the next release ready? Any idea?

I will try the bam-readcount and fpfilter.pl to filter more false positives !
Thanks,
Jane

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Detection of somatic mutations in normal & tumour paired NGS data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News