How Varscan process the pileup data?

stvos replied

03-26-2013, 06:00 AM
Varscan results vs Genomeview

Dear all,
I am using /VarScan/x86_64/2.3.3
VarScan Somatic
I seem to have really big differences in SNP coverages in the SNP output file of Varscan and that of the visualized .bam file in Genomeview (similar to IGV).
Has anyone found out yet what causes this?
Leave a comment:
david.tamborero replied

06-22-2012, 06:01 AM
Hi!

I've just landed in this post after a long time without using somatic calling tools.

I am wondering if you solved this, I have no idea why it happened.

Anyway, I've just downloaded the Varscan2, and I've noticed that it incorporates a filter that, among other things, it seems to remove those SNPs which appear close to indels. Could it help to you?
Leave a comment:
bpetersen replied

05-20-2012, 10:34 PM
Am I the only one with this issue? I still can't figure out what the problem is, in other datasets Varscan worked fine for me, but if this isn't fixed I'm going to have to use a different tool...
Leave a comment:

bpetersen replied

05-15-2012, 03:59 AM

Hello everyone,
I'm currently also experiencing problems with Varscan somatic, but they seem different from what has been described here. I've tried using the -B option for samtools pileup, but still I get completely wrong allele counts for my tumor sample for supposedly somatic mutations with very low p-values.
One example:
Create samtools pileup files with the command:

Code:

samtools pileup -Bf ref.fasta in.bam > out.pileup

run varscan somatic:

Code:

java -Xmx16g -jar /home/sukmb205/software/VarScan.v2.2.11.jar somatic normal.pileup tumor.pileup outfile.var

output varscan:

Code:

chrom	position	ref	var	normal_reads1	normal_reads2	normal_var_freq	normal_gt	tumor_reads1	tumor_reads2	tumor_var_freq	tumor_gt	somatic_status	variant_p_value	somatic_p_value	tumor_reads1_plus	tumor_reads1_minus	tumor_reads2_plus	tumor_reads2_minus
chr5	88171909	A	T	46	0	0%	A	91	95	51.08%	W	Somatic	1.0	6.837592912705086E-13	0	91	0	95

Looks like quite a good somatic mutation concerning the p-value, but when I look into IGV, NOT ONE read actually supports the T in the tumor sample. Instead, there is a single read at this position that has been aligned with a 308bp deletion, could this be the problem??

pileup tumor:

Code:

chr5	88171909	A	63	.,,,,,,,,,,,,,,,,,,,,,,..,,,.$.,.,,,,,,.,,,..,,-308actcctcaaactaccttcccacaaagccatttaagttaaatggtacatttacagactcacctacatgaaggatataacttaaaacatctgcttagacacatacgttctgttcagatataaaaaatgtggcaaaaatttttaaaaatataggaccactatattcttaaaatgtgtgttcttctgtgtgtgtgtgttcattcattcaagagatctttgactgcaattaggtagtcggtcctataaaggcttccttgtgtgacgataatttctaaaagtaaaatgctccagtgaatatttctgctaaataa,,,,,,.,....,,...	/2.*114I,1<(%040":+3/--+?%.-.CE2;/3*0-B8=).0II--7,+,II%I09@*CII

pileup normal:

Code:

chr5	88171909	A	71	...$.,,,,,,,,,,,.,,,,,,,,,.,,,,.,,.,,,,,,..,,,.,.,,,..,,,,,,.,...,....^X.^(.	BE,9=2++H8I31.)89,=I).2II=1I,+/8I51I>*2,7+.1I.I%-*-1I/2A-I/3@:I.5I8II(I

I'm seeing similar things at all the high quality somatic mutations that were called and I've tried using mpileup instead of pileup, with option -B, without -B etc. I just don't know what the problem is!
I would really appreciate some comments on this, thanks!

Leave a comment:

swbarnes2 replied

04-19-2012, 10:14 AM
Right, it's definately samtools pileup that is causing the problem. If you comapre the .sam file, and the two pileups (with and without -B) in regions where SNPs are being missed, you can see the problem. pileup -B will faithfully represent the letter quality scores found in the .sam file. pileup without -B will sometimes report the quality of those letters as being terrible, which causes SNP calling softwares to ignore them as too poor quality.

Using -B is supposed to reduce the number of false positives. There might be applications where that potential trade off is worth it, but for what it's worth, I'm skeptical. In my work, I'd much rather sift through false positives than miss real mutations.
Leave a comment:
SPRA replied

04-19-2012, 06:13 AM
Samtools mpileup with -B or -E falg for Varscan

I just ran varscan on mpileup data with -B or -E flag. And there is quite a difference indeed (see below). Should varscan always be used on mpileup data generated with the -B flag?

samtools mpileup -d 10000 -S -B -C 50 -P Illumina -f hg19.fa normal.bam > normal.mpileup
samtools mpileup -d 10000 -S -B -C 50 -P Illumina -f hg19.fa tumor.bam > tumor.mpileup
java -jar VarScan.v2.2.10.jar somatic normal.mpileup tumor.mpileup tumor_vs_normal
java -jar VarScan.v2.2.10.jar processSomatic tumor_vs_normal.snp
48198 VarScan calls processed
3168 were Somatic (1229 high confidence)
37882 were Germline
6937 were LOH

OR

samtools mpileup -d 10000 -S -E -C 50 -P Illumina -f hg19.fa normal.bam > normal.baq.mpileup
samtools mpileup -d 10000 -S -E -C 50 -P Illumina -f hg19.fa tumor.bam > tumor.baq.mpileup
java -jar VarScan.v2.2.10.jar somatic normal.baq.mpileup tumor.baq.mpileup tumor_vs_normal.baq
java -jar VarScan.v2.2.10.jar processSomatic tumor_vs_normal.baq.snp
7593 VarScan calls processed
517 were Somatic (191 high confidence)
6406 were Germline
636 were LOH
Leave a comment:
Jane M replied

03-27-2012, 12:17 AM
Hello,

If some users are experiencing the same issue and read this subject, the problem comes from samtools. Add the -B or -E option to disable the BAQ computation.
Leave a comment:
Jane M replied

03-16-2012, 06:05 AM
Hello,

I just wanted to let you know that there are buggs in VarScan, in both default and somatic modes. There are other people experiencing the same problem like me.
To conclude, I give up VarScan but I hope the problem will be taken into account, because appart this issue, this seems to be an interesting tool!
Leave a comment:
Jane M replied

03-02-2012, 05:11 AM
I ran VarScan in the simple mode on the normal sample, using pileup2snp:

java -Xmx20g -jar VarScan.v2.2.8.jar pileup2snp /data/patient1/s_garma-fibros_converted_sorted.pileup --min-coverage 10 --min-avg-qual 25 > /data/patient1/output_varscan_snp_garma.snp

At the position where I have a problem in read counts, I got:

Code:

chr4 186380515 (G) 184 (A) 5

I have the same problem in both mode. It's very strange... It cannot be related to the version of VarScan since I'm using the latest one, v2.2.8...

I attach here the screenshots from IGV at this specific position.
Attached Files

ReadsInNormalSample.png (51.6 KB, 87 views)

ReadsInTumoralSample.png (45.2 KB, 83 views)
Leave a comment:
Jane M replied

03-02-2012, 12:29 AM
Thank you for your answer Dan,

Here is the output of the pileup file at the position where I have the problem I mentioned:
With VarScan somatic:
- for the normal sample: 183 (reference G) and 4 (variant A)
- for the tumor sample: 8 (reference) and 14 (variant)

With IGV:
- for the normal sample: 185 (reference) and 165 (variant)
- for the tumor sample: 8 (reference) and 359 (variant)

[PCJane patient1]$ grep 186380515 s_garma-fibros_converted_sorted.pileup
chr4 186380515 G 350 .$,$a$,$,$a$a$.Aa,aA.aaa,..,a,.....aa,,,a,,AAaaaA.AA,,Aa,,aa...a,..a,,.AAaaa,a...,,a,,,..A,a,a,.,,.Aaaaa,,,.,Aa.,Aa,,Aa,AAa,aa..a,,a..a,aAa,aA.Aaa.aaAAAA.A,a,aA,A,,,..aa,aaa.A.aaa..aa...A,,,aA.AA..Aa,,aa...a,aaaAA.AaAa,a,AAA.,a,AAA....a..,aa,AA,A.a..aA,aaa.,,aA..AAa,aA.,aA,,AA,,,,,AaAAA,A,,,A.aaa..aa,AAa,,,..A,,,,A.a,a,aa,,....aaa,....,,,,..A..,aa......^~A^~, EC!@C!!B(+@+3J585C@IF7BJIGIJ97FDF9FD337774I3:FF75HH55JIJ5GJJ9HH733555H5GII:J3JIJBH4J3J3JIJJJ53333JIJEJ43IJ43JJ43J334F33EJ4JJ3JI3J333J;3J433J333363J4J3F43J6JGIJJ3:J333?3J333JJ33JGJ4JJJ84J33JJ33GJ33JJJ3J:3346J3343J3J>33JJ*J334IJGJ4JJJ34G33F3J4JJ46J433JDJ33IJ333I33JJ33JJ43JIJJJ43334G3FJJ5F333HF73J564IGJFH5JJGG5H3H4I53IIFFFF454JFFFFHHGHFF7CCF67C@CCCC%E
^C
[PCJane patient1]$

I intend to test Varscan in the "simple mode" today to check if I have this issue too.

When using the CASAVA pipeline on my normal sample, I got:

186380515 chr4 (G)155 (A) 142

By comparison between, IGV, CASAVA and Varscan in the normal sample, I have:
-for the reference: 185, 155, 183
-for the variant: 165, 142, 4

Moreover, I tried JointSNVMix, which let the direct comparison between normal and tumoral sample. At this specific position, I have nothing. But at several sites, where I see this problem in VarScan ,my results (number of read counts) with JointSNV are closed to the ones in IGV...

Last edited by Jane M; 03-02-2012, 01:39 AM.
Leave a comment:
dkoboldt replied

03-01-2012, 08:27 AM
Jane,

It will also help if you provide the pileup output for 1-2 examples of this base dropout - we can quickly look at what's in it to see if VarScan is not counting anything.

Please note that the base quality parsing issue was resolved as of VarScan v2.2.8. If any of you encounter it, please let me know!

Yours,

Dan Koboldt
Leave a comment:
Jane M replied

02-06-2012, 08:49 AM
Originally posted by david.tamborero View Post

I guess than before running varscan you have converted the bam file to pileup file, right?

If so, such 'lost' reads could be explained by the samtools mpileup command. Check its arguments: the mapping_quality filtering works (as far as i remember, and as opposite to the base_quality filtering), therefore some reads can be removed due to this parameter. The other source of reads removal during bam to pileup conversion can be the 'anomalous read pairs': check the '-A' argument of the mpileup command to see if reads counts are more consistent with IGV data (which opens the bam file).

Exactly, I have converted my bam files into pileup files to use VarScan.

samtools mpileup -f ~/fasta/hg19.fasta /data/patient1/s_garma-296_converted_sorted.bam > /data/patient1/s_garma-296_converted_sorted.pileup

The default arguments are:

-q INT skip alignments with mapQ smaller than INT [0]
-Q INT skip bases with baseQ/BAQ smaller than INT [13]

I checked on IGV the PHRED of some of the reads at this position and they were good (>30). So the missing reads are probably not due to base quality. I don't know for the mapping quality... How can I check it?

As you suggested me, I re-run samtools mpileup with -A:

samtools mpileup -f ~/fasta/hg19.fasta /data/patient1/s_garma-296_converted_sorted.bam > /data/patient1/s_garma-296_converted_sorted.pileup

And I have still the same kind of problems: LOH considered as somatic mutations because of missing reads.
I am not sure what is doing the -A option... Does it add some anomalous reads?

Last edited by Jane M; 02-06-2012, 08:52 AM.
Leave a comment:
david.tamborero replied

02-02-2012, 09:26 AM
I guess than before running varscan you have converted the bam file to pileup file, right?

If so, such 'lost' reads could be explained by the samtools mpileup command. Check its arguments: the mapping_quality filtering works (as far as i remember, and as opposite to the base_quality filtering), therefore some reads can be removed due to this parameter. The other source of reads removal during bam to pileup conversion can be the 'anomalous read pairs': check the '-A' argument of the mpileup command to see if reads counts are more consistent with IGV data (which opens the bam file).
Leave a comment:
Jane M replied

02-02-2012, 08:56 AM
I tried VarScan in the somatic mode and I also experienced this problem.
For example, VarScan gives as output the following number of reads:

- for the normal sample: 183 (reference) and 4 (variant)
- for the tumor sample: 8 (reference) and 14 (variant)

This results in a somatic mutation. I checked my bam files in IGV and I found:
- for the normal sample: 185 (reference) and 165 (variant)
- for the tumor sample: 8 (reference) and 359 (variant)

Of course, some reads can be deleted because they have low quality, but I doubt that it could be the cases for so many reads... This variation is rather a LOH rather than a somatic mutation.
And I've got this kind of results several times...

I wonder how this problem affect the SNPs detection and how wrong is my analysis...

Do you know if this problem will be taken into account?
Leave a comment:
Jane M replied

01-23-2012, 02:17 AM
Ok, thanks!
Leave a comment:

Previous 1 2 template Next

Addressing Off-Target Effects in CRISPR Technologies

by seqadmin

The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality¹. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes². This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways³. Identifying the full range...
- Channel: Articles
08-27-2024, 04:44 AM
Selecting and Optimizing mRNA Library Preparations

by seqadmin

Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
- Channel: Articles
08-07-2024, 12:11 PM

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News