Varscan results vs Genomeview
Dear all,
I am using /VarScan/x86_64/2.3.3
VarScan Somatic
I seem to have really big differences in SNP coverages in the SNP output file of Varscan and that of the visualized .bam file in Genomeview (similar to IGV).
Has anyone found out yet what causes this?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi!
I've just landed in this post after a long time without using somatic calling tools.
I am wondering if you solved this, I have no idea why it happened.
Anyway, I've just downloaded the Varscan2, and I've noticed that it incorporates a filter that, among other things, it seems to remove those SNPs which appear close to indels. Could it help to you?
Leave a comment:
-
Am I the only one with this issue? I still can't figure out what the problem is, in other datasets Varscan worked fine for me, but if this isn't fixed I'm going to have to use a different tool...
Leave a comment:
-
Hello everyone,
I'm currently also experiencing problems with Varscan somatic, but they seem different from what has been described here. I've tried using the -B option for samtools pileup, but still I get completely wrong allele counts for my tumor sample for supposedly somatic mutations with very low p-values.
One example:
Create samtools pileup files with the command:
Code:samtools pileup -Bf ref.fasta in.bam > out.pileup
Code:java -Xmx16g -jar /home/sukmb205/software/VarScan.v2.2.11.jar somatic normal.pileup tumor.pileup outfile.var
Code:chrom position ref var normal_reads1 normal_reads2 normal_var_freq normal_gt tumor_reads1 tumor_reads2 tumor_var_freq tumor_gt somatic_status variant_p_value somatic_p_value tumor_reads1_plus tumor_reads1_minus tumor_reads2_plus tumor_reads2_minus chr5 88171909 A T 46 0 0% A 91 95 51.08% W Somatic 1.0 6.837592912705086E-13 0 91 0 95
pileup tumor:
Code:chr5 88171909 A 63 .,,,,,,,,,,,,,,,,,,,,,,..,,,.$.,.,,,,,,.,,,..,,-308actcctcaaactaccttcccacaaagccatttaagttaaatggtacatttacagactcacctacatgaaggatataacttaaaacatctgcttagacacatacgttctgttcagatataaaaaatgtggcaaaaatttttaaaaatataggaccactatattcttaaaatgtgtgttcttctgtgtgtgtgtgttcattcattcaagagatctttgactgcaattaggtagtcggtcctataaaggcttccttgtgtgacgataatttctaaaagtaaaatgctccagtgaatatttctgctaaataa,,,,,,.,....,,... /2.*114I,1<(%040":+3/--+?%.-.CE2;/3*0-B8=).0II--7,+,II%I09@*CII
Code:chr5 88171909 A 71 ...$.,,,,,,,,,,,.,,,,,,,,,.,,,,.,,.,,,,,,..,,,.,.,,,..,,,,,,.,...,....^X.^(. BE,9=2++H8I31.)89,=I).2II=1I,+/8I51I>*2,7+.1I.I%-*-1I/2A-I/3@:I.5I8II(I
I would really appreciate some comments on this, thanks!
Leave a comment:
-
Right, it's definately samtools pileup that is causing the problem. If you comapre the .sam file, and the two pileups (with and without -B) in regions where SNPs are being missed, you can see the problem. pileup -B will faithfully represent the letter quality scores found in the .sam file. pileup without -B will sometimes report the quality of those letters as being terrible, which causes SNP calling softwares to ignore them as too poor quality.
Using -B is supposed to reduce the number of false positives. There might be applications where that potential trade off is worth it, but for what it's worth, I'm skeptical. In my work, I'd much rather sift through false positives than miss real mutations.
Leave a comment:
-
Samtools mpileup with -B or -E falg for Varscan
I just ran varscan on mpileup data with -B or -E flag. And there is quite a difference indeed (see below). Should varscan always be used on mpileup data generated with the -B flag?
samtools mpileup -d 10000 -S -B -C 50 -P Illumina -f hg19.fa normal.bam > normal.mpileup
samtools mpileup -d 10000 -S -B -C 50 -P Illumina -f hg19.fa tumor.bam > tumor.mpileup
java -jar VarScan.v2.2.10.jar somatic normal.mpileup tumor.mpileup tumor_vs_normal
java -jar VarScan.v2.2.10.jar processSomatic tumor_vs_normal.snp
48198 VarScan calls processed
3168 were Somatic (1229 high confidence)
37882 were Germline
6937 were LOH
OR
samtools mpileup -d 10000 -S -E -C 50 -P Illumina -f hg19.fa normal.bam > normal.baq.mpileup
samtools mpileup -d 10000 -S -E -C 50 -P Illumina -f hg19.fa tumor.bam > tumor.baq.mpileup
java -jar VarScan.v2.2.10.jar somatic normal.baq.mpileup tumor.baq.mpileup tumor_vs_normal.baq
java -jar VarScan.v2.2.10.jar processSomatic tumor_vs_normal.baq.snp
7593 VarScan calls processed
517 were Somatic (191 high confidence)
6406 were Germline
636 were LOH
Leave a comment:
-
Hello,
If some users are experiencing the same issue and read this subject, the problem comes from samtools. Add the -B or -E option to disable the BAQ computation.
Leave a comment:
-
Hello,
I just wanted to let you know that there are buggs in VarScan, in both default and somatic modes. There are other people experiencing the same problem like me.
To conclude, I give up VarScan but I hope the problem will be taken into account, because appart this issue, this seems to be an interesting tool!
Leave a comment:
-
I ran VarScan in the simple mode on the normal sample, using pileup2snp:
java -Xmx20g -jar VarScan.v2.2.8.jar pileup2snp /data/patient1/s_garma-fibros_converted_sorted.pileup --min-coverage 10 --min-avg-qual 25 > /data/patient1/output_varscan_snp_garma.snp
Code:chr4 186380515 (G) 184 (A) 5
I attach here the screenshots from IGV at this specific position.
Leave a comment:
-
Thank you for your answer Dan,
Here is the output of the pileup file at the position where I have the problem I mentioned:
With VarScan somatic:
- for the normal sample: 183 (reference G) and 4 (variant A)
- for the tumor sample: 8 (reference) and 14 (variant)
With IGV:
- for the normal sample: 185 (reference) and 165 (variant)
- for the tumor sample: 8 (reference) and 359 (variant)
[PCJane patient1]$ grep 186380515 s_garma-fibros_converted_sorted.pileup
chr4 186380515 G 350 .$,$a$,$,$a$a$.Aa,aA.aaa,..,a,.....aa,,,a,,AAaaaA.AA,,Aa,,aa...a,..a,,.AAaaa,a...,,a,,,..A,a,a,.,,.Aaaaa,,,.,Aa.,Aa,,Aa,AAa,aa..a,,a..a,aAa,aA.Aaa.aaAAAA.A,a,aA,A,,,..aa,aaa.A.aaa..aa...A,,,aA.AA..Aa,,aa...a,aaaAA.AaAa,a,AAA.,a,AAA....a..,aa,AA,A.a..aA,aaa.,,aA..AAa,aA.,aA,,AA,,,,,AaAAA,A,,,A.aaa..aa,AAa,,,..A,,,,A.a,a,aa,,....aaa,....,,,,..A..,aa......^~A^~, EC!@C!!B(+@+3J585C@IF7BJIGIJ97FDF9FD337774I3:FF75HH55JIJ5GJJ9HH733555H5GII:J3JIJBH4J3J3JIJJJ53333JIJEJ43IJ43JJ43J334F33EJ4JJ3JI3J333J;3J433J333363J4J3F43J6JGIJJ3:J333?3J333JJ33JGJ4JJJ84J33JJ33GJ33JJJ3J:3346J3343J3J>33JJ*J334IJGJ4JJJ34G33F3J4JJ46J433JDJ33IJ333I33JJ33JJ43JIJJJ43334G3FJJ5F333HF73J564IGJFH5JJGG5H3H4I53IIFFFF454JFFFFHHGHFF7CCF67C@CCCC%E
^C
[PCJane patient1]$
When using the CASAVA pipeline on my normal sample, I got:
186380515 chr4 (G)155 (A) 142
-for the reference: 185, 155, 183
-for the variant: 165, 142, 4
Moreover, I tried JointSNVMix, which let the direct comparison between normal and tumoral sample. At this specific position, I have nothing. But at several sites, where I see this problem in VarScan ,my results (number of read counts) with JointSNV are closed to the ones in IGV...Last edited by Jane M; 03-02-2012, 01:39 AM.
Leave a comment:
-
Jane,
It will also help if you provide the pileup output for 1-2 examples of this base dropout - we can quickly look at what's in it to see if VarScan is not counting anything.
Please note that the base quality parsing issue was resolved as of VarScan v2.2.8. If any of you encounter it, please let me know!
Yours,
Dan Koboldt
Leave a comment:
-
Originally posted by david.tamborero View PostI guess than before running varscan you have converted the bam file to pileup file, right?
If so, such 'lost' reads could be explained by the samtools mpileup command. Check its arguments: the mapping_quality filtering works (as far as i remember, and as opposite to the base_quality filtering), therefore some reads can be removed due to this parameter. The other source of reads removal during bam to pileup conversion can be the 'anomalous read pairs': check the '-A' argument of the mpileup command to see if reads counts are more consistent with IGV data (which opens the bam file).
samtools mpileup -f ~/fasta/hg19.fasta /data/patient1/s_garma-296_converted_sorted.bam > /data/patient1/s_garma-296_converted_sorted.pileup
-q INT skip alignments with mapQ smaller than INT [0]
-Q INT skip bases with baseQ/BAQ smaller than INT [13]
As you suggested me, I re-run samtools mpileup with -A:
samtools mpileup -f ~/fasta/hg19.fasta /data/patient1/s_garma-296_converted_sorted.bam > /data/patient1/s_garma-296_converted_sorted.pileup
I am not sure what is doing the -A option... Does it add some anomalous reads?Last edited by Jane M; 02-06-2012, 08:52 AM.
Leave a comment:
-
I guess than before running varscan you have converted the bam file to pileup file, right?
If so, such 'lost' reads could be explained by the samtools mpileup command. Check its arguments: the mapping_quality filtering works (as far as i remember, and as opposite to the base_quality filtering), therefore some reads can be removed due to this parameter. The other source of reads removal during bam to pileup conversion can be the 'anomalous read pairs': check the '-A' argument of the mpileup command to see if reads counts are more consistent with IGV data (which opens the bam file).
Leave a comment:
-
I tried VarScan in the somatic mode and I also experienced this problem.
For example, VarScan gives as output the following number of reads:
- for the normal sample: 183 (reference) and 4 (variant)
- for the tumor sample: 8 (reference) and 14 (variant)
This results in a somatic mutation. I checked my bam files in IGV and I found:
- for the normal sample: 185 (reference) and 165 (variant)
- for the tumor sample: 8 (reference) and 359 (variant)
Of course, some reads can be deleted because they have low quality, but I doubt that it could be the cases for so many reads... This variation is rather a LOH rather than a somatic mutation.
And I've got this kind of results several times...
I wonder how this problem affect the SNPs detection and how wrong is my analysis...
Do you know if this problem will be taken into account?
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 10-02-2024, 04:51 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
10-02-2024, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: