Varscan VCF output bugs(?)

coco90417 replied

02-27-2014, 10:24 AM
Varscan vcf output for indel

Hi,

I am also encountering issues with vcf output of indels. I have got indels that look like this:

1 984171 . CAG AG .
1 1588744 . AGCG GCG .

I checked genome browser for the context of both mutations(http://genome.ucsc.edu/cgi-bin/hgTra...A984170-984180 and http://genome.ucsc.edu/cgi-bin/hgTra...588740-1588750), it seems that the first one is supposed to be simple deletion of the first base and the should look like this:

1 984170 . GC G .

And the second one can be either represented by a block substitution that looks like this:

1 1588743 . AAG AG .

or a deletion (if you align the deletion to the left) that looks like this:

1 1588742 . GA G .

So I do not know whether I did something wrong or it was because Varscan has a different vcf output format for indels?

Please help me. Many many thanks.
Leave a comment:
IsmailM replied

02-11-2014, 10:35 AM
If you are using VarScan mpileup2snp or mpileup2indel, why does the QUAL column not have a number in it?
Leave a comment:

bw. replied

02-04-2014, 03:26 PM

I'm also seeing the slashes with VarScan v2.3.6
I wrote this script to convert the slashes to commas:

Code:

import sys

if len(sys.argv) < 2:  sys.exit("Usage: " + sys.argv[0] + "  vcf_filename")

in_fname = sys.argv[1]
out_fname = (in_fname[:-4] if in_fname.endswith(".vcf") else in_frame) + ".fixed.vcf"
print("Writing to: " + out_fname)
out = open(out_fname, "w")
for line in open(in_fname):
        if not line or line[0] is "#":
                out.write(line)
        else:
                fields = line.split("\t")
                fields[3] = fields[3].replace("/", ",").replace("\\", ",")   # remove any slashes from REF field
                fields[4] = fields[4].replace("/", ",").replace("\\", ",")   # remove any slashes from ALT field
                out.write("\t".join(fields))

To use, just copy-paste into a file (lets say script.py) and run:

python script.py file.vcf

Also, this version of the script just removes the vcf records with slashes:

Code:

import sys

if len(sys.argv) < 2:  sys.exit("Usage: " + sys.argv[0] + "  vcf_filename")

in_fname = sys.argv[1]
out_fname = (in_fname[:-4] if in_fname.endswith(".vcf") else in_frame) + ".fixed.vcf"
print("Writing to: " + out_fname)
out = open(out_fname, "w")
for line in open(in_fname):
        if not line or line[0] is "#":
                out.write(line)
        else:
                fields = line.split("\t")
                if "\\" not in (fields[3]+fields[4]) and "/" not in (fields[3]+fields[4]):
                        out.write("\t".join(fields))

Last edited by bw.; 02-05-2014, 02:16 PM. Reason: Turns out slashes also sometimes appear in the REF field, so added checks for that.

Leave a comment:

rnahar replied

12-23-2013, 08:33 PM
I am also facing the +/- issue in the varscan indel notations - however i do not use the vcf output but prefer the regular tabular output of Varscan. Is there a way that this indel notation can be changed so as to be compatible with annovar ? I use Varscan 2.3.6
Leave a comment:
IsmailM replied

07-25-2013, 02:17 PM
solved indel vcf format with awk command

Here is an awk command that can change your indel vcf format into the correct format.

cat Original_VCF | awk 'BEGIN {OFS="\t"} NR <= 24' > FINAL_VCF && cat Original_VCF | awk 'BEGIN {OFS="\t"} NR >= 25 { if (length($4)>length($5)) {$5 = substr($4, 0, 1)}; print }' >> FINAL_VCF

It uses two awk commands because the second command changes the header of the file if you run it on the whole file. So the first awk command transfers the header(assumed to be 24 lines) and then from the 25th line down is the vcf indels that are changed to the correct indel format using the second awk command.
Leave a comment:
eeyun replied

05-15-2013, 10:39 AM
Originally posted by eeyun View Post

As far as I can tell, it should be ref = TTCC and alt = T

Attachment included here to show the variant in question.
Attached Files

varscan problem.PNG (1.8 KB, 64 views)
Leave a comment:
eeyun replied

05-15-2013, 10:38 AM
Originally posted by eeyun View Post

We are having the same problem with 2.3.5

<pre>#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
chr1 6529182 . TTCC TCC . PASS ADP=314;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDPP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:322:314:178:138:43.67%:1.1101E-50:34:31:88:90:70:68</pre>

As far as I can tell, it should be ref = TTCC and alt = T
Leave a comment:
eeyun replied

05-15-2013, 10:32 AM
We are having the same problem with 2.3.5

<pre>#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
chr1 6529182 . TTCC TCC . PASS ADP=314;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDPP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:255:322:314:178:138:43.67%:1.1101E-50:34:31:88:90:70:68</pre>
Leave a comment:
sophiespo replied

05-15-2013, 02:55 AM
Originally posted by NestorNotabilis View Post

To add to Olivia's comment, I'm also using 2.3.4 and still getting the +/- issue when using mpileup2snp.

I am as well when using the somatic/processSomatic functions.

Annovar doesn't like this.. can anyone help?
Leave a comment:
NestorNotabilis replied

02-05-2013, 03:38 AM
Originally posted by tommivat View Post

Olivia, +/- issue is fixed in the latest version 2.3.4.

To add to Olivia's comment, I'm also using 2.3.4 and still getting the +/- issue when using mpileup2snp.
Leave a comment:
tommivat replied

02-01-2013, 07:17 AM
Originally posted by oliviajm View Post

Is it fixed in all VarScan tools?
Which one do you use?

I use VarScan.v2.3.4.jar mpileup2indel and I still get some "C +AAAG" or "G -AT" in my vcf output file.

That explains.. I use somatic for tumor-normal pairs.

Tommi
Leave a comment:
oliviajm replied

02-01-2013, 07:11 AM
Is it fixed in all VarScan tools?

Which one do you use?

I use VarScan.v2.3.4.jar mpileup2indel and I still get some "C +AAAG" or "G -AT" in my vcf output file.

Olivia
Leave a comment:
tommivat replied

02-01-2013, 05:19 AM
Olivia, +/- issue is fixed in the latest version 2.3.4.

However, the way variant alleles are coded is still unconventional. vcf format uses comma to separate alleles whereas varscan uses slash so I hope this can be fixed in future releases:

Code:

A/C -> A,C ACG/CG -> ACG,CG

br,
Tommi
Leave a comment:
oliviajm replied

02-01-2013, 05:00 AM
Hello all,

I realised that the missing qual field had been added in one of the last versions of VarScan. As I use it in a pipeline, I did not update it recently to avoid compatibility problems.

But after a few tests, it seems to me that the insertion and deletion are still coded with + and - in the ref and alt column, which don't match with the vcf specifications. I think an insertion of a T after a C should be written C in the ref field and CT in the alt field (and not by +T) for example.

Regards,

Olivia
Leave a comment:
tommivat replied

01-31-2013, 10:01 AM
Hello Dan and others,

First, thanks for the great piece of software! It would space us some work if somaticFilter supported .vcf files as well. I don't know if it is tedious to implement.

Another thing I wanted to ask, not related to vcf, concerns false-positive filtering (fpfilter.pl). I'm using bam-readcount to produce input for the script, but even if I do it chromosome by chromosome, the files are too big (>50G) and my computer (with 8Gb memory) just gets jammed when running the fpfilter.pl. Is there a way to do modify the script to support pipeing? And please tell me if it already does.

br,
Tommi
Leave a comment:

Previous 1 2 template Next

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News