Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

varscan-annotation pipeline?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • varscan-annotation pipeline?

    How to connect VarScan output and annotation tools?

    is there any useful tool to directly annotate the varscan's output file?

    or I have to change the form of the output file to vcf format?

  • #2
    Hello,

    The latest version of VarScan (v2.2.11, just posted) includes a VCF output option for somatic mutations.

    This option was already available for multi-sample germline variant calling (mpileup2snp, mpileup2cns, mpileup2indel commands).

    Just set --output-vcf to 1.

    Yours,

    Dan Koboldt

    Comment


    • #3
      Can I ask if the vcf provided by varscan is valid though? I have used the latest version and tried to annotate with annovar (via their conversion perl script) but I get an error.

      NOTICE: for SNPs, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, RMS mapping quality, quality by depth, if these information can be recognized automatically
      NOTICE: for indels, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, read count supporting indel call, RMS mapping quality, if these information can be recognized automatically

      Similarly, using vcf-stats from vcftools also gives an error;

      Different number of columns at chr1:12198 (expected 10, got 9)
      Error not recoverable, exiting.


      Here is the head of my varscan vcf file

      ##fileformat=VCFv4.0
      ##source=VarScan2
      ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
      ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
      ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
      ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
      ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
      chr1 12198 G C . PASS DP=107 GT:GQP 1/1:35:107
      chr1 12266 G A . PASS DP=53 GT:GQP 0/1:4:53


      Regards,

      Mark

      Comment


      • #4
        what about somatic option??

        I couldn't find the vcf file output option command...

        Comment


        • #5
          Originally posted by dkrtndhkd View Post
          what about somatic option??

          I couldn't find the vcf file output option command...
          Hi dkrtndhkd,

          You can also set --output-vcf to 1 for somatic.

          Cheers,

          Fernando

          Comment


          • #6
            Originally posted by mark.dunning View Post
            Can I ask if the vcf provided by varscan is valid though? I have used the latest version and tried to annotate with annovar (via their conversion perl script) but I get an error.

            NOTICE: for SNPs, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, RMS mapping quality, quality by depth, if these information can be recognized automatically
            NOTICE: for indels, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, read count supporting indel call, RMS mapping quality, if these information can be recognized automatically

            Similarly, using vcf-stats from vcftools also gives an error;

            Different number of columns at chr1:12198 (expected 10, got 9)
            Error not recoverable, exiting.


            Here is the head of my varscan vcf file

            ##fileformat=VCFv4.0
            ##source=VarScan2
            ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
            ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
            ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
            ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
            ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
            #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
            chr1 12198 G C . PASS DP=107 GT:GQP 1/1:35:107
            chr1 12266 G A . PASS DP=53 GT:GQP 0/1:4:53


            Regards,

            Mark
            Hi Mark,

            I got a similar problem with another software when I tried to provide it with a vcf file coming from VarScan mpileup2indel. It seems that in the vcf files obtained with VarScan the QUAL column is empty. So when the file is open by another tool, the number of column is wrong and the data in the columns don't match with the name of the column. ("PASS" should be in the "FILTER" column, and here it seems to be in the "QUAL" column.)
            So you need to add a column filled with a dot under the "QUAL" name.
            In my case, I used the command :
            awk '{ if ($1 ~ "^#") { print $0} else { sub("",".\t",$6); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12} }' VarScanfile.vcf > outputFile.vcf
            and it solved the problem.

            Hope it will help you.

            Olivia

            EDIT : just found this : http://seqanswers.com/forums/showthread.php?t=20000
            Last edited by oliviajm; 06-08-2012, 12:00 AM.

            Comment

            Working...
            X