Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • varscan-annotation pipeline?

    How to connect VarScan output and annotation tools?

    is there any useful tool to directly annotate the varscan's output file?

    or I have to change the form of the output file to vcf format?

  • #2
    Hello,

    The latest version of VarScan (v2.2.11, just posted) includes a VCF output option for somatic mutations.

    This option was already available for multi-sample germline variant calling (mpileup2snp, mpileup2cns, mpileup2indel commands).

    Just set --output-vcf to 1.

    Yours,

    Dan Koboldt

    Comment


    • #3
      Can I ask if the vcf provided by varscan is valid though? I have used the latest version and tried to annotate with annovar (via their conversion perl script) but I get an error.

      NOTICE: for SNPs, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, RMS mapping quality, quality by depth, if these information can be recognized automatically
      NOTICE: for indels, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, read count supporting indel call, RMS mapping quality, if these information can be recognized automatically

      Similarly, using vcf-stats from vcftools also gives an error;

      Different number of columns at chr1:12198 (expected 10, got 9)
      Error not recoverable, exiting.


      Here is the head of my varscan vcf file

      ##fileformat=VCFv4.0
      ##source=VarScan2
      ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
      ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
      ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
      ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
      ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
      chr1 12198 G C . PASS DP=107 GT:GQP 1/1:35:107
      chr1 12266 G A . PASS DP=53 GT:GQP 0/1:4:53


      Regards,

      Mark

      Comment


      • #4
        what about somatic option??

        I couldn't find the vcf file output option command...

        Comment


        • #5
          Originally posted by dkrtndhkd View Post
          what about somatic option??

          I couldn't find the vcf file output option command...
          Hi dkrtndhkd,

          You can also set --output-vcf to 1 for somatic.

          Cheers,

          Fernando

          Comment


          • #6
            Originally posted by mark.dunning View Post
            Can I ask if the vcf provided by varscan is valid though? I have used the latest version and tried to annotate with annovar (via their conversion perl script) but I get an error.

            NOTICE: for SNPs, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, RMS mapping quality, quality by depth, if these information can be recognized automatically
            NOTICE: for indels, column 6 and beyond MAY BE heterozygosity status, quality score, read depth, read count supporting indel call, RMS mapping quality, if these information can be recognized automatically

            Similarly, using vcf-stats from vcftools also gives an error;

            Different number of columns at chr1:12198 (expected 10, got 9)
            Error not recoverable, exiting.


            Here is the head of my varscan vcf file

            ##fileformat=VCFv4.0
            ##source=VarScan2
            ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
            ##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
            ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
            ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
            ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
            #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1
            chr1 12198 G C . PASS DP=107 GT:GQP 1/1:35:107
            chr1 12266 G A . PASS DP=53 GT:GQP 0/1:4:53


            Regards,

            Mark
            Hi Mark,

            I got a similar problem with another software when I tried to provide it with a vcf file coming from VarScan mpileup2indel. It seems that in the vcf files obtained with VarScan the QUAL column is empty. So when the file is open by another tool, the number of column is wrong and the data in the columns don't match with the name of the column. ("PASS" should be in the "FILTER" column, and here it seems to be in the "QUAL" column.)
            So you need to add a column filled with a dot under the "QUAL" name.
            In my case, I used the command :
            awk '{ if ($1 ~ "^#") { print $0} else { sub("",".\t",$6); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12} }' VarScanfile.vcf > outputFile.vcf
            and it solved the problem.

            Hope it will help you.

            Olivia

            EDIT : just found this : http://seqanswers.com/forums/showthread.php?t=20000
            Last edited by oliviajm; 06-08-2012, 12:00 AM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X