Seqanswers Leaderboard Ad

**Jose Blanca** · 06-16-2010, 10:17 PM

We've done similar things at my lab. We haven't dealt with very large datasets, just a couple of illumina and 454 together. If you want to take a look at our documentation you can. We also did a small tutorial session on the topic.
I hope that could serve you as inspiration.

**Rao** · 06-16-2010, 11:01 PM

You can try VarScan...takes pileup as input

**Michael.James.Clark** · 06-18-2010, 01:09 PM

Thanks for the feedback.

I'm going to try VarScan because I've already done the variant calling and have the pileup files.

Does anyone have suggestions on annotation and filtering programs downstream of VarScan for annotation?

For example, going from the list of variants to coding consequences (marking whether and how variants affect coding sequences), and parsing by type of variant (indels vs SNVs) or coverage/quality?

I'm actually also having trouble getting VarScan to work, actually:

I used samtools 0.1.7-5 (r528) to generate pileup using the -c -a -f hg18.fa -r 0.0000007 options.
When I tried running one of the "pileup2" commands in VarScan, this is happening:

java -jar /home/mclark/varScan/VarScan.v2.2.jar pileup2indel chr21.pileup
Min coverage: 8
Min reads2: 2
Min var freq: 0.01
Min avg qual: 15
P-value thresh: 0.99
Reading input from chr21.pileup
Chrom Position Ref Var Reads1 Reads2 VarFreq Strands1 Strands2 Qual1 Qual2 Pvalue
Parsing Exception on line:
chr21 9719766 N A 68 0 59 3 ^Z.^~,^~, `2/
For input string: "A"

Any ideas what's going on and how I can get around it?

I'm also wondering what the possible Options are when running each command in VarScan. I don't see a list on the site (and if it's in the code, I'm afraid I may not be savvy enough to figure that out myself so assistance is appreciated). For example, can I play with "min avg qual" and such? Thanks.

**epigen** · 06-21-2010, 02:34 AM

The VarScan manual site says that it cannot process pileup created with the -c option:

"Do NOT use the -c parameter. It generates consensus format, which is different from pileup format. The next release of VarScan will recognize both formats. Note, to save disk space and file I/O, you can redirect pileup output directly to VarScan with a "pipe" command. For example:

samtools pileup -f reference.fasta myData.bam | java -jar VarScan.v2.1.jar pileup2snp"

c stands for consensus and it looks just as the parsing exception was caused by that consensus "A". So you should run pileup without -c to use it for VarScan. Or wait for the promised next release/someone to do a clever hack to the code ...

**Rao** · 06-21-2010, 06:58 AM

try samtools pileup -vcf
gives only varients

**Michael.James.Clark** · 06-21-2010, 09:22 AM

Great, thanks guys. I think last week I was only seeing the "Documentation" not the "Manual" from the site. The Manual describes just what I wanted to know.

Rao, the -c option's consensus output appears to be the issue. Can still potentially use -v to only output variants, though.

**Michael.James.Clark** · 06-21-2010, 10:37 PM

Alright, that worked and I got output. It looks believable to me, but I've encountered some issues.

For one thing, I can't get the "filter" command to report anything. No matter what settings I use, it reports 0 variants passing filter, and for the other, when I delve into the variant file, I can find variants that should pass filter. Has anyone gotten it to work?

I also tried the somatic command, and it looks like it worked, but I've got some curiosities in it as well. Example output:

Min coverage: 8x for Normal, 6x for Tumor
Min reads2: 2
Min strands2: 1
Min var freq: 0.2
Min freq for hom: 0.75
Min avg qual: 15
P-value thresh: 0.99
Somatic p-value: 0.05
127671560 shared positions
122884470 had sufficient coverage for comparison
121991210 were called Reference
12445 were mixed SNP-indel calls and filtered
176060 were called Germline
8887 were called LOH
685647 were called Somatic
10221 were called Unknown
0 were called Variant

I'm thrown by the "0 were called Variant". Anyone know what that means?

**wuhoucdc** · 08-24-2010, 07:41 PM

Originally posted by Michael.James.Clark View Post

Hi all,
I'm looking for suggestions of variant annotation tools for large data sets.
For example, I've called variants using Samtools pileup and now I want to go from a huge list of variants to a list of annotations and a simple method for filtering them.
Any thoughts on things I might try?

Hi,

You could try SVA in DUKE (http://people.genome.duke.edu/~dg48/sva/index.php).

I think this big guy can satisfy your request if you have a big computer.

Wu

**nilshomer** · 08-24-2010, 08:33 PM

Originally posted by wuhoucdc View Post

Hi,

You could try SVA in DUKE (http://people.genome.duke.edu/~dg48/sva/index.php).

I think this big guy can satisfy your request if you have a big computer.

Wu

My only concern is that I have heard it hard-codes dbsnp 127 or something (can anyone confirm, N=1). Even still it is a great piece of software!

**krawitz** · 07-12-2012, 10:26 PM

You might want to try www.gene-talk.de

**jfb** · 12-28-2012, 11:35 AM

Is there an update to this post recommending tools for variant annotation and analysis? I'm trying to use R's VariantAnnotation package but the learning curve is frustrating me and I'm not sure it's worth the effort...

**brofallon** · 12-31-2012, 02:12 PM

I believe the two most commonly used tools are annovar and SNPEff. Annovar handles many types of annotations and is built for filtering. SNPEff produces some nice html files for your web-viewing enjoyment in addition to text files.

**krawitz** · 01-02-2013, 04:46 AM

I agree annovar and SNPeff seem to be most widely used for variant annotation. For variant analysis there are e.g. ingenuity (commercial), annotate-it and www.gene-talk.de. We are using GeneTalk at the institute for medical genetics at Berlin Charité and are collaborating with the R&D. The platform seems to be rather commonly used now. We have currently about one hundred single exomes analyzed per day by about 500 unique users. The annotation is based on annovar. The filtering and interpretation tools are codeveloped by us but it is generally a project open to any kind of collaboration. We just added a new filter for compound heterozygous filtering so if this is something you are interested in, just try it out,...

**trackavinash** · 01-02-2013, 08:49 PM

I agree with Annotation and snpEff being widely used. I had a chance to use SeattleSeq annotation recently when I had to calculate some Grantham scores - http://snp.gs.washington.edu/SeattleSeqAnnotation137/ You could check it out if you'd prefer a web interface to submit jobs to. Galaxy does a bit of annotation as well ( I've used Galaxy for obtaining PhyloP scores).

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Variant Annotation Tools

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News