Unconfigured Ad

**raonyguimaraes** · 10-11-2011, 05:30 AM

All right,

Here is my pipeline, some metrics and logs ...

Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

I'm planning to switch to GATK 1.2 on the next few weeks ...

Attached Files

**Heisman** · 10-11-2011, 05:45 AM

Originally posted by raonyguimaraes View Post

All right,

Here is my pipeline, some metrics and logs ...

Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

I'm planning to switch to GATK 1.2 on the next few weeks ...

Are you sure you are filtering out mutations that are not called in the exome? If you call mutations in the whole genome you will get many more than 20k if you are doing a capture protocol due to non-specific hybridization.

**raonyguimaraes** · 10-11-2011, 05:52 AM

On the Unifier Genotyper I'm using the following parameters:

# # #Standard Raw VCF
java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
-l INFO \
-I $OUT_DIR/exome.real.dedup.recal.bam \
-R $REFERENCE \
-B:intervals,BED $EXON_CAPTURE_FILE \
-B:dbsnp,VCF $DBSNP \
-glm BOTH \
-stand_call_conf 50.0 \
-stand_emit_conf 20.0 \
-dcov 300 \
-A AlleleBalance \
-A DepthOfCoverage \
-A FisherStrand \
-o $OUT_DIR/exome.raw.vcf \
-log $LOG_DIR/UnifiedGenotyper.log \
-nt 4

The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !

**Heisman** · 10-11-2011, 05:54 AM

Yeah, I haven't used GATK before so I can't really say but that seemed like the most logical thing to me (we see hundreds of thousands of mutations prior to filtering just the CCDS regions).

**raonyguimaraes** · 10-11-2011, 05:55 AM

You are right, without using this bed file I was getting something like 2 million variants ...

**ulz_peter** · 10-12-2011, 11:39 PM

I just adapted the manual to fit it in the Wiki How to section:

Any changes, recommendations, complaints, etc. welcome:

Just a moment...

http://seqanswers.com/wiki/How-to/exome_analysis

**ulz_peter** · 10-12-2011, 11:51 PM

Originally posted by raonyguimaraes View Post

On the Unifier Genotyper I'm using the following parameters:

# # #Standard Raw VCF
java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
-l INFO \
-I $OUT_DIR/exome.real.dedup.recal.bam \
-R $REFERENCE \
-B:intervals,BED $EXON_CAPTURE_FILE \
-B:dbsnp,VCF $DBSNP \
-glm BOTH \
-stand_call_conf 50.0 \
-stand_emit_conf 20.0 \
-dcov 300 \
-A AlleleBalance \
-A DepthOfCoverage \
-A FisherStrand \
-o $OUT_DIR/exome.raw.vcf \
-log $LOG_DIR/UnifiedGenotyper.log \
-nt 4

The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !

I use stand_emit_conf 10.0 and we usually get ~60k SNPs

**mirabilia** · 10-13-2011, 12:49 AM

hi folks,
thanks for sharing your expertise...it's a great help for a quite newbie like me.
I'm wondering if this analysis pipeline is suitable also for prokaryotic case or needs some adjustments. In case, could you suggest me some references?

thx!

**ulz_peter** · 10-13-2011, 12:59 AM

I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

Hope that helps.

**hanifk** · 10-13-2011, 01:42 AM

thanks very much

**mirabilia** · 10-13-2011, 03:53 AM

Originally posted by ulz_peter View Post

I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

Hope that helps.

thanks a lot ulz_peter!
Could you please clarify which steps of your pipeline are specifically for diploid genomes in order I can customize for my purposes?

**Michael.James.Clark** · 10-13-2011, 10:21 AM

Very cool! I was planning to put together a little Google Site going through how I analyze exome-seq that's very similar to this. Now I'm not sure I should bother!

**Orr Shomroni** · 10-13-2011, 01:29 PM

Thank you guys

Thank you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline is quite helpful. Also it seems very similar to what my instructor recommended I should do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone familiar with those techniques?

**Michael.James.Clark** · 10-13-2011, 01:30 PM

Originally posted by Orr Shomroni View Post

Thanks you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline also seems very similar to what my instructor recommended me to do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone knows what I'm talking about?

Annovar can annotate with SIFT and Polyphen now.

**raonyguimaraes** · 10-13-2011, 04:43 PM

I think we should all give a try to VAAST as well

Page not found – Yandell Lab

http://www.yandell-lab.org/software/vaast.html

A probabilistic disease-gene finder for personal genomes.
Yandell M, Huff C, Hu H, Singleton M, Moore B, Xing J, Jorde LB, Reese MG.
Source

Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah and School of Medicine, Salt Lake City, UT 84112, USA. [email protected]

VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences. VAAST builds on existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion. VAAST can score both coding and noncoding variants, evaluating the cumulative impact of both types of variants simultaneously. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases. VAAST thus has a much greater scope of use than any existing methodology. Here we demonstrate its ability to identify damaged genes using small cohorts (n = 3) of unrelated individuals, wherein no two share the same deleterious variants, and for common, multigenic diseases using as few as 150 cases.

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News