Seqanswers Leaderboard Ad

**raonyguimaraes** · 10-11-2011, 05:30 AM

All right,

Here is my pipeline, some metrics and logs ...

Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

I'm planning to switch to GATK 1.2 on the next few weeks ...

Attached Files

**Heisman** · 10-11-2011, 05:45 AM

Originally posted by raonyguimaraes View Post

All right,

Here is my pipeline, some metrics and logs ...

Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

I'm planning to switch to GATK 1.2 on the next few weeks ...

Are you sure you are filtering out mutations that are not called in the exome? If you call mutations in the whole genome you will get many more than 20k if you are doing a capture protocol due to non-specific hybridization.

**raonyguimaraes** · 10-11-2011, 05:52 AM

On the Unifier Genotyper I'm using the following parameters:

# # #Standard Raw VCF
java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
-l INFO \
-I $OUT_DIR/exome.real.dedup.recal.bam \
-R $REFERENCE \
-B:intervals,BED $EXON_CAPTURE_FILE \
-B:dbsnp,VCF $DBSNP \
-glm BOTH \
-stand_call_conf 50.0 \
-stand_emit_conf 20.0 \
-dcov 300 \
-A AlleleBalance \
-A DepthOfCoverage \
-A FisherStrand \
-o $OUT_DIR/exome.raw.vcf \
-log $LOG_DIR/UnifiedGenotyper.log \
-nt 4

The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !

**Heisman** · 10-11-2011, 05:54 AM

Yeah, I haven't used GATK before so I can't really say but that seemed like the most logical thing to me (we see hundreds of thousands of mutations prior to filtering just the CCDS regions).

**raonyguimaraes** · 10-11-2011, 05:55 AM

You are right, without using this bed file I was getting something like 2 million variants ...

**ulz_peter** · 10-12-2011, 11:39 PM

I just adapted the manual to fit it in the Wiki How to section:

Any changes, recommendations, complaints, etc. welcome:

SEQanswers

http://seqanswers.com/wiki/How-to/exome_analysis

**ulz_peter** · 10-12-2011, 11:51 PM

Originally posted by raonyguimaraes View Post

On the Unifier Genotyper I'm using the following parameters:

# # #Standard Raw VCF
java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
-l INFO \
-I $OUT_DIR/exome.real.dedup.recal.bam \
-R $REFERENCE \
-B:intervals,BED $EXON_CAPTURE_FILE \
-B:dbsnp,VCF $DBSNP \
-glm BOTH \
-stand_call_conf 50.0 \
-stand_emit_conf 20.0 \
-dcov 300 \
-A AlleleBalance \
-A DepthOfCoverage \
-A FisherStrand \
-o $OUT_DIR/exome.raw.vcf \
-log $LOG_DIR/UnifiedGenotyper.log \
-nt 4

The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !

I use stand_emit_conf 10.0 and we usually get ~60k SNPs

**mirabilia** · 10-13-2011, 12:49 AM

hi folks,
thanks for sharing your expertise...it's a great help for a quite newbie like me.
I'm wondering if this analysis pipeline is suitable also for prokaryotic case or needs some adjustments. In case, could you suggest me some references?

thx!

**ulz_peter** · 10-13-2011, 12:59 AM

I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

Hope that helps.

**hanifk** · 10-13-2011, 01:42 AM

thanks very much

**mirabilia** · 10-13-2011, 03:53 AM

Originally posted by ulz_peter View Post

I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

Hope that helps.

thanks a lot ulz_peter!
Could you please clarify which steps of your pipeline are specifically for diploid genomes in order I can customize for my purposes?

**Michael.James.Clark** · 10-13-2011, 10:21 AM

Very cool! I was planning to put together a little Google Site going through how I analyze exome-seq that's very similar to this. Now I'm not sure I should bother!

**Orr Shomroni** · 10-13-2011, 01:29 PM

Thank you guys

Thank you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline is quite helpful. Also it seems very similar to what my instructor recommended I should do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone familiar with those techniques?

**Michael.James.Clark** · 10-13-2011, 01:30 PM

Originally posted by Orr Shomroni View Post

Thanks you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline also seems very similar to what my instructor recommended me to do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone knows what I'm talking about?

Annovar can annotate with SIFT and Polyphen now.

**raonyguimaraes** · 10-13-2011, 04:43 PM

I think we should all give a try to VAAST as well

Yandell Lab - Variant Annotation, Analysis and Search Tool

http://www.yandell-lab.org/software/vaast.html

Variant Annotation, Analysis and Search Tool

A probabilistic disease-gene finder for personal genomes.
Yandell M, Huff C, Hu H, Singleton M, Moore B, Xing J, Jorde LB, Reese MG.
Source

Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah and School of Medicine, Salt Lake City, UT 84112, USA. [email protected]

VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences. VAAST builds on existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion. VAAST can score both coding and noncoding variants, evaluating the cumulative impact of both types of variants simultaneously. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases. VAAST thus has a much greater scope of use than any existing methodology. Here we demonstrate its ability to identify damaged genes using small cohorts (n = 3) of unrelated individuals, wherein no two share the same deleterious variants, and for common, multigenic diseases using as few as 150 cases.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 49 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News