Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • raonyguimaraes
    Member
    • Jun 2010
    • 38

    #16
    All right,

    Here is my pipeline, some metrics and logs ...

    Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

    I'm planning to switch to GATK 1.2 on the next few weeks ...
    Attached Files

    Comment

    • Heisman
      Senior Member
      • Dec 2010
      • 534

      #17
      Originally posted by raonyguimaraes View Post
      All right,

      Here is my pipeline, some metrics and logs ...

      Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

      I'm planning to switch to GATK 1.2 on the next few weeks ...
      Are you sure you are filtering out mutations that are not called in the exome? If you call mutations in the whole genome you will get many more than 20k if you are doing a capture protocol due to non-specific hybridization.

      Comment

      • raonyguimaraes
        Member
        • Jun 2010
        • 38

        #18
        On the Unifier Genotyper I'm using the following parameters:

        # # #Standard Raw VCF
        java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
        -l INFO \
        -I $OUT_DIR/exome.real.dedup.recal.bam \
        -R $REFERENCE \
        -B:intervals,BED $EXON_CAPTURE_FILE \
        -B:dbsnp,VCF $DBSNP \
        -glm BOTH \
        -stand_call_conf 50.0 \
        -stand_emit_conf 20.0 \
        -dcov 300 \
        -A AlleleBalance \
        -A DepthOfCoverage \
        -A FisherStrand \
        -o $OUT_DIR/exome.raw.vcf \
        -log $LOG_DIR/UnifiedGenotyper.log \
        -nt 4

        The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

        I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

        By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !

        Comment

        • Heisman
          Senior Member
          • Dec 2010
          • 534

          #19
          Yeah, I haven't used GATK before so I can't really say but that seemed like the most logical thing to me (we see hundreds of thousands of mutations prior to filtering just the CCDS regions).

          Comment

          • raonyguimaraes
            Member
            • Jun 2010
            • 38

            #20
            You are right, without using this bed file I was getting something like 2 million variants ...

            Comment

            • ulz_peter
              Senior Member
              • Feb 2010
              • 219

              #21
              I just adapted the manual to fit it in the Wiki How to section:

              Any changes, recommendations, complaints, etc. welcome:

              Comment

              • ulz_peter
                Senior Member
                • Feb 2010
                • 219

                #22
                Originally posted by raonyguimaraes View Post
                On the Unifier Genotyper I'm using the following parameters:

                # # #Standard Raw VCF
                java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
                -l INFO \
                -I $OUT_DIR/exome.real.dedup.recal.bam \
                -R $REFERENCE \
                -B:intervals,BED $EXON_CAPTURE_FILE \
                -B:dbsnp,VCF $DBSNP \
                -glm BOTH \
                -stand_call_conf 50.0 \
                -stand_emit_conf 20.0 \
                -dcov 300 \
                -A AlleleBalance \
                -A DepthOfCoverage \
                -A FisherStrand \
                -o $OUT_DIR/exome.raw.vcf \
                -log $LOG_DIR/UnifiedGenotyper.log \
                -nt 4

                The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

                I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

                By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !
                I use stand_emit_conf 10.0 and we usually get ~60k SNPs

                Comment

                • mirabilia
                  Junior Member
                  • Nov 2010
                  • 3

                  #23
                  hi folks,
                  thanks for sharing your expertise...it's a great help for a quite newbie like me.
                  I'm wondering if this analysis pipeline is suitable also for prokaryotic case or needs some adjustments. In case, could you suggest me some references?

                  thx!

                  Comment

                  • ulz_peter
                    Senior Member
                    • Feb 2010
                    • 219

                    #24
                    I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

                    You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

                    Hope that helps.

                    Comment

                    • hanifk
                      Member
                      • Oct 2010
                      • 18

                      #25
                      thanks very much

                      Comment

                      • mirabilia
                        Junior Member
                        • Nov 2010
                        • 3

                        #26
                        Originally posted by ulz_peter View Post
                        I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

                        You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

                        Hope that helps.
                        thanks a lot ulz_peter!
                        Could you please clarify which steps of your pipeline are specifically for diploid genomes in order I can customize for my purposes?

                        Comment

                        • Michael.James.Clark
                          Senior Member
                          • Apr 2009
                          • 207

                          #27
                          Very cool! I was planning to put together a little Google Site going through how I analyze exome-seq that's very similar to this. Now I'm not sure I should bother!
                          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                          Projects: U87MG whole genome sequence [Website] [Paper]

                          Comment

                          • Orr Shomroni
                            Member
                            • Oct 2011
                            • 26

                            #28
                            Thank you guys

                            Thank you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline is quite helpful. Also it seems very similar to what my instructor recommended I should do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone familiar with those techniques?
                            "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

                            Comment

                            • Michael.James.Clark
                              Senior Member
                              • Apr 2009
                              • 207

                              #29
                              Originally posted by Orr Shomroni View Post
                              Thanks you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline also seems very similar to what my instructor recommended me to do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone knows what I'm talking about?
                              Annovar can annotate with SIFT and Polyphen now.
                              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                              Projects: U87MG whole genome sequence [Website] [Paper]

                              Comment

                              • raonyguimaraes
                                Member
                                • Jun 2010
                                • 38

                                #30
                                I think we should all give a try to VAAST as well



                                A probabilistic disease-gene finder for personal genomes.
                                Yandell M, Huff C, Hu H, Singleton M, Moore B, Xing J, Jorde LB, Reese MG.
                                Source

                                Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah and School of Medicine, Salt Lake City, UT 84112, USA. [email protected]

                                VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences. VAAST builds on existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion. VAAST can score both coding and noncoding variants, evaluating the cumulative impact of both types of variants simultaneously. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases. VAAST thus has a much greater scope of use than any existing methodology. Here we demonstrate its ability to identify damaged genes using small cohorts (n = 3) of unrelated individuals, wherein no two share the same deleterious variants, and for common, multigenic diseases using as few as 150 cases.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...