Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    All right,

    Here is my pipeline, some metrics and logs ...

    Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

    I'm planning to switch to GATK 1.2 on the next few weeks ...
    Attached Files

    Comment


    • #17
      Originally posted by raonyguimaraes View Post
      All right,

      Here is my pipeline, some metrics and logs ...

      Even after using a bed file from "SeqCap EZ Human Exome Library v2.0" i'm still getting 350191 variants ... Hope someone can help me to get to the promised 20k variants.

      I'm planning to switch to GATK 1.2 on the next few weeks ...
      Are you sure you are filtering out mutations that are not called in the exome? If you call mutations in the whole genome you will get many more than 20k if you are doing a capture protocol due to non-specific hybridization.

      Comment


      • #18
        On the Unifier Genotyper I'm using the following parameters:

        # # #Standard Raw VCF
        java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
        -l INFO \
        -I $OUT_DIR/exome.real.dedup.recal.bam \
        -R $REFERENCE \
        -B:intervals,BED $EXON_CAPTURE_FILE \
        -B:dbsnp,VCF $DBSNP \
        -glm BOTH \
        -stand_call_conf 50.0 \
        -stand_emit_conf 20.0 \
        -dcov 300 \
        -A AlleleBalance \
        -A DepthOfCoverage \
        -A FisherStrand \
        -o $OUT_DIR/exome.raw.vcf \
        -log $LOG_DIR/UnifiedGenotyper.log \
        -nt 4

        The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

        I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

        By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !

        Comment


        • #19
          Yeah, I haven't used GATK before so I can't really say but that seemed like the most logical thing to me (we see hundreds of thousands of mutations prior to filtering just the CCDS regions).

          Comment


          • #20
            You are right, without using this bed file I was getting something like 2 million variants ...

            Comment


            • #21
              I just adapted the manual to fit it in the Wiki How to section:

              Any changes, recommendations, complaints, etc. welcome:

              Comment


              • #22
                Originally posted by raonyguimaraes View Post
                On the Unifier Genotyper I'm using the following parameters:

                # # #Standard Raw VCF
                java -Xmx15g -jar $GATK_DIR/GenomeAnalysisTK.jar -T UnifiedGenotyper \
                -l INFO \
                -I $OUT_DIR/exome.real.dedup.recal.bam \
                -R $REFERENCE \
                -B:intervals,BED $EXON_CAPTURE_FILE \
                -B:dbsnp,VCF $DBSNP \
                -glm BOTH \
                -stand_call_conf 50.0 \
                -stand_emit_conf 20.0 \
                -dcov 300 \
                -A AlleleBalance \
                -A DepthOfCoverage \
                -A FisherStrand \
                -o $OUT_DIR/exome.raw.vcf \
                -log $LOG_DIR/UnifiedGenotyper.log \
                -nt 4

                The company where this where done guarantees 30X of coverage ... (http://www.otogenetics.com/human_exome_page.htm)

                I know this number should reduce after Variant Recalibrator ... I just want to know how many variants people are getting on this step.

                By filtering out mutations you mean using the BED File to call only at the target regions ? If so, yes !
                I use stand_emit_conf 10.0 and we usually get ~60k SNPs

                Comment


                • #23
                  hi folks,
                  thanks for sharing your expertise...it's a great help for a quite newbie like me.
                  I'm wondering if this analysis pipeline is suitable also for prokaryotic case or needs some adjustments. In case, could you suggest me some references?

                  thx!

                  Comment


                  • #24
                    I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

                    You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

                    Hope that helps.

                    Comment


                    • #25
                      thanks very much

                      Comment


                      • #26
                        Originally posted by ulz_peter View Post
                        I haven't tried it with prokaryiotic samples, but it should work actually (bwa, picard and samtools definitely work with prokaryotic data, not too sure about the GATK though...)

                        You need to adjust it though, for example index your own reference sequences and analysis depends on what sequence variation you'd expect (this pipeline works for diploid genomes only, though you might use some parts of it for different purposes)

                        Hope that helps.
                        thanks a lot ulz_peter!
                        Could you please clarify which steps of your pipeline are specifically for diploid genomes in order I can customize for my purposes?

                        Comment


                        • #27
                          Very cool! I was planning to put together a little Google Site going through how I analyze exome-seq that's very similar to this. Now I'm not sure I should bother!
                          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                          Projects: U87MG whole genome sequence [Website] [Paper]

                          Comment


                          • #28
                            Thank you guys

                            Thank you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline is quite helpful. Also it seems very similar to what my instructor recommended I should do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone familiar with those techniques?
                            "Though it may seem that all's been said and done, originality still lives on" - some unoriginal guy who had nothing better to write as his signature

                            Comment


                            • #29
                              Originally posted by Orr Shomroni View Post
                              Thanks you ulz_peter and raonyguimaraes. I'm starting doing NGS quite soon, and being a newbie, this pipeline also seems very similar to what my instructor recommended me to do (BWA, SamTools/Varscan, and Annovar). She also said something about using Sift and Polyphen to predict the effect of the mutation on the gene functionality (continuous score that is benign below a certain threshold, and destructive above it). Anyone knows what I'm talking about?
                              Annovar can annotate with SIFT and Polyphen now.
                              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                              Projects: U87MG whole genome sequence [Website] [Paper]

                              Comment


                              • #30
                                I think we should all give a try to VAAST as well



                                A probabilistic disease-gene finder for personal genomes.
                                Yandell M, Huff C, Hu H, Singleton M, Moore B, Xing J, Jorde LB, Reese MG.
                                Source

                                Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah and School of Medicine, Salt Lake City, UT 84112, USA. [email protected]

                                VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences. VAAST builds on existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion. VAAST can score both coding and noncoding variants, evaluating the cumulative impact of both types of variants simultaneously. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases. VAAST thus has a much greater scope of use than any existing methodology. Here we demonstrate its ability to identify damaged genes using small cohorts (n = 3) of unrelated individuals, wherein no two share the same deleterious variants, and for common, multigenic diseases using as few as 150 cases.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X