Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    We have a lot of comparisons of DINDEL, complete genomes, samtools, and other advanced indels caller, for deep whole genomes, 1000x sample low-pass whole genomes (1000g) and multi-sample exomes. Guillermo del Angel, who is writing the indel caller [plus a recalibrator and filtering program] as well as developing evaluation models, just submitted our 1000G calls (like 5 minutes ago). Over the next few weeks we'll be pushing a lot of our slide decks up to our public dropbox (have a look at the GSA wiki for the link) and there'll be a treasure-trove of analyses, evaluation material, etc. soon. All of these tools -- including our own -- are doing an ok job at indel calling, but there's still a long way to do before indels are as well handled as SNPs.

    Glad everyone is enjoying the GATK! If you want to see some crazy fun software engineering -- the slide archive has a presentation on automated distributed parallelism in the GATK, which is live in the codebase. I'd be very interested to hear if people start using this. Also, the engine as of yesterday is doing AWS S3 distributed logging so we should soon be aware of bugs as soon as they occur, anywhere in the world.

    Comment


    • #17
      Mark, thanks a lot for your updates here (and for your previous help on your GetSatisfaction page). Your project is of course quite valuable to the rest of us all. I'm looking forward to those slides.

      For everyone's information, I did just run the DINDEL function through Unified Genotyper on an exome dataset.
      It came out with about 10% the number of indels as it previously found SNPs. This is many fewer than the Dindel program using default settings. However, I generally expect to find 10% the number of indels as SNPs based on past research (see 1001 publications about the topic!), so I'm actually pleased with the number (because I found about 36k SNVs, so 3-4k indels seems reasonable).

      Since we've got you here, I wanted to ask about multi-threading in GATK. I've tried using the "nt" parameter unsuccessfully at many of the steps (set to 8 threads) even though the --help says it can use nt. Do any of the tools actually use that function?

      Also, at the end I'm trying to annotate overlap with Refseq.
      There's a page on the wiki about using the VariantAnnotator tool with RefSeq (here: http://www.broadinstitute.org/gsa/wiki/index.php/RefSeq). I tried this and failed.
      Then I found the pages about using Refseq with the GenomicAnnotator tool (http://www.broadinstitute.org/gsa/wi...nomicAnnotator). I guess all I'm asking is whether that VariantAnnotator page is basically inaccurate and if GenomicAnnotator is the default tool for annotating overlaps that way now?

      Thanks again.
      Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
      Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
      Projects: U87MG whole genome sequence [Website] [Paper]

      Comment


      • #18
        Good to hear that UG indel calling looks like it's working! In exomes I'd expect to find actually slightly less indels than 10% so some additional filtering is going to be necessary.

        Most LocusWalker-based tools in the GATK support the -nt option. This includes CountCovariates, UG, and VariantEval. Unfortunately, indel realigner and table recalibrator don't, as they are read walkers. We don't have a particularly good parallelization approach for read walkers yet, but it's something we'd like to do.

        Comment


        • #19
          I justed wanted to know of standard values of indel calling.
          I recently tried to analyze 2 solid exome sequencing datasets. While getting ~50K Snps I only got 11 indels (for both of them) using standard values:

          Code:
          java -jar GenomeAnalysisTK.jar \
           -glm DINDEL \
           -R resources/Homo_sapiens_assembly18.fasta \
           -T UnifiedGenotyper \
           -I sample1.bam \
           -D resources/dbsnp_129_hg18.rod \
           -o snps.raw.vcf \
           -stand_call_conf 50.0 \
           -stand_emit_conf 10.0 \
           -dcov 50
          Am I doing something wrong (I forgot to mention I am using GATK-1.0.5083)

          Comment


          • #20
            Originally posted by mdepristo View Post
            Good to hear that UG indel calling looks like it's working! In exomes I'd expect to find actually slightly less indels than 10% so some additional filtering is going to be necessary.

            Most LocusWalker-based tools in the GATK support the -nt option. This includes CountCovariates, UG, and VariantEval. Unfortunately, indel realigner and table recalibrator don't, as they are read walkers. We don't have a particularly good parallelization approach for read walkers yet, but it's something we'd like to do.
            That's good, actually. I ended up with >3200 indels, but <37,000 SNVs, so that does seem to match reasonably well.

            I'll try throwing the -nt option in for subsequent runs.

            Originally posted by ulz_peter View Post
            I justed wanted to know of standard values of indel calling.
            I recently tried to analyze 2 solid exome sequencing datasets. While getting ~50K Snps I only got 11 indels (for both of them) using standard values:

            Code:
            java -jar GenomeAnalysisTK.jar \
             -glm DINDEL \
             -R resources/Homo_sapiens_assembly18.fasta \
             -T UnifiedGenotyper \
             -I sample1.bam \
             -D resources/dbsnp_129_hg18.rod \
             -o snps.raw.vcf \
             -stand_call_conf 50.0 \
             -stand_emit_conf 10.0 \
             -dcov 50
            Am I doing something wrong (I forgot to mention I am using GATK-1.0.5083)
            For one thing, I think you can skip using -dcov 50.

            What did you use for alignment?
            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
            Projects: U87MG whole genome sequence [Website] [Paper]

            Comment


            • #21
              Originally posted by Michael.James.Clark View Post



              For one thing, I think you can skip using -dcov 50.

              What did you use for alignment?
              Actually we got BAM files from the sequencing center, as this is solid data I guess it was the Bioscope pipeline.

              Comment


              • #22
                Originally posted by ulz_peter View Post
                Actually we got BAM files from the sequencing center, as this is solid data I guess it was the Bioscope pipeline.
                I don't know Bioscope, I'm afraid.

                I asked because seeing so few indels sounds like maybe it was aligned with an ungapped aligner.

                Did you do all the other standard GATK steps? IndelRealignment/Recalibration/etc?
                Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                Projects: U87MG whole genome sequence [Website] [Paper]

                Comment


                • #23
                  DINDEL in gatk does not work

                  Hello all,

                  Using -glm DINDEL in gatk UnifiedGenotyper does not work for me, I get error message like
                  ERROR MESSAGE: SAM/BAM file SAMFileReader{/bamfolder/demo1.bam} is malformed: Adjacent I/D events in read 2:71:17399:4521

                  Actually UnifiedGenotyper works well on the bam files without -glm option.

                  Any helps?

                  Thanks

                  Comment


                  • #24
                    Originally posted by bair View Post
                    Hello all,

                    Using -glm DINDEL in gatk UnifiedGenotyper does not work for me, I get error message like
                    ERROR MESSAGE: SAM/BAM file SAMFileReader{/bamfolder/demo1.bam} is malformed: Adjacent I/D events in read 2:71:17399:4521

                    Actually UnifiedGenotyper works well on the bam files without -glm option.

                    Any helps?

                    Thanks
                    Hi bair,

                    Don't know what to say about that error. If you see a "malformed" error, running Picard FixMateInformation.jar might help.

                    Also, always include which version of the programs you're using etc. With GATK it's pretty much always best to use the latest version through the SVN.
                    Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                    Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                    Projects: U87MG whole genome sequence [Website] [Paper]

                    Comment


                    • #25
                      Originally posted by Michael.James.Clark View Post
                      Hi bair,

                      Don't know what to say about that error. If you see a "malformed" error, running Picard FixMateInformation.jar might help.

                      Also, always include which version of the programs you're using etc. With GATK it's pretty much always best to use the latest version through the SVN.
                      Thanks your Michael, I'm using gatk 2011-01-26 buildup version, since gatk works well on the bam files for snp calling, it might be the -glm DINDEL problem. will take you suggestion to try latest svn version.

                      Comment


                      • #26
                        I am trying to run -glm DINDEL with
                        -A DepthOfCoverage
                        -A AlleleBalance
                        and try to get the allele balance info to the output, but they don't seem to be there.

                        So I am interested in finding out how many reads are supporting each indel and how many are not supporting. Is there a way to output this in GATK -glm DINDEL?

                        Comment


                        • #27
                          Originally posted by bair View Post
                          Using -glm DINDEL in gatk UnifiedGenotyper does not work for me, I get error message like
                          ERROR MESSAGE: SAM/BAM file SAMFileReader{/bamfolder/demo1.bam} is malformed: Adjacent I/D events in read 2:71:17399:4521
                          I'm seeing the same error for a BAM file I am trying to call indels on using UnifiedGenotyper from GATK v1.0.5336. When I look at the problem read, there is indeed an insertion immediately adjacent to a deletion: 21I1D33M. The BAM file validates perfectly using Picard's ValidateSamFile (version 1.35).

                          I used BWA to align reads and GATK to do local re-alignment and recalibration. This read has the same alignment in both the original BWA BAM file and the re-calibrated BAM file. I did not run Picard's FixMateInformation, as I thought the mate pair information was now fixed on the fly (see http://www.broadinstitute.org/gsa/wi..._around_indels). Running FixMateInformation does not change the cigar string for the problem read in any case.

                          My command line was:

                          Code:
                          java -Xmx4g -jar ~/bin/GenomeAnalysisTK-1.0.5336/GenomeAnalysisTK.jar \
                          -l INFO -R ../hg19_index/hg19.fasta -T UnifiedGenotyper \
                          -D ../gatk_resources/dbsnp_131_hg19.rod -I bam/test.recal.bam \
                          -glm DINDEL -o test.indels.raw.vcf -stand_call_conf 30 -stand_emit_conf 10
                          Here are the SAM records for the problem read and its mate pair:

                          Code:
                          IL30_4321:7:54:11975:13960	165	chrM	1	0	*	=	1	0	AAGAGTGCTACTCTCCTCGCTCCGGGCCCATAACACTTGGGGGTAGCTAAAGTG	:>A?B=BBB=@BBBBBBB;BBBB;CCBBB@A=@AA@CBCDDCC=>CBB=??C?B	RG:Z:IL30_4321_7	MQ:i:0	OQ:Z:FFFFFCFFFFFFFFFFFFFFFFFFDFFFFFFFFFDCFFFDCFA>DFDCBBACDD	XQ:i:257
                          IL30_4321:7:54:11975:13960	89	chrM	1	0	21I1D33M	=	1	0	TTAAATAAGACATCACGATGGATCACAGGTCTATCACCCTATTAACCACTCACG	<>?@>@?BB?C>@C?<B>BBB>BC=CABABB@>?C=CCB@>C@?=CC<BBC=;<	RG:Z:IL30_4321_7	NM:i:0	OQ:Z:=DGIIEGIEDIDAIDGIIGGIIEIGIIFIEIIIFIAIIIIIIIIIIIEGIIIII	UQ:i:0
                          I did not have any trouble running the UnifiedGenotyper on this file for SNP calling.

                          Comment


                          • #28
                            is malformed: Adjacent I/D events in read

                            Originally posted by ameynert View Post
                            I'm seeing the same error for a BAM file I am trying to call indels on using UnifiedGenotyper from GATK v1.0.5336. When I look at the problem read, there is indeed an insertion immediately adjacent to a deletion: 21I1D33M. The BAM file validates perfectly using Picard's ValidateSamFile (version 1.35).

                            I used BWA to align reads and GATK to do local re-alignment and recalibration. This read has the same alignment in both the original BWA BAM file and the re-calibrated BAM file. I did not run Picard's FixMateInformation, as I thought the mate pair information was now fixed on the fly (see http://www.broadinstitute.org/gsa/wi..._around_indels). Running FixMateInformation does not change the cigar string for the problem read in any case.

                            My command line was:

                            Code:
                            java -Xmx4g -jar ~/bin/GenomeAnalysisTK-1.0.5336/GenomeAnalysisTK.jar \
                            -l INFO -R ../hg19_index/hg19.fasta -T UnifiedGenotyper \
                            -D ../gatk_resources/dbsnp_131_hg19.rod -I bam/test.recal.bam \
                            -glm DINDEL -o test.indels.raw.vcf -stand_call_conf 30 -stand_emit_conf 10
                            Here are the SAM records for the problem read and its mate pair:

                            Code:
                            IL30_4321:7:54:11975:13960	165	chrM	1	0	*	=	1	0	AAGAGTGCTACTCTCCTCGCTCCGGGCCCATAACACTTGGGGGTAGCTAAAGTG	:>A?B=BBB=@BBBBBBB;BBBB;CCBBB@A=@AA@CBCDDCC=>CBB=??C?B	RG:Z:IL30_4321_7	MQ:i:0	OQ:Z:FFFFFCFFFFFFFFFFFFFFFFFFDFFFFFFFFFDCFFFDCFA>DFDCBBACDD	XQ:i:257
                            IL30_4321:7:54:11975:13960	89	chrM	1	0	21I1D33M	=	1	0	TTAAATAAGACATCACGATGGATCACAGGTCTATCACCCTATTAACCACTCACG	<>?@>@?BB?C>@C?<B>BBB>BC=CABABB@>?C=CCB@>C@?=CC<BBC=;<	RG:Z:IL30_4321_7	NM:i:0	OQ:Z:=DGIIEGIEDIDAIDGIIGGIIEIGIIFIEIIIFIAIIIIIIIIIIIEGIIIII	UQ:i:0
                            I did not have any trouble running the UnifiedGenotyper on this file for SNP calling.
                            I have the exact same problem using stampy for alignments. Are you using stampy to generate alignments? Perhaps this is unexpected behaviour coming from an atypical aligner?

                            Comment


                            • #29
                              We have done some alignments with Stampy and seen the exact same problem of adjacent insertions and deletions. Also, sometimes Stampy seems to align the reads with weird indels, and they do not seem to be fixed even with GATK IndelRealigner (whereas alignments from BWA seem to be aligned fine with GATK IndelRealigner).

                              Comment


                              • #30
                                Originally posted by siwright View Post
                                I have the exact same problem using stampy for alignments. Are you using stampy to generate alignments? Perhaps this is unexpected behaviour coming from an atypical aligner?
                                Yes, I was using Stampy 0.1.12 to generate the alignments. The problem is fixed in Stampy 0.1.13 as far as I can tell. I've generally been pretty happy with the indels generated from a Stampy > realign > score recal > unifiedgenotyper pipeline.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Understanding Genetic Influence on Infectious Disease
                                  by seqadmin




                                  During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                  Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                  09-09-2024, 10:59 AM
                                • seqadmin
                                  Addressing Off-Target Effects in CRISPR Technologies
                                  by seqadmin






                                  The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                                  08-27-2024, 04:44 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 06:25 AM
                                0 responses
                                13 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 01:02 PM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 09-18-2024, 06:39 AM
                                0 responses
                                14 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 09-11-2024, 02:44 PM
                                0 responses
                                14 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X