Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Hena
    Member
    • Nov 2009
    • 19

    Using dindel

    Hi all,

    I'm currently trying out dindel v0.12 for finding indels. However I hit a little snag and there is little help available that I can find.

    I'm running the stage two command to realign windows (second command of phase 2). The example in manual gives command:
    dindel --analysis indels --doDiploid --bamFile sample.bam --ref ref.fa --inputVarFile sample.realign_windows.2.txt --libFile sample.dindel_output.libraries.txt --outputFile sample.dindel_stage2_output_windows.2

    However running the above with correct file names doesn't work. It gives out error: Error parsing input options. and prints the usage. So what option(s) should be added to make that stage work?

    I also noticed that the phase 2 first command should have inputVarFile instead of varFile as said in the manual.
  • krobison
    Senior Member
    • Nov 2007
    • 734

    #2
    Kees (the author) has been quite generous about helping me past similar problems

    Just replace each $-prefixed item with the correct filename (this is pulled from some Perl code); I think the main problem you've hit is the --inputVarFile vs. --varFile inconsistency in the code
    Code:
    dindel --analysis indels --doDiploid --bamFile bamFile --ref $refFasta [B]--varFile[/B] $windowsFile  --outputFile $outputFile

    Comment

    • lh3
      Senior Member
      • Feb 2008
      • 686

      #3
      I think there are a couple of typos in the online documentation. The following shows how I run dindel.

      Code:
      ./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
      python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
      ./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
      echo 3.glf.txt > 3.list
      python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa

      Comment

      • Lee Sam
        Member
        • Oct 2008
        • 57

        #4
        Originally posted by lh3 View Post
        I think there are a couple of typos in the online documentation. The following shows how I run dindel.

        Code:
        ./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
        python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
        ./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
        echo 3.glf.txt > 3.list
        python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa
        Thanks, this is really helpful. I'm working with dindel too and I was just today wondering about these.

        Comment

        • Michael.James.Clark
          Senior Member
          • Apr 2009
          • 207

          #5
          Question regarding the --doEM option:
          I have a family of five individuals (two parents, three children), so I assume there are four haplotypes in the data set. Is there a way to set it for this (if it would make a difference)?
          Am I better off extracting each individual from the pooled BAM file and running them individually with --doDiploid instead?
          Thanks.
          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
          Projects: U87MG whole genome sequence [Website] [Paper]

          Comment

          • Hena
            Member
            • Nov 2009
            • 19

            #6
            Thanks for the answers lh3 and krobison. I got it running now .

            Comment

            • Michael.James.Clark
              Senior Member
              • Apr 2009
              • 207

              #7
              I used Dindel after GATK realignment/recalibration.
              It seems like this is redundant.
              Is it just as good/better to just run Dindel in a seperate pipeline directly from the original alignments?

              Another query: Do people just generally filter out those that end up with the fr0/q20/hp10/wv flags in the FILTER field?
              Last edited by Michael.James.Clark; 10-26-2010, 06:41 PM.
              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
              Projects: U87MG whole genome sequence [Website] [Paper]

              Comment

              • keesa
                Junior Member
                • Oct 2010
                • 2

                #8
                In general I would advise not to use variants with quality scores below 10 for single diploid samples. The fr0 filter in the 0.12 version of Dindel does reduce the number of false positives on real data but you will also loose some sensitivity.

                It is true that running Dindel on BAMs realigned by the GATK will not result in too many new calls if you have high-depth diploid data.
                The main advantage of running Dindel currently would be for calling the genotypes: here the GATK realigned BAMs might result in undercalls as reads matching the reference are not realigned even though they may support the alternative haplotype with the indel just as well as the reference haplotype.
                Also, Dindel has a dedicated sequencing error model for homopolymer runs, which should result in more accurate calls in those contexts.
                The Broad are currently implementing the Dindel algorithm in the GATK, but I don't know exactly when it will be released (later this year I expect).

                The new version of Dindel has a script that lets you select only the indels that were seen twice or more (whatever number you prefer). If you apply this to indels extracted from the realigned BAM you will be able to significantly reduce compute time.

                Kees (Disclosure: I am the author of Dindel if it wasn't clear already).

                PS I put a new version of Dindel on the website today.

                Comment

                • drio
                  Senior Member
                  • Oct 2008
                  • 323

                  #9
                  Originally posted by Michael.James.Clark View Post
                  I used Dindel after GATK realignment/recalibration.
                  It seems like this is redundant.
                  But it helps when you want to look by eye to the alignments to understand why your SNP caller performed a call.
                  -drd

                  Comment

                  • lshen
                    Member
                    • Jan 2008
                    • 30

                    #10
                    Thanks for the update. It is a great tool that I was using to re-run several data sets.

                    For v 1.01: --numWindowsPerFile option not working.

                    I see discrepancied between QUAL and last column in vcf output:
                    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S3
                    chr13 8769 . C CA 897 PASS DP=150;NF=14;NR=13;NRS=16;NFS=13;HP=1 GT:GQ 1/1:90
                    chr13 8910 . AT A 289 PASS DP=127;NF=6;NR=6;NRS=11;NFS=10;HP=2 GT:GQ 0/1:289
                    chr13 8985 . ACT A 272 PASS DP=109;NF=13;NR=0;NRS=26;NFS=0;HP=1 GT:GQ 1/1:3

                    Can you output total read counts in vcf output? Can you generate the glf file list automaticallyas part of your makeWindows.py?

                    Comment

                    • lshen
                      Member
                      • Jan 2008
                      • 30

                      #11
                      Anyone can feedback on the output? Did I make mistake in the run (single sample as diploid and with default settings)?



                      How can NRS+NFS = 32 with DP=81, and the genotype is 1/1? it should be heterozugous.

                      chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93


                      Below is more from the VCF4 output

                      ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total number of reads in haplotype window">
                      ##INFO=<ID=HP,Number=1,Type=Integer,Description="Reference homopolymer tract length">
                      ##INFO=<ID=NF,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on forward strand">
                      ##INFO=<ID=NR,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on reverse strand">
                      ##INFO=<ID=NFS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on forward strand">
                      ##INFO=<ID=NRS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on reverse strand">
                      ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
                      ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
                      ##ALT=<ID=DEL,Description="Deletion">
                      ##FILTER=<ID=q5,Description="Quality below 5">
                      ##FILTER=<ID=hp10,Description="Reference homopolymer length was longer than 10">
                      ##FILTER=<ID=fr0,Description="Non-ref allele is not covered by at least one read on both strands">
                      ##FILTER=<ID=wv,Description="Other indel in window had higher likelihood">
                      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2044B
                      chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93
                      chr7 3311292 . G GAGA 12 PASS DP=113;NF=0;NR=0;NRS=11;NFS=36;HP=2 GT:GQ 0/1:12

                      chr3 135275377 . C CCGCTCTTCCGAT 36 PASS DP=40;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:36
                      chr3 135278476 . T TAGATCGGAAGA 3 q5 DP=130;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:3
                      chr3 135281981 . C CGCTCTTCCGATCT 15 PASS DP=42;NF=0;NR=0;NRS=1;NFS=0;HP=3 GT:GQ 0/1:15

                      Comment

                      • Jaap
                        Junior Member
                        • Oct 2010
                        • 3

                        #12
                        Dindel on paired-end data

                        Hi all,

                        Since we want to compare samples sequenced in Sanger to our own samples we figured out that we needed the same analysis programs. Sanger informed me they have used Dindel for indels, so I wanted to use that too. Only thing is Dindel only takes one BAM file as input. Since I have paired-end reads I'm confused.
                        Do I need to merge these files with Samtools? And how does Dindel then know which reads are the pairs?

                        Kind regards
                        Jaap

                        Comment

                        • krobison
                          Senior Member
                          • Nov 2007
                          • 734

                          #13
                          What aligner are you using? Most aligners will take paired end data & use that in the alignment process as well as generate the proper pairing information.

                          Does dindel consider the pairing information? It could certainly have a potential value, but I'm not sure it relies on it.

                          Comment

                          • Jaap
                            Junior Member
                            • Oct 2010
                            • 3

                            #14
                            I'm using BWA for alignment.
                            Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?

                            Kind regards
                            Jaap

                            Comment

                            • drio
                              Senior Member
                              • Oct 2008
                              • 323

                              #15
                              Originally posted by Jaap View Post
                              I'm using BWA for alignment.
                              Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?
                              If you used sampe when processing your alignments your
                              BAM will already contain alignments from both ends(pairs).
                              Dindel will process them accordingly following the BAM standars.
                              -drd

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:10 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              103 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...