Unconfigured Ad

**krobison** · 10-25-2010, 04:55 AM

Kees (the author) has been quite generous about helping me past similar problems

Just replace each $-prefixed item with the correct filename (this is pulled from some Perl code); I think the main problem you've hit is the --inputVarFile vs. --varFile inconsistency in the code

Code:

dindel --analysis indels --doDiploid --bamFile bamFile --ref $refFasta [B]--varFile[/B] $windowsFile  --outputFile $outputFile

**lh3** · 10-25-2010, 05:36 AM

I think there are a couple of typos in the online documentation. The following shows how I run dindel.

Code:

./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
echo 3.glf.txt > 3.list
python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa

**Lee Sam** · 10-25-2010, 11:51 AM

Originally posted by lh3 View Post

I think there are a couple of typos in the online documentation. The following shows how I run dindel.

Code:

./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
echo 3.glf.txt > 3.list
python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa

Thanks, this is really helpful. I'm working with dindel too and I was just today wondering about these.

**Michael.James.Clark** · 10-25-2010, 01:04 PM

Question regarding the --doEM option:
I have a family of five individuals (two parents, three children), so I assume there are four haplotypes in the data set. Is there a way to set it for this (if it would make a difference)?
Am I better off extracting each individual from the pooled BAM file and running them individually with --doDiploid instead?
Thanks.

**Hena** · 10-25-2010, 09:51 PM

Thanks for the answers lh3 and krobison. I got it running now

.

**Michael.James.Clark** · 10-26-2010, 11:19 AM

I used Dindel after GATK realignment/recalibration.
It seems like this is redundant.
Is it just as good/better to just run Dindel in a seperate pipeline directly from the original alignments?

Another query: Do people just generally filter out those that end up with the fr0/q20/hp10/wv flags in the FILTER field?

**keesa** · 10-28-2010, 03:43 AM

In general I would advise not to use variants with quality scores below 10 for single diploid samples. The fr0 filter in the 0.12 version of Dindel does reduce the number of false positives on real data but you will also loose some sensitivity.

It is true that running Dindel on BAMs realigned by the GATK will not result in too many new calls if you have high-depth diploid data.
The main advantage of running Dindel currently would be for calling the genotypes: here the GATK realigned BAMs might result in undercalls as reads matching the reference are not realigned even though they may support the alternative haplotype with the indel just as well as the reference haplotype.
Also, Dindel has a dedicated sequencing error model for homopolymer runs, which should result in more accurate calls in those contexts.
The Broad are currently implementing the Dindel algorithm in the GATK, but I don't know exactly when it will be released (later this year I expect).

The new version of Dindel has a script that lets you select only the indels that were seen twice or more (whatever number you prefer). If you apply this to indels extracted from the realigned BAM you will be able to significantly reduce compute time.

Kees (Disclosure: I am the author of Dindel if it wasn't clear already).

PS I put a new version of Dindel on the website today.

http://sites.google.com/site/keesalbers/soft/dindel

**drio** · 10-28-2010, 04:33 AM

Originally posted by Michael.James.Clark View Post

I used Dindel after GATK realignment/recalibration.
It seems like this is redundant.

But it helps when you want to look by eye to the alignments to understand why your SNP caller performed a call.

**lshen** · 10-28-2010, 08:25 AM

Thanks for the update. It is a great tool that I was using to re-run several data sets.

For v 1.01: --numWindowsPerFile option not working.

I see discrepancied between QUAL and last column in vcf output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S3
chr13 8769 . C CA 897 PASS DP=150;NF=14;NR=13;NRS=16;NFS=13;HP=1 GT:GQ 1/1:90
chr13 8910 . AT A 289 PASS DP=127;NF=6;NR=6;NRS=11;NFS=10;HP=2 GT:GQ 0/1:289
chr13 8985 . ACT A 272 PASS DP=109;NF=13;NR=0;NRS=26;NFS=0;HP=1 GT:GQ 1/1:3

Can you output total read counts in vcf output? Can you generate the glf file list automaticallyas part of your makeWindows.py?

**lshen** · 11-01-2010, 09:17 AM

Anyone can feedback on the output? Did I make mistake in the run (single sample as diploid and with default settings)?

How can NRS+NFS = 32 with DP=81, and the genotype is 1/1? it should be heterozugous.

chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93

Below is more from the VCF4 output

##INFO=<ID=DP,Number=1,Type=Integer,Description="Total number of reads in haplotype window">
##INFO=<ID=HP,Number=1,Type=Integer,Description="Reference homopolymer tract length">
##INFO=<ID=NF,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on forward strand">
##INFO=<ID=NR,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on reverse strand">
##INFO=<ID=NFS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on forward strand">
##INFO=<ID=NRS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on reverse strand">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##ALT=<ID=DEL,Description="Deletion">
##FILTER=<ID=q5,Description="Quality below 5">
##FILTER=<ID=hp10,Description="Reference homopolymer length was longer than 10">
##FILTER=<ID=fr0,Description="Non-ref allele is not covered by at least one read on both strands">
##FILTER=<ID=wv,Description="Other indel in window had higher likelihood">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2044B
chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93
chr7 3311292 . G GAGA 12 PASS DP=113;NF=0;NR=0;NRS=11;NFS=36;HP=2 GT:GQ 0/1:12

chr3 135275377 . C CCGCTCTTCCGAT 36 PASS DP=40;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:36
chr3 135278476 . T TAGATCGGAAGA 3 q5 DP=130;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:3
chr3 135281981 . C CGCTCTTCCGATCT 15 PASS DP=42;NF=0;NR=0;NRS=1;NFS=0;HP=3 GT:GQ 0/1:15

**Jaap** · 11-05-2010, 07:15 AM

Dindel on paired-end data

Hi all,

Since we want to compare samples sequenced in Sanger to our own samples we figured out that we needed the same analysis programs. Sanger informed me they have used Dindel for indels, so I wanted to use that too. Only thing is Dindel only takes one BAM file as input. Since I have paired-end reads I'm confused.
Do I need to merge these files with Samtools? And how does Dindel then know which reads are the pairs?

Kind regards
Jaap

**krobison** · 11-05-2010, 07:35 AM

What aligner are you using? Most aligners will take paired end data & use that in the alignment process as well as generate the proper pairing information.

Does dindel consider the pairing information? It could certainly have a potential value, but I'm not sure it relies on it.

**Jaap** · 11-05-2010, 08:00 AM

I'm using BWA for alignment.
Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?

Kind regards
Jaap

**drio** · 11-05-2010, 08:24 AM

Originally posted by Jaap View Post

I'm using BWA for alignment.
Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?

If you used sampe when processing your alignments your
BAM will already contain alignments from both ends(pairs).
Dindel will process them accordingly following the BAM standars.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, Yesterday, 11:10 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 42 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 103 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Using dindel

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News