Header Leaderboard Ad


Dindel giving error for every candidate indel



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dindel giving error for every candidate indel

    I'm running Dindel 1.01 on bam files produced by BWA and Stampy, looking for small indels. All the steps run to completion but every candidate indel encounters an error, so the final vcf file is completely empty. The errors are the following (output from cat *.glf.txt | grep "chr" | cut -d " " -f 1 | sort | uniq -c):

         14 error_above_read_count_threshold
          3 error_Cannot_find_reference_sequence.
        598 error_too_few_reads
    These proportions are typical of all runs I do - "error_too_few_reads" is dominating, though I highly doubt that there are actually too few reads - my coverage is ~25x, and through samtools pileup and manual inspection I know that there are plenty of indels that are well covered.

    When I run the "getCIGARindels" step, I get some alarming messages printed to stdout:
    Parsing indels from CIGAR strings...
    Wrote indels in CIGARS for target chr1 to file candidate_dindels
    Wrote indels in CIGARS for target chr2 to file candidate_dindels
    error: faidx error, len==0
    start: -24 end: 
    error: faidx error, len==0
    start: -11 end:
    I get about ~20 of these "faidx error" in a typical run (though not for every chromosome). I can't find anything wrong with my reference genome file or it's index file (indexed using samtools faidx - appended at end of post). I don't know if these errors are related to the above ones.

    My commands (running on 64bit Ubuntu):
    dindel-1.01-linux-64bit --analysis getCIGARindels --bamFile input.bam --outputFile candidate_dindels --ref ref_genome.fasta
    python makeWindows.py --inputVarFile candidate_dindels.variants.txt --windowFilePrefix realign_windows --numWindowsPerFile 2000
    dindel-1.01-linux-64bit --analysis indels --bamFile input.bam --ref ref_genome.fasta --libFile candidate_dindels.libraries.txt --varFile $infile --outputFile $outfile
    My fasta.fai file:
    chr1	230208	6	100	101
    chr2	813178	232523	100	101
    chr3	316617	1053839	100	101
    chr4	1531919	1373629	100	101
    chr5	576869	2920874	100	101
    chr6	270148	3503518	100	101
    chr7	1090947	3776374	100	101
    chr8	562643	4878237	100	101
    chr9	439885	5446513	100	101
    chr10	745741	5890804	100	101
    chr11	666454	6644010	100	101
    chr12	1078175	7317136	100	101
    chr13	924429	8406100	100	101
    chr14	784334	9339781	100	101
    chr15	1091289	10131966	100	101
    chr16	948062	11234175	100	101
    chr17	85779	12191725	100	101
    chr18	6318	12278369	100	101

  • #2
    I have exactly the same problem.
    Did you ever figure out what the problem was?


    • #3
      Yes in fact - Kees Albers has confirmed that this is caused by the omission of the --doDiploid (or --doPooled) flag. I left the --doDiploid flag out because I ran Dindel on haploid samples, but that will not work. You need one of the two "--do*" flags. Might it be your case too that you are not specifying this flag?

      Also the faidx errors are apparently nothing to worry about.


      • #4
        Interesting. Thanks for that. Now why is it Dindel doesn't seem to have a mode for haploid samples ?


        • #5
          Originally posted by colindaven View Post
          Interesting. Thanks for that. Now why is it Dindel doesn't seem to have a mode for haploid samples ?
          Well, it is common for software in the NGS field to not support haploid samples. In earlier versions of Dindel there was a "force homozygous" option, but not anymore.

          However, what I do is to run Dindel in pooled mode, and then use the "makeGenotypeLikelihoodFilePooled.py" script that comes with the program. This script prints a file with the likelihoods for homozygous ref/ref, heterozygous ref/alt and homozygous ref/ref - grabbing only the two homozygous likelihoods (ignoring the heterozygous likelihood) from this file allows me to force homozygosity (though then I have to compute some measure of confidence myself, e.g. a likelihood ratio or a posterior probability).


          Latest Articles


          • seqadmin
            A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
            by seqadmin

            ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

            01-24-2023, 01:19 PM
          • seqadmin
            Introduction to Single-Cell Sequencing
            by seqadmin
            Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

            The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
            01-09-2023, 03:10 PM