No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • pietro2580
    Thanks for the suggestion!
    Tried to re-run the whole pipeline, starting from the BWA alignment (including the faidx indexing of the reference sequence), but no luck.
    Moreover, mpileup - bcf - varfilter is able to predict around 81% (1891 out of 2321) of the present indels correctly, and for around 15% it's able to predict the correct "region" (+- 5 nt), but does not report the correct indel.

    Seems that for these 15% indels, there are other indels reporterd in the neighborhood of the "reported" indel in the total mpileup-output (before running varFilter), and one of these other indels (not reported) seem to be the true one.
    A few lines of the complete mpileup-output is given below, the reported deletion is -GT (fourth line) while the correct one is -TT (last line). The wrong one might be selected because it has more reads supporting it (40 + 32 versus 38 + 30).

    NC_000913       84533   .       G       .       209     .       DP=109;VDB=0.0392;AF1=0;AC1=0;DP4=52,55,0,2;MQ=48;FQ=-282;PV4=0.5,1,0.075,1     PL      0
    NC_000913       84533   .       GCT     G       54.5    .       INDEL;DP=109;VDB=0.0808;AF1=1;AC1=2;DP4=1,5,21,14;MQ=57;FQ=-63.5;PV4=0.08,0.27,1,1      GT:PL:GQ        1/1:95,29,0:55
    NC_000913       84534   .       C       .       209     .       DP=104;VDB=0.0318;AF1=0;AC1=0;DP4=50,52,1,1;MQ=48;FQ=-282;PV4=1,1,0.081,0.076   PL      0
    NC_000913       84534   .       CTGT    CT      214     .       INDEL;DP=104;VDB=0.0777;AF1=1;AC1=2;DP4=0,0,40,32;MQ=52;FQ=-252 GT:PL:GQ        1/1:255,217,0:99
    NC_000913       84535   .       T       .       207     .       DP=96;VDB=0.0413;AF1=0;AC1=0;DP4=48,47,0,1;MQ=47;FQ=-282;PV4=1,0,0.17,0.15      PL      0
    NC_000913       84536   .       G       .       205     .       DP=96;VDB=0.0639;AF1=0;AC1=0;DP4=47,45,0,0;MQ=47;FQ=-282        PL      0
    NC_000913       84536   .       GTT     G       214     .       INDEL;DP=96;VDB=0.0777;AF1=1;AC1=2;DP4=0,0,38,30;MQ=52;FQ=-240  GT:PL:GQ        1/1:255,205,0:99

    Leave a comment:

  • swbarnes2
    On possibility; I'm wondering if all your software is on the same page with regard to what the reference sequence actually is.

    You know that samtools mpileup uses an index for your fasta, and if you changed your fasta, and didn't remake this index, that might explain why the software is confused about what the sequence exactly is there.

    So, try remaking that index with samtools faidx, then rerun samtools mpileup.

    Leave a comment:

  • SAMtools mpileup prediction - few nucleotides off

    Hi all,

    I am using samtools - mpileup (version 0.1.18) to predict SNP and indels in bacterial genomes. When applying this pipeline to artificially generated Illumina data (using dwgsim), it seems to me that mpilup is able to predict the region where indels appear, but is reporting in some cases the wrong indel. In total around 2000 indels are reported, and for ~ 300 of them, a behavior like below is observed.

    E.g. on position 3.211.144, dwgsim artificially introduced a deletion of 3 nucleotides (-GAG) in the reference sequence.

    However, mpileup is giving on position 3211140 as ref TGCCG and as alternative TG, which would result in a deletion of CCG:
    ref_id 3211140 . TGCCG TG 214 . INDEL;DP=97;VDB=0.0710;AF1=1;AC1=2;DP4=0,0,24,28;MQ=39;FQ=-192 GT:PL:GQ 1/1:255,157,0:99

    However, when looking at the alignment of the genes in the BAM file (with tview), this does not seem to be correct (see attached).

    It seems to me that samtools is predicting the indel correctly, but is rather making an error in reporting, by reporting an indel with the same size a few nucleotides upstream of the real indel location. Or am I missing something?

    Thank you in advance!

    Attached Files

Latest Articles


  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin

    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    02-08-2024, 06:33 AM





Topics Statistics Last Post
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 02-14-2024, 09:19 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 02-12-2024, 03:37 PM
0 responses
Last Post seqadmin