Hello everyone! It's my first time posting here.
In the lab I'm at, we're working mostly with bacteria (A bit of yeast here and there, but rarely) and I got involved in a project where we're working with RNA-seq data from Illumina that consists of a decent amount of E.coli RNA sequences.
Currently I'm trying to look for indels in these data, and it took me shamefully long to notice, that VarScan, which I was using for variant calling, in fact only reports one alternate allele per position and somehow just loses the others. I only noticed it because reads supporting alternate allele and those supporting the reference allele do not add up to total coverage at many positions. I wasn't that surprised as most of the tools are optimized (or in fact created solely) for analyzing human data which is quite different from bacterial and so one has to be careful when sticking bacteria where is not their place.
In my troubles I happened upon this site: http://www.oliverelliott.org/article...t_mpileup2vcf/ . Where a man of knowledge was troubled by the same problem and wrote a program in C++ that takes the input from smatools mpileup and turns it into a .vcf file from which I can sort out indels.
Now I'm not all that good in reading C++ so I wouldn't notice any mistakes and would have to rely on user feedback, but lo and behold there is none!
Has anyone ever seen or used this piece of software or knows any other that could help me in my troubles with my bacterial data?
Now bear in mind that I got my BsC just last spring in biology and couldn't write even a line of python by the life of me at the time so you could say I'm quite wet behind the ears.
In the lab I'm at, we're working mostly with bacteria (A bit of yeast here and there, but rarely) and I got involved in a project where we're working with RNA-seq data from Illumina that consists of a decent amount of E.coli RNA sequences.
Currently I'm trying to look for indels in these data, and it took me shamefully long to notice, that VarScan, which I was using for variant calling, in fact only reports one alternate allele per position and somehow just loses the others. I only noticed it because reads supporting alternate allele and those supporting the reference allele do not add up to total coverage at many positions. I wasn't that surprised as most of the tools are optimized (or in fact created solely) for analyzing human data which is quite different from bacterial and so one has to be careful when sticking bacteria where is not their place.
In my troubles I happened upon this site: http://www.oliverelliott.org/article...t_mpileup2vcf/ . Where a man of knowledge was troubled by the same problem and wrote a program in C++ that takes the input from smatools mpileup and turns it into a .vcf file from which I can sort out indels.
Now I'm not all that good in reading C++ so I wouldn't notice any mistakes and would have to rely on user feedback, but lo and behold there is none!
Has anyone ever seen or used this piece of software or knows any other that could help me in my troubles with my bacterial data?
Now bear in mind that I got my BsC just last spring in biology and couldn't write even a line of python by the life of me at the time so you could say I'm quite wet behind the ears.
Comment