I have some fairly basic questions about software tools to use on my data, which is from a population of yeast evolved experimentally over about 2-3 weeks. I'm asking this because most of the tools I've been finding discuss applications that sound very different from mine (diploid, especially human, data from natural populations that have evolved for a very long time away from the reference sequence), so I would appreciate any guidance from those more knowledgeable.
My data consists of reads (76 bp) from a novel strain of yeast evolved over a fairly short time under stressed conditions, so presumably many variants are low-frequency (due to the short time) and driven by selection (due to the stress). I've already aligned them using Bowtie 2. Coverage is in the neighborhood of 100x in most places.
Let's say I am just interested in identifying variants (rather than inferring more detailed things like allele frequencies) -- basically I would like a list of where these variants are, how many reads they appeared on, some statistical criterion to evaluate their quality, etc. I have tried SAMtools and FreeBayes for this, but the have a lot of technical details -- about the allele frequency spectra, genotyping, Bayesian analysis, neutral evolution priors -- that I don't yet understand and don't know if they are even relevant for this level of analysis, which I believe to be pretty simple. Ideally, I would prefer something that just searches through the alignment and reports everywhere reads don't match the reference, which are then scored in some way using the quality score of that read's mapping and base quality. I would prefer not to impose any prior knowledge about a model (e.g., neutral evolution), but based on the documentation I've read, I can't tell if this is relevant here or not.
Again, thanks for help anyone can provide!
My data consists of reads (76 bp) from a novel strain of yeast evolved over a fairly short time under stressed conditions, so presumably many variants are low-frequency (due to the short time) and driven by selection (due to the stress). I've already aligned them using Bowtie 2. Coverage is in the neighborhood of 100x in most places.
Let's say I am just interested in identifying variants (rather than inferring more detailed things like allele frequencies) -- basically I would like a list of where these variants are, how many reads they appeared on, some statistical criterion to evaluate their quality, etc. I have tried SAMtools and FreeBayes for this, but the have a lot of technical details -- about the allele frequency spectra, genotyping, Bayesian analysis, neutral evolution priors -- that I don't yet understand and don't know if they are even relevant for this level of analysis, which I believe to be pretty simple. Ideally, I would prefer something that just searches through the alignment and reports everywhere reads don't match the reference, which are then scored in some way using the quality score of that read's mapping and base quality. I would prefer not to impose any prior knowledge about a model (e.g., neutral evolution), but based on the documentation I've read, I can't tell if this is relevant here or not.
Again, thanks for help anyone can provide!
Comment