I am new to seqanswers. I have already searched for answers to some of my questions. Forgive me if there are already posts that address them, though.
I am using illumina for a resequencing project to explore the genetic diversity of an RNA virus population (RNA viruses have on average one mutation per genome per replication cycle - so A LOT of SNPs). I am trying to get up to speed on analysis programs and am learning basic python as well. Would like to minimize the number of programs that I use, but realize that I may need several to achieve my analytic goals, which are:
1. Align reads to my reference genome (~7kb) and generate data/histograms of per base coverage
2. Identify unique reads
3. Identify single nucleotide polymorphisms and their approximate frequency
4. Determine whether polymorphisms are synonymous or nonsynonymous for amino acid change based on known viral reference sequence (many tools I have found are for humans, mice, yeast etc.). If I need to write a script myself fine, but do people have suggestions on how to incorporate coding frame into short read analysis?
5. Get a frequency count on the types of polymorphism (NS vs. S, charge changes, stop codons) on a per codon or base level
6. Map polymorphisms to reference genome and possibly localize to a given protein or sub-protein domain if possible
Any help on any of these is much appreciated. Anticipate my biggest block will be with #4 and #5.
Thanks!
I am using illumina for a resequencing project to explore the genetic diversity of an RNA virus population (RNA viruses have on average one mutation per genome per replication cycle - so A LOT of SNPs). I am trying to get up to speed on analysis programs and am learning basic python as well. Would like to minimize the number of programs that I use, but realize that I may need several to achieve my analytic goals, which are:
1. Align reads to my reference genome (~7kb) and generate data/histograms of per base coverage
2. Identify unique reads
3. Identify single nucleotide polymorphisms and their approximate frequency
4. Determine whether polymorphisms are synonymous or nonsynonymous for amino acid change based on known viral reference sequence (many tools I have found are for humans, mice, yeast etc.). If I need to write a script myself fine, but do people have suggestions on how to incorporate coding frame into short read analysis?
5. Get a frequency count on the types of polymorphism (NS vs. S, charge changes, stop codons) on a per codon or base level
6. Map polymorphisms to reference genome and possibly localize to a given protein or sub-protein domain if possible
Any help on any of these is much appreciated. Anticipate my biggest block will be with #4 and #5.
Thanks!
Comment