I recently published an open-source toolkit for high-throughput next-generation sequence analysis of combinatorial selections. The name, a portmanteau of FASTA format and Aptamer, came about as a result of our exploitation of the FASTA format to ensure downstream compatibility while maintaining the important metrics (sequence frequency, rank, cluster identity, etc.) in the description line.
Summary
FASTAptamer is an easy-to-use and universally compatible toolkit designed for bench scientists to address the primary sequence analysis needs from high-throughput sequencing of combinatorial selection populations. FASTAptamer performs the simple tasks of counting, normalizing, ranking and sorting the abundance of each unique sequence in a population, comparing sequence distributions for two populations, clustering sequences into sequence families based on Levenshtein edit distance, calculating fold-enrichment for all of the sequences present across populations, and searching degenerately for nucleotide sequence motifs. While FASTAptamer was originally developed for analysis of high-throughput sequencing data from aptamer selections, it offers broad utility for those working on ribozyme or deoxyribozyme (DNAzyme) selections, surface display (phage or yeast display, mRNA display, antibody fragment, etc.) selections, in vivo SELEX, protein mutagenesis selection (deep mutational scanning), directed evolution, or any biocombinatorial selection that results in a DNA-encoded library for sequencing.
Check out our publication here: Molecular Therapy Nucleic Acids (2015) 4, e230; doi:10.1038/mtna.2015.4
FASTAptamer software, sample data and a user’s guide are available for download at http://burkelab.missouri.edu/fastaptamer.html.
We have made our tools available on the Galaxy ToolShed (under the "combinatorial selections" category).
We also welcome development of our toolkit over on GitHub!
Summary
FASTAptamer is an easy-to-use and universally compatible toolkit designed for bench scientists to address the primary sequence analysis needs from high-throughput sequencing of combinatorial selection populations. FASTAptamer performs the simple tasks of counting, normalizing, ranking and sorting the abundance of each unique sequence in a population, comparing sequence distributions for two populations, clustering sequences into sequence families based on Levenshtein edit distance, calculating fold-enrichment for all of the sequences present across populations, and searching degenerately for nucleotide sequence motifs. While FASTAptamer was originally developed for analysis of high-throughput sequencing data from aptamer selections, it offers broad utility for those working on ribozyme or deoxyribozyme (DNAzyme) selections, surface display (phage or yeast display, mRNA display, antibody fragment, etc.) selections, in vivo SELEX, protein mutagenesis selection (deep mutational scanning), directed evolution, or any biocombinatorial selection that results in a DNA-encoded library for sequencing.
Check out our publication here: Molecular Therapy Nucleic Acids (2015) 4, e230; doi:10.1038/mtna.2015.4
FASTAptamer software, sample data and a user’s guide are available for download at http://burkelab.missouri.edu/fastaptamer.html.
We have made our tools available on the Galaxy ToolShed (under the "combinatorial selections" category).
We also welcome development of our toolkit over on GitHub!