SeqMan NGen
Update: I visit this site quite a bit to understand the tools available and where this technology is taking us, but I haven't actually posted in some time. Here's an update if you are interested in DNAStar development.
SeqMan Genome Assembler was an in-house name during development of the assembly program. SeqMan NGen is the name going forward, as it really is an engine for providing these assemblies for end users, who need no special computer specs like 64 bit operating systems and lots of RAM for subsequent assembly analysis. Normal computers do the end user job. Assemblies include siRNA targeting, ChIP-Seq, mRNA alignment to genomic templates, etc., so "Genome Assembler" was a limiting name for the program.
The last few posts dealt with strategies for sequence aligners to compensate for under reporting SNPs in areas heavy with mutation. This gets to the heart of the difference between aligners like MAQ and ELAND and actual contig assemblers that produce .ace files or their equivalent. An aligner program throws one read against the template, records where it sticks, and then proceeds to the next read. The big problem is that if there are more than two differences between any read and the reference, the read is thrown out. The output is a big text file.
NGen performs several passes during the assembly process (and quickly). The first pass does something resembling what aligners do, in that it takes care of the easy reads. In subsequent passes the assembly is completely de novo. All reads are incorporated in the context of the existing reads of the experimental strain. There is no limit to the number of differences between the reference sequence and any given read of the experimental strain, as it is a de novo assembly that disregards the template entirely. No reads are thrown out. There could be eight true SNP differences between your strain and your reference strain within a 35 bp span, for instance, and those SNPs will be reported and can be visually confirmed in the alignment view.
The end user can also filter out false SNPs based on quality score, percent of SNPs in reads at each locus, depth of coverage, and known vs. novel SNPs, using the normal SeqMan interface. SNP reporting also includes subsequent silent or non-silent amino acid mutations at specific aa positions at the protein level. The end user actually has a fairly easy job of discerning those SNPs that matter. The strategy for dealing with large indels or transpositions is exquisite, and you will just have to contact us for that, as it is beyond the scope of a board post.
Aligners like MAQ are actually very effective if one uses a reference sequence that is "the answer", but that is not necessarily the case in many projects. We are actually introducing a MAQ-like aligner in a couple of weeks for next-gen RNA-Seq comparative gene expression analysis, and the results feed directly into the tools traditionally used for microarray analysis like scatter plots and heat maps. Of course, RNA-Seq is orders of magnitude more sensitive and accurate than microarray.
For sequence assembly, nothing beats an actual assembly rather than a read-by-read alignment text file. Due to computer limitations, aligners that throw reads one at a time at a reference sequence are a necessary evil right now for higher level eukaryotes, but that will soon change and end users will soon be able to visualize actual assemblies at any position along the genome.
Update: I visit this site quite a bit to understand the tools available and where this technology is taking us, but I haven't actually posted in some time. Here's an update if you are interested in DNAStar development.
SeqMan Genome Assembler was an in-house name during development of the assembly program. SeqMan NGen is the name going forward, as it really is an engine for providing these assemblies for end users, who need no special computer specs like 64 bit operating systems and lots of RAM for subsequent assembly analysis. Normal computers do the end user job. Assemblies include siRNA targeting, ChIP-Seq, mRNA alignment to genomic templates, etc., so "Genome Assembler" was a limiting name for the program.
The last few posts dealt with strategies for sequence aligners to compensate for under reporting SNPs in areas heavy with mutation. This gets to the heart of the difference between aligners like MAQ and ELAND and actual contig assemblers that produce .ace files or their equivalent. An aligner program throws one read against the template, records where it sticks, and then proceeds to the next read. The big problem is that if there are more than two differences between any read and the reference, the read is thrown out. The output is a big text file.
NGen performs several passes during the assembly process (and quickly). The first pass does something resembling what aligners do, in that it takes care of the easy reads. In subsequent passes the assembly is completely de novo. All reads are incorporated in the context of the existing reads of the experimental strain. There is no limit to the number of differences between the reference sequence and any given read of the experimental strain, as it is a de novo assembly that disregards the template entirely. No reads are thrown out. There could be eight true SNP differences between your strain and your reference strain within a 35 bp span, for instance, and those SNPs will be reported and can be visually confirmed in the alignment view.
The end user can also filter out false SNPs based on quality score, percent of SNPs in reads at each locus, depth of coverage, and known vs. novel SNPs, using the normal SeqMan interface. SNP reporting also includes subsequent silent or non-silent amino acid mutations at specific aa positions at the protein level. The end user actually has a fairly easy job of discerning those SNPs that matter. The strategy for dealing with large indels or transpositions is exquisite, and you will just have to contact us for that, as it is beyond the scope of a board post.
Aligners like MAQ are actually very effective if one uses a reference sequence that is "the answer", but that is not necessarily the case in many projects. We are actually introducing a MAQ-like aligner in a couple of weeks for next-gen RNA-Seq comparative gene expression analysis, and the results feed directly into the tools traditionally used for microarray analysis like scatter plots and heat maps. Of course, RNA-Seq is orders of magnitude more sensitive and accurate than microarray.
For sequence assembly, nothing beats an actual assembly rather than a read-by-read alignment text file. Due to computer limitations, aligners that throw reads one at a time at a reference sequence are a necessary evil right now for higher level eukaryotes, but that will soon change and end users will soon be able to visualize actual assemblies at any position along the genome.
Comment