Unconfigured Ad

**Pseudonym** · 04-15-2012, 04:54 PM

You're almost correct. The names are indeed based on the algorithm, but it's not quite as simple as the inchworm number, then the chrysalis number, then the butterfly number.

This is the basic algorithm of Trinity, assuming that you've already built a de Bruijn assembly graph (which is identical to a (k+1)-mer count):

Eagerly extract contigs from the de Bruijn graph. These contigs may or may not have any relationship to "real" transcripts, they're just whatever long contiguous paths happen to be found in the graph.
Find reads which justify clustering/joining ("welding" in Trinity terminology) these contigs together. A set of contigs which are believed to belong together is called a "component".
Align reads to components. For each read, decide which component it's most likely to belong to.
For each component, treat the reads which map to that component as a separate assembly problem. This involves constructing a new (smaller) de Bruijn graph from only those reads which belong to a component.

The output of Inchworm is the set of contigs. The output of Chrysalis is the set of components, plus the reads which are called as belonging to those components. The output of Butterfly is the called transcripts for each component.

When Butterfly rebuilds a graph for each component and does cleanup, it sometimes finds that the resulting graph is disconnected. This is usually because the inital contigs discovered by Inchworm were not "real" contigs. Each connected component in the rebuilt graph is called a "subcomponent".

So the "comp" is the component, "c" is the subcomponent, and "seq" is the extracted sequence from the subcomponent.

Trinity does not reason in terms of genes, loci and alternative splicing events. It solves a graph problem, though of course the heuristics are tuned to the needs of biology. So while it's highly likely that all the isoforms of a given gene belong to the same subcomponent, you shouldn't assume that a subcomponent is a gene.

**nareshvasani** · 12-11-2013, 08:49 AM

Nice explaination Pseudonym...

Originally posted by Pseudonym View Post

You're almost correct. The names are indeed based on the algorithm, but it's not quite as simple as the inchworm number, then the chrysalis number, then the butterfly number.

This is the basic algorithm of Trinity, assuming that you've already built a de Bruijn assembly graph (which is identical to a (k+1)-mer count):

Eagerly extract contigs from the de Bruijn graph. These contigs may or may not have any relationship to "real" transcripts, they're just whatever long contiguous paths happen to be found in the graph.
Find reads which justify clustering/joining ("welding" in Trinity terminology) these contigs together. A set of contigs which are believed to belong together is called a "component".
Align reads to components. For each read, decide which component it's most likely to belong to.
For each component, treat the reads which map to that component as a separate assembly problem. This involves constructing a new (smaller) de Bruijn graph from only those reads which belong to a component.

The output of Inchworm is the set of contigs. The output of Chrysalis is the set of components, plus the reads which are called as belonging to those components. The output of Butterfly is the called transcripts for each component.

When Butterfly rebuilds a graph for each component and does cleanup, it sometimes finds that the resulting graph is disconnected. This is usually because the inital contigs discovered by Inchworm were not "real" contigs. Each connected component in the rebuilt graph is called a "subcomponent".

So the "comp" is the component, "c" is the subcomponent, and "seq" is the extracted sequence from the subcomponent.

Trinity does not reason in terms of genes, loci and alternative splicing events. It solves a graph problem, though of course the heuristics are tuned to the needs of biology. So while it's highly likely that all the isoforms of a given gene belong to the same subcomponent, you shouldn't assume that a subcomponent is a gene.

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 32 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

Trinity transcript naming

Comment

Comment

Latest Articles

ad_right_rmr

News