I having some fun with it. What it seems like to me is I've got to have very detailed knowledge of the transcriptome within context of a sequencing run. For example we know things like that there are families of genes in different loci who are 50 or 60% similar which to a biologist makes it sound like they are fairly separable. To an aligner with 50bp reads, however, those features could share a lot of data when one or the other is expressed. Since most mappers assign equally good hits randomly that's gonna be messy.
So you need to know how much data can be shared between which genes at a specific sequencing type and read length.
So you need to know how much data can be shared between which genes at a specific sequencing type and read length.
Comment