I've had good success with Tophat and Cufflinks for analysis of RNASeq and computation of expression changes between samples - thanks to all the authors! I'm pretty new to this game (<6 months experience with NGS, aligners and UNIX) but these have been helpful programs.
I have an unresolved issue, though; how do these algorithms assign sequence reads/bundles which map to identical areas of the genome? (mouse genome in this case.) This may be a question that is applicable to all similar alignment software. As an example, the following transcripts (EMBL nomenclature):
ENSMUST00000022634 (chromosome 14) - the 'authentic' gene transcript
ENSMUST00000101577 (chromosome 12) - a pseudogene, whose transcript may or may not be translated
are ~1 kb transcripts that are virtually identical (just a few mismatched nucleotides between them). The cufflinks transcripts.gtf, .tmap and .tracking files show that different RPKM values are being generated for each of these two transcripts, and the CUFF.xxx bundles assigned to them are numbered differently.
I'm curious as to whether sequence reads/bundles which are identical are being randomly assigned as the chr14 or the chr12 transcript; or whether they are being assigned to both loci?
Thanks for any insight.
I have an unresolved issue, though; how do these algorithms assign sequence reads/bundles which map to identical areas of the genome? (mouse genome in this case.) This may be a question that is applicable to all similar alignment software. As an example, the following transcripts (EMBL nomenclature):
ENSMUST00000022634 (chromosome 14) - the 'authentic' gene transcript
ENSMUST00000101577 (chromosome 12) - a pseudogene, whose transcript may or may not be translated
are ~1 kb transcripts that are virtually identical (just a few mismatched nucleotides between them). The cufflinks transcripts.gtf, .tmap and .tracking files show that different RPKM values are being generated for each of these two transcripts, and the CUFF.xxx bundles assigned to them are numbered differently.
I'm curious as to whether sequence reads/bundles which are identical are being randomly assigned as the chr14 or the chr12 transcript; or whether they are being assigned to both loci?
Thanks for any insight.
Comment