Hi everyone,
My apologies if this question has been asked before, but my searches on the forum came up with nothing.
I got sequencing results back from mouse RNA (75 bp PE, 50 mln reads) and then ran it through a Tophat-Cufflinks-Cuffmerge-Cuffdiff pipeline. This results in a list with differentially transcribed genes. So far, so good.
However, some genes seem to have been 'combined' somewhere along the pipeline, that it is there are multiple gene symbols on one line but only one fpkm value, chromosomal location, etc. Below are two examples:
XLOC_002706 XLOC_002706 Neurod4,Vmn2r84,Vmn2r85,Vmn2r86,Vmn2r87 chr10:130268058-130542669 SC4 SC2 OK 889.112 524.056 -0.762645 -256.757 5,00E-05 0.00061898 yes
XLOC_003158 XLOC_003158 2410006H16Rik,Snord49a,Snord49b chr11:62601222-62670908 SC4 SC2 OK 327.662 373.866 0.19031 0.453688 0.3605 0.742145 no
As you can see, this can be found both when there is or is no significant change.
Has anyone had this problem before? If so, what is the best solution? Should the reads be trimmed to avoid them overlapping multiple genes and if so, how much trimming is recommended?
Many thanks for your input.
My apologies if this question has been asked before, but my searches on the forum came up with nothing.
I got sequencing results back from mouse RNA (75 bp PE, 50 mln reads) and then ran it through a Tophat-Cufflinks-Cuffmerge-Cuffdiff pipeline. This results in a list with differentially transcribed genes. So far, so good.
However, some genes seem to have been 'combined' somewhere along the pipeline, that it is there are multiple gene symbols on one line but only one fpkm value, chromosomal location, etc. Below are two examples:
XLOC_002706 XLOC_002706 Neurod4,Vmn2r84,Vmn2r85,Vmn2r86,Vmn2r87 chr10:130268058-130542669 SC4 SC2 OK 889.112 524.056 -0.762645 -256.757 5,00E-05 0.00061898 yes
XLOC_003158 XLOC_003158 2410006H16Rik,Snord49a,Snord49b chr11:62601222-62670908 SC4 SC2 OK 327.662 373.866 0.19031 0.453688 0.3605 0.742145 no
As you can see, this can be found both when there is or is no significant change.
Has anyone had this problem before? If so, what is the best solution? Should the reads be trimmed to avoid them overlapping multiple genes and if so, how much trimming is recommended?
Many thanks for your input.
Comment