While working with the UCSC and RefSeq noncoding data extracted as instructed, I found a strange problem.
First of all, I collected non-coding transcripts from RefSeq and UCSC annotations using the filters "NR_" and "cdsStart=cdsEnd" in UCSC Genome browser and coding transcripts using the filters "NM_" and "cdsStart!=cdsEnd".
I would expect that a transcript cannot be both protein-coding and non-coding, but there are many transcripts appearing both in coding and non-coding annotations.
For example, I found that NR_02784 and NM_013796 refer to the same transcript.
This trend is also found between UCSC and RefSeq annotations.
For example, UCSC lncRNA 'uc009ura.2' is exactly the same as RefSeq protein-coding transcript NM_027327.
How can this happen?
Does this mean I somehow collected the data in the wrong way or is it their annotation problem?
In the case of latter, what can I do about these transcripts? do I have to throw out all such transcripts of my considerations?
Thanks guys for your time and helps.
HJ.
First of all, I collected non-coding transcripts from RefSeq and UCSC annotations using the filters "NR_" and "cdsStart=cdsEnd" in UCSC Genome browser and coding transcripts using the filters "NM_" and "cdsStart!=cdsEnd".
I would expect that a transcript cannot be both protein-coding and non-coding, but there are many transcripts appearing both in coding and non-coding annotations.
For example, I found that NR_02784 and NM_013796 refer to the same transcript.
This trend is also found between UCSC and RefSeq annotations.
For example, UCSC lncRNA 'uc009ura.2' is exactly the same as RefSeq protein-coding transcript NM_027327.
How can this happen?
Does this mean I somehow collected the data in the wrong way or is it their annotation problem?
In the case of latter, what can I do about these transcripts? do I have to throw out all such transcripts of my considerations?
Thanks guys for your time and helps.
HJ.
Comment