I am working on a pipeline to go from annotations from a tool through several steps ultimately to a gff file in GFF3 format. The last step is conversion from a genbank flatfile to gff3.
For the final conversion I am having some difficulty understanding exactly what the relational terms in the Sequence Ontology database.
Specifically I am trying to understand better what the is_a, part_of, and member_of relational terms in the Sequence Ontology database mean.
For example, for a given CDS I want to encode it as a gene, intron, and CDS.
I would like to encode them all using as few intermediate terms (e.g. mRNA, transcript) as possible.
According to the SO (http://www.sequenceontology.org/brow...erm/SO:0000188) an intron can be related to a gene as follows:
an intron is_a primary_transcript_region which is part_of a primary_transcript which is_a transcript which is_a gene_member_region which is a member_of a gene
According the SO (http://www.sequenceontology.org/brow...erm/SO:0000316) a CDS can be related to a gene as follows:
a CDS is_a mRNA_region which is part_of a mRNA which is_a mature_transcript which is_a transcript which is_a gene_member region which is a member_of a gene
So what it looks like to me at this point is that a mRNA is synonymous with a transcript (related by two is_a steps).
So even though mRNA isn't listed in the relationships to an intron, can I still encode an intron as a child of a mRNA?
Am I misunderstanding these relationships?
For the final conversion I am having some difficulty understanding exactly what the relational terms in the Sequence Ontology database.
Specifically I am trying to understand better what the is_a, part_of, and member_of relational terms in the Sequence Ontology database mean.
For example, for a given CDS I want to encode it as a gene, intron, and CDS.
I would like to encode them all using as few intermediate terms (e.g. mRNA, transcript) as possible.
According to the SO (http://www.sequenceontology.org/brow...erm/SO:0000188) an intron can be related to a gene as follows:
an intron is_a primary_transcript_region which is part_of a primary_transcript which is_a transcript which is_a gene_member_region which is a member_of a gene
According the SO (http://www.sequenceontology.org/brow...erm/SO:0000316) a CDS can be related to a gene as follows:
a CDS is_a mRNA_region which is part_of a mRNA which is_a mature_transcript which is_a transcript which is_a gene_member region which is a member_of a gene
So what it looks like to me at this point is that a mRNA is synonymous with a transcript (related by two is_a steps).
So even though mRNA isn't listed in the relationships to an intron, can I still encode an intron as a child of a mRNA?
Am I misunderstanding these relationships?