I have a gff file of genes that exist in mulitple isoforms and often several versions of mRNAs. I want to assemble all the exons for each gene to represent the longest isoform. I have tried bedtools and gffread, but I only get the sequences for each feature in the gff file. Any tips on how to do this?
Here's an example of my file:
Here's an example of my file:
Code:
##gff-version 3 ### scis396 noncoding gene 286737 287339 . - . ID=scign021128;Name=scign021128 scis396 noncoding mRNA 286737 287339 3015 - . ID=scitn021128.1;Parent=scign021128;Name=scitn021128.1 scis396 noncoding exon 286737 287339 . - . Parent=scitn021128.1 ### scis673 noncoding gene 85677 115116 . + . ID=scign002358;Name=scign002358 scis673 noncoding mRNA 113016 115116 6254 + . ID=scitn002358.1;Parent=scign002358;Name=scitn002358.1 scis673 noncoding exon 113016 113049 . + . Parent=scitn002358.1 scis673 noncoding exon 113444 114538 . + . Parent=scitn002358.1 scis673 noncoding exon 114973 115116 . + . Parent=scitn002358.1 scis673 noncoding mRNA 85677 115099 3835 + . ID=scitn002358.2;Parent=scign002358;Name=scitn002358.2 scis673 noncoding exon 85677 85697 . + . Parent=scitn002358.2 scis673 noncoding exon 113896 114538 . + . Parent=scitn002358.2 scis673 noncoding exon 114973 115099 . + . Parent=scitn002358.2
Comment