Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • amias
    Junior Member
    • Sep 2014
    • 3

    converting genomic coordinates to transcript coordinates

    Hello,
    I need to convert the genomic coordinates for UTR and CDS features in the arabidopsis gff3 gene annotations to transcript coordinates. Can anyone suggest a way to do this using awk or other languages? I have tried to first calculate the length of the features for each ID, so that I can calculate the feature start and end positions for each transcript based on the length. However I am not sure how to write the code for the second part. I have searched for related threads but none of the solution seemed to provide what I want. Is there already a package or script for this kind of conversion? Any suggestion is most welcome!

    Here is a sample of my input data:
    Code:
    more Athaliana_167_TAIR10.gene.gff3 
    ##gff-version 3
    ##annot-version TAIR10
    Chr1    phytozomev10    gene    3631    5899    .       +       .       ID=AT1G01010.TAIR10;Name=AT1G01010
    Chr1    phytozomev10    mRNA    3631    5899    .       +       .       ID=AT1G01010.1.TAIR10;Name=AT1G01010.1;pacid=19656964;longest=1;Parent=AT1G01010.TAIR10
    Chr1    phytozomev10    five_prime_UTR  3631    3759    .       +       .       ID=AT1G01010.1.TAIR10.five_prime_UTR.1;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     3760    3913    .       +       0       ID=AT1G01010.1.TAIR10.CDS.1;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     3996    4276    .       +       2       ID=AT1G01010.1.TAIR10.CDS.2;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     4486    4605    .       +       0       ID=AT1G01010.1.TAIR10.CDS.3;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     4706    5095    .       +       0       ID=AT1G01010.1.TAIR10.CDS.4;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     5174    5326    .       +       0       ID=AT1G01010.1.TAIR10.CDS.5;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    CDS     5439    5630    .       +       0       ID=AT1G01010.1.TAIR10.CDS.6;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    three_prime_UTR 5631    5899    .       +       .       ID=AT1G01010.1.TAIR10.three_prime_UTR.1;Parent=AT1G01010.1.TAIR10;pacid=19656964
    Chr1    phytozomev10    gene    5928    8737    .       -       .       ID=AT1G01020.TAIR10;Name=AT1G01020
    Chr1    phytozomev10    mRNA    5928    8737    .       -       .       ID=AT1G01020.1.TAIR10;Name=AT1G01020.1;pacid=19655142;longest=1;Parent=AT1G01020.TAIR10
    Chr1    phytozomev10    CDS     8571    8666    .       -       0       ID=AT1G01020.1.TAIR10.CDS.1;Parent=AT1G01020.1.TAIR10;pacid=19655142
    Here is my working code based on the thread here for calculating the UTR and CDS feature length for each transcript:

    Code:
    awk '$3!="gene" && $3!="mRNA"' <Athaliana_167_TAIR10.gene.gff3|awk  '$9~/Parent/{sub(/.*Parent=/,"",$9);sub(/.TAIR10;.*/,"",$9);ID=$9;L[ID]=$5-$4+1}{for(i in L){print $0,L[i]}}'

Latest Articles

Collapse

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, 06-09-2026, 11:58 AM
0 responses
24 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-05-2026, 10:09 AM
0 responses
29 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-04-2026, 08:59 AM
0 responses
39 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-02-2026, 12:03 PM
0 responses
62 views
0 reactions
Last Post SEQadmin2  
Working...