Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JonB
    Member
    • Jan 2010
    • 85

    Extract gene sequences from gff3 file and reference fasta

    Hi,

    I have a gff3 file and I want to extract the gene sequences (not including introns). Several genes have many isoforms, but I want only the gene sequence (i.e. all the exons spliced). Anyone know of a tool that does this? I tried gffread from the tophat package but I could not get only the gene sequence.

    Sample of my gff3 file:
    Code:
    ##gff-version 3
    ###
    scis2053        noncoding       gene    27485   28677   .       -       .       ID=scign013105;Name=scign013105
    scis2053        noncoding       mRNA    27485   28677   5921    -       .       ID=scitn013105.1;Parent=scign013105;Name=scitn013105.1
    scis2053        noncoding       exon    27485   28677   .       -       .       Parent=scitn013105.1
    ###
    scis673 noncoding       gene    85677   115116  .       +       .       ID=scign002358;Name=scign002358
    scis673 noncoding       mRNA    113016  115116  6254    +       .       ID=scitn002358.1;Parent=scign002358;Name=scitn002358.1
    scis673 noncoding       exon    113016  113049  .       +       .       Parent=scitn002358.1
    scis673 noncoding       exon    113444  114538  .       +       .       Parent=scitn002358.1
    scis673 noncoding       exon    114973  115116  .       +       .       Parent=scitn002358.1
    scis673 noncoding       mRNA    85677   115099  3835    +       .       ID=scitn002358.2;Parent=scign002358;Name=scitn002358.2
    scis673 noncoding       exon    85677   85697   .       +       .       Parent=scitn002358.2
    scis673 noncoding       exon    113896  114538  .       +       .       Parent=scitn002358.2
    scis673 noncoding       exon    114973  115099  .       +       .       Parent=scitn002358.2
  • Delphine
    Junior Member
    • Oct 2010
    • 2

    #2
    Hi,

    You can use the BEDtools suite with getfasta (http://bedtools.readthedocs.org/en/l.../getfasta.html). You need your gff3 file and the fasta file of your genome (reference).
    Usage: bedtools getfasta [OPTIONS] -fi <input FASTA> -bed <BED/GFF/VCF> -fo <output FASTA>

    Comment

    Latest Articles

    Collapse

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by SEQadmin2, Yesterday, 10:09 AM
    0 responses
    9 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-04-2026, 08:59 AM
    0 responses
    17 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 12:03 PM
    0 responses
    26 views
    0 reactions
    Last Post SEQadmin2  
    Started by SEQadmin2, 06-02-2026, 11:40 AM
    0 responses
    21 views
    0 reactions
    Last Post SEQadmin2  
    Working...