Announcement

Collapse
No announcement yet.

gtf file for arabidopsis

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gtf file for arabidopsis

    Does anyone know where I can find a gtf file for arabidopsis or a program that I can use to create one?

    Thanks!

  • #2
    a gtf file of what?
    for gene annotation, maybe the TAIR or the TIGR

    Comment


    • #3
      Yes, a gtf for gene annotation. I checked TAIR, TIGR, EMBL, etc and so far have been unable to locate a gtf file. I can only find gff files for arabidopsis. It seems EMBL has gtf files for everything except plants. I looked at gbrowse and the UCSC genome browser and I don't see a way to export as a gtf file. I have spent much time with google and I haven't found anything useful.

      Comment


      • #4
        Why isn't gff ok? Are you looking for a specific field like "transcript_id:" or so?

        Comment


        • #5
          I'm trying to get the AB whole transcriptome pipeline working and it requires the transcript_id and gene_id fields in the gtf file. I tried the gff and it didn't work. I don't have the programming skills to create a perl script so I was hoping I could download a gtf file or find an application that could create one.

          Comment


          • #6
            could you show me a few lines of the gff file(s) you have, just in case the information is available and easy to convert?

            Comment


            • #7
              The gff files look like this. I think the major challenge in converting gff to gtf is counting the exons for each transcript.
              --
              Chr1 TAIR9 CDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 3996 4276 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 4486 4605 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 4486 4605 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 4706 5095 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 4706 5095 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 5174 5326 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 5174 5326 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
              Chr1 TAIR9 exon 5439 5899 . + . Parent=AT1G01010.1
              Chr1 TAIR9 CDS 5439 5630 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;

              the gtf file has to be in this format

              supercont1.1 protein_coding CDS 2191663 2191958 . - 1 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "4"; protein_id "AAEL000037-PA";
              supercont1.1 protein_coding exon 2191201 2191600 . - . gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
              supercont1.1 protein_coding CDS 2191299 2191600 . - 2 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5"; protein_id "AAEL000037-PA";
              supercont1.1 protein_coding stop_codon 2191296 2191298 . - 0 gene_id "AAEL000037"; transcript_id "AAEL000037-RA"; exon_number "5";
              supercont1.1 protein_coding exon 2207362 2207580 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
              supercont1.1 protein_coding CDS 2207362 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1"; protein_id "AAEL000086-PA";
              supercont1.1 protein_coding start_codon 2207578 2207580 . - 0 gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "1";
              supercont1.1 protein_coding exon 2207263 2207299 . - . gene_id "AAEL000086"; transcript_id "AAEL000086-RA"; exon_number "2";

              Comment


              • #8
                I would recommend you to take a look at the Python GFF parsers developed by Brad Chapman. It can be downloaded form GitHub (http://github.com/chapmanb/bcbb/tree/master/gff/). Those script can convert between different types of GFF versions. More information about his script is found at in some blog posts (http://bcbio.wordpress.com)

                Comment


                • #9
                  Thanks Andreas. I'll have a look at the GFF parser.

                  Comment


                  • #10
                    gtf for arabidopsis

                    Hi,

                    Did you solve your gtf problem? You can use the gff for the first 8 fields, the last field needs to be changed to include the gene_id, transcript_id, and exon #.

                    Comment


                    • #11
                      We were able to get what we needed. Thanks!

                      Comment


                      • #12
                        Quick question:
                        How did you deal with transcripts that have different stop codons?

                        Comment


                        • #13
                          I'm not sure I understand what your asking. Are you referring to splice variants? If so, we wrote a perl script reads the file line by line and counted exons for each transcript. I can send it to you if it would help. In it's current form it only works for the gff file from TAIR.

                          Comment


                          • #14
                            Sure that would be great. We are having issues with our gtf file... The file format you refer to seems a bit different from the description on the cufflinks site (http://mblab.wustl.edu/GTF22.html). Did you validate your gtf file? We get errors when we do. Thanks for your help!

                            Comment


                            • #15
                              dear all,

                              i am facing similar problem. i am very much in need of tigr rice genome v6.0 but not able to get it yet. i want to utilize this gtf file as refgene list to upload on broad institute's IGV browser.
                              Any help is appreciable.

                              regards,
                              Saha

                              Comment

                              Working...
                              X