Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TAIR "GFF3" files are not GFF3?

    Does anyone have an easy fix for the TAIR (Arabidopsis thaliana) "GFF3" files. I've used all the GFF3 validators I can find online and they are all telling me that they are riddled with formatting errors. I've tried the .gff files from a number of different genome versions from TAIR, none of them work e.g. TAIR10_GFF3_genes.gff

  • #2
    What are you trying to use them with? I have not had any problems. TAIR does use some unconventional naming, but the formats work fine.

    Comment


    • #3
      Trying to use with a Perl based pipeline, "GENE-counter". I'll keep looking for a fix, but it's a little dissapointing that the TAIR sequence references don't event match between the .gff and genome.fa...

      Thanks for the reply

      Comment


      • #4
        The only sequences that don't match that I am aware of is the Mitochondria and Chloroplast names. In the gff these are named as "Mitochondria" and "Chloroplast" while in the genome.fa they are listed as "ChrM" and "ChrC" respectively (or the other way, its been a while since I looked at it). Chr1-5 should be identical.

        Given that Arabidopsis only has the 5 chromosomes plus mitochondria and chloroplast, the easiest solution is to just go into the genome.fa and change the names of the chromosomes to match the gff. It takes all of 10 seconds and usually solves most discrepencies.
        Last edited by chadn737; 04-20-2013, 11:31 AM.

        Comment


        • #5
          Yes, I've had to change those as you describe. I've managed to work around some of the GENE counter issues, which I think might have been the bigger problem. Thanks again for the input!

          Comment


          • #6
            There are a few gene names containing semicolons in the TAIR GFF files which confuse a lot of parsers. Removing these might help.

            Comment


            • #7
              I'll look into that, thanks.

              Comment


              • #8
                We discuss how to fix at least one set of problems with the TAIR version of GFF files here:



                If you don't need to convert TAIR GFF to BED, you could just pipe the result of the GNU awk statement to a GFF file, e.g.:

                $ awk '{gsub(/;$/,"");print}' TAIR9_GFF3_genes.gff > TAIR9_GFF3_genes.fixed.gff

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advanced Methods for the Detection of Infectious Disease
                  by seqadmin




                  The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                  ...
                  11-27-2023, 01:15 PM
                • seqadmin
                  Strategies for Investigating the Microbiome
                  by seqadmin




                  Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                  11-09-2023, 07:02 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-01-2023, 09:55 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-30-2023, 10:48 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-29-2023, 08:26 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-29-2023, 08:12 AM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Working...
                X