Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dingkai0564
    Member
    • Nov 2010
    • 10

    asking questions about gtf files

    we are using bowtie to generate the bam file.we are intend to use tophat and cufflink ,HTseq to count the short reads.but we can not find any gtf file related to our species.Could we use these software?and we can only find gff3 files,is there any possibility that we generate by our self using gff3 files?
  • natstreet
    Member
    • Nov 2009
    • 83

    #2
    I found a great script for converting gff3 to gtf and also one for converting cufflinks gtf to gff3, both of which have saved me much hassle for using data from, and getting data into, GBrowse. By default the gff3togtf script creates gene_id entries in the attributes column but cufflinks will only work with gene_name. I've left the script in it's original form here but you should either change the script or post-process the gtf file produced using e.g. a sed command.

    I've attached them both and as soon as our server with my notes stored on it is back up again I will edit this reply to link to the originals to make sure credit is given to the right people.
    Attached Files

    Comment

    • Simon Anders
      Senior Member
      • Feb 2010
      • 995

      #3
      Be sure to read the man page of htseq-count. There are options to tell how the gene ID attribute is called in your GFF file (Ensembl's standard is "gene_id", but as 'natstreet' just said, you also see 'gene_name', 'ID' or whatever).

      Comment

      • dingkai0564
        Member
        • Nov 2010
        • 10

        #4
        About the HTseq

        Originally posted by Simon Anders View Post
        Be sure to read the man page of htseq-count. There are options to tell how the gene ID attribute is called in your GFF file (Ensembl's standard is "gene_id", but as 'natstreet' just said, you also see 'gene_name', 'ID' or whatever).
        Thanks for your advice. It seems that i can make the HTseq running,however,i only get the results of :

        50972 GFF lines processed.
        100000 reads processed.
        200000 reads processed.
        300000 reads processed.
        400000 reads processed.
        500000 reads processed.
        600000 reads processed.
        700000 reads processed.
        727886 reads processed.
        13101 229869
        no_feature 498017
        ambiguous 0
        too low aQual 0
        not aligned 4460065

        but i can not get the results that counts for each feature. Could you tell me what i should do to get the number of each genes or each exon's short reads.

        Thanks!

        Comment

        • dingkai0564
          Member
          • Nov 2010
          • 10

          #5
          Originally posted by dingkai0564 View Post
          Thanks for your advice. It seems that i can make the HTseq running,however,i only get the results of :

          50972 GFF lines processed.
          100000 reads processed.
          200000 reads processed.
          300000 reads processed.
          400000 reads processed.
          500000 reads processed.
          600000 reads processed.
          700000 reads processed.
          727886 reads processed.
          13101 229869
          no_feature 498017
          ambiguous 0
          too low aQual 0
          not aligned 4460065

          but i can not get the results that counts for each feature. Could you tell me what i should do to get the number of each genes or each exon's short reads.

          Thanks!
          Thank you all! i solve the problems.

          Comment

          • carmeyeii
            Senior Member
            • Mar 2011
            • 137

            #6
            So you can supply TopHat with a GTF file of annotated transcripts, which, using the --GTF option, will be the first place where reads are mapped, followed by the whole genome, with or without novel junction discovery in this second stage. As I understand it, this is after TopHat 1.4.
            I'm curious to know how t was before 1.4. I think you could already give TopHat a GTF file, but it used it second. Am I right? If so, what is the difference between using it [the GTF file] first and using it second after the genome?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM
            • seqadmin
              Investigating the Gut Microbiome Through Diet and Spatial Biology
              by seqadmin




              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
              02-24-2025, 06:31 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            17 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            18 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            19 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            186 views
            0 reactions
            Last Post seqadmin  
            Working...