Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • asking questions about gtf files

    we are using bowtie to generate the bam file.we are intend to use tophat and cufflink ,HTseq to count the short reads.but we can not find any gtf file related to our species.Could we use these software?and we can only find gff3 files,is there any possibility that we generate by our self using gff3 files?

  • #2
    I found a great script for converting gff3 to gtf and also one for converting cufflinks gtf to gff3, both of which have saved me much hassle for using data from, and getting data into, GBrowse. By default the gff3togtf script creates gene_id entries in the attributes column but cufflinks will only work with gene_name. I've left the script in it's original form here but you should either change the script or post-process the gtf file produced using e.g. a sed command.

    I've attached them both and as soon as our server with my notes stored on it is back up again I will edit this reply to link to the originals to make sure credit is given to the right people.
    Attached Files

    Comment


    • #3
      Be sure to read the man page of htseq-count. There are options to tell how the gene ID attribute is called in your GFF file (Ensembl's standard is "gene_id", but as 'natstreet' just said, you also see 'gene_name', 'ID' or whatever).

      Comment


      • #4
        About the HTseq

        Originally posted by Simon Anders View Post
        Be sure to read the man page of htseq-count. There are options to tell how the gene ID attribute is called in your GFF file (Ensembl's standard is "gene_id", but as 'natstreet' just said, you also see 'gene_name', 'ID' or whatever).
        Thanks for your advice. It seems that i can make the HTseq running,however,i only get the results of :

        50972 GFF lines processed.
        100000 reads processed.
        200000 reads processed.
        300000 reads processed.
        400000 reads processed.
        500000 reads processed.
        600000 reads processed.
        700000 reads processed.
        727886 reads processed.
        13101 229869
        no_feature 498017
        ambiguous 0
        too low aQual 0
        not aligned 4460065

        but i can not get the results that counts for each feature. Could you tell me what i should do to get the number of each genes or each exon's short reads.

        Thanks!

        Comment


        • #5
          Originally posted by dingkai0564 View Post
          Thanks for your advice. It seems that i can make the HTseq running,however,i only get the results of :

          50972 GFF lines processed.
          100000 reads processed.
          200000 reads processed.
          300000 reads processed.
          400000 reads processed.
          500000 reads processed.
          600000 reads processed.
          700000 reads processed.
          727886 reads processed.
          13101 229869
          no_feature 498017
          ambiguous 0
          too low aQual 0
          not aligned 4460065

          but i can not get the results that counts for each feature. Could you tell me what i should do to get the number of each genes or each exon's short reads.

          Thanks!
          Thank you all! i solve the problems.

          Comment


          • #6
            So you can supply TopHat with a GTF file of annotated transcripts, which, using the --GTF option, will be the first place where reads are mapped, followed by the whole genome, with or without novel junction discovery in this second stage. As I understand it, this is after TopHat 1.4.
            I'm curious to know how t was before 1.4. I think you could already give TopHat a GTF file, but it used it second. Am I right? If so, what is the difference between using it [the GTF file] first and using it second after the genome?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X