we are using bowtie to generate the bam file.we are intend to use tophat and cufflink ,HTseq to count the short reads.but we can not find any gtf file related to our species.Could we use these software?and we can only find gff3 files,is there any possibility that we generate by our self using gff3 files?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I found a great script for converting gff3 to gtf and also one for converting cufflinks gtf to gff3, both of which have saved me much hassle for using data from, and getting data into, GBrowse. By default the gff3togtf script creates gene_id entries in the attributes column but cufflinks will only work with gene_name. I've left the script in it's original form here but you should either change the script or post-process the gtf file produced using e.g. a sed command.
I've attached them both and as soon as our server with my notes stored on it is back up again I will edit this reply to link to the originals to make sure credit is given to the right people.
-
About the HTseq
Originally posted by Simon Anders View PostBe sure to read the man page of htseq-count. There are options to tell how the gene ID attribute is called in your GFF file (Ensembl's standard is "gene_id", but as 'natstreet' just said, you also see 'gene_name', 'ID' or whatever).
50972 GFF lines processed.
100000 reads processed.
200000 reads processed.
300000 reads processed.
400000 reads processed.
500000 reads processed.
600000 reads processed.
700000 reads processed.
727886 reads processed.
13101 229869
no_feature 498017
ambiguous 0
too low aQual 0
not aligned 4460065
but i can not get the results that counts for each feature. Could you tell me what i should do to get the number of each genes or each exon's short reads.
Thanks!
Comment
-
Originally posted by dingkai0564 View PostThanks for your advice. It seems that i can make the HTseq running,however,i only get the results of :
50972 GFF lines processed.
100000 reads processed.
200000 reads processed.
300000 reads processed.
400000 reads processed.
500000 reads processed.
600000 reads processed.
700000 reads processed.
727886 reads processed.
13101 229869
no_feature 498017
ambiguous 0
too low aQual 0
not aligned 4460065
but i can not get the results that counts for each feature. Could you tell me what i should do to get the number of each genes or each exon's short reads.
Thanks!
Comment
-
So you can supply TopHat with a GTF file of annotated transcripts, which, using the --GTF option, will be the first place where reads are mapped, followed by the whole genome, with or without novel junction discovery in this second stage. As I understand it, this is after TopHat 1.4.
I'm curious to know how t was before 1.4. I think you could already give TopHat a GTF file, but it used it second. Am I right? If so, what is the difference between using it [the GTF file] first and using it second after the genome?
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 11:49 AM
|
0 responses
15 views
0 likes
|
Last Post
by seqadmin
Yesterday, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment