Seqanswers Leaderboard Ad

**para_seq** · 09-15-2009, 01:24 PM

I also have a question with the use of ERANGE. Where can I get the 'knownGene.txt' file mentioned in the README.build.rds file? I need to see its field definitions to format my gene annotation file for building RDS database (-RNA option). It seems 'knownGene.txt' is not in GFF format. Thank you.

**kmcarr** · 09-16-2009, 05:36 AM

The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).

**para_seq** · 09-16-2009, 09:03 AM

Originally posted by kmcarr View Post

The 'knownGene.txt' file is one of the standard files used for the UCSC Genome browser. You can download the version for the human hg18 assembly from their FTP site here:

ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database

You are correct that it is not GFF format. It is their own format directly related to the structure of the corresponding database table. You can find out about the format of the file from the UCSC Table browser (http://genome.ucsc.edu/cgi-bin/hgTables?command=start). Select "Mammal", "Human", "Mar. 2006" from the "clade", "genome", "assembly" pop-up menus, then select "Genes and Gene Prediction Tracks" and "UCSC Genes" from the "group" and "track" pop-ups respectively. Finally select "knownGene" from the "table" menu and then click the "describe table schema" button. This will show a description of each column of data in the table (or txt file).

Thank you, kmcarr. That helps a lot. I found the table schema of this file.

**Kasycas** · 11-24-2009, 04:17 AM

spike sequences

Hi,

I am also wondering about the OP's question about spike sequences. They seem to be mentioned only in the ERANGE paper and online help files with no mention of what they actually are! Any ideas ?

Thanks,

Kasycas

**manducasexta** · 11-25-2009, 12:15 PM

Hi All -

I might be missing the point of Kasycas and Zhuzhu's questions, but I'll give it a shot anyway:

If you added spiked-in standards to your sample, then you should be able to find out the sequences through any reference source. Only you can know what your spikes are. Mortazavi et al used in vitro synthesized RNA from Arabidopsis and phage.

elisa*_* · 04-16-2010, 09:50 AM

Hi all and kmcarr,

Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.

**kmcarr** · 04-16-2010, 11:24 AM

Originally posted by elisa*_* View Post

Hi all and kmcarr,

Do you know how to convert a gff file to knownGene format? I am working with species not on UCSC genome browser, but do have a gff file for gene annotations.

I created a knownGeneTable for TAIR8 starting with the GFF3 file from TAIR. Now whether you consider my method easy or hard depends on whether you are familiar/comfortable with BioPerl, specifically the Bio:

B::SeqFeature module. This is the new preferred back end for GBrowse so I already had many of the components in place. I'll give you the outline but if it all sound Greek to you then I'm afraid I'll be no help at all.

- Install the latest versions of BioPerl and MySQL.

- Create an empty MySQL database to hold your annotation.

- Load the database with the annotations from your GFF file using the bp_seqfeature_load.pl script (installed as part of BioPerl).

- Use the attached script (changing the -dsn and -user parameters as needed) to query the newly created DB and output the knownGenesTable file.

I know this seems like the long way around to get the file you want but as I said, once you have the database created it is useful for many projects.

Attached Files

bp_makeKnownGenesTable.pl (974 Bytes, 44 views)

elisa*_* · 04-16-2010, 11:44 AM

kmcarr,
Thank you very much for the info. I will try this.

**Pankaj** · 05-12-2010, 12:04 PM

Where to download annotation file for human genome

Hello,

I have sequence co-ordinates of ChIP-seq peaks. I am trying to map it so that I can know the nearest genes. However, I'm having difficulty in finding annotated file to download. I tried UCSC and also Refseq, but I can't find .gff files. There are so many files and I don't know which to download. I downloaded some of those but they don't have gene names. I just want a file that contains transcription start site, strand (+/-) and gene name for human genome. Can anybody please help me with this?

Thanks

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 11 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Build expanded genome for ERANGE

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News