C Linux command line use of gene_info.gz

kayve

Junior Member

Join Date: Jul 2012

Posts: 1
- Share
- Tweet
#1

C Linux command line use of gene_info.gz

07-12-2012, 05:07 PM

Hi.

I am working with Affy MicroArray data with R and bioconductor for a course and for my project I am reanalyzing the U133A and U133B data published by the group on this website:

GEO Accession viewer

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE362

NCBI's Gene Expression Omnibus (GEO) is a public archive and resource for gene expression data.

I have a clear idea about partitioning the data based on my thesis work which was written in C and utilizes the UniProt text flatfile downloadable from this site:

UniProt

http://www.uniprot.org/downloads

My program can be downloaded from:

http://kayve.net/promog

I also have a tar vxfz extractable at http://kayve.net/promog.tgz but I have since tweaked the promog.c and Makefile. The tarball also contains writeups and power point and various runs, etc. A lot of stuff including an old uniprot_sprot.dat file.

I was attracked by this post:

entrez ID conversion - SEQanswers

http://seqanswers.com/forums/showthread.php?t=9390

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

specifically, the posting of gene_info.gz. I am not sure this is the data I need, but even if it is I do not understand it fully. I was investigating the bioDBnet tools, but I am not sure it has what I want.

403 Forbidden

http://biodbnet.abcc.ncifcrf.gov/tools/

I have found web based tools out there and tutorials that suggest cut and paste and XML, but for me the path of least resistance is to just write some C code to do what I want, since my code is not trivial in its algorithm I devised to come up with my partitioning. I partition all 432,660 proteins in the file, so I won't be cutting and pasting. I'm not really interested in running a bunch of low performance object oriented stuff, I just want to understand the data so I can modify my c programs.

Is conversion data for the Affy arrays I have mentioned above in that gene_info.gz file to UniProt flatfile text accession IDs that so that I may dovetail my partition results with the software I have written myself, or do I need to look to other data? What is the column header information so I can understand the data adequately to peform this task?

*----------------------------------------------------------*
Kayven Riese, MSCS,
MS (Physiology and Biophysics)
(415) 902 5513 cellular
http://kayve.net
Webmaster http://ChessYoga.org
*----------------------------------------------------------*
Tags: uniprot affy microarray

Previous template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, Yesterday, 06:57 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 16 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 24 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

C Linux command line use of gene_info.gz

Latest Articles

ad_right_rmr

News