Hi.
I am working with Affy MicroArray data with R and bioconductor for a course and for my project I am reanalyzing the U133A and U133B data published by the group on this website:
I have a clear idea about partitioning the data based on my thesis work which was written in C and utilizes the UniProt text flatfile downloadable from this site:
My program can be downloaded from:
I also have a tar vxfz extractable at http://kayve.net/promog.tgz but I have since tweaked the promog.c and Makefile. The tarball also contains writeups and power point and various runs, etc. A lot of stuff including an old uniprot_sprot.dat file.
I was attracked by this post:
specifically, the posting of gene_info.gz. I am not sure this is the data I need, but even if it is I do not understand it fully. I was investigating the bioDBnet tools, but I am not sure it has what I want.
I have found web based tools out there and tutorials that suggest cut and paste and XML, but for me the path of least resistance is to just write some C code to do what I want, since my code is not trivial in its algorithm I devised to come up with my partitioning. I partition all 432,660 proteins in the file, so I won't be cutting and pasting. I'm not really interested in running a bunch of low performance object oriented stuff, I just want to understand the data so I can modify my c programs.
Is conversion data for the Affy arrays I have mentioned above in that gene_info.gz file to UniProt flatfile text accession IDs that so that I may dovetail my partition results with the software I have written myself, or do I need to look to other data? What is the column header information so I can understand the data adequately to peform this task?
I am working with Affy MicroArray data with R and bioconductor for a course and for my project I am reanalyzing the U133A and U133B data published by the group on this website:
I have a clear idea about partitioning the data based on my thesis work which was written in C and utilizes the UniProt text flatfile downloadable from this site:
My program can be downloaded from:
I also have a tar vxfz extractable at http://kayve.net/promog.tgz but I have since tweaked the promog.c and Makefile. The tarball also contains writeups and power point and various runs, etc. A lot of stuff including an old uniprot_sprot.dat file.
I was attracked by this post:
specifically, the posting of gene_info.gz. I am not sure this is the data I need, but even if it is I do not understand it fully. I was investigating the bioDBnet tools, but I am not sure it has what I want.
I have found web based tools out there and tutorials that suggest cut and paste and XML, but for me the path of least resistance is to just write some C code to do what I want, since my code is not trivial in its algorithm I devised to come up with my partitioning. I partition all 432,660 proteins in the file, so I won't be cutting and pasting. I'm not really interested in running a bunch of low performance object oriented stuff, I just want to understand the data so I can modify my c programs.
Is conversion data for the Affy arrays I have mentioned above in that gene_info.gz file to UniProt flatfile text accession IDs that so that I may dovetail my partition results with the software I have written myself, or do I need to look to other data? What is the column header information so I can understand the data adequately to peform this task?