I have a list of Affy Probe sets and their corresponding gene symbol, does anyone know how I can convert them to the genomic coordinates of the gene so I can intersect them with my ChIP-seq peaks?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by ETHANol View PostI have a list of Affy Probe sets and their corresponding gene symbol, does anyone know how I can convert them to the genomic coordinates of the gene so I can intersect them with my ChIP-seq peaks?
best regards
T.
-
Okay, I have the annotation file form Affymetrix (csv file) and it has the probe sets and the some genomic coordinates. My problems are two:
1) How can I match the probe sets on the data from the microarray experiment with the genomic coordinates/probe sets on the other file?
2) The genomic coordinates are not for the whole gene. How do I get the whole gene (I eventually want promoters). I think I should be able to do this with Galaxy.--------------
Ethan
Comment
-
sorry I did not understand that you were looking for coordinates of genes.
for sure there are many possibilities and galaxy, which i never used, might be an option.
in case you are a bit familiar with R/bioconductor, there is a package called 'biomaRt' which is a rather easy tool to retrieve genomic coordinates of whatever and you can easily use affy probe identifiers to query.
promoter coordinates is probably more tricky because of the lack of a robust annotation but you could start with a simple approach defining the region 1-2kb upstream of the TSS as promoter (depends very much on the organism) and map your chipseq reads to those.
Comment
-
R, I was afraid it would come to that. I'm trying to get going with R, but it's difficult and will take some time. BiomaRt looks like the tool I need.
Can someone help me with this. In the user guide they have instructions for getting genomic coordinates of the genes for Affy probes. I have a question about the following:
In this line specific probes are assigned to 'affyids':
affyids = c("202763_at", "209310_s_at", "207500_at")
and 'affyids' is the value used to query the data base.
Okay, that's great but I want to use a list of about 1000 Affy probes that I have as a text file ('mytextfile.txt') in the same directory that I a started R in.
Is there a command line (or two..) that I can use to assign my list of Affy probes to 'affyids'.
Sorry if my R-speak no so good.... but I'm a molecular biologist and new to the computer universe.
Thanks a billion to anyone that can help.--------------
Ethan
Comment
-
If you don't want to deal with R/Bioconductor, I guess you could use the web interface of Ensembl/Biomart (http://www.biomart.org/). It's quite intuitive to use.
Nevertheless, it would be probably useful to get to grips with R. See if this bit of code helps:
Code:## Get the genome coordinates of the genes tagged by a set of ## Affymetrix probes IDs ## Assuming your file of affy probes IDs is a single column of ## probes identifiers, one probe per line, no header. ## E.g. something like this: # Ssc.25128.1.S1_at # Ssc.6614.1.S1_at # Ssc.24115.1.A1_at # Ssc.15874.1.S1_at # Ssc.30896.2.S1_at library(biomaRt) affyids<- read.table(file= 'mytextfile.txt', header=F) affyids<- as.vector(affyids[,1]) mart<- useDataset(dataset= "sscrofa_gene_ensembl", ## Change dataset here useMart("ensembl")) probe2gene<- getBM( attributes= c('affy_porcine', 'ensembl_gene_id', 'chromosome_name', 'strand', 'start_position', 'end_position'), filters= 'affy_porcine', ## Change as appropriate value= affyids, mart= mart)
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment