Seqanswers Leaderboard Ad

**maubp** · 11-29-2012, 07:48 AM

Do you know any scripting/programming language? Both BioPerl and Biopython (and likely other libraries too) could assist you with their EMBL parsers - although in this case you could do this without a full parser.

**jiaco** · 11-30-2012, 06:02 AM

biomaRt (R/bioconductor): http://www.bioconductor.org/packages...l/biomaRt.html

Code:

library( biomaRt )

uniprot = useMart( "unimart" );
uniprot = useDataset( "uniprot", uniprot );

# these can be looked at for more options in search(filters) and retrieve(attributes)

filters = listFilters( uniprot );
attributes = listAttributes( uniprot )

useFilter = c( "accession" );
useAttributes = c( "accession", "gene_name", "go_id", "go_name" );

query = "P41932";
df = getBM( mart=uniprot, values=c(query), filters=useFilter, attributes=useAttributes )

nrow = dim( df )[ 1 ];
s=sprintf( "%s", df[1,2] );
for( i in 1:nrow ) {
        s = sprintf( "%s,GO; %s; %s;", s, df[i,3], df[i,4] );
}

If you have a text file full of accessions and want output with 1 gene per line:

Code:

query = read.table( "queryfile.txt" );
# assume 1st column is accession

query = as.character( query[,1] );

mdf = getBM( mart=uniprot, values=query, filters=useFilter, attributes=useAttributes )

uniqueAccs = unique( sort( as.character( mdf[,1] ) ) );
outvec = vector( mode="character", length=0 );
for( acc in uniqueAccs ) {
        df = mdf[ mdf[,1] == acc, ];
        nrow = dim( df )[ 1 ];
        s=sprintf( "%s", df[1,2] );
        for( i in 1:nrow ) {
                s = sprintf( "%s,GO; %s; %s;", s, df[i,3], df[i,4] );
        }
        outvec = c( outvec, s );
}
write.table( outvec, "myoutfile.txt", quote=F, row.names=F, col.names=F );

(the second code snippet depends on the preamble from the first)

EDIT: I realize I did not answer your question, but this will get the job done without any need for downloading embl files.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 25 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Extracting information from EMBL flat file

Comment

Comment

Latest Articles

ad_right_rmr

News