Seqanswers Leaderboard Ad

**jason_ARGONAUTE** · 10-17-2013, 08:20 AM

Originally posted by jason_ARGONAUTE View Post

Hi guys,
i hv got a GO file for my differentially expressed genes file, it goes like:

FBgn00001 GO:0016301 [Name:****(annotation)]
FBgn00002 GO:0016301 [Name:****(annotation)]
FBgn00003 GO:0016301 [Name:****(annotation)]
FBgn00004 GO:0003700 [Name:****(annotation)]
FBgn00004 GO:0009651 [Name:****(annotation)]
FBgn00004 GO:0006355 [Name:****(annotation)]
FBgn00005 GO:0009556 [Name:****(annotation)]
FBgn00005 GO:0005515 [Name:****(annotation)]
FBgn00005 GO:0080019 [Name:****(annotation)]
FBgn00005 GO:0016563 [Name:****(annotation)]
FBgn00005 GO:0016627 [Name:****(annotation)]
FBgn00006 GO:0003700 [Name:****(annotation)]
FBgn00006 GO:0010018 [Name:****(annotation)]

now i want to use WEGO ,so i need to convert it like:

FBgn00001 GO:0016301
FBgn00002 GO:0016301
FBgn00003 GO:0016301
FBgn00004 GO:0003700 GO:0009651 GO:0006355
FBgn00005 GO:0009556 GO:0005515 GO:0080019 GO:0016563 GO:0016627
FBgn00006 GO:0003700 GO:0010018

I think this could be solved using a perl script. I am not able to do this since i am a beginner. Can someone help me out? A simple perl script is good enough for me^^

both of files are tab-delemited.

**Ciaran** · 10-18-2013, 07:39 AM

This might help

sed 's/\[.*\]//g' genes_file

**crazyhottommy** · 10-18-2013, 10:24 AM

This python script should work

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
if row[0] not in new.keys():
new[row[0]] = [row[1]]
else:
new[row[0]].append(row[1])

with open("wego.txt","w") as f:
for key, value in sorted(new.items()):
f.write(key+"\t"+"\t".join(value)+"\n")

**crazyhottommy** · 10-18-2013, 10:27 AM

I don't know why the indentation is messed up....

Originally posted by crazyhottommy View Post

This python script should work

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
if row[0] not in new.keys():
new[row[0]] = [row[1]]
else:
new[row[0]].append(row[1])

with open("wego.txt","w") as f:
for key, value in sorted(new.items()):
f.write(key+"\t"+"\t".join(value)+"\n")

**dpryan** · 10-18-2013, 01:42 PM

Originally posted by crazyhottommy View Post

I don't know why the indentation is messed up....

You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

Code:

I'm in a code block
    and I can be indented to not muck up python

**crazyhottommy** · 10-19-2013, 04:55 AM

[/CODE][/CODE]

Originally posted by dpryan View Post

You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

Code:

I'm in a code block
    and I can be indented to not muck up python

Test...

Code:

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
    if row[0] not in new.keys():
        new[row[0]] = [row[1]]
    else:
        new[row[0]].append(row[1])


with open("wego.txt","w") as f:
    for key, value in sorted(new.items()):
       f.write(key+"\t"+"\t".join(value)+"\n")

**crazyhottommy** · 10-21-2013, 11:23 AM

This one line awk can do the trick...

awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt

**jason_ARGONAUTE** · 11-04-2013, 09:15 PM

i like the simplicity, thank you!

**jason_ARGONAUTE** · 11-04-2013, 09:18 PM

Originally posted by Ciaran View Post

This might help

sed 's/\[.*\]//g' genes_file

i like the simplicity of linux commands, thank you!

**jason_ARGONAUTE** · 11-04-2013, 09:19 PM

Originally posted by crazyhottommy View Post

This one line awk can do the trick...

awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt

i'm new to command awk, but thanks anyway^^

**jason_ARGONAUTE** · 11-04-2013, 09:23 PM

Originally posted by crazyhottommy View Post

[/CODE][/CODE]

Test...

Code:

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
    if row[0] not in new.keys():
        new[row[0]] = [row[1]]
    else:
        new[row[0]].append(row[1])


with open("wego.txt","w") as f:
    for key, value in sorted(new.items()):
       f.write(key+"\t"+"\t".join(value)+"\n")

many people told me to learn Python instead of Perl, maybe i'll learn python someday^^

**gringer** · 11-05-2013, 03:21 AM

Originally posted by jason_ARGONAUTE View Post

many people told me to learn Python instead of Perl, maybe i'll learn python someday^^

A Perl version? Okay, here's something that might work:

Code:

perl -ane '
  if($gn ne $F[0]){
    print ($gn?"\n":"").$gn;
  }
  print " ".$F[1];
  $gn = $F[0];
  END {
    print "\n";
  }'

[delimiter can be changed with the -F option, i.e. -F '/\t/']

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

gene feature converter

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News