Seqanswers Leaderboard Ad

**jason_ARGONAUTE** · 10-17-2013, 08:20 AM

Originally posted by jason_ARGONAUTE View Post

Hi guys,
i hv got a GO file for my differentially expressed genes file, it goes like:

FBgn00001 GO:0016301 [Name:****(annotation)]
FBgn00002 GO:0016301 [Name:****(annotation)]
FBgn00003 GO:0016301 [Name:****(annotation)]
FBgn00004 GO:0003700 [Name:****(annotation)]
FBgn00004 GO:0009651 [Name:****(annotation)]
FBgn00004 GO:0006355 [Name:****(annotation)]
FBgn00005 GO:0009556 [Name:****(annotation)]
FBgn00005 GO:0005515 [Name:****(annotation)]
FBgn00005 GO:0080019 [Name:****(annotation)]
FBgn00005 GO:0016563 [Name:****(annotation)]
FBgn00005 GO:0016627 [Name:****(annotation)]
FBgn00006 GO:0003700 [Name:****(annotation)]
FBgn00006 GO:0010018 [Name:****(annotation)]

now i want to use WEGO ,so i need to convert it like:

FBgn00001 GO:0016301
FBgn00002 GO:0016301
FBgn00003 GO:0016301
FBgn00004 GO:0003700 GO:0009651 GO:0006355
FBgn00005 GO:0009556 GO:0005515 GO:0080019 GO:0016563 GO:0016627
FBgn00006 GO:0003700 GO:0010018

I think this could be solved using a perl script. I am not able to do this since i am a beginner. Can someone help me out? A simple perl script is good enough for me^^

both of files are tab-delemited.

**Ciaran** · 10-18-2013, 07:39 AM

This might help

sed 's/\[.*\]//g' genes_file

**crazyhottommy** · 10-18-2013, 10:24 AM

This python script should work

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
if row[0] not in new.keys():
new[row[0]] = [row[1]]
else:
new[row[0]].append(row[1])

with open("wego.txt","w") as f:
for key, value in sorted(new.items()):
f.write(key+"\t"+"\t".join(value)+"\n")

**crazyhottommy** · 10-18-2013, 10:27 AM

I don't know why the indentation is messed up....

Originally posted by crazyhottommy View Post

This python script should work

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
if row[0] not in new.keys():
new[row[0]] = [row[1]]
else:
new[row[0]].append(row[1])

with open("wego.txt","w") as f:
for key, value in sorted(new.items()):
f.write(key+"\t"+"\t".join(value)+"\n")

**dpryan** · 10-18-2013, 01:42 PM

Originally posted by crazyhottommy View Post

I don't know why the indentation is messed up....

You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

Code:

I'm in a code block
    and I can be indented to not muck up python

**crazyhottommy** · 10-19-2013, 04:55 AM

[/CODE][/CODE]

Originally posted by dpryan View Post

You need to use the "["CODE"]" tags (remove the quotes). If you go to the advanced mode, then click on the hash tag in the toolbar.

Code:

I'm in a code block
    and I can be indented to not muck up python

Test...

Code:

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
    if row[0] not in new.keys():
        new[row[0]] = [row[1]]
    else:
        new[row[0]].append(row[1])


with open("wego.txt","w") as f:
    for key, value in sorted(new.items()):
       f.write(key+"\t"+"\t".join(value)+"\n")

**crazyhottommy** · 10-21-2013, 11:23 AM

This one line awk can do the trick...

awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt

**jason_ARGONAUTE** · 11-04-2013, 09:15 PM

i like the simplicity, thank you!

**jason_ARGONAUTE** · 11-04-2013, 09:18 PM

Originally posted by Ciaran View Post

This might help

sed 's/\[.*\]//g' genes_file

i like the simplicity of linux commands, thank you!

**jason_ARGONAUTE** · 11-04-2013, 09:19 PM

Originally posted by crazyhottommy View Post

This one line awk can do the trick...

awk '{ if (a[$1]) a[$1]=a[$1]"\t"$2; else a[$1]=$2;} END { for (i in a) print i, a[i]}' OFS="\t" input.txt

i'm new to command awk, but thanks anyway^^

**jason_ARGONAUTE** · 11-04-2013, 09:23 PM

Originally posted by crazyhottommy View Post

[/CODE][/CODE]

Test...

Code:

import csv
reader = csv.reader(open("GO.txt","r"), delimiter="\t")
new={}
for row in reader:
    if row[0] not in new.keys():
        new[row[0]] = [row[1]]
    else:
        new[row[0]].append(row[1])


with open("wego.txt","w") as f:
    for key, value in sorted(new.items()):
       f.write(key+"\t"+"\t".join(value)+"\n")

many people told me to learn Python instead of Perl, maybe i'll learn python someday^^

**gringer** · 11-05-2013, 03:21 AM

Originally posted by jason_ARGONAUTE View Post

many people told me to learn Python instead of Perl, maybe i'll learn python someday^^

A Perl version? Okay, here's something that might work:

Code:

perl -ane '
  if($gn ne $F[0]){
    print ($gn?"\n":"").$gn;
  }
  print " ".$F[1];
  $gn = $F[0];
  END {
    print "\n";
  }'

[delimiter can be changed with the -F option, i.e. -F '/\t/']

Topics	Statistics	Last Post
Study Highlights Challenges in Cellular Reprogramming for Regenerative Medicine by seqadmin Started by seqadmin, Today, 06:25 AM	0 responses 13 views 0 likes	Last Post by seqadmin Today, 06:25 AM
New DNA Modification Discovered as Key to Gene Activation in Early Development by seqadmin Started by seqadmin, Yesterday, 01:02 PM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 01:02 PM
Wastewater Analysis Unlocks New Method for Identifying Public Health Threats by seqadmin Started by seqadmin, 09-18-2024, 06:39 AM	0 responses 14 views 0 likes	Last Post by seqadmin 09-18-2024, 06:39 AM
Molecular Markers Shared Across Dementias by seqadmin Started by seqadmin, 09-11-2024, 02:44 PM	0 responses 14 views 0 likes	Last Post by seqadmin 09-11-2024, 02:44 PM

Seqanswers Leaderboard Ad

Announcement

gene feature converter

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News