Unconfigured Ad

**dpryan** · 02-19-2015, 12:38 AM

Cross-posted on biostars.

**AlexReynolds** · 02-19-2015, 01:18 AM

You could probably modify this script for your input sets:

Code:

#!/usr/bin/env python

input_one = '''
BIEC2-99962 HOR_233 G_G
BIEC2-9997 HOR_233 A_G
BIEC2-999748 HOR_233 C_C
BIEC2-999848 HOR_233 G_G
BIEC2-99989 HOR_233 A_A
'''.strip().split()

input_two = '''
BIEC2-9997 HOR_250 A_A
BIEC2-999748 HOR_250 C_C
BIEC2-99989 HOR_250 A_C
'''.strip().split()

input_three = '''
BIEC2-9997 HOR_615 A_G
BIEC2-999748 HOR_615 A_C
BIEC2-999848 HOR_615 A_G
BIEC2-99989 HOR_615 A_C
'''.strip().split()

one_list = input_one[::3]
one_dict = dict(zip(one_list, input_one[2::3]))
two_dict = dict(zip(input_two[::3], input_two[2::3]))
three_dict = dict(zip(input_three[::3], input_three[2::3]))

print '\n'.join([' '.join([k, one_dict[k], two_dict.get(k, 'NA'), three_dict.get(k, 'NA')]) for k in one_list])

The output looks like:

Code:

$ ./join_test.py
BIEC2-99962 G_G NA NA
BIEC2-9997 A_G A_A A_G
BIEC2-999748 C_C C_C A_C
BIEC2-999848 G_G NA A_G
BIEC2-99989 A_A A_C A_C

This seems to match your expected output.

If you want to understand how the script works, use some print statements for each variable before the list comprehension, and then break the list comprehension down into smaller pieces.

Ultimately, you would replace lists input_one, input_two and input_three with the results from reading in your input files with open() and readlines() methods.

Remember to strip() and split() so that each element of the list is separated from the others, regardless of whether the delimiter is a space or newline — use print to investigate one of the sample input lists, if this requirement isn't clear.

I'd second that awk is not really the ideal tool for this job, and I use it a great deal.

Topics	Statistics	Last Post
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 37 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM

Unconfigured Ad

Using AWK to perform VLOOKUP-like command

Comment

Comment

Latest Articles

ad_right_rmr

News