Greetings, I am trying to produce a sample matrix consisting of samples (columns) and gene-IDs (rows), with the raw hits for each gene in the matrix. For COG categories, I have been able to produce these types of tables quite easily in excel. However, for other types of gene-ontology databases that are a bit more complex, i have been running into some problems. What i have currently is a list that looks like the below:
>count_pFAM_samplelist
count pFAM sample
1 2 1A1D_ACIAC/10-323 1
2 3 1A1D_AGRRK/9-322 1
3 1 1A1D_BURCC/10-323 1
4 1 1A1D_CUPNH/9-323 1
5 1 1A1D_METNO/9-322 1
6 2 1A1D_METPP/10-323 1
7 3 1A1D_METS4/9-322 1
8 1 1A1D_PSES0/11-323 1
9 1 1A1D_PSEUD/10-323 1
10 2 1A1D_RHIRD/10-322 1
11 2 14312_ARATH/10-245 2
12 1 1433_EIMTE/9-256 2
13 1 1433_SPIOL/1-198 2
14 1 1A1D_ACIAC/10-323 2
15 4 1A1D_AGRRK/9-322 2
16 1 1A1D_CUPNH/9-323 2
17 2 1A1D_METNO/9-322 2
18 6 1A1D_METPP/10-323 2
19 1 1A1D_METS4/9-322 2
20 2 1A1D_PSEUD/10-323 2
what I would like to do is transform this into a matrix with "sample" in columns, pFAM in rows, with count listed in the matrix. If anyone has any suggestions, they would be helpful. Thanks,
-Tony
>count_pFAM_samplelist
count pFAM sample
1 2 1A1D_ACIAC/10-323 1
2 3 1A1D_AGRRK/9-322 1
3 1 1A1D_BURCC/10-323 1
4 1 1A1D_CUPNH/9-323 1
5 1 1A1D_METNO/9-322 1
6 2 1A1D_METPP/10-323 1
7 3 1A1D_METS4/9-322 1
8 1 1A1D_PSES0/11-323 1
9 1 1A1D_PSEUD/10-323 1
10 2 1A1D_RHIRD/10-322 1
11 2 14312_ARATH/10-245 2
12 1 1433_EIMTE/9-256 2
13 1 1433_SPIOL/1-198 2
14 1 1A1D_ACIAC/10-323 2
15 4 1A1D_AGRRK/9-322 2
16 1 1A1D_CUPNH/9-323 2
17 2 1A1D_METNO/9-322 2
18 6 1A1D_METPP/10-323 2
19 1 1A1D_METS4/9-322 2
20 2 1A1D_PSEUD/10-323 2
what I would like to do is transform this into a matrix with "sample" in columns, pFAM in rows, with count listed in the matrix. If anyone has any suggestions, they would be helpful. Thanks,
-Tony
Comment