Seqanswers Leaderboard Ad

**zillur** · 10-03-2016, 12:36 PM

Is there anybody has any idea? Please. I appreciate your helps.

Best Regards
Zillur

**d_emms** · 10-04-2016, 02:20 AM

Hi

It looks like you need to set your PERL5LIB environment variable so that it points to where your orthomcl perl files are. Something like this:
export PERL5LIB=/path/to/orthomcl.

One suggestion though, have you tried OrthoFinder? It's far easier to run, it just requires a single command. It's also a lot more accurate than OrthoMCL:

Releases · davidemms/OrthoFinder

https://github.com/davidemms/OrthoFinder/releases

Phylogenetic orthology inference for comparative genomics - davidemms/OrthoFinder

David

**zillur** · 10-04-2016, 09:43 AM

Thank you very much for your suggestions. Yeah. I have tried orthofinder and it gave me outputs. I wanted to run orthomcl to compare, maybe its not necessary now. Do you have any suggestions how can I process the outputs to get a gene presence/absence matrix?

Thank you again.

Best Regards
Zillur

**d_emms** · 10-05-2016, 02:05 AM

The file Orthogroups.csv is effectively a presence/absence matrix: The rows are orthogroups and the columns are species so if there are any genes listed in the i,j-th cell then the ith orthogroup is present in the jth species.

All the best
David

**zillur** · 10-05-2016, 08:21 AM

Thank you very much for your comment. I want a matrix like:

Code:

              genome1	genome2 genome3
gene1  	 1     	 0     	 0
gene2  	 0     	 0     	 0
gene3  	 1     	 1     	 1
gene4  	 0     	 0     	 1

How can I do this?

Best Regards
Zillur

**d_emms** · 10-06-2016, 02:02 AM

You'd just need to replace empty cells with 0 and cells with text in with 1.

All the best
David

**zillur** · 10-06-2016, 09:54 AM

Thank you very much for your reply.

You'd just need to replace empty cells with 0 and cells with text in with 1.

Exactly I want to do this. But how can replace this?

Thanks for your suggestions.
Best Regards
Zillur

**d_emms** · 10-07-2016, 07:01 AM

This is a python script that will do it for you:

Code:

import sys
import csv

if len(sys.argv) != 2:
    print("Usage: python presence_absence.py Orthogroups.csv")
    sys.exit()

inFN = sys.argv[1]
outFN = inFN + ".01_matrix.csv"
with open(inFN, 'rb') as infile, open(outFN, 'wb') as outfile:
    reader = csv.reader(infile, delimiter="\t")
    writer = csv.writer(outfile, delimiter="\t")
    writer.writerow(reader.next())
    for line in reader:
        writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

All the best
David

**zillur** · 10-07-2016, 08:43 AM

Thank you very much for your script. I was trying to run, but:

Code:

[zillur@genomics Results_Sep26]$ python matrix_convert_binary.py Orthogroups.csv
Traceback (most recent call last):
  File "matrix_convert_binary.py", line 14, in <module>
    writer.writerow(reader.next())
AttributeError: '_csv.reader' object has no attribute 'next'

My system is:

Code:

[zillur@genomics Results_Sep26]$ python
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.

I am not sure what I need to modify. Any idea?
Thanks again.

Best Regards
Zillur

**d_emms** · 10-10-2016, 07:15 AM

It was written for python 2, below is a version which will work with both python 2 and 3:

Code:

import sys
import csv

if len(sys.argv) != 2:
    print("Usage: python presence_absence.py Orthogroups.csv")
    sys.exit()

inFN = sys.argv[1]
outFN = inFN + ".01_matrix.csv"
with open(inFN, 'r') as infile, open(outFN, 'w') as outfile:
    reader = csv.reader(infile, delimiter="\t")
    writer = csv.writer(outfile, delimiter="\t")
    writer.writerow(next(reader))
    for line in reader:
        writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

**zillur** · 10-10-2016, 10:30 AM

Thank you very much for your valuable suggestions. The code perfectly converted the matrix into a binary matrix. But the problem is I can't load the new csv file in R as it is:

Code:

[zillur@genomics Results_Sep26]$ head Orthogroups.csv.01_matrix.csv 
	PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta	PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta	PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta	PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta	PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta	PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta	PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta
OG0000000	1	1	0	0	0	0	1
OG0000001	1	1	1	0	1	1	1
OG0000002	0	0	0	0	0	1	0
OG0000003	0	0	0	0	1	1	0
OG0000004	1	1	0	0	0	0	1
OG0000005	0	0	0	0	1	0	0
OG0000006	1	1	0	0	0	0	1
OG0000007	1	1	1	0	1	1	1
OG0000008	0	0	1	0	0	0	0

But when I load the csv in R, it looks like:

Code:

> data = read.csv("Orthogroups.csv.01_matrix.csv", sep=",")
> head(data)
  PlasmoDB.28_PbergheiANKA_AnnotatedProteins.fasta.PlasmoDB.28_Pchabaudichabaudi_AnnotatedProteins.fasta.PlasmoDB.28_Pfalciparum3D7_AnnotatedProteins.fasta.PlasmoDB.28_Pgallinaceum8A_AnnotatedProteins.fasta.PlasmoDB.28_PknowlesiH_AnnotatedProteins.fast ...
1                                                                                                                                                                                                                                 OG0000000\t1\t1\t0\t0\t0\t0\t1
2                                                                                                                                                                                                                                 OG0000001\t1\t1\t1\t0\t1\t1\t1
3                                                                                                                                                                                                                                 OG0000002\t0\t0\t0\t0\t0\t1\t0
4                                                                                                                                                                                                                                 OG0000003\t0\t0\t0\t0\t1\t1\t0
5                                                                                                                                                                                                                                 OG0000004\t1\t1\t0\t0\t0\t0\t1
6                                                                                                                                                                                                                                 OG0000005\t0\t0\t0\t0\t1\t0\t0

What should I do now?
Thanks again for your help and comment.

Best Regards
Zillur

**d_emms** · 10-14-2016, 07:10 AM

It's a tab-delimited file, try this instead:
data = read.csv("Orthogroups.csv.01_matrix.csv", sep="\t")

**zillur** · 10-14-2016, 11:37 AM

Thank you very much. Got it.

Best Regards
Zillur

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Orthomcl running problem

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News