Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zillur
    Senior Member
    • Sep 2014
    • 106

    Orthomcl running problem

    Hi there,
    I was trying to run orthomcl in my linux workstation. I am facing this problem:
    Code:
    [root@genomics bin]# ./orthomcl-pipeline -i /home/zillur/Desktop/zillur/phd/orthomcl -o /home/zillur/Desktop/zillur/phd/orthomcl/output -m /usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example --nocompliant
    Warning: directory "/home/zillur/Desktop/zillur/phd/orthomcl/output" already exists, are you sure you want to store data here [Y]? y
    Starting OrthoMCL pipeline on: Mon Sep 26 20:11:08 2016
    Git commit: unknown
    
    =Stage 1: Validate Files =
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta ... 5076 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta ... 5217 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta ... 5542 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta ... 3 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta ... 5323 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta ... 5586 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta ... 5709 sequences
    Validated 7 files
    Stage 1 took 0.02 minutes 
    
    =Stage 2: Validate Database=
    Stage 2 took 0.00 minutes 
    
    
    =Stage 3: Load OrthoMCL Database Schema=
    /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    Error executing command: /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log. See logs /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log and /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    The log is as follows:
    Code:
    [zillur@genomics log]$ more 3.loadschema.stderr.log 
    Can't locate OrthoMCLEngine/Main/Base.pm in @INC (@INC contains: /usr/bin/../lib/perl /root/perl5/lib/perl5/x86_64-linux-thread-multi /root/perl5/lib/perl5 /h
    ome/zillur/perl5/lib/perl5/x86_64-linux-thread-multi /home/zillur/perl5/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /
    usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/orthomclInstallSchema line 6.
    BEGIN failed--compilation aborted at /usr/bin/orthomclInstallSchema line 6.
    Any suggestions please.

    Best Regards
    Zillur
  • zillur
    Senior Member
    • Sep 2014
    • 106

    #2
    Is there anybody has any idea? Please. I appreciate your helps.

    Best Regards
    Zillur

    Comment

    • d_emms
      Junior Member
      • Oct 2016
      • 6

      #3
      Hi

      It looks like you need to set your PERL5LIB environment variable so that it points to where your orthomcl perl files are. Something like this:
      export PERL5LIB=/path/to/orthomcl.

      One suggestion though, have you tried OrthoFinder? It's far easier to run, it just requires a single command. It's also a lot more accurate than OrthoMCL:
      Phylogenetic orthology inference for comparative genomics - davidemms/OrthoFinder


      David

      Comment

      • zillur
        Senior Member
        • Sep 2014
        • 106

        #4
        Thank you very much for your suggestions. Yeah. I have tried orthofinder and it gave me outputs. I wanted to run orthomcl to compare, maybe its not necessary now. Do you have any suggestions how can I process the outputs to get a gene presence/absence matrix?

        Thank you again.

        Best Regards
        Zillur

        Comment

        • d_emms
          Junior Member
          • Oct 2016
          • 6

          #5
          The file Orthogroups.csv is effectively a presence/absence matrix: The rows are orthogroups and the columns are species so if there are any genes listed in the i,j-th cell then the ith orthogroup is present in the jth species.

          All the best
          David

          Comment

          • zillur
            Senior Member
            • Sep 2014
            • 106

            #6
            Thank you very much for your comment. I want a matrix like:

            Code:
                          genome1	genome2 genome3
            gene1  	 1     	 0     	 0
            gene2  	 0     	 0     	 0
            gene3  	 1     	 1     	 1
            gene4  	 0     	 0     	 1
            How can I do this?

            Best Regards
            Zillur

            Comment

            • d_emms
              Junior Member
              • Oct 2016
              • 6

              #7
              You'd just need to replace empty cells with 0 and cells with text in with 1.

              All the best
              David

              Comment

              • zillur
                Senior Member
                • Sep 2014
                • 106

                #8
                Thank you very much for your reply.
                You'd just need to replace empty cells with 0 and cells with text in with 1.
                Exactly I want to do this. But how can replace this?

                Thanks for your suggestions.
                Best Regards
                Zillur

                Comment

                • d_emms
                  Junior Member
                  • Oct 2016
                  • 6

                  #9
                  This is a python script that will do it for you:


                  Code:
                  import sys
                  import csv
                  
                  if len(sys.argv) != 2:
                      print("Usage: python presence_absence.py Orthogroups.csv")
                      sys.exit()
                  
                  inFN = sys.argv[1]
                  outFN = inFN + ".01_matrix.csv"
                  with open(inFN, 'rb') as infile, open(outFN, 'wb') as outfile:
                      reader = csv.reader(infile, delimiter="\t")
                      writer = csv.writer(outfile, delimiter="\t")
                      writer.writerow(reader.next())
                      for line in reader:
                          writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                  All the best
                  David

                  Comment

                  • zillur
                    Senior Member
                    • Sep 2014
                    • 106

                    #10
                    Thank you very much for your script. I was trying to run, but:
                    Code:
                    [zillur@genomics Results_Sep26]$ python matrix_convert_binary.py Orthogroups.csv
                    Traceback (most recent call last):
                      File "matrix_convert_binary.py", line 14, in <module>
                        writer.writerow(reader.next())
                    AttributeError: '_csv.reader' object has no attribute 'next'
                    My system is:
                    Code:
                    [zillur@genomics Results_Sep26]$ python
                    Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
                    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
                    Type "help", "copyright", "credits" or "license" for more information.
                    I am not sure what I need to modify. Any idea?
                    Thanks again.

                    Best Regards
                    Zillur

                    Comment

                    • d_emms
                      Junior Member
                      • Oct 2016
                      • 6

                      #11
                      It was written for python 2, below is a version which will work with both python 2 and 3:

                      Code:
                      import sys
                      import csv
                      
                      if len(sys.argv) != 2:
                          print("Usage: python presence_absence.py Orthogroups.csv")
                          sys.exit()
                      
                      inFN = sys.argv[1]
                      outFN = inFN + ".01_matrix.csv"
                      with open(inFN, 'r') as infile, open(outFN, 'w') as outfile:
                          reader = csv.reader(infile, delimiter="\t")
                          writer = csv.writer(outfile, delimiter="\t")
                          writer.writerow(next(reader))
                          for line in reader:
                              writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                      Comment

                      • zillur
                        Senior Member
                        • Sep 2014
                        • 106

                        #12
                        Thank you very much for your valuable suggestions. The code perfectly converted the matrix into a binary matrix. But the problem is I can't load the new csv file in R as it is:
                        Code:
                        [zillur@genomics Results_Sep26]$ head Orthogroups.csv.01_matrix.csv 
                        	PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta	PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta	PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta	PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta	PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta	PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta	PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta
                        OG0000000	1	1	0	0	0	0	1
                        OG0000001	1	1	1	0	1	1	1
                        OG0000002	0	0	0	0	0	1	0
                        OG0000003	0	0	0	0	1	1	0
                        OG0000004	1	1	0	0	0	0	1
                        OG0000005	0	0	0	0	1	0	0
                        OG0000006	1	1	0	0	0	0	1
                        OG0000007	1	1	1	0	1	1	1
                        OG0000008	0	0	1	0	0	0	0

                        But when I load the csv in R, it looks like:

                        Code:
                        > data = read.csv("Orthogroups.csv.01_matrix.csv", sep=",")
                        > head(data)
                          PlasmoDB.28_PbergheiANKA_AnnotatedProteins.fasta.PlasmoDB.28_Pchabaudichabaudi_AnnotatedProteins.fasta.PlasmoDB.28_Pfalciparum3D7_AnnotatedProteins.fasta.PlasmoDB.28_Pgallinaceum8A_AnnotatedProteins.fasta.PlasmoDB.28_PknowlesiH_AnnotatedProteins.fast ...
                        1                                                                                                                                                                                                                                 OG0000000\t1\t1\t0\t0\t0\t0\t1
                        2                                                                                                                                                                                                                                 OG0000001\t1\t1\t1\t0\t1\t1\t1
                        3                                                                                                                                                                                                                                 OG0000002\t0\t0\t0\t0\t0\t1\t0
                        4                                                                                                                                                                                                                                 OG0000003\t0\t0\t0\t0\t1\t1\t0
                        5                                                                                                                                                                                                                                 OG0000004\t1\t1\t0\t0\t0\t0\t1
                        6                                                                                                                                                                                                                                 OG0000005\t0\t0\t0\t0\t1\t0\t0
                        What should I do now?
                        Thanks again for your help and comment.

                        Best Regards
                        Zillur

                        Comment

                        • d_emms
                          Junior Member
                          • Oct 2016
                          • 6

                          #13
                          It's a tab-delimited file, try this instead:
                          data = read.csv("Orthogroups.csv.01_matrix.csv", sep="\t")

                          Comment

                          • zillur
                            Senior Member
                            • Sep 2014
                            • 106

                            #14
                            Thank you very much. Got it.

                            Best Regards
                            Zillur

                            Comment

                            Latest Articles

                            Collapse

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 10:09 AM
                            0 responses
                            10 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-04-2026, 08:59 AM
                            0 responses
                            21 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            27 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            22 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...