Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zillur
    Senior Member
    • Sep 2014
    • 106

    Orthomcl running problem

    Hi there,
    I was trying to run orthomcl in my linux workstation. I am facing this problem:
    Code:
    [root@genomics bin]# ./orthomcl-pipeline -i /home/zillur/Desktop/zillur/phd/orthomcl -o /home/zillur/Desktop/zillur/phd/orthomcl/output -m /usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example --nocompliant
    Warning: directory "/home/zillur/Desktop/zillur/phd/orthomcl/output" already exists, are you sure you want to store data here [Y]? y
    Starting OrthoMCL pipeline on: Mon Sep 26 20:11:08 2016
    Git commit: unknown
    
    =Stage 1: Validate Files =
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta ... 5076 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta ... 5217 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta ... 5542 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta ... 3 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta ... 5323 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta ... 5586 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta ... 5709 sequences
    Validated 7 files
    Stage 1 took 0.02 minutes 
    
    =Stage 2: Validate Database=
    Stage 2 took 0.00 minutes 
    
    
    =Stage 3: Load OrthoMCL Database Schema=
    /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    Error executing command: /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log. See logs /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log and /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    The log is as follows:
    Code:
    [zillur@genomics log]$ more 3.loadschema.stderr.log 
    Can't locate OrthoMCLEngine/Main/Base.pm in @INC (@INC contains: /usr/bin/../lib/perl /root/perl5/lib/perl5/x86_64-linux-thread-multi /root/perl5/lib/perl5 /h
    ome/zillur/perl5/lib/perl5/x86_64-linux-thread-multi /home/zillur/perl5/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /
    usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/orthomclInstallSchema line 6.
    BEGIN failed--compilation aborted at /usr/bin/orthomclInstallSchema line 6.
    Any suggestions please.

    Best Regards
    Zillur
  • zillur
    Senior Member
    • Sep 2014
    • 106

    #2
    Is there anybody has any idea? Please. I appreciate your helps.

    Best Regards
    Zillur

    Comment

    • d_emms
      Junior Member
      • Oct 2016
      • 6

      #3
      Hi

      It looks like you need to set your PERL5LIB environment variable so that it points to where your orthomcl perl files are. Something like this:
      export PERL5LIB=/path/to/orthomcl.

      One suggestion though, have you tried OrthoFinder? It's far easier to run, it just requires a single command. It's also a lot more accurate than OrthoMCL:
      Phylogenetic orthology inference for comparative genomics - davidemms/OrthoFinder


      David

      Comment

      • zillur
        Senior Member
        • Sep 2014
        • 106

        #4
        Thank you very much for your suggestions. Yeah. I have tried orthofinder and it gave me outputs. I wanted to run orthomcl to compare, maybe its not necessary now. Do you have any suggestions how can I process the outputs to get a gene presence/absence matrix?

        Thank you again.

        Best Regards
        Zillur

        Comment

        • d_emms
          Junior Member
          • Oct 2016
          • 6

          #5
          The file Orthogroups.csv is effectively a presence/absence matrix: The rows are orthogroups and the columns are species so if there are any genes listed in the i,j-th cell then the ith orthogroup is present in the jth species.

          All the best
          David

          Comment

          • zillur
            Senior Member
            • Sep 2014
            • 106

            #6
            Thank you very much for your comment. I want a matrix like:

            Code:
                          genome1	genome2 genome3
            gene1  	 1     	 0     	 0
            gene2  	 0     	 0     	 0
            gene3  	 1     	 1     	 1
            gene4  	 0     	 0     	 1
            How can I do this?

            Best Regards
            Zillur

            Comment

            • d_emms
              Junior Member
              • Oct 2016
              • 6

              #7
              You'd just need to replace empty cells with 0 and cells with text in with 1.

              All the best
              David

              Comment

              • zillur
                Senior Member
                • Sep 2014
                • 106

                #8
                Thank you very much for your reply.
                You'd just need to replace empty cells with 0 and cells with text in with 1.
                Exactly I want to do this. But how can replace this?

                Thanks for your suggestions.
                Best Regards
                Zillur

                Comment

                • d_emms
                  Junior Member
                  • Oct 2016
                  • 6

                  #9
                  This is a python script that will do it for you:


                  Code:
                  import sys
                  import csv
                  
                  if len(sys.argv) != 2:
                      print("Usage: python presence_absence.py Orthogroups.csv")
                      sys.exit()
                  
                  inFN = sys.argv[1]
                  outFN = inFN + ".01_matrix.csv"
                  with open(inFN, 'rb') as infile, open(outFN, 'wb') as outfile:
                      reader = csv.reader(infile, delimiter="\t")
                      writer = csv.writer(outfile, delimiter="\t")
                      writer.writerow(reader.next())
                      for line in reader:
                          writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                  All the best
                  David

                  Comment

                  • zillur
                    Senior Member
                    • Sep 2014
                    • 106

                    #10
                    Thank you very much for your script. I was trying to run, but:
                    Code:
                    [zillur@genomics Results_Sep26]$ python matrix_convert_binary.py Orthogroups.csv
                    Traceback (most recent call last):
                      File "matrix_convert_binary.py", line 14, in <module>
                        writer.writerow(reader.next())
                    AttributeError: '_csv.reader' object has no attribute 'next'
                    My system is:
                    Code:
                    [zillur@genomics Results_Sep26]$ python
                    Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
                    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
                    Type "help", "copyright", "credits" or "license" for more information.
                    I am not sure what I need to modify. Any idea?
                    Thanks again.

                    Best Regards
                    Zillur

                    Comment

                    • d_emms
                      Junior Member
                      • Oct 2016
                      • 6

                      #11
                      It was written for python 2, below is a version which will work with both python 2 and 3:

                      Code:
                      import sys
                      import csv
                      
                      if len(sys.argv) != 2:
                          print("Usage: python presence_absence.py Orthogroups.csv")
                          sys.exit()
                      
                      inFN = sys.argv[1]
                      outFN = inFN + ".01_matrix.csv"
                      with open(inFN, 'r') as infile, open(outFN, 'w') as outfile:
                          reader = csv.reader(infile, delimiter="\t")
                          writer = csv.writer(outfile, delimiter="\t")
                          writer.writerow(next(reader))
                          for line in reader:
                              writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                      Comment

                      • zillur
                        Senior Member
                        • Sep 2014
                        • 106

                        #12
                        Thank you very much for your valuable suggestions. The code perfectly converted the matrix into a binary matrix. But the problem is I can't load the new csv file in R as it is:
                        Code:
                        [zillur@genomics Results_Sep26]$ head Orthogroups.csv.01_matrix.csv 
                        	PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta	PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta	PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta	PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta	PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta	PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta	PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta
                        OG0000000	1	1	0	0	0	0	1
                        OG0000001	1	1	1	0	1	1	1
                        OG0000002	0	0	0	0	0	1	0
                        OG0000003	0	0	0	0	1	1	0
                        OG0000004	1	1	0	0	0	0	1
                        OG0000005	0	0	0	0	1	0	0
                        OG0000006	1	1	0	0	0	0	1
                        OG0000007	1	1	1	0	1	1	1
                        OG0000008	0	0	1	0	0	0	0

                        But when I load the csv in R, it looks like:

                        Code:
                        > data = read.csv("Orthogroups.csv.01_matrix.csv", sep=",")
                        > head(data)
                          PlasmoDB.28_PbergheiANKA_AnnotatedProteins.fasta.PlasmoDB.28_Pchabaudichabaudi_AnnotatedProteins.fasta.PlasmoDB.28_Pfalciparum3D7_AnnotatedProteins.fasta.PlasmoDB.28_Pgallinaceum8A_AnnotatedProteins.fasta.PlasmoDB.28_PknowlesiH_AnnotatedProteins.fast ...
                        1                                                                                                                                                                                                                                 OG0000000\t1\t1\t0\t0\t0\t0\t1
                        2                                                                                                                                                                                                                                 OG0000001\t1\t1\t1\t0\t1\t1\t1
                        3                                                                                                                                                                                                                                 OG0000002\t0\t0\t0\t0\t0\t1\t0
                        4                                                                                                                                                                                                                                 OG0000003\t0\t0\t0\t0\t1\t1\t0
                        5                                                                                                                                                                                                                                 OG0000004\t1\t1\t0\t0\t0\t0\t1
                        6                                                                                                                                                                                                                                 OG0000005\t0\t0\t0\t0\t1\t0\t0
                        What should I do now?
                        Thanks again for your help and comment.

                        Best Regards
                        Zillur

                        Comment

                        • d_emms
                          Junior Member
                          • Oct 2016
                          • 6

                          #13
                          It's a tab-delimited file, try this instead:
                          data = read.csv("Orthogroups.csv.01_matrix.csv", sep="\t")

                          Comment

                          • zillur
                            Senior Member
                            • Sep 2014
                            • 106

                            #14
                            Thank you very much. Got it.

                            Best Regards
                            Zillur

                            Comment

                            Latest Articles

                            Collapse

                            • GATTACAT
                              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                              by GATTACAT
                              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                              Today, 11:43 AM
                            • SEQadmin2
                              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                              by SEQadmin2


                              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                              Here are nine questions we think about, in roughly the order they matter, before...
                              06-18-2026, 07:11 AM
                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, Yesterday, 05:37 AM
                            0 responses
                            8 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-26-2026, 11:10 AM
                            0 responses
                            17 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-17-2026, 06:09 AM
                            0 responses
                            52 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-09-2026, 11:58 AM
                            0 responses
                            110 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...