Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Orthomcl running problem

    Hi there,
    I was trying to run orthomcl in my linux workstation. I am facing this problem:
    Code:
    [root@genomics bin]# ./orthomcl-pipeline -i /home/zillur/Desktop/zillur/phd/orthomcl -o /home/zillur/Desktop/zillur/phd/orthomcl/output -m /usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example --nocompliant
    Warning: directory "/home/zillur/Desktop/zillur/phd/orthomcl/output" already exists, are you sure you want to store data here [Y]? y
    Starting OrthoMCL pipeline on: Mon Sep 26 20:11:08 2016
    Git commit: unknown
    
    =Stage 1: Validate Files =
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta ... 5076 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta ... 5217 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta ... 5542 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta ... 3 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta ... 5323 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta ... 5586 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta ... 5709 sequences
    Validated 7 files
    Stage 1 took 0.02 minutes 
    
    =Stage 2: Validate Database=
    Stage 2 took 0.00 minutes 
    
    
    =Stage 3: Load OrthoMCL Database Schema=
    /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    Error executing command: /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log. See logs /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log and /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    The log is as follows:
    Code:
    [zillur@genomics log]$ more 3.loadschema.stderr.log 
    Can't locate OrthoMCLEngine/Main/Base.pm in @INC (@INC contains: /usr/bin/../lib/perl /root/perl5/lib/perl5/x86_64-linux-thread-multi /root/perl5/lib/perl5 /h
    ome/zillur/perl5/lib/perl5/x86_64-linux-thread-multi /home/zillur/perl5/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /
    usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/orthomclInstallSchema line 6.
    BEGIN failed--compilation aborted at /usr/bin/orthomclInstallSchema line 6.
    Any suggestions please.

    Best Regards
    Zillur

  • #2
    Is there anybody has any idea? Please. I appreciate your helps.

    Best Regards
    Zillur

    Comment


    • #3
      Hi

      It looks like you need to set your PERL5LIB environment variable so that it points to where your orthomcl perl files are. Something like this:
      export PERL5LIB=/path/to/orthomcl.

      One suggestion though, have you tried OrthoFinder? It's far easier to run, it just requires a single command. It's also a lot more accurate than OrthoMCL:
      Phylogenetic orthology inference for comparative genomics - davidemms/OrthoFinder


      David

      Comment


      • #4
        Thank you very much for your suggestions. Yeah. I have tried orthofinder and it gave me outputs. I wanted to run orthomcl to compare, maybe its not necessary now. Do you have any suggestions how can I process the outputs to get a gene presence/absence matrix?

        Thank you again.

        Best Regards
        Zillur

        Comment


        • #5
          The file Orthogroups.csv is effectively a presence/absence matrix: The rows are orthogroups and the columns are species so if there are any genes listed in the i,j-th cell then the ith orthogroup is present in the jth species.

          All the best
          David

          Comment


          • #6
            Thank you very much for your comment. I want a matrix like:

            Code:
                          genome1	genome2 genome3
            gene1  	 1     	 0     	 0
            gene2  	 0     	 0     	 0
            gene3  	 1     	 1     	 1
            gene4  	 0     	 0     	 1
            How can I do this?

            Best Regards
            Zillur

            Comment


            • #7
              You'd just need to replace empty cells with 0 and cells with text in with 1.

              All the best
              David

              Comment


              • #8
                Thank you very much for your reply.
                You'd just need to replace empty cells with 0 and cells with text in with 1.
                Exactly I want to do this. But how can replace this?

                Thanks for your suggestions.
                Best Regards
                Zillur

                Comment


                • #9
                  This is a python script that will do it for you:


                  Code:
                  import sys
                  import csv
                  
                  if len(sys.argv) != 2:
                      print("Usage: python presence_absence.py Orthogroups.csv")
                      sys.exit()
                  
                  inFN = sys.argv[1]
                  outFN = inFN + ".01_matrix.csv"
                  with open(inFN, 'rb') as infile, open(outFN, 'wb') as outfile:
                      reader = csv.reader(infile, delimiter="\t")
                      writer = csv.writer(outfile, delimiter="\t")
                      writer.writerow(reader.next())
                      for line in reader:
                          writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                  All the best
                  David

                  Comment


                  • #10
                    Thank you very much for your script. I was trying to run, but:
                    Code:
                    [zillur@genomics Results_Sep26]$ python matrix_convert_binary.py Orthogroups.csv
                    Traceback (most recent call last):
                      File "matrix_convert_binary.py", line 14, in <module>
                        writer.writerow(reader.next())
                    AttributeError: '_csv.reader' object has no attribute 'next'
                    My system is:
                    Code:
                    [zillur@genomics Results_Sep26]$ python
                    Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
                    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
                    Type "help", "copyright", "credits" or "license" for more information.
                    I am not sure what I need to modify. Any idea?
                    Thanks again.

                    Best Regards
                    Zillur

                    Comment


                    • #11
                      It was written for python 2, below is a version which will work with both python 2 and 3:

                      Code:
                      import sys
                      import csv
                      
                      if len(sys.argv) != 2:
                          print("Usage: python presence_absence.py Orthogroups.csv")
                          sys.exit()
                      
                      inFN = sys.argv[1]
                      outFN = inFN + ".01_matrix.csv"
                      with open(inFN, 'r') as infile, open(outFN, 'w') as outfile:
                          reader = csv.reader(infile, delimiter="\t")
                          writer = csv.writer(outfile, delimiter="\t")
                          writer.writerow(next(reader))
                          for line in reader:
                              writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                      Comment


                      • #12
                        Thank you very much for your valuable suggestions. The code perfectly converted the matrix into a binary matrix. But the problem is I can't load the new csv file in R as it is:
                        Code:
                        [zillur@genomics Results_Sep26]$ head Orthogroups.csv.01_matrix.csv 
                        	PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta	PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta	PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta	PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta	PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta	PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta	PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta
                        OG0000000	1	1	0	0	0	0	1
                        OG0000001	1	1	1	0	1	1	1
                        OG0000002	0	0	0	0	0	1	0
                        OG0000003	0	0	0	0	1	1	0
                        OG0000004	1	1	0	0	0	0	1
                        OG0000005	0	0	0	0	1	0	0
                        OG0000006	1	1	0	0	0	0	1
                        OG0000007	1	1	1	0	1	1	1
                        OG0000008	0	0	1	0	0	0	0

                        But when I load the csv in R, it looks like:

                        Code:
                        > data = read.csv("Orthogroups.csv.01_matrix.csv", sep=",")
                        > head(data)
                          PlasmoDB.28_PbergheiANKA_AnnotatedProteins.fasta.PlasmoDB.28_Pchabaudichabaudi_AnnotatedProteins.fasta.PlasmoDB.28_Pfalciparum3D7_AnnotatedProteins.fasta.PlasmoDB.28_Pgallinaceum8A_AnnotatedProteins.fasta.PlasmoDB.28_PknowlesiH_AnnotatedProteins.fast ...
                        1                                                                                                                                                                                                                                 OG0000000\t1\t1\t0\t0\t0\t0\t1
                        2                                                                                                                                                                                                                                 OG0000001\t1\t1\t1\t0\t1\t1\t1
                        3                                                                                                                                                                                                                                 OG0000002\t0\t0\t0\t0\t0\t1\t0
                        4                                                                                                                                                                                                                                 OG0000003\t0\t0\t0\t0\t1\t1\t0
                        5                                                                                                                                                                                                                                 OG0000004\t1\t1\t0\t0\t0\t0\t1
                        6                                                                                                                                                                                                                                 OG0000005\t0\t0\t0\t0\t1\t0\t0
                        What should I do now?
                        Thanks again for your help and comment.

                        Best Regards
                        Zillur

                        Comment


                        • #13
                          It's a tab-delimited file, try this instead:
                          data = read.csv("Orthogroups.csv.01_matrix.csv", sep="\t")

                          Comment


                          • #14
                            Thank you very much. Got it.

                            Best Regards
                            Zillur

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Quality Control Essentials for Next-Generation Sequencing Workflows
                              by seqadmin




                              Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

                              Nucleic Acid Quality Control
                              Preparing for NGS starts with isolating the...
                              02-10-2025, 01:58 PM
                            • seqadmin
                              An Introduction to the Technologies Transforming Precision Medicine
                              by seqadmin


                              In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
                              01-27-2025, 07:46 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 02-07-2025, 09:30 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 02-05-2025, 10:34 AM
                            0 responses
                            107 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 02-03-2025, 09:07 AM
                            0 responses
                            83 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 01-31-2025, 08:31 AM
                            0 responses
                            47 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X