Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Orthomcl running problem

    Hi there,
    I was trying to run orthomcl in my linux workstation. I am facing this problem:
    Code:
    [root@genomics bin]# ./orthomcl-pipeline -i /home/zillur/Desktop/zillur/phd/orthomcl -o /home/zillur/Desktop/zillur/phd/orthomcl/output -m /usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example --nocompliant
    Warning: directory "/home/zillur/Desktop/zillur/phd/orthomcl/output" already exists, are you sure you want to store data here [Y]? y
    Starting OrthoMCL pipeline on: Mon Sep 26 20:11:08 2016
    Git commit: unknown
    
    =Stage 1: Validate Files =
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta ... 5076 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta ... 5217 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta ... 5542 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta ... 3 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta ... 5323 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta ... 5586 sequences
    Odd number of elements in hash assignment at /root/perl5/lib/perl5/Bio/SeqIO.pm line 378.
    Validating PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta ... 5709 sequences
    Validated 7 files
    Stage 1 took 0.02 minutes 
    
    =Stage 2: Validate Database=
    Stage 2 took 0.00 minutes 
    
    
    =Stage 3: Load OrthoMCL Database Schema=
    /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    Error executing command: /usr/bin/orthomclInstallSchema "/usr/local/Tools/orthomcl/orthomcl-pipeline-master/etc/orthomcl.config.example" "/home/zillur/Desktop/zillur/phd/orthomcl/output/log/orthomclSchema.log" 1>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log 2>/home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log. See logs /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stdout.log and /home/zillur/Desktop/zillur/phd/orthomcl/output/log/3.loadschema.stderr.log
    The log is as follows:
    Code:
    [zillur@genomics log]$ more 3.loadschema.stderr.log 
    Can't locate OrthoMCLEngine/Main/Base.pm in @INC (@INC contains: /usr/bin/../lib/perl /root/perl5/lib/perl5/x86_64-linux-thread-multi /root/perl5/lib/perl5 /h
    ome/zillur/perl5/lib/perl5/x86_64-linux-thread-multi /home/zillur/perl5/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /
    usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/orthomclInstallSchema line 6.
    BEGIN failed--compilation aborted at /usr/bin/orthomclInstallSchema line 6.
    Any suggestions please.

    Best Regards
    Zillur

  • #2
    Is there anybody has any idea? Please. I appreciate your helps.

    Best Regards
    Zillur

    Comment


    • #3
      Hi

      It looks like you need to set your PERL5LIB environment variable so that it points to where your orthomcl perl files are. Something like this:
      export PERL5LIB=/path/to/orthomcl.

      One suggestion though, have you tried OrthoFinder? It's far easier to run, it just requires a single command. It's also a lot more accurate than OrthoMCL:
      Phylogenetic orthology inference for comparative genomics - davidemms/OrthoFinder


      David

      Comment


      • #4
        Thank you very much for your suggestions. Yeah. I have tried orthofinder and it gave me outputs. I wanted to run orthomcl to compare, maybe its not necessary now. Do you have any suggestions how can I process the outputs to get a gene presence/absence matrix?

        Thank you again.

        Best Regards
        Zillur

        Comment


        • #5
          The file Orthogroups.csv is effectively a presence/absence matrix: The rows are orthogroups and the columns are species so if there are any genes listed in the i,j-th cell then the ith orthogroup is present in the jth species.

          All the best
          David

          Comment


          • #6
            Thank you very much for your comment. I want a matrix like:

            Code:
                          genome1	genome2 genome3
            gene1  	 1     	 0     	 0
            gene2  	 0     	 0     	 0
            gene3  	 1     	 1     	 1
            gene4  	 0     	 0     	 1
            How can I do this?

            Best Regards
            Zillur

            Comment


            • #7
              You'd just need to replace empty cells with 0 and cells with text in with 1.

              All the best
              David

              Comment


              • #8
                Thank you very much for your reply.
                You'd just need to replace empty cells with 0 and cells with text in with 1.
                Exactly I want to do this. But how can replace this?

                Thanks for your suggestions.
                Best Regards
                Zillur

                Comment


                • #9
                  This is a python script that will do it for you:


                  Code:
                  import sys
                  import csv
                  
                  if len(sys.argv) != 2:
                      print("Usage: python presence_absence.py Orthogroups.csv")
                      sys.exit()
                  
                  inFN = sys.argv[1]
                  outFN = inFN + ".01_matrix.csv"
                  with open(inFN, 'rb') as infile, open(outFN, 'wb') as outfile:
                      reader = csv.reader(infile, delimiter="\t")
                      writer = csv.writer(outfile, delimiter="\t")
                      writer.writerow(reader.next())
                      for line in reader:
                          writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                  All the best
                  David

                  Comment


                  • #10
                    Thank you very much for your script. I was trying to run, but:
                    Code:
                    [zillur@genomics Results_Sep26]$ python matrix_convert_binary.py Orthogroups.csv
                    Traceback (most recent call last):
                      File "matrix_convert_binary.py", line 14, in <module>
                        writer.writerow(reader.next())
                    AttributeError: '_csv.reader' object has no attribute 'next'
                    My system is:
                    Code:
                    [zillur@genomics Results_Sep26]$ python
                    Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:53:06) 
                    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
                    Type "help", "copyright", "credits" or "license" for more information.
                    I am not sure what I need to modify. Any idea?
                    Thanks again.

                    Best Regards
                    Zillur

                    Comment


                    • #11
                      It was written for python 2, below is a version which will work with both python 2 and 3:

                      Code:
                      import sys
                      import csv
                      
                      if len(sys.argv) != 2:
                          print("Usage: python presence_absence.py Orthogroups.csv")
                          sys.exit()
                      
                      inFN = sys.argv[1]
                      outFN = inFN + ".01_matrix.csv"
                      with open(inFN, 'r') as infile, open(outFN, 'w') as outfile:
                          reader = csv.reader(infile, delimiter="\t")
                          writer = csv.writer(outfile, delimiter="\t")
                          writer.writerow(next(reader))
                          for line in reader:
                              writer.writerow(line[:1] + [0 if "" == cell else 1 for cell in line[1:]])

                      Comment


                      • #12
                        Thank you very much for your valuable suggestions. The code perfectly converted the matrix into a binary matrix. But the problem is I can't load the new csv file in R as it is:
                        Code:
                        [zillur@genomics Results_Sep26]$ head Orthogroups.csv.01_matrix.csv 
                        	PlasmoDB-28_PbergheiANKA_AnnotatedProteins.fasta	PlasmoDB-28_Pchabaudichabaudi_AnnotatedProteins.fasta	PlasmoDB-28_Pfalciparum3D7_AnnotatedProteins.fasta	PlasmoDB-28_Pgallinaceum8A_AnnotatedProteins.fasta	PlasmoDB-28_PknowlesiH_AnnotatedProteins.fasta	PlasmoDB-28_PvivaxSal1_AnnotatedProteins.fasta	PlasmoDB-28_PyoeliiyoeliiYM_AnnotatedProteins.fasta
                        OG0000000	1	1	0	0	0	0	1
                        OG0000001	1	1	1	0	1	1	1
                        OG0000002	0	0	0	0	0	1	0
                        OG0000003	0	0	0	0	1	1	0
                        OG0000004	1	1	0	0	0	0	1
                        OG0000005	0	0	0	0	1	0	0
                        OG0000006	1	1	0	0	0	0	1
                        OG0000007	1	1	1	0	1	1	1
                        OG0000008	0	0	1	0	0	0	0

                        But when I load the csv in R, it looks like:

                        Code:
                        > data = read.csv("Orthogroups.csv.01_matrix.csv", sep=",")
                        > head(data)
                          PlasmoDB.28_PbergheiANKA_AnnotatedProteins.fasta.PlasmoDB.28_Pchabaudichabaudi_AnnotatedProteins.fasta.PlasmoDB.28_Pfalciparum3D7_AnnotatedProteins.fasta.PlasmoDB.28_Pgallinaceum8A_AnnotatedProteins.fasta.PlasmoDB.28_PknowlesiH_AnnotatedProteins.fast ...
                        1                                                                                                                                                                                                                                 OG0000000\t1\t1\t0\t0\t0\t0\t1
                        2                                                                                                                                                                                                                                 OG0000001\t1\t1\t1\t0\t1\t1\t1
                        3                                                                                                                                                                                                                                 OG0000002\t0\t0\t0\t0\t0\t1\t0
                        4                                                                                                                                                                                                                                 OG0000003\t0\t0\t0\t0\t1\t1\t0
                        5                                                                                                                                                                                                                                 OG0000004\t1\t1\t0\t0\t0\t0\t1
                        6                                                                                                                                                                                                                                 OG0000005\t0\t0\t0\t0\t1\t0\t0
                        What should I do now?
                        Thanks again for your help and comment.

                        Best Regards
                        Zillur

                        Comment


                        • #13
                          It's a tab-delimited file, try this instead:
                          data = read.csv("Orthogroups.csv.01_matrix.csv", sep="\t")

                          Comment


                          • #14
                            Thank you very much. Got it.

                            Best Regards
                            Zillur

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Recent Advances in Sequencing Analysis Tools
                              by seqadmin


                              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                              05-06-2024, 07:48 AM
                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 05-14-2024, 07:03 AM
                            0 responses
                            26 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 05-10-2024, 06:35 AM
                            0 responses
                            45 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 05-09-2024, 02:46 PM
                            0 responses
                            59 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 05-07-2024, 06:57 AM
                            0 responses
                            46 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X