Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • reema
    Member
    • Feb 2014
    • 27

    CEGMA - FATAL ERROR when running local map 6400

    Hello Everyone,

    I am trying to get the CEGMA score for our new transcriptome assembly. But when I tried to run cegma on my assemled transcripts, I am getting following error:-

    "RUNNING: local_map -n local -f -h /sw/opt/CEGMA_v2.5/data/hmm_profiles -i KOG genome.chunks.fa 2>output.cegma.errors
    FATAL ERROR when running local map 6400: "No such file or directory"

    AND

    "genewise: error while loading shared libraries: libglib-1.2.so.0: cannot open shared object file: No such file or directory
    Can't run genewise -splice_gtag -quiet -gff -pretty -alb -hmmer /sw/opt/CEGMA_v2.5/data/hmm_profiles/KOG0002.hmm genomic12139.fa >genewise12139 "

    I am running after setting every dependency in the path. libglib-1.2.so.0 is also in the path,even I tried running with the libglib-2.0.so.0 [as mentioned in https://gist.github.com/robsyme/1153173]. Here's the commands which I am using for setting path:-

    source .bashrc
    export PATH=$PATH:/sw/opt/geneid/bin/
    export PATH=$PATH:/sw/opt/blast+/bin/
    export CEGMA=/sw/opt/CEGMA_v2.5
    export PERL5LIB=/sw/opt/CEGMA_v2.5/lib/
    export PATH=$PATH:/sw/opt/wise2.4.1/src/bin/
    export WISECONFIGDIR=/sw/opt/wise2.4.1/wisecfg
    export LD_LIBRARY_PATH=/usr/lib64:${LD_LIBRARY_PATH}
    export PATH=$PATH:/sw/opt/CEGMA_v2.5/bin/

    I tried so many times after making little changes, but still no luck. I wonder, if anyone come across with this kind of problem? Any suggestion would be very helpful.

    Many Thanks,
    Reema Singh
  • peromhc
    Senior Member
    • Sep 2009
    • 108

    #2
    I made an amazon machine image for CEGMA - search CEGMA on AWS and you should find it. It is ami-18935a70. It's preconfigured, just upload your data and run.

    Comment

    • kbradnam
      Member
      • May 2011
      • 54

      #3
      Genewise is usually the problem step in most errant runs of CEGMA. As well as the Amazon instance that peromhc mentioned there are many other ways of running CEGMA where you don't have to install it yourself (including a VM). Check the CEGMA FAQ:

      Comment

      • reema
        Member
        • Feb 2014
        • 27

        #4
        Hello peromhc and kbradnam,

        Thank you very much for your reply. Sorry for getting back late- As I first want to run cegma on my assembly before posting the reply. Here's the update:-

        1) CEGMA works fine now on our cluster. As the problem was because of a missing library on the execution host and that has been updated by our IT people.

        2) But I would like to ask one more question:- What is the best Cegma score? I was looking at the http://korflab.ucdavis.edu/Datasets/...faq.html#link4 , but couldn't relate it with our results. our assembly contains:-

        a) Assembly 1 = 202(complete) and 229(partial)
        b) Assembly 2 = 226(complete) and 239(partial)

        Do we have good score?

        Any explanation/suggestion would be very helpful.

        Thanks,
        Reema,

        Comment

        • kbradnam
          Member
          • May 2011
          • 54

          #5
          202 is better than 201 but not as good as 203. It's all relative. CEGMA is most useful in this regard only if you have made multiple assemblies from the same input data. This allows you to assess the relative performance of different assemblers and/or assembly parameters.

          I have previously reported on variation in many different runs of CEGMA: http://figshare.com/articles/Variati...etrics/1011961

          Comment

          • reema
            Member
            • Feb 2014
            • 27

            #6
            Hello kbradnam

            Thanks for sharing the link. But i have one more quick question.

            partial score is higher than complete score from both assembly(generated from two different samples). As far as i understand from http://korflab.ucdavis.edu/Datasets/cegma/, the number of partial set would be higher as it also include the complete set. So my question is :- If partial score is higher than complete score than is this indicates that assembly is fragmented?
            Also should partial score lower than complete score in ideal situation?

            Thanks,
            Reema,
            Last edited by reema; 09-12-2014, 06:52 AM.

            Comment

            • kbradnam
              Member
              • May 2011
              • 54

              #7
              Originally posted by reema View Post
              If partial score is higher than complete score than is this indicates that assembly is fragmented?
              Also should partial score lower than complete score in ideal situation?
              Remember, these are not scores per se. 'Complete' and 'partial' refer to the number of full-length, or full-length *and* partial length core genes detected by the CEGMA pipeline.

              Our ideal (fantasy) result — for the purpose of qualifying the completeness of the gene space — is to have 248 complete proteins present. This would also give a partial figure of 248 as this category is really a superset of complete + partial.

              Note that even if CEGMA says something is 'complete' there is still the possibility that parts of the protein is missing. You have to decide on some artificial cut-off as expecting 100% of the sequence to be present is a) unrealistic and b) not possible because you may not know what 100% means in a newly sequenced species (e.g. in that species there may have been a 3 bp insertion leading to 1 extra amino acid).


              So from CEGMA's point of view, 'complete' means about 70% present (I say 'about' because this is based on alignments to 6 different profile HMMs, which may each vary in length).

              What if you don't have 248 core genes 'completely' present. Well the next thing is to look at the partial results, how close to 248 are they? If you have 200 (complete) and 240 (complete + partial) then this at least suggests that most of the core gene set is present in your assembly, but some may be split across contigs or missing from the assembly. Remember, CEGMA only looks for genes that are inside individual contigs or scaffolds. You could have an assembly that splits every gene across contigs which might lead to a 'complete' result of zero, and a partial result of '248'.

              From looking at results of many different runs of CEGMA, it is common to see something like 90–95% of core gene present in the 'complete' category, and another 1–5% present as partial genes.

              On their own, the 'complete' and 'partial' figures are not that useful. But when you compare results from multiple genome assemblies (all using the same input data), then you might be able to say something about the differences.

              Update: just looking through some old CEGMA results, I also found one case where the results were 157/223. This is more unusual, suggesting that a relatively large number (27%) of the core genes were present as fragments. This might simply reflect lots of short contigs/scaffolds in the assembly. In contrast to this, one of the best results that I have seen is 245/248. It is rare to see all core genes present, even when you allow for partial matches.
              Last edited by kbradnam; 09-12-2014, 03:32 PM.

              Comment

              • reema
                Member
                • Feb 2014
                • 27

                #8
                Thanks kbradnam for explaining this so clearly and nicely. I understand now

                Many Thanks,
                Reema Singh

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 08:59 AM
                0 responses
                8 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                15 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Working...