Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CEGMA with custom set of proteins

    Hi there,

    I am trying to use CEGMA with a custom set of alignment that I maintain, instead of the default KOGs. When running in a custom mode, one has to provide HMM profiles created with HMMER and "choose a cutoff for each profile". Example from the CEGMA readme:

    cegma --genome sample.dna --prot_num 4 --protein ORTH.fa \\
    --hmm_prefix ORTH --hmm_profiles hmm_profiles/ \\
    --cutoff_file profiles_cutoff.tbl

    I have 2 questions:

    1. Anyone knows what "the cutoff for the HMMER alignments" refers to, and how the default profiles_cutoff.tbl file was generated?

    1. Do I understand it right that all alignments need to have the same number of sequences, in order to satisfy the —prot_num argument?

    Any tips would be greatly appreciated, many many thanks.

    Fabien

  • #2
    Originally posted by retardia View Post
    Hi there,

    I am trying to use CEGMA with a custom set of alignment that I maintain, instead of the default KOGs. When running in a custom mode, one has to provide HMM profiles created with HMMER and "choose a cutoff for each profile". Example from the CEGMA readme:

    cegma --genome sample.dna --prot_num 4 --protein ORTH.fa \\
    --hmm_prefix ORTH --hmm_profiles hmm_profiles/ \\
    --cutoff_file profiles_cutoff.tbl

    I have 2 questions:

    1. Anyone knows what "the cutoff for the HMMER alignments" refers to, and how the default profiles_cutoff.tbl file was generated?

    1. Do I understand it right that all alignments need to have the same number of sequences, in order to satisfy the —prot_num argument?

    Any tips would be greatly appreciated, many many thanks.

    Fabien

    I'd like to revive this thread because it was never answered...and I have the same questions ;>

    Comment


    • #3
      I'll revive this again since neither of you got any answers. I came across this when looking for help with a different cegma question.

      The cutoff values are calculated as described in their paper. For each gene they took their alignment (of 6 taxa) and created every sub alignment of n-1 taxa. They created hmms from the n-1 alignments. Then they did an hmmsearch of the missing taxa with the hmm. They took the score of each of those searches and averaged them and then divided by 2 to generate the cutoff value.

      I created a shell script to generate these scores, if anyone is interested.

      The prot_num argument does mean that you will need all the alignments to have the same number of sequences. I wrote a shell script to bin all the genes based on prot_number. Another script then runs cegma with each binned prot_number.

      Another issue that comes up is that the hmm used in hmm_profiles needs to be generated by hmmer 2.3.2 and calibrated using hmmcalibrate. If you use an hmm generated by hmmer 3 you get an error.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      57 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      51 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      56 views
      0 likes
      Last Post seqadmin  
      Working...
      X