Header Leaderboard Ad

Collapse

CEGMA with custom set of proteins

Collapse

Announcement

Collapse

SEQanswers June Challenge Has Begun!

The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!

For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • CEGMA with custom set of proteins

    Hi there,

    I am trying to use CEGMA with a custom set of alignment that I maintain, instead of the default KOGs. When running in a custom mode, one has to provide HMM profiles created with HMMER and "choose a cutoff for each profile". Example from the CEGMA readme:

    cegma --genome sample.dna --prot_num 4 --protein ORTH.fa \\
    --hmm_prefix ORTH --hmm_profiles hmm_profiles/ \\
    --cutoff_file profiles_cutoff.tbl

    I have 2 questions:

    1. Anyone knows what "the cutoff for the HMMER alignments" refers to, and how the default profiles_cutoff.tbl file was generated?

    1. Do I understand it right that all alignments need to have the same number of sequences, in order to satisfy the —prot_num argument?

    Any tips would be greatly appreciated, many many thanks.

    Fabien

  • #2
    Originally posted by retardia View Post
    Hi there,

    I am trying to use CEGMA with a custom set of alignment that I maintain, instead of the default KOGs. When running in a custom mode, one has to provide HMM profiles created with HMMER and "choose a cutoff for each profile". Example from the CEGMA readme:

    cegma --genome sample.dna --prot_num 4 --protein ORTH.fa \\
    --hmm_prefix ORTH --hmm_profiles hmm_profiles/ \\
    --cutoff_file profiles_cutoff.tbl

    I have 2 questions:

    1. Anyone knows what "the cutoff for the HMMER alignments" refers to, and how the default profiles_cutoff.tbl file was generated?

    1. Do I understand it right that all alignments need to have the same number of sequences, in order to satisfy the —prot_num argument?

    Any tips would be greatly appreciated, many many thanks.

    Fabien

    I'd like to revive this thread because it was never answered...and I have the same questions ;>

    Comment


    • #3
      I'll revive this again since neither of you got any answers. I came across this when looking for help with a different cegma question.

      The cutoff values are calculated as described in their paper. For each gene they took their alignment (of 6 taxa) and created every sub alignment of n-1 taxa. They created hmms from the n-1 alignments. Then they did an hmmsearch of the missing taxa with the hmm. They took the score of each of those searches and averaged them and then divided by 2 to generate the cutoff value.

      I created a shell script to generate these scores, if anyone is interested.

      The prot_num argument does mean that you will need all the alignments to have the same number of sequences. I wrote a shell script to bin all the genes based on prot_number. Another script then runs cegma with each binned prot_number.

      Another issue that comes up is that the hmm used in hmm_profiles needs to be generated by hmmer 2.3.2 and calibrated using hmmcalibrate. If you use an hmm generated by hmmer 3 you get an error.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 10:20 AM
      0 responses
      9 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-07-2023, 07:14 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-06-2023, 01:08 PM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-01-2023, 08:56 PM
      0 responses
      166 views
      0 likes
      Last Post seqadmin  
      Working...
      X