Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I need the help with the gene clustering please.

    I have 46 genomes of different strains of a bacteria, along with their protein sequences and the nucleotide sequences of those protein sequences.

    I am trying to cluster the genes so that I can make a graph such that the x-axis includes genomes 1-46 and the y-axis includes the gene clusters.

    How do I go about doing this? I have looked into USEARCH and CD-HIT, but I am very confused on how to go about it.

  • #2
    Have you looked at Mauve?

    Comment


    • #3
      I will check it out thank you.

      I am just very confused on how to go about clustering or what exactly the purpose is.

      Do I cluster the first genome, then when I cluster the second genome, it is cumulative meaning the cluster adds onto the genome(s) before it?

      Comment


      • #4
        What exactly are you trying to analyze? Are these genomes related (closely or not so much)? Are you looking to see if there are re-arrangements (insertions/deletions/inversions/duplications) that can be identified across these genomes?

        Comment


        • #5
          Originally posted by GenoMax View Post
          What exactly are you trying to analyze? Are these genomes related (closely or not so much)? Are you looking to see if there are re-arrangements (insertions/deletions/inversions/duplications) that can be identified across these genomes?
          I am trying to generate a curve. The x-axis will be genomes 1-46. The y-axis will be the gene clusters.

          The genomes are all the same species, just different strains

          Later down the road, we will look for foreign genes and then use a metagenomics data base to see where they came from.

          Comment


          • #6
            Can you define what a gene cluster is? How do you quantify what a gene cluster is? What does the curve represent?

            My interpretation is you want to concatenate all the protein sequences into a single file, and nucleotide sequences into a different file, or get their protein equivalents. Then run cdhit to cluster similar sequences. You will end up with clusters.

            Comment


            • #7
              Originally posted by bio_boris View Post
              Can you define what a gene cluster is? How do you quantify what a gene cluster is? What does the curve represent?

              My interpretation is you want to concatenate all the protein sequences into a single file, and nucleotide sequences into a different file, or get their protein equivalents. Then run cdhit to cluster similar sequences. You will end up with clusters.
              Gene cluster meaning nucleotide or protein sequences of genes that are closely related based off of nucleotides or protein sequences.

              For example, we take the protein or nucleotide gene sequences of the 1st genome, then we cluster the genes. Then the 2nd genome takes the 1st into consideration and then clusters the genes. Then the 3rd genome takes the 1st and 2nd into consideration and so on.

              So on the graph the 1st genome will have about 2000 gene clusters and by the 46th one, there might be 20000 clusters.

              Comment


              • #8
                So you take a cluster, and iteratively add more clusters? Why not just cluster them all at once?

                So what does the curve represent?

                Comment


                • #9
                  Originally posted by bio_boris View Post
                  So you take a cluster, and iteratively add more clusters? Why not just cluster them all at once?

                  So what does the curve represent?

                  Yes to your first question.

                  I'm not sure why I'm not supposed to cluster them all at once. I think once I reach the last genome (46th genome), that will be the total number of gene clusters.

                  We predict the curve will exponentially increase and then taper off. I'm not exactly sure what the curve represents either.

                  Comment


                  • #10
                    bump please help

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-25-2024, 11:49 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    17 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    62 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X