Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering annotated sequences based on their GO terms

    Dear all,

    I have a set of 10000 sequences from an RNAseq experiment annotated with GO terms, I would like to cluster the sequences in biological meaningful groups using the GO terms information for each sequence. Is there any software to do that?

    Pau

    Thank you!

  • #2
    Hi,
    Out of curiosity, are these (10000) sequences already clusetered (de novo clustering) or do they represent only sequences with GO from your assembly?

    I had writen something down exactly as this one in my to dos list. I'd bookmarked this page to explored in future. I have not used it but it might help:


    The grouping algorithm is based on the hypothesis that similar annotations should have similar gene members.

    HTH

    Comment


    • #3
      Hi Apexy,

      No I have not yest clustered de sequences. I have just annotated de sequences obtained from the assembly using Blast2go. Now I would like to go un step further and cluster de sequences based on their GO terms in order to obtain groups of genes involved in similar function.
      Thanks, I have already had a look at DAVID website. I think it could be a good option, but the web only accepts 3000 sequences each time and I would like to cluster all the sequences and the same time....I will keep on searching for alternative websites.

      Thank you for your answer!!

      Pau

      Comment


      • #4
        You could probably write some small bash script. What kind of separators are you using in your headers? Which field is GO? Are there line-breaks in your sequences?
        savetherhino.org

        Comment


        • #5
          Hi,
          Its better as you have not done any clustering on them before annotation since de novo clustering sometimes assigns different transcripts from paralogous gene into the same clusters and for species with extensive gene duplications, it can be a potential nightmare. Are these functional labels from annotation transfer with BLAST or with INTERPRO or both in Blast2go? Can I also know what fraction these sequences (10,000) represent the entire assembly and what database was Blast2go set to if you used BLAST?

          @rhinoceros, a cluster should be defined by the degree of overlap in GOs shared by sequences. This will certainly introduce a new challenge as to what threshold of GOs required to put sequences in one cluster. Do you mean using cat, cut,sort and grep in a loop to write a clustering algorithm?

          Thanks,

          Comment


          • #6
            Originally posted by Apexy View Post
            Hi,
            @rhinoceros, a cluster should be defined by the degree of overlap in GOs shared by sequences. This will certainly introduce a new challenge as to what threshold of GOs required to put sequences in one cluster. Do you mean using cat, cut,sort and grep in a loop to write a clustering algorithm?
            I thought the aim was to sort sequences so that in file Z there would be all the sequences that had GO X in their header. It's not really clustering at all but sorting. But anyway, maybe I misunderstood OP.
            Last edited by rhinoceros; 04-29-2013, 02:55 AM.
            savetherhino.org

            Comment


            • #7
              Originally posted by rhinoceros View Post
              I thought the aim was to sort sequences so that in file Z there would be all the sequences that had GO X in their header. It's not really clustering at all but sorting. But anyway, maybe I misunderstood OP.
              This would have been an appealing solution if each sequence had only one GO term.

              Comment


              • #8
                Hi Apexy and rhinoceros, thank you for your information. Yes, Apexy is right in the sense that each sequences has more than one GO term and this make the process more complex. The annotation come from GO terms, motif (Interproscan) and enzyme code. All them came from the best first 10 hits from a blastX against de nr database from NCBI with a treshold of 10e-6.
                From 16000 sequences I got significant blast hits for 14000 sequences. Then for these sequences I performed the different annotation steps and I got around 10000 annotated. Now as you say, I want to cluster this 10000 sequences usig the information coming from the annotations. I tried DAVID and BABELOMICS but they have some limitations in the number of sequences they can run each time. I was wondering if it could be any program based on R or UNIX to that locally...

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  Yesterday, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 06:57 AM
                0 responses
                7 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 07:17 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Working...
                X