Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering annotated sequences based on their GO terms

    Dear all,

    I have a set of 10000 sequences from an RNAseq experiment annotated with GO terms, I would like to cluster the sequences in biological meaningful groups using the GO terms information for each sequence. Is there any software to do that?

    Pau

    Thank you!

  • #2
    Hi,
    Out of curiosity, are these (10000) sequences already clusetered (de novo clustering) or do they represent only sequences with GO from your assembly?

    I had writen something down exactly as this one in my to dos list. I'd bookmarked this page to explored in future. I have not used it but it might help:


    The grouping algorithm is based on the hypothesis that similar annotations should have similar gene members.

    HTH

    Comment


    • #3
      Hi Apexy,

      No I have not yest clustered de sequences. I have just annotated de sequences obtained from the assembly using Blast2go. Now I would like to go un step further and cluster de sequences based on their GO terms in order to obtain groups of genes involved in similar function.
      Thanks, I have already had a look at DAVID website. I think it could be a good option, but the web only accepts 3000 sequences each time and I would like to cluster all the sequences and the same time....I will keep on searching for alternative websites.

      Thank you for your answer!!

      Pau

      Comment


      • #4
        You could probably write some small bash script. What kind of separators are you using in your headers? Which field is GO? Are there line-breaks in your sequences?
        savetherhino.org

        Comment


        • #5
          Hi,
          Its better as you have not done any clustering on them before annotation since de novo clustering sometimes assigns different transcripts from paralogous gene into the same clusters and for species with extensive gene duplications, it can be a potential nightmare. Are these functional labels from annotation transfer with BLAST or with INTERPRO or both in Blast2go? Can I also know what fraction these sequences (10,000) represent the entire assembly and what database was Blast2go set to if you used BLAST?

          @rhinoceros, a cluster should be defined by the degree of overlap in GOs shared by sequences. This will certainly introduce a new challenge as to what threshold of GOs required to put sequences in one cluster. Do you mean using cat, cut,sort and grep in a loop to write a clustering algorithm?

          Thanks,

          Comment


          • #6
            Originally posted by Apexy View Post
            Hi,
            @rhinoceros, a cluster should be defined by the degree of overlap in GOs shared by sequences. This will certainly introduce a new challenge as to what threshold of GOs required to put sequences in one cluster. Do you mean using cat, cut,sort and grep in a loop to write a clustering algorithm?
            I thought the aim was to sort sequences so that in file Z there would be all the sequences that had GO X in their header. It's not really clustering at all but sorting. But anyway, maybe I misunderstood OP.
            Last edited by rhinoceros; 04-29-2013, 02:55 AM.
            savetherhino.org

            Comment


            • #7
              Originally posted by rhinoceros View Post
              I thought the aim was to sort sequences so that in file Z there would be all the sequences that had GO X in their header. It's not really clustering at all but sorting. But anyway, maybe I misunderstood OP.
              This would have been an appealing solution if each sequence had only one GO term.

              Comment


              • #8
                Hi Apexy and rhinoceros, thank you for your information. Yes, Apexy is right in the sense that each sequences has more than one GO term and this make the process more complex. The annotation come from GO terms, motif (Interproscan) and enzyme code. All them came from the best first 10 hits from a blastX against de nr database from NCBI with a treshold of 10e-6.
                From 16000 sequences I got significant blast hits for 14000 sequences. Then for these sequences I performed the different annotation steps and I got around 10000 annotated. Now as you say, I want to cluster this 10000 sequences usig the information coming from the annotations. I tried DAVID and BABELOMICS but they have some limitations in the number of sequences they can run each time. I was wondering if it could be any program based on R or UNIX to that locally...

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Addressing Off-Target Effects in CRISPR Technologies
                  by seqadmin






                  The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                  08-27-2024, 04:44 AM
                • seqadmin
                  Selecting and Optimizing mRNA Library Preparations
                  by seqadmin



                  Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                  08-07-2024, 12:11 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 08-27-2024, 04:40 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 08-22-2024, 05:00 AM
                0 responses
                293 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 08-21-2024, 10:49 AM
                0 responses
                135 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 08-19-2024, 05:12 AM
                0 responses
                124 views
                0 likes
                Last Post seqadmin  
                Working...
                X