Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Rachel
    Junior Member
    • Dec 2008
    • 9

    RNA-seq assembly

    Hi

    I have a RNA-seq (Illumina platform) data without a reference sequence. Then the only option I have is to do a de novo assembly. Followed by gene prediction or mega blast to identified the content of my mRNA.

    However if the gene content is unknown. May I know if there is any software available to identified the unknown genes, or any pipeline that I can used.

    Say you have hypothetical proteins how am I going to determine that it is a hypothetical protein and what does it functions (any softwares).

    Thanks
  • schmima
    Member
    • Apr 2010
    • 56

    #2
    To annotate an assembly, http://www.blast2go.org/ may help you.

    Comment

    • Rachel
      Junior Member
      • Dec 2008
      • 9

      #3
      Thanks and appreciate for your reply.

      If I am not mistaken the blast2go are able to annotate the available genes from the database. If the genes or hypothetical proteins is not available in the database. Then what should I need to do to predict a new or novel gene? Thanks

      Comment

      • schmima
        Member
        • Apr 2010
        • 56

        #4
        just to make sure that I got it right - you have an assembled transcriptome and you want to annotate it (?). I guess that for this you will always have to rely on other databases. I don't know about anything that would be able to tell you ab initio what kind of sequence would produce what kind of protein.

        In other words: you have to rely on existing knowledge. However - there's quite a lot around. Example blast2go does more or less the following:
        1. uses blast (in case of transcripts blastx) to search for similar transcripts which are at least somewhere somehow described (some may have experimental evidence, other are only based on predictions). In this step you will not only find the ones that are identical to known transcripts. It will also find cases where you have some similarity.
        2. Annotation then via GO, InterProScan, KEGG etc. (InterPro runs - I think - only on the ones which have a GO annotation - did not finish it due to the rather slow processing )
        3. Some Statistics

        Using blast2go you will be able to annotate quite some of your transcripts. Nonetheless - you will definitely have others which are not similar to any of the known ones (to be exact - they may be similar to a certain extent - but less than you specified by the threshold you chose for blastx).

        Now - if I got it right, you would like to do something with the remaining - unannotated transcript (?). Hm - I'm not really an expert for this. But I guess that "gene prediction" is not really what you need (as this programs are rather annotating a genome sequence - with the help of the transcripts you provide from your assembly - but as you don't have a genome sequence...). Well - there may be some programs which check transcripts directly - would be nice to know if you find something.

        An other possibility would be to search for protein domains (InterProScan etc - but this time on the sequences which were left out by blast2go). However - as fas as I know, you need to have protein sequences to do so. Means you need to translate your transcript into proteins (if not strand specific: six proteins - three frames from each strand). Just keep in mind:
        1. the domainscanners are again based on "similarity to known things"
        2. translating transcripts into proteins can be quite errorprone (imagine you had some intronic reads (eg either unspliced pre-mRNA or antisense transcripts): they will be incorporated into your transcript and during in silico translation it will mix up your protein sequence quite badly)


        In summary:
        I don't know about a "good" way of dealing with unknown transcripts which are not similar to anything that is known [well there are some - but not on the computer you would have to go to the bench ]

        Comment

        • Rachel
          Junior Member
          • Dec 2008
          • 9

          #5
          Hi

          Really appreciate for detailed out my questions ^_^ That is exactly what I want to know > how to deal with the unknown transcripts.

          Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...

          Share with me if there is any additional info ^_^ Have a nice day ahead ya

          Comment

          • schmima
            Member
            • Apr 2010
            • 56

            #6
            was a pleasure

            Well I have not done anything on the project yet. But I would kind of assuming if I have something different from the known database then what should I do...
            I guess if it is totally different you'll have a hard time. Well - in principle you could translate into protein and do some crazy stuff maybe via the structure... but I think this is everything else than easy...

            well - if you just have few of them (or could filter based on whatever criteria down to few):
            1. back to the lab try to get/confirm the transcript (means: clone and sequence it the old way)
            2. still in the lab - use other methods to characterize it...
            3. some years later: either , , , , , or ...



            have a nice day - and in case you found a solution, let me know

            all the best

            Comment

            • Rachel
              Junior Member
              • Dec 2008
              • 9

              #7
              WOW seems to be very challenging and a lot of stuff to be done if that happens!!!
              Will see what else I can do with it....

              Anyway, much appreciate for the sharing... THANKS!

              Comment

              • eskirton
                Junior Member
                • Dec 2009
                • 1

                #8
                try hmmscan vs pfam

                maybe try a blast-based annotation first (as recommended above) and with your remaining (and low-confidence) transcripts, try a more sensitive hmm based annotation.

                first identify the coding regions and translate (e.g. using prodigal or similar), and run hmmscan vs pfam. novel proteins will likely have conserved domains, so even if they don't have "full-length" hits to known proteins, the domains themselves are informative.

                Comment

                • schmima
                  Member
                  • Apr 2010
                  • 56

                  #9
                  By the way - beside the protein-similarity searches via blastx and domainscanners (forgot to note that blast2go is only trying to annotate protein coding transcripts - as GOs are only associated with proteins) I would also search for similarities on the nucleotide level (normal blast/blat - don't know about any software that is wrapping everything - if anyone knows - would be interesting) - I believe you will be able to annotate some of the ones that were not having any protein(-domain) similarity (some of them could also be rather intersting in biological meaning).

                  All the best (writing at the phone is tricky - sry for mistakes...)

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 07-02-2026, 11:08 AM
                  0 responses
                  12 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  20 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  54 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...