Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to evaluate MIRA assembly

    Could anybody help me to evaluate my transcriptome?
    I have now MIRA and Cap3 assembled tick (Amblyomma americanum) transcriptome, which was sequenced by 454 pyrosequencing.
    I would like to evaluate my transcriptome by comparing gold-standard such as genome. Unfortunately, there is no genome sequence for Amblyomma americanum; however, there is transcriptomes from NCBI. Do you think I can evaluate my transcriptome by comparing with previous transcriptomes? Or Could you give me any idea to evaluate my transcriptome and how to do it?
    Thank you very much in advance.

  • #2
    Hi,



    this might provide some ideas.

    Comment


    • #3
      There is a now a paper out from the authors of the presentation above: Assessing De Novo transcriptome assembly metrics for consistency and utility.

      Our lab will shortly publish software to analyse quality of de-novo assembled transcriptomes using some of these metrics and some we have developed - it should be on Github by the end of September.

      Comment


      • #4
        Hi Blahah,

        Thanks for the link to the paper!
        I am looking forward to the release of your program. Please keep us updated.

        Comment


        • #5
          I ran across QUAST recently. <strike>It may be helpful in this case</strike>: http://bioinf.spbau.ru/quast

          Edit: I will leave the link in here if people find this thread through a search. Not useful for transcriptomes as pointed out below by Blahah404.
          Last edited by GenoMax; 09-10-2013, 10:32 AM.

          Comment


          • #6
            QUAST is for genomes - basically none of those metrics are going to be meaningful for transcriptomes.

            Comment


            • #7
              Thanks all,

              I found one way to evaluate my transcriptome using core eukaryotic genes (CEGs).
              Here is http://korflab.ucdavis.edu/datasets/..._completeness/ .
              If anyone is interested in it, go above address.

              Comment


              • #8
                CEGs are a nice dataset - but bear in mind that they are expected in the *genome*, not necessarily the transcriptome. If they aren't expressed in your tissue of interest, they won't be present in your reads. At most CEGs give you ~450 genes, which makes it not a very sensitive metric. I recommend using a reciprocal best blast against the closest relative that does have a sequences genome as a similar, but more sensitive metric.

                Comment


                • #9
                  Thanks Blahah404,

                  I am now thinking use of close relative genome (which is Ixodes scaluparis 'Blacklegged tick') to evaluate my transcriptome. You mentioned reciprocal blast. Do you mean using Tophat or Cufflink? Actually, I am a beginner for bioinformatics so I have no many idea about analysis. Could you give me more details?

                  Comment


                  • #10
                    Originally posted by kp5091 View Post
                    I am now thinking use of close relative genome (which is Ixodes scaluparis 'Blacklegged tick') to evaluate my transcriptome.
                    This is a good idea.

                    You mentioned reciprocal blast. Do you mean using Tophat or Cufflink?
                    No, not Tophat or Cufflinks. BLAST is a tool that aligns a query sequence against a database of reference sequences and scores the alignments so you can see which sequence in the database the query is most similar to. Read more on Wikipedia and try it out on the NCBI webserver.

                    Reciprocal best blast refers to a strategy where you have two datasets, A and B. You BLAST A against B (i.e. B is the database), and then blast B against A (A is the database). You can be more confident that two sequences are homologous if the best hits are reciprocal than if the hit only goes one way. So for example, if the best scoring hit for gene 1 in dataset A is gene 50 in dataset B, and the best scoring hit for gene 50 in dataset B is gene 1 in dataset A, that's a reciprocal best blast hit. If the best scoring git for gene 50 was another gene in dataset A, that wouldn't be a reciprocal best blast hit.

                    By using reciprocal best blast for a de-novo assembled transcriptome against a related genome, you can count up all the reciprocal best blast hits and use that as a metric.

                    This and many more metrics are built into some software I'm working on. I'll try and put up an alpha version this weekend so you can try it if you're interested.

                    Comment


                    • #11
                      Blahah404,

                      Thanks for your prompt reply.

                      I am now understanding what is reciprocal best blast. I have a question about reciprocal best blast. After reciprocal best blast, I believe I can get bunch of data, then how I can handle all these data. Is your software working on now for that? If yes, I would be happy to try it.

                      Comment


                      • #12
                        Originally posted by kp5091 View Post
                        After reciprocal best blast, I believe I can get bunch of data, then how I can handle all these data. Is your software working on now for that? If yes, I would be happy to try it.
                        Most people write their own scripts... but yes, my software will handle that. I'll drop a message in here when I've got it online.

                        If you're going to be working in bioinformatics you might want to think about learning a programming language so you can manipulate data easily. Ruby, R, Python are all good.

                        Comment


                        • #13
                          Blahah404,

                          Actually, I am a biologist. So I have no idea about computer language.

                          Comment


                          • #14
                            I'm a biologist too - data analysis is important though right? It's not too hard to learn a bit of programming. See for example the CodeAcademy Ruby course (easy and free).

                            Comment


                            • #15
                              Right,
                              Data analysis is such important. I will try to adapt a new world of computer language such as Python(?) even though all my fingers are thumb on computer.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X