Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • RockChalkJayhawk
    Senior Member
    • Mar 2009
    • 192

    RNA-Seq Analysis Challenge

    Dear SEQanswers Community,

    RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.

    I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.

    There should be several categories including:
    Transcript Assembly
    Transcript Quantitation
    Gene Quantitation
    and Differential Expression Testing

    Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.

    It would be great if we could get Industry to support some awards in these categories.

    There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.

    If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.

    Best,

    Steven Hart
    University of Kansas Medical Center
  • Richard Finney
    Senior Member
    • Feb 2009
    • 701

    #2
    Best is the enemy of good enough.

    Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

    To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.

    Comment

    • RockChalkJayhawk
      Senior Member
      • Mar 2009
      • 192

      #3
      Originally posted by Richard Finney View Post
      Best is the enemy of good enough.

      Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

      To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.
      Richard,
      Yes. It is possible to create a synthestic "truth dataset". However, I would love to apply these on real datasets, but as you pointed out, there is no way of knowing the truth. However, one can objectively assess performance metrics from known datasets.

      Of course, no technique is perfect. But it would be advantageous for use to gague how well our performances rate. How else will we know about or address thier weaknesses to make better programs? Obviously this is a huge problem that will take many of us to figure out, but we need to start somewhere if we ever want to move forward.

      Comment

      • steven
        Senior Member
        • Aug 2009
        • 269

        #4
        Originally posted by RockChalkJayhawk View Post
        Dear SEQanswers Community,

        RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.

        I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.

        There should be several categories including:
        Transcript Assembly
        Transcript Quantitation
        Gene Quantitation
        and Differential Expression Testing

        Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.

        It would be great if we could get Industry to support some awards in these categories.

        There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.

        If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.

        Best,

        Steven Hart
        University of Kansas Medical Center
        I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).

        Comment

        • RockChalkJayhawk
          Senior Member
          • Mar 2009
          • 192

          #5
          Originally posted by steven View Post
          I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).
          Thank you steven, I was not aware of this project!

          Comment

          • steven
            Senior Member
            • Aug 2009
            • 269

            #6
            You are welcome, Steven!

            Comment

            • RockChalkJayhawk
              Senior Member
              • Mar 2009
              • 192

              #7
              Any idea on what the initial results look like or when the data will be published?

              Comment

              • steven
                Senior Member
                • Aug 2009
                • 269

                #8
                I heard that two yet unpublished tools were exceptional:
                - GEM: an incredibly fast and accurate read aligner, from Paolo Ribeca.
                - The Flux Simulator/Flux Capacitor: an impressive RNA-seq analysis package for (alternative) transcript quantification, from Micha Sammeth.
                Disclaimer: both are friends of mine

                Comment

                • RockChalkJayhawk
                  Senior Member
                  • Mar 2009
                  • 192

                  #9
                  I have used FluxSimulator in the past. It is really great!

                  However, I am trying to find some performance metrics for each of these tools, much like the RGASP project you sent me is doing.

                  Unfortunately, most users are blindly using these tools because they do the "minimal that people need". Some like cufflinks/cuffdiff do so much extra stuff that they must be the best tools. I am more interested in finding out the strengths and weaknesses of each, rather than accepting the results through blind faith.

                  For example, using tophat and/or Cufflinks with or without a reference GTF yield different transcript builds. Moreover, the differential statistics in cuffdiff leave me confused (because they are so complex). I can get a lot of "differential expression" between biological replicates (as high as 30% of the genes), which shouldn't happen, and actually does not happen (at the gene-level) when I count the number of reads and use other programs like DESeq (no genes DE). However, there are (to my knowlege) no transcript-level quantification tools that report estimated read counts. Now with so many tools out there, it is a good idea to start to think about how we can gague the performance of each tool.

                  Again, this seems to be what the RGASP project is aiming for and I look forward to thier results.

                  Comment

                  • marcora
                    Member
                    • Jan 2010
                    • 52

                    #10
                    Has anybody been successful in generating a synthetic "truth dataset" for RNAseq. I am comparing cuffdiff to deseq and I am getting very different results. Which one should I pick? I can't answer this question until the dataset mentioned above is available!

                    Comment

                    • urchgene
                      Member
                      • Oct 2010
                      • 14

                      #11
                      Hi everyone.................I am trying to do paired end mapping using SHRiMP but it requires that i have both the + orientation and - orientation of these reads following each other simultaneously in same file. Unfortunately i have these reads in a form that this ---> direction is in one file and this <------ direction is in another file. Do you know any scripts i can use to dump these reads in same file but in this manner that both directions are following each other simultaneously? (just a newbie please)

                      Comment

                      • Jayu
                        Member
                        • Mar 2011
                        • 14

                        #12
                        Can anyone tell me the pipeline for RNAseq analysis?

                        Comment

                        • Apexy
                          Member
                          • Apr 2011
                          • 62

                          #13
                          Hi Jayu,
                          There is no exact pipeline or tool to do this and the strategy to take will depend on the availability of a reference. While to the avoid suggesting a particular one for you without having this information and without having tasted all that is available, I propose you read the following paper to have a feel of these approaches. Jeffrey A. Martin1 & Zhong Wang. Next-generation transcriptome assembly.Nature Reviews Genetics 12, 671-682 (October 2011) | doi:10.1038/nrg3068

                          Comment

                          • Jayu
                            Member
                            • Mar 2011
                            • 14

                            #14
                            Thank you but this paper is freely not available is their any other source or any other paper.

                            Comment

                            • Apexy
                              Member
                              • Apr 2011
                              • 62

                              #15
                              Hello,

                              Sorry about that. I think this one is free: www.genome.org/cgi/doi/10.1101/gr.131383.111
                              . I have no idea how to send the paper to you and i wonder if it is acceptable to do that here given that it is not free.

                              HTH

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...