Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-Seq Analysis Challenge

    Dear SEQanswers Community,

    RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.

    I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.

    There should be several categories including:
    Transcript Assembly
    Transcript Quantitation
    Gene Quantitation
    and Differential Expression Testing

    Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.

    It would be great if we could get Industry to support some awards in these categories.

    There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.

    If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.

    Best,

    Steven Hart
    University of Kansas Medical Center

  • #2
    Best is the enemy of good enough.

    Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

    To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.

    Comment


    • #3
      Originally posted by Richard Finney View Post
      Best is the enemy of good enough.

      Blat or homebrew_model or bowtietuxedocuffwhatever probably does the minimal that people need.

      To make definitive statements about which is better, you need to compare the results to truth. Can you come with a "truth set" to judge against? I think a synthetic set of input reads for the big test might have flaws.
      Richard,
      Yes. It is possible to create a synthestic "truth dataset". However, I would love to apply these on real datasets, but as you pointed out, there is no way of knowing the truth. However, one can objectively assess performance metrics from known datasets.

      Of course, no technique is perfect. But it would be advantageous for use to gague how well our performances rate. How else will we know about or address thier weaknesses to make better programs? Obviously this is a huge problem that will take many of us to figure out, but we need to start somewhere if we ever want to move forward.

      Comment


      • #4
        Originally posted by RockChalkJayhawk View Post
        Dear SEQanswers Community,

        RNA-Seq is arguably, the most complex next-gen data analysis we face. Unlike genome-based sequencing, RNA-seq yields many different dimensions of data. Tools and algorithms are quickly being released in the literature, and at times it can be difficult to keep up with, although most of the packages are related to genome-based sequencing.

        I would like to put together a challenge to the bioinformatics community for the most accurate method for mRNA-Seq analysis, modeled off what many of us participated in for the SEQanswers ChIP-Seq Challenge.

        There should be several categories including:
        Transcript Assembly
        Transcript Quantitation
        Gene Quantitation
        and Differential Expression Testing

        Since each pipeline will use the same dataset, it will be possible to compare sensitivity, accuracy, precision, FDR, etc.

        It would be great if we could get Industry to support some awards in these categories.

        There are of course several difficulties associated with this Challenge, including the design specifics of the challenge. As of now, I am thinking of setting up simulated human datasets (50 Million 2 x 36 or 2 x 75bp PE reads), but would like input from others on what they feel is most important in assessing analysis quality.

        If you are interested in participating in this project, or have ideas/opinions on how to best design this challenge, please respond in this forum.

        Best,

        Steven Hart
        University of Kansas Medical Center
        I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).

        Comment


        • #5
          Originally posted by steven View Post
          I think that this already exists. Or at least a similar challenge: look for RGASP (RNAseq Genome Annotation Assessment Project).
          Thank you steven, I was not aware of this project!

          Comment


          • #6
            You are welcome, Steven!

            Comment


            • #7
              Any idea on what the initial results look like or when the data will be published?

              Comment


              • #8
                I heard that two yet unpublished tools were exceptional:
                - GEM: an incredibly fast and accurate read aligner, from Paolo Ribeca.
                - The Flux Simulator/Flux Capacitor: an impressive RNA-seq analysis package for (alternative) transcript quantification, from Micha Sammeth.
                Disclaimer: both are friends of mine

                Comment


                • #9
                  I have used FluxSimulator in the past. It is really great!

                  However, I am trying to find some performance metrics for each of these tools, much like the RGASP project you sent me is doing.

                  Unfortunately, most users are blindly using these tools because they do the "minimal that people need". Some like cufflinks/cuffdiff do so much extra stuff that they must be the best tools. I am more interested in finding out the strengths and weaknesses of each, rather than accepting the results through blind faith.

                  For example, using tophat and/or Cufflinks with or without a reference GTF yield different transcript builds. Moreover, the differential statistics in cuffdiff leave me confused (because they are so complex). I can get a lot of "differential expression" between biological replicates (as high as 30% of the genes), which shouldn't happen, and actually does not happen (at the gene-level) when I count the number of reads and use other programs like DESeq (no genes DE). However, there are (to my knowlege) no transcript-level quantification tools that report estimated read counts. Now with so many tools out there, it is a good idea to start to think about how we can gague the performance of each tool.

                  Again, this seems to be what the RGASP project is aiming for and I look forward to thier results.

                  Comment


                  • #10
                    Has anybody been successful in generating a synthetic "truth dataset" for RNAseq. I am comparing cuffdiff to deseq and I am getting very different results. Which one should I pick? I can't answer this question until the dataset mentioned above is available!

                    Comment


                    • #11
                      Hi everyone.................I am trying to do paired end mapping using SHRiMP but it requires that i have both the + orientation and - orientation of these reads following each other simultaneously in same file. Unfortunately i have these reads in a form that this ---> direction is in one file and this <------ direction is in another file. Do you know any scripts i can use to dump these reads in same file but in this manner that both directions are following each other simultaneously? (just a newbie please)

                      Comment


                      • #12
                        Can anyone tell me the pipeline for RNAseq analysis?

                        Comment


                        • #13
                          Hi Jayu,
                          There is no exact pipeline or tool to do this and the strategy to take will depend on the availability of a reference. While to the avoid suggesting a particular one for you without having this information and without having tasted all that is available, I propose you read the following paper to have a feel of these approaches. Jeffrey A. Martin1 & Zhong Wang. Next-generation transcriptome assembly.Nature Reviews Genetics 12, 671-682 (October 2011) | doi:10.1038/nrg3068

                          Comment


                          • #14
                            Thank you but this paper is freely not available is their any other source or any other paper.

                            Comment


                            • #15
                              Hello,

                              Sorry about that. I think this one is free: www.genome.org/cgi/doi/10.1101/gr.131383.111
                              . I have no idea how to send the paper to you and i wonder if it is acceptable to do that here given that it is not free.

                              HTH

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 07-19-2024, 07:20 AM
                              0 responses
                              40 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-16-2024, 05:49 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-15-2024, 06:53 AM
                              0 responses
                              63 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              43 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X