Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JueFish
    Member
    • May 2010
    • 42

    Mapping Human RNA Seq: Transcriptome vs. Genome

    Would anyone out there like to share their opinions about the relative merits and pitfalls of using the human transcriptome vs. the human genome as a reference for mapping some Solid RNA-Seq runs? I am guessing that this probably comes down to questions about the relative quality of the transcriptome sequence vs. the genome sequence (in other words, how complete is the transcriptome build relative to the genome build) and the relative role of splice-prediction algorithms (e.g. tophat) and their effects on read mapping. Any thoughts out there? To be honest, I don't know a whole lot on how "complete" the human transcriptome is supposed to be (# of tissues, life stages, etc.). I'm just looking for what would be the "best" way to do this. I could run both, but thought I'd start with first principles and go from there as these bam files are huge and a pain to store.

    Thanks
  • Derek-C
    Junior Member
    • Nov 2012
    • 7

    #2
    Sorry to bump an old question, but I'm also wondering about this at the moment and I can't seem to find an answer anywhere.

    What are the merits of using the human transcriptome vs human genome for RNA-Seq mapping?

    Comment

    • kopi-o
      Senior Member
      • Feb 2008
      • 319

      #3
      Transcriptome:
      + better specificity, easier to resolve isoforms, need less seq depth (probably)
      - restricted to known transcripts

      Genome:
      + can find new things
      - need to sequence more to do accurate isoform assignment, will miss more known splice junctions

      Comment

      • NGSfan
        Senior Member
        • Apr 2009
        • 181

        #4
        Originally posted by Derek-C View Post
        Sorry to bump an old question, but I'm also wondering about this at the moment and I can't seem to find an answer anywhere.

        What are the merits of using the human transcriptome vs human genome for RNA-Seq mapping?

        I am of the opinion that it is better to align to the genome. With STAR it can be done very quickly.

        The question is, do you believe the transcriptome annotation is really complete? We know from the ENCODE project that something like 80% of the genome is transcribed. If you only align reads to the transcriptome, you could be forcing some reads to align to known transcripts, some of which could have been better placed on an unannotated region of the genome, thus reducing ambiguity.

        Keep in mind that hardly any genome is really complete... in fact, you should align not only to the chromosomes, but to all available random contigs and "decoy" sequences. So if genomes are never really complete - how can we expect the transcriptome to be anything close to complete?

        The only advantage to transcriptome alignment is speed and memory savings... but I think with STAR this is not so much an issue anymore.

        Comment

        • NGSfan
          Senior Member
          • Apr 2009
          • 181

          #5
          Originally posted by kopi-o View Post
          Transcriptome:
          + better specificity, easier to resolve isoforms, need less seq depth (probably)
          - restricted to known transcripts

          Genome:
          + can find new things
          - need to sequence more to do accurate isoform assignment, will miss more known splice junctions
          If you input a GTF file into STAR you can have it index the known splice junctions for you...

          Comment

          • timydaley
            Member
            • Jun 2010
            • 26

            #6
            One problem about mapping to the transcriptome is that you can mistake transcription of paralogous genes, see Schrider et al.'s PLoS One paper critiquing Cheung's Science paper on RNA editing. Since ~70% of the human genome is transcribed, you may miss a lot of information mapping to the transcriptome.

            Comment

            • rskr
              Senior Member
              • Oct 2010
              • 249

              #7
              Not to bump an old thread, but it seems maybe still an open question. I think cufflinks for example can use both the transcriptome annotation and the genome to resolve certain problems with pseudogenes and homologous genes, which seems like should be a better approach, I am partial to mapping to the transcriptome at least for differential expression. It seems like a different question "Is there evidence for a transcript that hasn't been seen before?", furthermore these questions can be verified with lab work. There is also a theory that the transcripts should be able to be assembled before mapping, which should remove most of the dominant allele bias, though I don't think the assemblers are quite upto it yet.

              Comment

              • sudhan
                Junior Member
                • Jan 2018
                • 1

                #8
                SO finally, it is good or bad to use transcriptome references for differential gene expression study?

                Comment

                • rskr
                  Senior Member
                  • Oct 2010
                  • 249

                  #9
                  I think now you can do both at the same time. HISAT2 builds suffix indexes with annotations built in, so whichever mapping best explains the data are chosen.

                  Comment

                  • sdriscoll
                    I like code
                    • Sep 2009
                    • 436

                    #10
                    I kinda take issue with both approaches. With alignment to genome I always miss some alignments because aligning RNA-Seq to the genome is relatively difficult. STAR misses some alignments that GSNAP picks up and, on occasion, even bowtie2 picks up alignments STAR misses (not spliced ones, of course). Furthermore when I take reads that failed to align to the genome and map them directly to the transcriptome many of those reads align. And this is true even within low error rates. If I go the other way - map to the transcriptome first - I run some risk of mapping reads to genes that would be more ambiguously mapped to the genome. I have no idea how much of a problem that is in part because I'm not confident in any aligner's ability to find all possible alignments of a read to the genome. With some data I may map to the genome first and throw out reads with MAPQ==0 and then take the remaining aligned and unaligned reads to map to the transcriptome. In the end the transcriptome probabilistic methods (RSEM, eXpress, Kallisto, Salmon) have been shown to produce more accurate gene expression than genome approaches (cufflinks, stringtie, etc). The necessity for accurate expression to detect accurate differential expression is up for debate. I'd guess it's not as big of a deal. However when it comes to publication we like to report TPM expressions for genes since it's the closest thing to a standard that we have in RNA-Seq and in order to get accurate TPM you have to use some type of probabilistic isoform level expression estimation and it's the direct to transcriptome methods that seem to work the best.
                    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                    Salk Institute for Biological Studies, La Jolla, CA, USA */

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Pathogen Surveillance with Advanced Genomic Tools
                      by seqadmin




                      The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                      03-24-2025, 11:48 AM
                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    49 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    57 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    49 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    200 views
                    0 reactions
                    Last Post seqadmin  
                    Working...