Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • golharam
    Member
    • Dec 2009
    • 55

    Converting reads mapped from transcriptome back to genome

    HI - I'm trying to mimic the pipeline of a paper that mapped RNA-Seq reads to the transcriptome, then converted the mapped coordinates to their genomic coordinates.

    Does anyone have an easy way of performing this? I emailed the author but never got a response.
  • john_mu
    Member
    • May 2010
    • 88

    #2
    What do you mean by mapped coordinates? I feel that they are the same thing...
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment

    • golharam
      Member
      • Dec 2009
      • 55

      #3
      I have a FASTA file of transcripts. If the read maps to a transcript, I need to convert the coordinates on the transcript to coordinates on the genome. This shouldn't be too hard as long as I have the name and location of the transcript and where the reads maps to on the transcript.

      I can determine the genomic coordinates based on the annotation of the transcript. I was hoping someone already had a program to do this.

      Comment

      • Jon_Keats
        Senior Member
        • Mar 2010
        • 279

        #4
        I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?

        Comment

        • kmcarr
          Senior Member
          • May 2008
          • 1181

          #5
          If you are into BioPerl there is a module, Bio::Coordinate::GeneMapper, which is designed to do transformations between coordinate systems like this.

          Caveats:
          - The documentation for this module is sparse.
          - The module appears to contain a couple of bugs.
          - You really have to grok the BioPerl object model.

          Comment

          • xinchen
            Junior Member
            • May 2010
            • 6

            #6
            If you're using Ensembl transcripts, I think Ensembl somewhere stores the set of exons that go into making up each transcript, with corresponding genomic coordinates for exons, so you can probably just write a program to match the numbers there for every transcript.

            Otherwise, you can always do your own alignment with a cDNA alignment program like sim4 or splign

            Comment

            • thinkRNA
              Member
              • Jan 2010
              • 94

              #7
              Originally posted by Jon_Keats View Post
              I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?
              What is the title of this paper? This is a very intersting methodology of mapping the reads to the "transcriptome" and I am wondering why they need to convert back to the genome?

              Comment

              • golharam
                Member
                • Dec 2009
                • 55

                #8
                @thinkRNA- Papers is "Integrative analysis of the melanoma transcriptome". I've emailed Mike Berger 3 times w/ no response. I'm a bit annoyed.

                I'll probably just write my own perl script to do the conversion.

                Comment

                • mrawlins
                  Member
                  • Apr 2010
                  • 63

                  #9
                  I'm not sure I would trust a transcriptome file, since the inaccuracies in the transcriptome annotation will propagate. The bioinformatics currently available cannot give a perfect transcriptome annotation, and the bias introduced by imperfect annotations may skew your experimental results.

                  If you have any capability to do the junction mapping and alternative splicing analysis yourself (i.e., mapping to the genome, not the transcriptome), I would go that route. If that's not an option, be sure your analysis includes a discussion of how the results are skewed by the inaccuracies of the transcriptome annotation.

                  Comment

                  • genomicist
                    Member
                    • Jan 2011
                    • 12

                    #10
                    Hi golharam! Have you had any success in solving your question, i.e. mapping transcript alignments back to genome coordinates?

                    Comment

                    • golharam
                      Member
                      • Dec 2009
                      • 55

                      #11
                      I never managed to reproduce the results in the paper. But I do see translocations in other NGS datasets. I used BWA to map the reads to the ENTIRE genome.

                      After some discussion here, I'm not convinced mapping to just the known transcriptome is the best approach as novel transcripts may be missed.

                      As far as mapping transcript coordinates to genomic coordinates, I wrote a Perl script that uses BioPerl to do this.

                      Comment

                      • mgogol
                        Senior Member
                        • Mar 2008
                        • 197

                        #12
                        Want to share your script? : ) I'm about to write the same thing. Maybe.

                        Comment

                        • rskr
                          Senior Member
                          • Oct 2010
                          • 249

                          #13
                          I think it is a good approach. There are fewer pseudo genes in the transcriptome, so the alignments are more accurate. Not to mention that splice boundaries, are iffy at best with short reads.

                          Comment

                          Latest Articles

                          Collapse

                          • GATTACAT
                            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                            by GATTACAT
                            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                            07-01-2026, 11:43 AM
                          • SEQadmin2
                            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                            by SEQadmin2


                            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                            Here are nine questions we think about, in roughly the order they matter, before...
                            06-18-2026, 07:11 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, 07-02-2026, 11:08 AM
                          0 responses
                          18 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-30-2026, 05:37 AM
                          0 responses
                          19 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-26-2026, 11:10 AM
                          0 responses
                          21 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-17-2026, 06:09 AM
                          0 responses
                          54 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...