Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting reads mapped from transcriptome back to genome

    HI - I'm trying to mimic the pipeline of a paper that mapped RNA-Seq reads to the transcriptome, then converted the mapped coordinates to their genomic coordinates.

    Does anyone have an easy way of performing this? I emailed the author but never got a response.

  • #2
    What do you mean by mapped coordinates? I feel that they are the same thing...
    SpliceMap: De novo detection of splice junctions from RNA-seq
    Download SpliceMap Comment here

    Comment


    • #3
      I have a FASTA file of transcripts. If the read maps to a transcript, I need to convert the coordinates on the transcript to coordinates on the genome. This shouldn't be too hard as long as I have the name and location of the transcript and where the reads maps to on the transcript.

      I can determine the genomic coordinates based on the annotation of the transcript. I was hoping someone already had a program to do this.

      Comment


      • #4
        I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?

        Comment


        • #5
          If you are into BioPerl there is a module, Bio::Coordinate::GeneMapper, which is designed to do transformations between coordinate systems like this.

          Caveats:
          - The documentation for this module is sparse.
          - The module appears to contain a couple of bugs.
          - You really have to grok the BioPerl object model.

          Comment


          • #6
            If you're using Ensembl transcripts, I think Ensembl somewhere stores the set of exons that go into making up each transcript, with corresponding genomic coordinates for exons, so you can probably just write a program to match the numbers there for every transcript.

            Otherwise, you can always do your own alignment with a cDNA alignment program like sim4 or splign

            Comment


            • #7
              Originally posted by Jon_Keats View Post
              I'm working on the same issue, guess you are talking about the Berger et al. paper? Setting each read to the genome seems relatively easy with a relational database for the reads that map to an exon, so creating a modified SAM file with genome coordinates is relatively easy. But if this is all you want you can just align to genome. So the ____ issue is the spit reads that cross exon-exon boundaries. How to split them and then how not to double count these split reads if you use it for an expression estimate?
              What is the title of this paper? This is a very intersting methodology of mapping the reads to the "transcriptome" and I am wondering why they need to convert back to the genome?

              Comment


              • #8
                @thinkRNA- Papers is "Integrative analysis of the melanoma transcriptome". I've emailed Mike Berger 3 times w/ no response. I'm a bit annoyed.

                I'll probably just write my own perl script to do the conversion.

                Comment


                • #9
                  I'm not sure I would trust a transcriptome file, since the inaccuracies in the transcriptome annotation will propagate. The bioinformatics currently available cannot give a perfect transcriptome annotation, and the bias introduced by imperfect annotations may skew your experimental results.

                  If you have any capability to do the junction mapping and alternative splicing analysis yourself (i.e., mapping to the genome, not the transcriptome), I would go that route. If that's not an option, be sure your analysis includes a discussion of how the results are skewed by the inaccuracies of the transcriptome annotation.

                  Comment


                  • #10
                    Hi golharam! Have you had any success in solving your question, i.e. mapping transcript alignments back to genome coordinates?

                    Comment


                    • #11
                      I never managed to reproduce the results in the paper. But I do see translocations in other NGS datasets. I used BWA to map the reads to the ENTIRE genome.

                      After some discussion here, I'm not convinced mapping to just the known transcriptome is the best approach as novel transcripts may be missed.

                      As far as mapping transcript coordinates to genomic coordinates, I wrote a Perl script that uses BioPerl to do this.

                      Comment


                      • #12
                        Want to share your script? : ) I'm about to write the same thing. Maybe.

                        Comment


                        • #13
                          I think it is a good approach. There are fewer pseudo genes in the transcriptome, so the alignments are more accurate. Not to mention that splice boundaries, are iffy at best with short reads.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-27-2024, 06:37 PM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-27-2024, 06:07 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          69 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X