Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • lzu
    Junior Member
    • Nov 2013
    • 9

    How to extract assembled transcript sequence from RNA-seq data instead of ref genome?

    Hi All,

    I have a RNA-seq data from a 'subspeices' or 'variety' of grape. Grape genome is available.

    I want to get the transcript fasta file of this 'species variety' after I mapped the grape 'variety' RNA-seq reads to grape reference genome via 'TopHat and Cufflinks' pipeline.

    Cufflinks only produced output of 'transcript coordinate' file (positions of transcripts in the grape reference genome). But I need to extract the transcript assembly fasta sequence from this grape 'variety' RNA-seq data, not from the reference grape genome because there is a little bit evolutionary difference between my sample and the reference genome which I want to analyse later.

    So how do I extract transcript fasta file from RNA-seq data of my sample instead of the reference genome after I ran the 'TopHat and Cufflinks' pipeline?

    Thanks for your help and suggestion!

    lzu
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Cufflinks outputs a GTF assembly that annotates each of the loci that it calls. If you want a multi-fasta file of that, just use gtf_to_fasta (that probably comes with tophat, but if not you can google around for it).

    Comment

    • sphil
      Senior Member
      • Apr 2010
      • 192

      #3
      Originally posted by dpryan View Post
      Cufflinks outputs a GTF assembly that annotates each of the loci that it calls. If you want a multi-fasta file of that, just use gtf_to_fasta (that probably comes with tophat, but if not you can google around for it).
      be aware of where cufflinks gets the sequences from. maybe it uses the provided fasta and just extracts the sequecnes according to the coordinates given by the gtf. Which means, you end up with your 'non-variety' grape sequences.

      Comment

      • lzu
        Junior Member
        • Nov 2013
        • 9

        #4
        Then I will end up getting a fasta file of transcript sequence extracted from the reference genome which is not what I wanted. I want the transcript sequences from my sample (a grape 'variety').

        Comment

        • lzu
          Junior Member
          • Nov 2013
          • 9

          #5
          You are right, you've got my point. I still don't know how to extract sequences from the grape 'variety' RNA-seq data. Maybe it is hard, or should I assemble RNA-seq de novo by using Trinity?

          Comment

          • sphil
            Senior Member
            • Apr 2010
            • 192

            #6
            Originally posted by lzu View Post
            You are right, you've got my point. I still don't know how to extract sequences from the grape 'variety' RNA-seq data. Maybe it is hard, or should I assemble RNA-seq de novo by using Trinity?
            Yep, maybe that's the better way of doing it. Assemble the transcripts denovo an map those transcript to the reference genome.

            Comment

            • lzu
              Junior Member
              • Nov 2013
              • 9

              #7
              Originally posted by sphil View Post
              Yep, maybe that's the better way of doing it. Assemble the transcripts denovo an map those transcript to the reference genome.
              Do you know any paper(s) that "first denovo assemble RNA-seq, then map to ref genome"?

              Comment

              • sphil
                Senior Member
                • Apr 2010
                • 192

                #8
                sorry, can't find one from the top of my head but the 'normal' mapping procedure after de novo assembly of transcripts should do the job pretty well. Just account for your diversity of strains when you choose the mapping parameters. Use loose mapping criteria after your assembly and it should be fine. If, however, this doesn't give you the desired results, what I normally do is to BLAST the transcripts against an in-house database. This is even looser than what most of the mappers allow . Also, if the transcripts are becoming too long this should be the way to go.

                Hope that helps:


                FWIW: see below some papers for assembly and mapping which might be helpful anyways.

                There you go:
                Garber et al.
                Trinity used to assembly transcripts
                Oases assembler

                Comment

                • lzu
                  Junior Member
                  • Nov 2013
                  • 9

                  #9
                  Originally posted by sphil View Post
                  sorry, can't find one from the top of my head but the 'normal' mapping procedure after de novo assembly of transcripts should do the job pretty well. Just account for your diversity of strains when you choose the mapping parameters. Use loose mapping criteria after your assembly and it should be fine. If, however, this doesn't give you the desired results, what I normally do is to BLAST the transcripts against an in-house database. This is even looser than what most of the mappers allow . Also, if the transcripts are becoming too long this should be the way to go.

                  Hope that helps:


                  FWIW: see below some papers for assembly and mapping which might be helpful anyways.

                  There you go:
                  Garber et al.
                  Trinity used to assembly transcripts
                  Oases assembler
                  ----
                  Thanks for the suggestion. I read some papers which use model reference genome to predict alternative splicing diversity of subspecies or species 'variety' with RNA-seq data. There might be errors in results if some exons or introns are truly physically lost in those subspecies/variety genome due to the genetic diversity among different groups/populations...

                  Comment

                  • Jeremy
                    Senior Member
                    • Nov 2009
                    • 190

                    #10
                    It would probably not be too difficult to get a list of variants between your sample and the reference, then convert the reference genome to the variant bases and then use gtf to fasta to get the variant transcripts. I have done something similar in R using the seqinr package.

                    Comment

                    • sindrle
                      Senior Member
                      • Aug 2013
                      • 266

                      #11
                      Lets say you have called indels and SNPs with GATK. Would that work, or can you please share some more details?

                      I have never done this before.

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        Yesterday, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Yesterday, 12:03 PM
                      0 responses
                      19 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, Yesterday, 11:40 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-26-2026, 10:12 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...