Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to obtain full length RNA transcript sequence

    Hi everyone,
    i'm new to this kind of tasks so, please be patient!
    I'm trying to create a blast DB using the RNAseq data from ENCODE.
    I've downloaded both the FASTQ reads and the .bam/bai files.
    I need the fasta sequences of all the full length transcripts: is it possible to extract/obtain them from the BAM file?
    Alternatively should i try to do a de novo assembly using Trinity?
    Thanx a lot.
    Regards,

    Davide

  • #2
    I thought this task would have been easy or at least possible since i have the reads aligned to the ref genome (homo sapiens)
    Anyone can help?
    Thanx

    Comment


    • #3
      Which BAM files are you talking about? ENCODE has many.

      Why do you want to make your BLAST data base from RNA-Seq reads rather than simply from, say, the cDNA FASTA file from Ensembl?

      Comment


      • #4
        Originally posted by Simon Anders View Post
        Which BAM files are you talking about? ENCODE has many.

        Why do you want to make your BLAST data base from RNA-Seq reads rather than simply from, say, the cDNA FASTA file from Ensembl?
        I'm talking about BAM file of the human total RNA extract from CSHL Long RNA seq.

        I don't use Ensembl data because cDNA FASTA from Ensembl does not contain all the transcript (i guess) but only "known, novel and pseudogenes" as stated on their website

        Moreover i will probably repeat this task using RNAseq data from cell in particular conditions

        Thanx a lot.

        Davide

        Comment


        • #5
          What you want to do is called reference-based (as opposed to: de-novo) transcript assembly. A tool commonly used for this purpose is cufflinks:

          Roberts, Pimentel, Trapnell, and Pachter:
          Identification of novel transcripts in annotated genomes using RNA-Seq
          Bioinformatics (2011) 27 (17): 2325-2329.
          doi:10.1093/bioinformatics/btr355

          However, before doing this yourself, you may want to check whether the ENCODE people have not already done this analysis. It seems obvious that they would do this.

          I still wonder what you would need a database of all transcripts for. Instead of blasting against it, you can always blast against the genome.

          Comment


          • #6
            Originally posted by Simon Anders View Post
            What you want to do is called reference-based (as opposed to: de-novo) transcript assembly. A tool commonly used for this purpose is cufflinks:

            Roberts, Pimentel, Trapnell, and Pachter:
            Identification of novel transcripts in annotated genomes using RNA-Seq
            Bioinformatics (2011) 27 (17): 2325-2329.
            doi:10.1093/bioinformatics/btr355

            However, before doing this yourself, you may want to check whether the ENCODE people have not already done this analysis. It seems obvious that they would do this.

            I still wonder what you would need a database of all transcripts for. Instead of blasting against it, you can always blast against the genome.
            Thank you.
            I've taken a look at cufflinks, just the manual, but i did not find th FASTA file of the transcript as an output file of some task. Cufflinks instead talk about gtf file as an output (that does not contain the FASTA sequence of the transcript). I'll take a better look to the program.
            I've also read just yesterday this interesting article:
            "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks" nature protocol

            If i blast in the genome i lose informations that are in the RNA sequence and not in genome (ex. sequences in the transposable element that are not integrated in the genome...)

            Thanx again.

            Comment


            • #7
              You use the GTF file to produce the cDNA FASTA file from the reference FASTA file. This is a simple exercise in script programming.

              Comment


              • #8
                I've taken a look at GTF specs.
                Yes, it is.
                Thanx

                Comment


                • #9
                  Hi i saw the thread..
                  Can i get the logic for the program to create transcripts from genome file.
                  how it differ based on orientation. i mean reads which have positive and negative orientation.
                  Thank you.
                  Deepak

                  Comment


                  • #10
                    Refer to my earlier post to get fasta from Cufflinks GTF.
                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      Yesterday, 08:07 AM
                    • seqadmin
                      Recent Developments in Metagenomics
                      by seqadmin





                      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                      09-23-2024, 06:35 AM
                    • seqadmin
                      Understanding Genetic Influence on Infectious Disease
                      by seqadmin




                      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                      09-09-2024, 10:59 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 10-02-2024, 04:51 AM
                    0 responses
                    92 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-01-2024, 07:10 AM
                    0 responses
                    100 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-30-2024, 08:33 AM
                    1 response
                    102 views
                    0 likes
                    Last Post EmiTom
                    by EmiTom
                     
                    Started by seqadmin, 09-26-2024, 12:57 PM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X