Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA biotypes

    Hello:

    This is my first post.

    I am a genetics student at UConn. I have Illumina RNA-Seq data and my PI wants me to determine the RNA distribution (tRNA%, rRNA%, snoRNA%). I already determined the miRNA content since we used miRDeep software.

    I am thinking about Bowtie-ing our sequencing data to a reference genome, and then somehow checking it against this Ref Gene annotation file I got from UCSC. I am not sure how to interface the Bowtie output against the annotation file, however.

    I am thinking about just writing a Perl script and using a loop to do this.

    What do people think? Have others done this and does this seem like the right way to go about things?

    Jim

  • #2
    Sounds reasonable to me. There are probably existing overlap scripts available. Eval might be worth trying.

    Comment


    • #3
      Could try Tophat (bowtie backend but allows spliced reads to align). Then either count reads in your regions of interest or run Cufflinks with a GTF and use the output FPKM values to get the proportions of each RNA species.

      Comment


      • #4
        I am working on Tophat. Do you know if Tophat maps normal reads inside exons as well as mapping exon-exon junction reads?

        Then, looks like i will have to write a script to compare the Tophat output against a UCSC annotation file of RNA types (otherwise known as Ref Gene)

        Comment


        • #5
          Tophat will map both reads that sit inside exons exlusively and those that cross exon-exon junctions. If you provide a GTF file to Cufflinks, program that generates expression estimates from tophat output, you can easily estimate the abundance of each RNA biotype as the RNA types are encoded in the GTF file. If you look at my intro thread (http://seqanswers.com/forums/showthr...?t=4589&page=3 )you should see the workflow that will get you most of the way to your desired result.

          Comment


          • #6
            I used parafin archived samples for sequencing and all of my RNA is in very tiny bits < 30 bp. Can you suggest an -r setting that would be appropriate for Tophat?

            Comment


            • #7
              Did you do paired-end sequencing? I'd assume not if you only have 30bp inserts in which case the -r option is not appropriate for single end reads. Moreover, with a 30bp read length tophat will most likely not provide you any benefit over just mapping the data with bowtie or bwa directly to genome as few exons will be smaller than 30bp, and the exon-junction finding function splits reads to 25 segments by default so you only have one segment anyways unless you set that value to 12-15bp at which point things will align everywhere.

              Comment


              • #8
                Thats what i was thinking but it said i am required to input an -r value. i guess i can just set it to 0.

                Comment


                • #9
                  if these are single end reads (ie. Not Paired-end) the -r option is not needed it is only need for paired-end reads. I guess the manual should be more specific with "There is no default, and this parameter is required for paired end runs but not single end runs"

                  Versus the current

                  -r

                  This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 08:47 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  54 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X