Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jkozubek
    Member
    • Mar 2011
    • 18

    RNA biotypes

    Hello:

    This is my first post.

    I am a genetics student at UConn. I have Illumina RNA-Seq data and my PI wants me to determine the RNA distribution (tRNA%, rRNA%, snoRNA%). I already determined the miRNA content since we used miRDeep software.

    I am thinking about Bowtie-ing our sequencing data to a reference genome, and then somehow checking it against this Ref Gene annotation file I got from UCSC. I am not sure how to interface the Bowtie output against the annotation file, however.

    I am thinking about just writing a Perl script and using a loop to do this.

    What do people think? Have others done this and does this seem like the right way to go about things?

    Jim
  • ppgardne
    Member
    • Oct 2010
    • 13

    #2
    Sounds reasonable to me. There are probably existing overlap scripts available. Eval might be worth trying.

    Comment

    • Jon_Keats
      Senior Member
      • Mar 2010
      • 279

      #3
      Could try Tophat (bowtie backend but allows spliced reads to align). Then either count reads in your regions of interest or run Cufflinks with a GTF and use the output FPKM values to get the proportions of each RNA species.

      Comment

      • jkozubek
        Member
        • Mar 2011
        • 18

        #4
        I am working on Tophat. Do you know if Tophat maps normal reads inside exons as well as mapping exon-exon junction reads?

        Then, looks like i will have to write a script to compare the Tophat output against a UCSC annotation file of RNA types (otherwise known as Ref Gene)

        Comment

        • Jon_Keats
          Senior Member
          • Mar 2010
          • 279

          #5
          Tophat will map both reads that sit inside exons exlusively and those that cross exon-exon junctions. If you provide a GTF file to Cufflinks, program that generates expression estimates from tophat output, you can easily estimate the abundance of each RNA biotype as the RNA types are encoded in the GTF file. If you look at my intro thread (http://seqanswers.com/forums/showthr...?t=4589&page=3 )you should see the workflow that will get you most of the way to your desired result.

          Comment

          • jkozubek
            Member
            • Mar 2011
            • 18

            #6
            I used parafin archived samples for sequencing and all of my RNA is in very tiny bits < 30 bp. Can you suggest an -r setting that would be appropriate for Tophat?

            Comment

            • Jon_Keats
              Senior Member
              • Mar 2010
              • 279

              #7
              Did you do paired-end sequencing? I'd assume not if you only have 30bp inserts in which case the -r option is not appropriate for single end reads. Moreover, with a 30bp read length tophat will most likely not provide you any benefit over just mapping the data with bowtie or bwa directly to genome as few exons will be smaller than 30bp, and the exon-junction finding function splits reads to 25 segments by default so you only have one segment anyways unless you set that value to 12-15bp at which point things will align everywhere.

              Comment

              • jkozubek
                Member
                • Mar 2011
                • 18

                #8
                Thats what i was thinking but it said i am required to input an -r value. i guess i can just set it to 0.

                Comment

                • Jon_Keats
                  Senior Member
                  • Mar 2010
                  • 279

                  #9
                  if these are single end reads (ie. Not Paired-end) the -r option is not needed it is only need for paired-end reads. I guess the manual should be more specific with "There is no default, and this parameter is required for paired end runs but not single end runs"

                  Versus the current

                  -r

                  This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  38 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  100 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  121 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  114 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...