Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ccard28
    Member
    • Jan 2012
    • 20

    New to RNA-Seq: Help obtaining sequencing summary needed.

    We have just recently submitted our first RNA-Seq sample in lab and were directed to using galaxy to do our alignment to the bovine genome to obtain our transcript profile. I have little experience working with this data and am just trying to figure out which tool to use to get a basic summary of our read alignment to the reference genome. The goal is to have raw numbers on total sequenced fragments, uniquely mapped fragments, fragments mapped to annotated exons, fragments mapped to annotated genes (exons + introns), etc. Is there anyone that can point me in the right direction? Thank you in advance for the help, trying to get started in the RNA-Seq world.
  • TonyBrooks
    Senior Member
    • Jun 2009
    • 303

    #2
    Galaxy is a community-driven web-based analysis platform for life science research.


    There is a tutorial on this.

    Comment

    • turnersd
      Senior Member
      • May 2011
      • 115

      #3
      This tutorial is fantastic, but I'm not sure the tutorial answers the OP's question, which is more geared toward post-alignment analytics. I've often wondered what else I should look at besides running samtools flagstat or EstimateLibraryComplexity.jar (picard). (I'm not even really sure how to interpret the output from EstimateLibraryComplexity.)

      Comment

      • ccard28
        Member
        • Jan 2012
        • 20

        #4
        I am definitely looking more for post-alignment information. I just want to know how to find out how many of my reads are mapping to exons vs introns. How many reads are aligning to annotated vs novel junctions. A general overview of how many reads are aligning to different things within the genome. This may be a very basic task but my lack of knowledge in working with the data is proving troublesome. I have looked at our data in the past and gotten it aligned and then done some cufflinks work with what transcripts are being expressed at various levels based on FPKM and such. However I need to backtrack and get these general statistics of what my reads are aligning to in the genome (exons, introns, annotated junctions, novel junctions, annotated exons, annotated genes, etc).

        Comment

        • kopi-o
          Senior Member
          • Feb 2008
          • 319

          #5
          ever-seq is pretty good for this kind of thing

          Comment

          • arvid
            Senior Member
            • Jul 2011
            • 156

            #6
            You can use intersectBed from BEDtools to check and count alignments overlapping various genomic features (anything described by GFF or BED) - you could set thresholds for overlap to reduce minimally overlapping reads... Not really feasible for splice junctions, however...

            Comment

            • mmreich
              Junior Member
              • Jan 2010
              • 1

              #7
              RNASeQC for post-alignment metrics

              There is a GATK-based package called RNASeQC that provides a large number of post-alignment metrics along the lines of what you are looking for, including total, unique, duplicate reads, mapped reads and mapped unique reads, rRNA reads, strand specificity, GC bias, correlation to a reference sequence, and many coverage metrics.

              It is available as a standalone piece of software and as a module on the GenePattern public server at http://genepattern.broadinstitute.org. More information is at https://confluence.broadinstitute.or...Tools/RNA-SeQC .

              Michael

              Comment

              • aparna
                Member
                • Feb 2009
                • 17

                #8
                Hi Michael,
                Have you used RNAseQC? I am getting an error - wondering if you know how to fix it.

                Thanks,

                Comment

                • Nomijill
                  Member
                  • Sep 2009
                  • 25

                  #9
                  At CLC bio we have several tutorials for RNASeq analysis. They are a great way for a beginner to understand the various steps required, and you will also get detailed reports of your mappings. This will require that you download a trial version of the software, but it is free for at least two weeks, and you can also analyze your RNASeq samples. Let me know if you need more guidance.

                  Comment

                  • swaraj
                    Member
                    • Feb 2012
                    • 50

                    #10
                    I have found picard tools very useful for generating post alignment statistics

                    1. Get read map statistics
                    java -Xmx10000m -jar picard-tools-1.58/BamIndexStats.jar I= sorted.bam >sorted.stats

                    2. Get quality score of aligned read statistics
                    java -Xmx10000m -jar picard-tools-1.58/QualityScoreDistribution.jar I=sorted.bam O=qualstats CHART=qualstats.pdf

                    3. Get RNAseq mapped reads metrics
                    java -Xmx10000m -jar picard-tools-1.58/picard-tools-1.58/CollectRnaSeqMetrics.jar STRAND_SPECIFICITY=NONE REF_FLAT= annotation.refflat CHART_OUTPUT=graph.pdf INPUT= sorted.bam OUTPUT=RNA_seq.stats

                    Getting genomic reflat files for all genes of an organism can be tricky hence it can be build with this simple command

                    a) Download gtf annotation file for all genes of the organism
                    b) Convert gtf into refflat

                    gtfToGenePred -genePredExt annotation.gtf tmp

                    awk 'BEGIN{FS="\t"};{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' tmp > annotation.refflat

                    The gtfToGenePred software is available at http://hgdownload.cse.ucsc.edu/admin/exe/

                    I hope this is helpful.

                    Comment

                    • October
                      Junior Member
                      • May 2012
                      • 2

                      #11
                      RNAseQC

                      RNAseQC seems to be a nice piece of software but I haven't seen much conversation about it on here yet.

                      So far, I have been able to get the read metrics on a single sample but have been unable to get the 'text-delimited description of samples and their bams' list into the correct format so that it can be recognized. Has anyone else run into this problem and/or know what the format of the .list file should look like?

                      Thanks so much, I'm a big fan of all the people on here who take the time to answer noob questions.

                      Comment

                      • aparna
                        Member
                        • Feb 2009
                        • 17

                        #12
                        This is how it should look like:

                        Sample ID Bam File Notes
                        Y903GFAZ Y903GFAZ.bam Y903GFAZ

                        Aparna

                        Comment

                        • October
                          Junior Member
                          • May 2012
                          • 2

                          #13
                          Thank you Aparna, that format works perfectly!

                          Have you been able to get the coverage output? I am running the newest version (v1.1.6) but have been unable to generate anything other than the read metric files.

                          Comment

                          Latest Articles

                          Collapse

                          • SEQadmin2
                            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                            by SEQadmin2


                            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                            ...
                            06-02-2026, 10:05 AM
                          • SEQadmin2
                            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                            by SEQadmin2


                            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                            Introduction

                            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                            05-22-2026, 06:42 AM
                          • SEQadmin2
                            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                            by SEQadmin2

                            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                            05-06-2026, 09:04 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Today, 08:59 AM
                          0 responses
                          4 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          21 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 11:40 AM
                          0 responses
                          14 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 05-28-2026, 11:40 AM
                          0 responses
                          29 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...