We have just recently submitted our first RNA-Seq sample in lab and were directed to using galaxy to do our alignment to the bovine genome to obtain our transcript profile. I have little experience working with this data and am just trying to figure out which tool to use to get a basic summary of our read alignment to the reference genome. The goal is to have raw numbers on total sequenced fragments, uniquely mapped fragments, fragments mapped to annotated exons, fragments mapped to annotated genes (exons + introns), etc. Is there anyone that can point me in the right direction? Thank you in advance for the help, trying to get started in the RNA-Seq world.
Unconfigured Ad
Collapse
X
-
This tutorial is fantastic, but I'm not sure the tutorial answers the OP's question, which is more geared toward post-alignment analytics. I've often wondered what else I should look at besides running samtools flagstat or EstimateLibraryComplexity.jar (picard). (I'm not even really sure how to interpret the output from EstimateLibraryComplexity.)
Comment
-
-
I am definitely looking more for post-alignment information. I just want to know how to find out how many of my reads are mapping to exons vs introns. How many reads are aligning to annotated vs novel junctions. A general overview of how many reads are aligning to different things within the genome. This may be a very basic task but my lack of knowledge in working with the data is proving troublesome. I have looked at our data in the past and gotten it aligned and then done some cufflinks work with what transcripts are being expressed at various levels based on FPKM and such. However I need to backtrack and get these general statistics of what my reads are aligning to in the genome (exons, introns, annotated junctions, novel junctions, annotated exons, annotated genes, etc).
Comment
-
-
You can use intersectBed from BEDtools to check and count alignments overlapping various genomic features (anything described by GFF or BED) - you could set thresholds for overlap to reduce minimally overlapping reads... Not really feasible for splice junctions, however...
Comment
-
-
RNASeQC for post-alignment metrics
There is a GATK-based package called RNASeQC that provides a large number of post-alignment metrics along the lines of what you are looking for, including total, unique, duplicate reads, mapped reads and mapped unique reads, rRNA reads, strand specificity, GC bias, correlation to a reference sequence, and many coverage metrics.
It is available as a standalone piece of software and as a module on the GenePattern public server at http://genepattern.broadinstitute.org. More information is at https://confluence.broadinstitute.or...Tools/RNA-SeQC .
Michael
Comment
-
-
At CLC bio we have several tutorials for RNASeq analysis. They are a great way for a beginner to understand the various steps required, and you will also get detailed reports of your mappings. This will require that you download a trial version of the software, but it is free for at least two weeks, and you can also analyze your RNASeq samples. Let me know if you need more guidance.
Comment
-
-
I have found picard tools very useful for generating post alignment statistics
1. Get read map statistics
java -Xmx10000m -jar picard-tools-1.58/BamIndexStats.jar I= sorted.bam >sorted.stats
2. Get quality score of aligned read statistics
java -Xmx10000m -jar picard-tools-1.58/QualityScoreDistribution.jar I=sorted.bam O=qualstats CHART=qualstats.pdf
3. Get RNAseq mapped reads metrics
java -Xmx10000m -jar picard-tools-1.58/picard-tools-1.58/CollectRnaSeqMetrics.jar STRAND_SPECIFICITY=NONE REF_FLAT= annotation.refflat CHART_OUTPUT=graph.pdf INPUT= sorted.bam OUTPUT=RNA_seq.stats
Getting genomic reflat files for all genes of an organism can be tricky hence it can be build with this simple command
a) Download gtf annotation file for all genes of the organism
b) Convert gtf into refflat
gtfToGenePred -genePredExt annotation.gtf tmp
awk 'BEGIN{FS="\t"};{print $12"\t"$1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' tmp > annotation.refflat
The gtfToGenePred software is available at http://hgdownload.cse.ucsc.edu/admin/exe/
I hope this is helpful.
Comment
-
-
RNAseQC
RNAseQC seems to be a nice piece of software but I haven't seen much conversation about it on here yet.
So far, I have been able to get the read metrics on a single sample but have been unable to get the 'text-delimited description of samples and their bams' list into the correct format so that it can be recognized. Has anyone else run into this problem and/or know what the format of the .list file should look like?
Thanks so much, I'm a big fan of all the people on here who take the time to answer noob questions.
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
06-02-2026, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Today, 08:59 AM
|
0 responses
4 views
0 reactions
|
Last Post
by SEQadmin2
Today, 08:59 AM
|
||
|
Started by SEQadmin2, 06-02-2026, 12:03 PM
|
0 responses
21 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 12:03 PM
|
||
|
Started by SEQadmin2, 06-02-2026, 11:40 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
06-02-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
29 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
Comment