What you can explore with non-coding RNA data

Novogene

Registered Vendor

Join Date: May 2024

Posts: 28
- Share
- Tweet
#1

What you can explore with non-coding RNA data

05-28-2024, 11:30 AM

Many of the non-coding RNAs (ncRNAs) produced by eukaryotic cells have been demonstrated to have crucial roles in biological processes. Long non-coding RNAs (lncRNAs), short non-coding RNAs (sRNAs), and circular non-coding RNA (circRNA) are the three different forms of ncRNAs. ncRNAs do not have the potential to be translated into proteins. They actively participate in the regulation, transcription, or post-transcriptional changes of gene expression. The presence, quality, and function of ncRNA in the gene expression of a biological sample at a specific time are revealed by RNA sequencing using next-generation sequencing technology. The ncRNAs also serve the role of reliable biomarkers for diagnoses and have functions in epigenetics.

The bioinformatics analysis of ncRNA sequencing involves the following steps.
Quality control

Mapping

Annotation

Quantitative analysis (expression)

Functional analysis

Qualitative analysis (characterization)

Fig 1. The bioinformatics analysis workflow of ncRNA sequencing

1. Quality control

Here, first, Raw data is generated and stored in FASTQ format (a text-based format for storing a nucleotide sequence). The FASTQ format has four different rows for “Sequence ID”, Read bases, separator, and quality score provider of the FASTQ format. Moving on to the data filtering step, which uses the fast program to introduce raw reads and produce clean reads. In the end, three things; error rate, base content, and the portion of raw reads transformed into clean reads, are attained.

2. Mapping

On the clean reads, mapping is performed using a program called Hierarchical Indexing for Spliced Alignment of Transcripts (HISAT2). The reference genome is indexed using a graph-based method, and the Bowtie2 algorithm is used for alignment. This method yields more accurate results with quick and sensitive alignment. Here, the output is a binary form of a SAM (Sequence Alignment Map) file called a BAM file. These BAM files may now be seen in the Integrative Genomics Viewer and compared to the reference genome to determine their differences.

3. Annotation

After getting BAM files, annotation is done through a software named StringTie. Annotation means identifying functional elements along the sequence of a genome. It uses a network for algorithm as well as an optional de novo assembly step to assemble steps into known or novel gene models based on known gene annotations. In this case, BAM files plus reference annotation files are introduced (input) and a GTF is obtained (output) through transcript annotation of the assembled and aligned reads. Now the assembled transcripts are merged to remove duplicate or redundant transcripts. After this, different filters (Exon number filter, transcript length filter, coding potential filter, etc.) are used to identify and predict the ncRNA types.

4. Quantitative analysis

The simplest approach to quantify the ncRNA and coding gene is to count the number of reads that map to each transcript. However, two factors need to be taken into consideration. First, the estimated expression level depends on the read counts and total reads sequenced for each sample and the second is that read counts also depend on total gene/transcript length. This means it is essential to perform a normalization step to make the data comparable between and within samples.

5. Functional analysis

In functional analysis, biological reference is assigned to a set of genes. It is determined whether there is the enrichment of any known biological functions, interactions, or pathways. So, a software called ClusterProfiler is used which implements methods to analyze and visualize functional profiles of genomic coordinates, gene and gene clusters and enrich the data. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are the most frequently used databases for functional analysis. Aside from data enrichment, ncRNA target prediction can also be done.

6. Qualitative analysis

In this step, variant discovery, and alternative spicing (AS) is done through software; GATK and rMATS respectively. GATK is used to identify where the aligned reads differ from the reference genome and write to a variant call format (VCF) file. BAM files are introduced and VCF files are obtained. On the other hand, rMATS is designed for detecting differential AS in replicated RNA-seq data.

How Novogene Can Help

Fig 2. Novogene’s non-coding RNA sequencing services

Novogene has accumulated extensive experience in non-coding RNA library preparation, sequencing, and bioinformatics analysis across numerous species. It can prepare rRNA removal libraries, with a sequencing strategy of PE150, keeping strand-specific directional library by default for lncRNA-seq and circRNA-seq. Optional features of Globin mRNA removal and exosome RNA are also provided. To avoid bacterial contamination, dual rRNA depletion strategy is adopted. For sRNA-seq, sRNAs removal and directional libraries are not needed due to their small sizes, and the SE50 strategy is adopted. Novogene also provides sequencing-only services for premade libraries as well. Novogene utilizes their deep scientific knowledge, first-class customer service, and unsurpassed data quality to help clients realize their research goals in the rapidly evolving world of genomics. To get in touch with Novogene and request more information or a quote, please go here.

Last edited by Novogene; 05-28-2024, 11:45 AM.
Tags: circrna seq, lncrna, ngs, non-coding rnas, small rna sequencing

Previous template Next

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, 07-25-2024, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin 07-25-2024, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 28 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 161 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

What you can explore with non-coding RNA data

Latest Articles

ad_right_rmr

News