Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • What you can explore with non-coding RNA data

    Many of the non-coding RNAs (ncRNAs) produced by eukaryotic cells have been demonstrated to have crucial roles in biological processes. Long non-coding RNAs (lncRNAs), short non-coding RNAs (sRNAs), and circular non-coding RNA (circRNA) are the three different forms of ncRNAs. ncRNAs do not have the potential to be translated into proteins. They actively participate in the regulation, transcription, or post-transcriptional changes of gene expression. The presence, quality, and function of ncRNA in the gene expression of a biological sample at a specific time are revealed by RNA sequencing using next-generation sequencing technology. The ncRNAs also serve the role of reliable biomarkers for diagnoses and have functions in epigenetics.

    The bioinformatics analysis of ncRNA sequencing involves the following steps.
    • Quality control
    • Mapping
    • Annotation
    • Quantitative analysis (expression)
    • Functional analysis
    • Qualitative analysis (characterization)
    Click image for larger version  Name:	Fig 1.png Views:	1 Size:	358.2 KB ID:	325759
    Fig 1. The bioinformatics analysis workflow of ncRNA sequencing

    1. Quality control

    Here, first, Raw data is generated and stored in FASTQ format (a text-based format for storing a nucleotide sequence). The FASTQ format has four different rows for “Sequence ID”, Read bases, separator, and quality score provider of the FASTQ format. Moving on to the data filtering step, which uses the fast program to introduce raw reads and produce clean reads. In the end, three things; error rate, base content, and the portion of raw reads transformed into clean reads, are attained.

    2. Mapping

    On the clean reads, mapping is performed using a program called Hierarchical Indexing for Spliced Alignment of Transcripts (HISAT2). The reference genome is indexed using a graph-based method, and the Bowtie2 algorithm is used for alignment. This method yields more accurate results with quick and sensitive alignment. Here, the output is a binary form of a SAM (Sequence Alignment Map) file called a BAM file. These BAM files may now be seen in the Integrative Genomics Viewer and compared to the reference genome to determine their differences.

    3. Annotation

    After getting BAM files, annotation is done through a software named StringTie. Annotation means identifying functional elements along the sequence of a genome. It uses a network for algorithm as well as an optional de novo assembly step to assemble steps into known or novel gene models based on known gene annotations. In this case, BAM files plus reference annotation files are introduced (input) and a GTF is obtained (output) through transcript annotation of the assembled and aligned reads. Now the assembled transcripts are merged to remove duplicate or redundant transcripts. After this, different filters (Exon number filter, transcript length filter, coding potential filter, etc.) are used to identify and predict the ncRNA types.

    4. Quantitative analysis

    The simplest approach to quantify the ncRNA and coding gene is to count the number of reads that map to each transcript. However, two factors need to be taken into consideration. First, the estimated expression level depends on the read counts and total reads sequenced for each sample and the second is that read counts also depend on total gene/transcript length. This means it is essential to perform a normalization step to make the data comparable between and within samples.

    5. Functional analysis

    In functional analysis, biological reference is assigned to a set of genes. It is determined whether there is the enrichment of any known biological functions, interactions, or pathways. So, a software called ClusterProfiler is used which implements methods to analyze and visualize functional profiles of genomic coordinates, gene and gene clusters and enrich the data. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are the most frequently used databases for functional analysis. Aside from data enrichment, ncRNA target prediction can also be done.

    6. Qualitative analysis

    In this step, variant discovery, and alternative spicing (AS) is done through software; GATK and rMATS respectively. GATK is used to identify where the aligned reads differ from the reference genome and write to a variant call format (VCF) file. BAM files are introduced and VCF files are obtained. On the other hand, rMATS is designed for detecting differential AS in replicated RNA-seq data.

    How Novogene Can Help
    Click image for larger version  Name:	Fig 2.png Views:	1 Size:	283.7 KB ID:	325760
    Fig 2. Novogene’s non-coding RNA sequencing services

    Novogene has accumulated extensive experience in non-coding RNA library preparation, sequencing, and bioinformatics analysis across numerous species. It can prepare rRNA removal libraries, with a sequencing strategy of PE150, keeping strand-specific directional library by default for lncRNA-seq and circRNA-seq. Optional features of Globin mRNA removal and exosome RNA are also provided. To avoid bacterial contamination, dual rRNA depletion strategy is adopted. For sRNA-seq, sRNAs removal and directional libraries are not needed due to their small sizes, and the SE50 strategy is adopted. Novogene also provides sequencing-only services for premade libraries as well. Novogene utilizes their deep scientific knowledge, first-class customer service, and unsurpassed data quality to help clients realize their research goals in the rapidly evolving world of genomics. To get in touch with Novogene and request more information or a quote, please go here.
    Last edited by Novogene; 05-28-2024, 11:45 AM.

Latest Articles


  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin

    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin

    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM





Topics Statistics Last Post
Started by seqadmin, Yesterday, 07:24 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 06-13-2024, 08:58 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 06-12-2024, 02:20 PM
0 responses
Last Post seqadmin  
Started by seqadmin, 06-07-2024, 06:58 AM
0 responses
Last Post seqadmin