Header Leaderboard Ad


From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3



No announcement yet.

  • From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3

    Click image for larger version  Name:	Analysis Image6.jpg Views:	0 Size:	673.9 KB ID:	324256

    This is part three of our Q&A article series with our expert sequencing analysis providers. We’re asking them important questions to learn how they handle different aspects of the analysis process.

    In this segment of our series, we ask our participants about the complex processes of transcript quantification and differential expression analysis.

    Review the first installment to see how they handle the quality control process, or the second installment to see how they evaluate assemblies and alignments.

    How do you perform transcript quantification and also differential expression analysis?

    Simon Valentine, Chief Commercial Officer, Basepair

    The most commonly used workflow on the Basepair platform provides QC, alignment, and quantification of RNA-seq data. Reads are first aligned to the genome with STAR and then featureCounts is used to measure gene expression at both the gene- and transcript-level. We can go from raw reads to RPKM values in about 15 minutes for a typical human dataset, with the ability to process hundreds of samples in parallel. Individual samples can then be combined in downstream differential expression analyses using DESeq2 and we also provide workflows for measuring de novo transcript isoform abundance and differential expression using cufflinks and cuffdiff.

    QIAGEN Digital Insights Team
    ​​The tools available in QIAGEN CLC Genomics Workbench Premium are compatible both with the Expression Tracks created by the RNA-Seq Analysis tool and the tables created by the miRNA quantification tool. Two tools are available in the Workbench for calculating differential expressions. The Differential Expression in Two Groups tool performs a statistical differential expression test for a set of Expression Tracks and a set of control tracks. The Differential Expression for RNA-Seq tool performs a statistical differential expression test for a set of Expression Tracks with associated metadata. Both tools use multi-factorial statistics based on a negative binomial Generalized Linear Model (GLM).

    Regarding the number of replicates, the Differential Expression for RNA-Seq tool is capable of running without replicates, but this is not recommended and the results should be treated with caution. You want to have as many biological replicates as possible - typically at least 3. Replication is important because it allows the ‘within group’ variation to be accurately estimated for a gene. Without replication, the Differential Expression for RNA-Seq tool assumes that genes with similar average expression levels have similar variability.

    When considering technical vs. biological replicates, Auer and Doerge, 2010 illustrate the importance of biological replicates with the example of an alien visiting Earth. The alien wishes to know if men are taller than women. It abducts one man and one woman and measures their heights several times (i.e., performs several technical replicates). However, without biological replicates, the alien would erroneously conclude that women are taller than men if this was the case in the two abducted individuals.

    The use of the GLM formalism allows us to fit curves to expression values without assuming that the error on the values is normally distributed. Similarly to edgeR and DESeq2, we assume that the read counts follow a Negative Binomial distribution, as explained in McCarthy et al., 2012. The Negative Binomial distribution can be understood as a ‘Gamma-Poisson’ mixture distribution, i.e., the distribution resulting from a mixture of Poisson distributions, where the Poisson parameter λ is itself Gamma-distributed. In an RNA-Seq context, this Gamma distribution is controlled by the dispersion parameter, such that the Negative Binomial distribution reduces to a Poisson distribution when the dispersion is zero.

    Mike Lelivelt, VP of Software Product Management and Marketing, Illumina

    DRAGEN offers a complete secondary analysis solution for RNA-seq data. DRAGEN RNA pipeline leverages a graph capable mapper and Multigenome (graph) Reference that improve mapping accuracy, especially in difficult-to-map regions. The transcript expression quantification by DRAGEN shows high concordance with widely adopted algorithms, e.g., Salmon and Kallisto, with significant runtime improvement, thanks to hardware (FPGA) acceleration. The individual transcript quantifications can then be aggregated into sample groups for differential expression analysis by the DESeq2 module. Fusion gene detection and RNA small variant calling can also be optionally enabled, allowing users to maximize their findings in their RNA-seq data.

    DRAGEN comes with lossless ORA Compression, which reduces RNA-seq FASTQ data files by >70%. This allows users to easily manage their datasets and reduce data storage costs, especially for high-throughput users.
    DRAGEN RNA pipeline meets customer’s needs where they need it - onboard selected Illumina sequencers (NextSeqTM 1000 and 2000, NovaSeqTM X series) and Illumina cloud platforms (BaseSpaceTM Sequence Hub, Illumina Connected Analytics). The onboard or cloud analysis can be configured at sequencing run planning, enabling seamless and ultra-efficient end-to-end workflows.

    Richard Moir, Director of Product and Technology, Geneious

    Geneious offers several options for mapping or assembly depending on the data used, including SPAdes, STAR and our own Geneious algorithm. Geneious calculates raw counts along with basic stats such as TPM while accounting for reads that are mapped to multiple locations​.

    Differential gene expression can then be analyzed using DESeq2 without the need for configuring a Python environment. Geneious downloads and configures all the necessary files for you. Results are presented in multiple integrated views that can be used in concert for identifying genes of interest. These include an interactive table view, volcano plots and PCA plots as well as heatmap coloring of genes in the interactive genome viewer.

    MGI (Complete Genomics)
    ​ Dr. Ni Ming, Senior Vice-President, MGI​

    a) Our strategy in MegaBOLT is to help clients with analyses which require high computing time and resources. The pipeline that we are using is heavily based on GATK best practices and is a well-recognized publicly available software. The output of our RNA-seq pipeline is the standard gene expression matrix for each sample. Other downstream analyses for the gene expression are also included, i.e. heatmap, cluster analysis, etc.

    b) In terms of the differential expression analysis, there are many software available, and this highly depends on the customer’s project design, i.e. selection of control/treatment pairs, grouping of different samples, etc. The customer can use the standards expression matrix as input for most of the tools, i.e. DEGseq1, DEseq2, edgeR3, EBseq4, NOIseq5 and PossionDis6.

    1. Wang L, Feng Z, Wang X, Wang X, Zhang X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics. 2010;26(1):136-138. doi:10.1093/bioinformatics/btp612.
    2. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi:10.1186/s13059-014-0550-8.
    3. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139-140. doi:10.1093/bioinformatics/btp616.
    4. Leng N, Dawson JA, Thomson JA, et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013;29(8):1035-1043. doi:10.1093/bioinformatics/btt087.
    5. Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in RNA-seq: a matter of depth. Genome Res. 2011;21(12):2213-2223. doi:10.1101/gr.124321.111.
    6. Audic S, Claverie JM. The significance of digital gene expression profiles. Genome Res. 1997;7(10):986-995. doi:10.1101/gr.7.10.986.

    Follow up with the fourth, fifth, and sixth (final) installment of our Q&A series.
      Please sign into your account to post comments.

    Latest Articles