Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • differential expression analysis in non-model species - best practice?

    Hi All,

    I've trawled the forums but have not found a complete discussion around this question: For RNA-seq DE analysis in non-model species, where a de novo transcriptome is the only mapping reference available, what's the most legitimate approach for DE testing? Transcript-level or 'gene'-level analysis?

    This is my understanding: In most cases, the non-model species community uses Trinity pipelines to assemble a reference transcriptome de novo (typically from the same reads used for downstream DE analysis), using RSEM for alignment-based abundance estimation to generate the counts tables for downstream DE analysis in whatever software you choose. Obviously, the success of DE analysis hinges on the accuracy of the count data used as input.
    There's a choice of using counts for Trinity transcripts (i.e., contigs in the de novo assembly theoretically equivalent to isoforms) (RSEM.isoforms.results), or at the level of Trinity 'components', which are a proxy for genes (RSEM.genes.results). (Compared to mapping against a genome, there are obvious inaccuracies with assembling genes and isoforms de novo, but its what we have).

    Obviously, a transcript-level analysis is preferred biologically but tricky in practice.
    *I'm aware that transcript-level analysis in popular edgeR and DESeq2 violates key assumptions of these programs. Many people go ahead anyway, and publish such results.
    *DEXseq is recommended for exon-level analysis, but appears to require mapping to a genome.
    *Alternatively, the 'gene'-level counts from RSEM can be used in e.g. DESeq2, although this brings its own issues because the Trinity components are only a proxy for gene level studies. Is this nevertheless the most legitimate approach for counts derived from de novo transcriptome mapping??
    *I've recently read of the alignment-free k-mer based approach of kallisto, with downstream DE analysis in sleuth, suitable at the transcript level. Is this new approach perhaps the best yet for non-model species??

    Like most, I'm relatively new to RNA-seq and am not a biostatistician. I realise there are issues with all of the above options, but I'm hoping some of the program developers and those with statistical minds can share some advice on what might be the most legitimate approach for non-model species.

    Many thanks.

  • #2
    Differential expression analysis at the gene level is always more reliable, regardless of the organism.

    More often than not, there is no reliable method of determining to which isoform a read belongs to when isoforms overlap. Less importantly, the counts are lower for the individual isoforms than for the genes.

    I like computing the coefficient of variation between replicates for isoforms vs genes to illustrate the tremendous gap in reliability in the results.

    Given the biological relevance of determining the differential expression at the isoform level, researchers will often request the results at the isoform level, but will end up using the analysis at the gene level, after seeing the unreliability of the results at the isoform level. There may be individual cases, where the differential expression analysis at the isoform level will give clear results, but this is generally not the case, especially at locations with many overlapping isoforms, or a low coverage.

    Comment


    • #3
      Kallisto can do transcript-level differential expression using a de-novo assembled transcriptome. It takes into account similarities in transcript sequences when doing counting, and has a stupidly fast bootstrapping mode for calculating a confidence interval for isoform proportions.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Understanding Genetic Influence on Infectious Disease
        by seqadmin




        During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

        Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
        09-09-2024, 10:59 AM
      • seqadmin
        Addressing Off-Target Effects in CRISPR Technologies
        by seqadmin






        The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
        08-27-2024, 04:44 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 06:25 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 01:02 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 09-18-2024, 06:39 AM
      0 responses
      14 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 09-11-2024, 02:44 PM
      0 responses
      14 views
      0 likes
      Last Post seqadmin  
      Working...
      X