Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • differential expression analysis in non-model species - best practice?

    Hi All,

    I've trawled the forums but have not found a complete discussion around this question: For RNA-seq DE analysis in non-model species, where a de novo transcriptome is the only mapping reference available, what's the most legitimate approach for DE testing? Transcript-level or 'gene'-level analysis?

    This is my understanding: In most cases, the non-model species community uses Trinity pipelines to assemble a reference transcriptome de novo (typically from the same reads used for downstream DE analysis), using RSEM for alignment-based abundance estimation to generate the counts tables for downstream DE analysis in whatever software you choose. Obviously, the success of DE analysis hinges on the accuracy of the count data used as input.
    There's a choice of using counts for Trinity transcripts (i.e., contigs in the de novo assembly theoretically equivalent to isoforms) (RSEM.isoforms.results), or at the level of Trinity 'components', which are a proxy for genes (RSEM.genes.results). (Compared to mapping against a genome, there are obvious inaccuracies with assembling genes and isoforms de novo, but its what we have).

    Obviously, a transcript-level analysis is preferred biologically but tricky in practice.
    *I'm aware that transcript-level analysis in popular edgeR and DESeq2 violates key assumptions of these programs. Many people go ahead anyway, and publish such results.
    *DEXseq is recommended for exon-level analysis, but appears to require mapping to a genome.
    *Alternatively, the 'gene'-level counts from RSEM can be used in e.g. DESeq2, although this brings its own issues because the Trinity components are only a proxy for gene level studies. Is this nevertheless the most legitimate approach for counts derived from de novo transcriptome mapping??
    *I've recently read of the alignment-free k-mer based approach of kallisto, with downstream DE analysis in sleuth, suitable at the transcript level. Is this new approach perhaps the best yet for non-model species??

    Like most, I'm relatively new to RNA-seq and am not a biostatistician. I realise there are issues with all of the above options, but I'm hoping some of the program developers and those with statistical minds can share some advice on what might be the most legitimate approach for non-model species.

    Many thanks.

  • #2
    Differential expression analysis at the gene level is always more reliable, regardless of the organism.

    More often than not, there is no reliable method of determining to which isoform a read belongs to when isoforms overlap. Less importantly, the counts are lower for the individual isoforms than for the genes.

    I like computing the coefficient of variation between replicates for isoforms vs genes to illustrate the tremendous gap in reliability in the results.

    Given the biological relevance of determining the differential expression at the isoform level, researchers will often request the results at the isoform level, but will end up using the analysis at the gene level, after seeing the unreliability of the results at the isoform level. There may be individual cases, where the differential expression analysis at the isoform level will give clear results, but this is generally not the case, especially at locations with many overlapping isoforms, or a low coverage.

    Comment


    • #3
      Kallisto can do transcript-level differential expression using a de-novo assembled transcriptome. It takes into account similarities in transcript sequences when doing counting, and has a stupidly fast bootstrapping mode for calculating a confidence interval for isoform proportions.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Exploring the Dynamics of the Tumor Microenvironment
        by seqadmin




        The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
        07-08-2024, 03:19 PM
      • seqadmin
        Exploring Human Diversity Through Large-Scale Omics
        by seqadmin


        In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
        06-25-2024, 06:43 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 07-19-2024, 07:20 AM
      0 responses
      35 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-16-2024, 05:49 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-15-2024, 06:53 AM
      0 responses
      56 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 07-10-2024, 07:30 AM
      0 responses
      43 views
      0 likes
      Last Post seqadmin  
      Working...
      X