Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ezattara
    Junior Member
    • Apr 2013
    • 4

    Comparative expression analysis across two species

    Hello everyone!

    I have the following situation/design that I could use some help figuring out the best way to analyze:

    I have two closely related species (non-model, no reference genome) that differ in their response to injury, and I want to compare gene expression along a 4-point time series and across both species. The basic experiment is to sample total RNA for RNAseq at four timepoints at each of the two species: a time t0 (baseline) and 3 consecutive time points t1 to t3.

    My conceptual pipeline so far is as follows:
    -Sequence total RNA using Illumina HiSeq 100PE
    -Pool and clean up reads for time points t0 to t3 for each species.
    -Assemble a transcriptome for each species using Trinity.

    Now comes the question. I can analyze differential gene expression for each species along timepoints using RSEM/edgeR. But how would it be best to compare the same timepoints (say, t0 or t2) across species? It seems to me like I should somehow assemble a "consensus transcriptome". Any ideas on how to do it? My thoughts so far are doing a reciprocal blast search of both transcriptomes and use it to build a "join table", then use it to compare the results of the RSEM analyses made against the species-specific transcriptome; however, I am not sure if FPKM values obtained against different references are comparable.

    I appreciate any thoughts or ideas on this!

    Cheers!

    -Ed-
  • sdriscoll
    I like code
    • Sep 2009
    • 436

    #2
    I think this is a not so straightforward issue and I'd have to think about it a bit more. For starters, however, I should warn you that when you pool RNA-seq data from more than one species, especially if they contain many common genes, tools like Trinity will in fact generate chimeric isoforms merging parts of the two species' versions of those genes together. Ive done this myself by combining simulated RNA-seq reads from Rat and Human. Trinity happily created Rat-Human chimeric isoforms for commonly expressed genes. That's not helpful at all.

    Maybe a good starting point would be to generate two master assemblies (one for each species) and then attempt to identify homologous genes between the two so that you could align those for differential expression analysis. When you get to that point you will have to rely on some kind of length-normalized values (like FPKM) because it's unlikely that the common genes between the species will have the same lengths which would throw off raw count based DE tools. There are no rules for this analysis as far as I know...but you could take a crack at it like this.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment

    • ezattara
      Junior Member
      • Apr 2013
      • 4

      #3
      Thanks for your reply.

      You are correct that pooling reads from both species is not a good idea. I have separate assemblies for each species, and have used RSEM/edgeR to get expression levels for each independently. If I do a reciprocal blast between both assemblies, and then use that table to join the species-specific results standardized to FPKM, that should allow me to compare side by side, right?

      I tried running RSEM to map reads of one species against the transcriptome of the other, hoping that since they are very close, I would be able to map reasonably well, but that failed miserably, as the Bowtie mapping strategy only recognized almost perfect matches, so most of the reads were not mapped. So now I know that doesn't work.

      Ok, I will give it a try and let you know how it worked.

      Cheers!

      Comment

      • sdriscoll
        I like code
        • Sep 2009
        • 436

        #4
        blast is what I was thinking would be a good starting point. keep in mind that Trinity does bundle transcripts into genes based on shared information. it may be best to try to match up the species at the gene level which may mean keeping track of multiple transcripts per species that match up as a group. then you can sum the FPKM values per sample per gene and make the comparisons. I have read that the TPM normalization may be an even better metric for this type of comparison. it is discussed super briefly in the RSEM paper.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment

        • danwiththeplan
          Member
          • Sep 2011
          • 72

          #5
          This is definitely a hard one. I can see all sorts of problems comparing expression levels between genes that are not exactly the same. Yes, you could get a number for gene 1 and a number for gene 2, compare them, and draw a plot, but even a single bp difference between the gene could totally change the structure / function of the protein, and then you're comparing things that should not really be compared.

          I had a similar issue with comparing two different cultivars.

          Sdriscoll's suggestion is pretty solid. Differential expression between genes that are 100% identical in the coding sequence would be a solid start. Differential expression between genes that are similar but non-identical might require you to show that the differences don't make much difference to the structure/function of the protein.

          Comment

          • mbblack
            Senior Member
            • Aug 2009
            • 245

            #6
            If the intent is to compare response to injury in species 1 at time t1, and response to injury in species 2 at time t1, then you already have what you need.

            I'm assuming you have independent, non-injury controls for both species experiments? So you have differential expression for every time point in species 1 and differential expression for every time point in species 2. You could simple take those species specific gene lists and use estimated species-specific fold change and a simple RankProduct analysis of homologous genes to look at relative injury response between species. Or compute z-scores for each species and compare those (or use any one of several other non-parametric approaches to compare two independent lists of things).

            I'd not want to even try to directly compare relative expression estimates between the two species as I see little value in that. At least not if the intent is to classify genomic injury response between two species. In that case, the comparison of interest is in the species specific response, and how it differs between them. The actual difference in relative expression between the two species for any particular gene doesn't really tell you anything of value about injury response between them. Their response to injury is defined by their species specific response of treatment relative to their respective species specific controls.
            Last edited by mbblack; 10-30-2014, 07:00 AM.
            Michael Black, Ph.D.
            ScitoVation LLC. RTP, N.C.

            Comment

            • sdriscoll
              I like code
              • Sep 2009
              • 436

              #7
              Excellent point. Solve the problem with good experiment design (i.e. use controls for each species at each time point). This is perfect. Then you can quantify each species' injury response at each time point and then make comparisons in that overall quantification without ever having to make direct comparisons between genes in the two species.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                Yesterday, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 12:03 PM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, Yesterday, 11:40 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...