Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tirohia
    Member
    • Nov 2011
    • 47

    Differential expression between samples from different experiments

    I have some RNASeq data from a time course experiment that is a mix of (mostly) plant and bacterial reads. I have a senior scientist (I'm a phd student) who has approached me asking me to do a differential expression analysis between the bacterial reads at a random time point in my time course with some reads from an experiment where the bacteria has been grown in culture.

    I can see how the difference between in planta and in vitro gene expression could be interesting. I've been racking my brains trying to think how you would design an experiment to look at this - no luck. I'm moderately confident that comparing data from different experiments is not the way to do it though.

    It's the same strain of bacteria - though different preparations. The data from the other experiment is RPKM values (and possibly raw counts/gene) from reads aligned against the same reference genome. I'm already aware that using RPKM values for downstream analysis is a bad idea. I can get the lengths of the genes and (I think) work backwards from RPKM. Which is going to result in two libraries of massively different sizes - a lot of the real estate in the time course experiment is taken up by plant RNA, from the in vitro experiment, it's all bacterial RNA. I'm dubious about straight out adjusting for library size - the difference in size means any variation in the time course data is just going to be magnified horrifically.

    Ah, also, my data, I have 3 replicates at each time point, the supplied data, only a single replicant. DESeq can handle 1 replicate if it must but this just screams dangerous to me.

    Are my objections running along the correct lines (magnification of variation, starting with different bacterial preparations )? Is there any else that I should add? Is there any way to make this a good thing (nothing occurs to me) even as an incredibly tentative exploratory process? Anything anyone could add would be grand.



    Cheers
    Ben.
  • sarvidsson
    Senior Member
    • Jan 2015
    • 137

    #2
    My first advice is for you to talk to the senior scientist about it and explain your fears to him if you didn't do so already. He should (hopefully) know the experimental setup and the species involved better than anyone here.

    Your objections are valid - it will be difficult to draw conclusions from such a comparison. A couple of things:
    - Forget the RPKMs for such a comparison, you need at least the raw counts - or much better the actual reads, so that you are sure that the data is aligned and counted in the same way.
    - How much bacterial reads do you have per sample from your time-course? Tens of thousands, hundreds of thousands or millions? With tens of thousands, forget it. With hundreds of thousands or millions, a comparison might be possible.
    - Make sure you get rid of rRNA sequences/counts before you put anything into DESeq, or it will end in disaster (since the amount of rRNA that slipped through will most probably be different for the different sample/library preps)
    - DESeq/DESeq2 will handle the library size differences (I've seen it work OK with up to two orders of magnitude of difference in read counts), don't try to adjust the raw counts for library size yourself!
    Last edited by sarvidsson; 02-25-2015, 01:09 AM.

    Comment

    • nucacidhunter
      Jafar Jabbari
      • Jan 2013
      • 1250

      #3
      I think you might be able to do following with your data:

      1- Identify bacterial genes that are expressed only in culture and only on host. There would a different set of genes that are expressed when bacteria invades and grows on host and a set of genes that are turned off.

      2- Identify some gens that are similar to housekeeping genes and normalise gene expression relative to them (similar to qPCR).

      Comment

      • fanli
        Senior Member
        • Jul 2014
        • 197

        #4
        You can also explicitly model the fact that you have data from two different experiments by specifying your design matrix for DESeq/DESeq2:
        Code:
        design(dds) <- formula(~experiment+time_point)
        But like sarvidsson said, you'll have difficulty drawing much from the data due to the lack of replicates in the culture data. I think differences in rRNA counts are okay, due to DESeq2's median ratio method for library size estimation.
        Given a matrix or data frame of count data, this function estimates the size factors as follows:
        Each column is divided by the geometric means of the rows. The median (or, if requested, another
        location estimator) of these ratios (skipping the genes with a geometric mean of zero) is
        used as the size factor for this column

        Comment

        • tirohia
          Member
          • Nov 2011
          • 47

          #5
          Sarvidsson - bacterial reads from the time course samples, yeah, I've got tens of thousands. The plant reads have taken up most of the space. When I designed the experiment all the biologists thought the bacteria would have multiplied a lot faster and we thought there might have been a reasonable signal from the bacterial reads, so I kept them in rather than just selecting for the plant RNA. Now, I'm still using it, but being tentative about what I say about it.
          In the supplied cultured data - it's tens of millions - so yeah, we're talking about several orders of magnitude difference. If it was only a couple of orders, I would probably be a little more sanguine about this.

          Thanks everyone for the responses. A little more comfortable writing this email now.

          Comment

          • sarvidsson
            Senior Member
            • Jan 2015
            • 137

            #6
            Originally posted by tirohia View Post
            Sarvidsson - bacterial reads from the time course samples, yeah, I've got tens of thousands. The plant reads have taken up most of the space. When I designed the experiment all the biologists thought the bacteria would have multiplied a lot faster and we thought there might have been a reasonable signal from the bacterial reads, so I kept them in rather than just selecting for the plant RNA. Now, I'm still using it, but being tentative about what I say about it.
            In the supplied cultured data - it's tens of millions - so yeah, we're talking about several orders of magnitude difference. If it was only a couple of orders, I would probably be a little more sanguine about this.
            Then you are rather limited in what you can do - looking at the data qualitatively as nucacidhunter suggested might be possible, but be careful in determining "culture only" expressed genes with low numbers from the host samples.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            38 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            100 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            121 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            114 views
            0 reactions
            Last Post SEQadmin2  
            Working...