Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differential expression between samples from different experiments

    I have some RNASeq data from a time course experiment that is a mix of (mostly) plant and bacterial reads. I have a senior scientist (I'm a phd student) who has approached me asking me to do a differential expression analysis between the bacterial reads at a random time point in my time course with some reads from an experiment where the bacteria has been grown in culture.

    I can see how the difference between in planta and in vitro gene expression could be interesting. I've been racking my brains trying to think how you would design an experiment to look at this - no luck. I'm moderately confident that comparing data from different experiments is not the way to do it though.

    It's the same strain of bacteria - though different preparations. The data from the other experiment is RPKM values (and possibly raw counts/gene) from reads aligned against the same reference genome. I'm already aware that using RPKM values for downstream analysis is a bad idea. I can get the lengths of the genes and (I think) work backwards from RPKM. Which is going to result in two libraries of massively different sizes - a lot of the real estate in the time course experiment is taken up by plant RNA, from the in vitro experiment, it's all bacterial RNA. I'm dubious about straight out adjusting for library size - the difference in size means any variation in the time course data is just going to be magnified horrifically.

    Ah, also, my data, I have 3 replicates at each time point, the supplied data, only a single replicant. DESeq can handle 1 replicate if it must but this just screams dangerous to me.

    Are my objections running along the correct lines (magnification of variation, starting with different bacterial preparations )? Is there any else that I should add? Is there any way to make this a good thing (nothing occurs to me) even as an incredibly tentative exploratory process? Anything anyone could add would be grand.



    Cheers
    Ben.

  • #2
    My first advice is for you to talk to the senior scientist about it and explain your fears to him if you didn't do so already. He should (hopefully) know the experimental setup and the species involved better than anyone here.

    Your objections are valid - it will be difficult to draw conclusions from such a comparison. A couple of things:
    - Forget the RPKMs for such a comparison, you need at least the raw counts - or much better the actual reads, so that you are sure that the data is aligned and counted in the same way.
    - How much bacterial reads do you have per sample from your time-course? Tens of thousands, hundreds of thousands or millions? With tens of thousands, forget it. With hundreds of thousands or millions, a comparison might be possible.
    - Make sure you get rid of rRNA sequences/counts before you put anything into DESeq, or it will end in disaster (since the amount of rRNA that slipped through will most probably be different for the different sample/library preps)
    - DESeq/DESeq2 will handle the library size differences (I've seen it work OK with up to two orders of magnitude of difference in read counts), don't try to adjust the raw counts for library size yourself!
    Last edited by sarvidsson; 02-25-2015, 01:09 AM.

    Comment


    • #3
      I think you might be able to do following with your data:

      1- Identify bacterial genes that are expressed only in culture and only on host. There would a different set of genes that are expressed when bacteria invades and grows on host and a set of genes that are turned off.

      2- Identify some gens that are similar to housekeeping genes and normalise gene expression relative to them (similar to qPCR).

      Comment


      • #4
        You can also explicitly model the fact that you have data from two different experiments by specifying your design matrix for DESeq/DESeq2:
        Code:
        design(dds) <- formula(~experiment+time_point)
        But like sarvidsson said, you'll have difficulty drawing much from the data due to the lack of replicates in the culture data. I think differences in rRNA counts are okay, due to DESeq2's median ratio method for library size estimation.
        Given a matrix or data frame of count data, this function estimates the size factors as follows:
        Each column is divided by the geometric means of the rows. The median (or, if requested, another
        location estimator) of these ratios (skipping the genes with a geometric mean of zero) is
        used as the size factor for this column

        Comment


        • #5
          Sarvidsson - bacterial reads from the time course samples, yeah, I've got tens of thousands. The plant reads have taken up most of the space. When I designed the experiment all the biologists thought the bacteria would have multiplied a lot faster and we thought there might have been a reasonable signal from the bacterial reads, so I kept them in rather than just selecting for the plant RNA. Now, I'm still using it, but being tentative about what I say about it.
          In the supplied cultured data - it's tens of millions - so yeah, we're talking about several orders of magnitude difference. If it was only a couple of orders, I would probably be a little more sanguine about this.

          Thanks everyone for the responses. A little more comfortable writing this email now.

          Comment


          • #6
            Originally posted by tirohia View Post
            Sarvidsson - bacterial reads from the time course samples, yeah, I've got tens of thousands. The plant reads have taken up most of the space. When I designed the experiment all the biologists thought the bacteria would have multiplied a lot faster and we thought there might have been a reasonable signal from the bacterial reads, so I kept them in rather than just selecting for the plant RNA. Now, I'm still using it, but being tentative about what I say about it.
            In the supplied cultured data - it's tens of millions - so yeah, we're talking about several orders of magnitude difference. If it was only a couple of orders, I would probably be a little more sanguine about this.
            Then you are rather limited in what you can do - looking at the data qualitatively as nucacidhunter suggested might be possible, but be careful in determining "culture only" expressed genes with low numbers from the host samples.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-27-2024, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-27-2024, 06:07 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X