Unconfigured Ad

**sarvidsson** · 02-25-2015, 01:06 AM

My first advice is for you to talk to the senior scientist about it and explain your fears to him if you didn't do so already. He should (hopefully) know the experimental setup and the species involved better than anyone here.

Your objections are valid - it will be difficult to draw conclusions from such a comparison. A couple of things:
- Forget the RPKMs for such a comparison, you need at least the raw counts - or much better the actual reads, so that you are sure that the data is aligned and counted in the same way.
- How much bacterial reads do you have per sample from your time-course? Tens of thousands, hundreds of thousands or millions? With tens of thousands, forget it. With hundreds of thousands or millions, a comparison might be possible.
- Make sure you get rid of rRNA sequences/counts before you put anything into DESeq, or it will end in disaster (since the amount of rRNA that slipped through will most probably be different for the different sample/library preps)
- DESeq/DESeq2 will handle the library size differences (I've seen it work OK with up to two orders of magnitude of difference in read counts), don't try to adjust the raw counts for library size yourself!

**nucacidhunter** · 02-25-2015, 02:54 AM

I think you might be able to do following with your data:

1- Identify bacterial genes that are expressed only in culture and only on host. There would a different set of genes that are expressed when bacteria invades and grows on host and a set of genes that are turned off.

2- Identify some gens that are similar to housekeeping genes and normalise gene expression relative to them (similar to qPCR).

**fanli** · 02-25-2015, 07:44 AM

You can also explicitly model the fact that you have data from two different experiments by specifying your design matrix for DESeq/DESeq2:

Code:

design(dds) <- formula(~experiment+time_point)

But like sarvidsson said, you'll have difficulty drawing much from the data due to the lack of replicates in the culture data. I think differences in rRNA counts are okay, due to DESeq2's median ratio method for library size estimation.

Given a matrix or data frame of count data, this function estimates the size factors as follows:
Each column is divided by the geometric means of the rows. The median (or, if requested, another
location estimator) of these ratios (skipping the genes with a geometric mean of zero) is
used as the size factor for this column

**tirohia** · 02-25-2015, 04:20 PM

Sarvidsson - bacterial reads from the time course samples, yeah, I've got tens of thousands. The plant reads have taken up most of the space. When I designed the experiment all the biologists thought the bacteria would have multiplied a lot faster and we thought there might have been a reasonable signal from the bacterial reads, so I kept them in rather than just selecting for the plant RNA. Now, I'm still using it, but being tentative about what I say about it.
In the supplied cultured data - it's tens of millions - so yeah, we're talking about several orders of magnitude difference. If it was only a couple of orders, I would probably be a little more sanguine about this.

Thanks everyone for the responses. A little more comfortable writing this email now.

**sarvidsson** · 02-26-2015, 12:11 AM

Originally posted by tirohia View Post

Sarvidsson - bacterial reads from the time course samples, yeah, I've got tens of thousands. The plant reads have taken up most of the space. When I designed the experiment all the biologists thought the bacteria would have multiplied a lot faster and we thought there might have been a reasonable signal from the bacterial reads, so I kept them in rather than just selecting for the plant RNA. Now, I'm still using it, but being tentative about what I say about it.
In the supplied cultured data - it's tens of millions - so yeah, we're talking about several orders of magnitude difference. If it was only a couple of orders, I would probably be a little more sanguine about this.

Then you are rather limited in what you can do - looking at the data qualitatively as nucacidhunter suggested might be possible, but be careful in determining "culture only" expressed genes with low numbers from the host samples.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 100 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 121 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 114 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Differential expression between samples from different experiments

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News