Hi everyone,
This is a beefy one... sort of.
I've set up a big RNA-seq experiment where I'm comparing pooled mouse samples. I've clipped off a bit of tissue and extracted the RNA, then for each time point I've pooled together a few individuals since the tissue I'm using is very limited and I can't get much RNA from them. There are 4 samples per pool. After pooling, I ran Ribozero to get rid of the rRNA and during the process I spiked the samples using the ERCC ex-Fold spike in mix (0.5ul - a dilution amount that seemed to be appropriate for my experiment).
This is the set up:
Mouse E11.5 Control (4 individuals pooled in to the same tube)
Mouse E11.5 TEST (4 individuals pooled in to the same tube)
Mouse E12.5 Control (individuals...etc)
Mouse E12.5 TEST... etc etc
All the way up to
Mouse E17.5 Control (4 individuals pooled)
Mouse E17.5 TEST (4 individuals pooled)
Each pool was sequenced on the Illumina hi-seq using v3 chemistry and I have the data. The problem that I have is trying to analyze the pools for differential expression and using the ERCC spike-ins for normalization.
So just to clear a couple things up first
--The point of this experiment is not to generate an end all serial transcriptome data set for the tissue I'm studying. We were willing to spend the money to do this as an exploratory experiment to highlight specific genes that we would follow up later. So its just exploratory and not for publishing, necessarily.
--We are aware of the alternatives for the approach to this experiment, but decided that based on our goals and our budget that this would be the best approach.
Okay - so considering all of these details, I was hoping I might get some feed back on the following questions:
1. Was it necessary for us to use the ERCC ex-fold spike ins for this experimental set up? We went back and forth about this a little bit, and decided it would be best to use them. But I wanted to get a feel from the community on this. I know the ERCC spikes are supposed to help control for platform variation, but since we multiplexed all of the pools during the run (across several lanes), does this even matter?
2. How on earth do I actually normalize the data from the ERCC spike ins. I mean step by step. I have run CuffDiff, and it seems to have its own normalizing standard when performing the analysis which did produce some very interesting results... but surely it doesn't it take in to account the ERCC spike ins automatically? I've also come across forum threads where people reference random functions with no context, like "loess.normalization()". What on Earth is that supposed to mean? Sounds like excel! haha I haven't been able to find a single how-to or tutorial on how to actually run the ERCC normalization. Maybe I'm not looking in the right place? I'm not hugely familiar with the bioinformatics skills necessary for doing this, but there is also no guidance or expertise on this at the institution/dept. I'm in. But we also don't want to outsource. Can anyone give me a step by step or link to a guide for normalizing my RNA-seq data using the ERCC spike ins? I don't have an intuitive knowledge of which programs I am supposed to use and I don't know what some random function is supposed to represent or where I'm supposed to implement it... but I do have the skills to learn how to use the tools with a little guidance.
Thanks so much for any help and please let me know if you need any more information!
Cheers!
Paul
This is a beefy one... sort of.
I've set up a big RNA-seq experiment where I'm comparing pooled mouse samples. I've clipped off a bit of tissue and extracted the RNA, then for each time point I've pooled together a few individuals since the tissue I'm using is very limited and I can't get much RNA from them. There are 4 samples per pool. After pooling, I ran Ribozero to get rid of the rRNA and during the process I spiked the samples using the ERCC ex-Fold spike in mix (0.5ul - a dilution amount that seemed to be appropriate for my experiment).
This is the set up:
Mouse E11.5 Control (4 individuals pooled in to the same tube)
Mouse E11.5 TEST (4 individuals pooled in to the same tube)
Mouse E12.5 Control (individuals...etc)
Mouse E12.5 TEST... etc etc
All the way up to
Mouse E17.5 Control (4 individuals pooled)
Mouse E17.5 TEST (4 individuals pooled)
Each pool was sequenced on the Illumina hi-seq using v3 chemistry and I have the data. The problem that I have is trying to analyze the pools for differential expression and using the ERCC spike-ins for normalization.
So just to clear a couple things up first
--The point of this experiment is not to generate an end all serial transcriptome data set for the tissue I'm studying. We were willing to spend the money to do this as an exploratory experiment to highlight specific genes that we would follow up later. So its just exploratory and not for publishing, necessarily.
--We are aware of the alternatives for the approach to this experiment, but decided that based on our goals and our budget that this would be the best approach.
Okay - so considering all of these details, I was hoping I might get some feed back on the following questions:
1. Was it necessary for us to use the ERCC ex-fold spike ins for this experimental set up? We went back and forth about this a little bit, and decided it would be best to use them. But I wanted to get a feel from the community on this. I know the ERCC spikes are supposed to help control for platform variation, but since we multiplexed all of the pools during the run (across several lanes), does this even matter?
2. How on earth do I actually normalize the data from the ERCC spike ins. I mean step by step. I have run CuffDiff, and it seems to have its own normalizing standard when performing the analysis which did produce some very interesting results... but surely it doesn't it take in to account the ERCC spike ins automatically? I've also come across forum threads where people reference random functions with no context, like "loess.normalization()". What on Earth is that supposed to mean? Sounds like excel! haha I haven't been able to find a single how-to or tutorial on how to actually run the ERCC normalization. Maybe I'm not looking in the right place? I'm not hugely familiar with the bioinformatics skills necessary for doing this, but there is also no guidance or expertise on this at the institution/dept. I'm in. But we also don't want to outsource. Can anyone give me a step by step or link to a guide for normalizing my RNA-seq data using the ERCC spike ins? I don't have an intuitive knowledge of which programs I am supposed to use and I don't know what some random function is supposed to represent or where I'm supposed to implement it... but I do have the skills to learn how to use the tools with a little guidance.
Thanks so much for any help and please let me know if you need any more information!
Cheers!
Paul
Comment