Does anyone know if there is a technical issue during either mRNA library prep or data handling that could cause two libraries prepared from the same cell population to look radically different? We have made multiple libraries from the same sample and the results appear quite discouraging. We're not sure if we're doing something wrong with analysis, or whether there was a problem with sample prep. Anyone know of any common pitfalls that could explain our problem?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I did some QC runs using MAQC UHR and Human Brain with three replicates for TruSeq Whole Transcriptome and the libraries had an R-squared value of .98 or higher when I compared their FPKM as generated by Cufflinks. Granted I did not do the mRNA selection step.
I have also made libraries from the same experimental sample using different methods for ribosomal reduction and when comparing the non-ribosomal FPKM I also get a very high correlation of >0.9.
Were they prepared at different times? What kind of RNA and how does the BioAnlyzer look?
-
Hey mnkyboy, thanks for the superfast reply! Here are details.
Cells: sorted human naive T cells, approximately 15 million in one tube. Cells aliquotted into 5 tubes, including one (1) tube of 1x10e7, two (2) tubes of 2x10e6, and two (2) tubes of 2x10e5.
Extractions: cells pelleted and lysed in RNAzol RT immediately after aliquoting, then stored at -80 until total RNA extraction. RNA extraction done at same time with same tubes of reagents on all 5 tubes.
Library prep: TruSeq RNA sample prep kit A, all libraries prepared together in a single 96-well plate using high-throughput protocol (with a few minor mods).
Library QC: completed, purified libraries run on bioanalyzer and showed appropriate size peak + a large peak that I took to represent the "bubble form" Illumina describes. Libraries quantified by Kapa qPCR with flowcell primers and SYBR Green reporter.
Clustering: cBOT using cluster kit TruSeq PE cluster kit v2 - HighSeq.
I did not run the starting RNA on the BioA before library prep. The cells were handled as immaculately as was possible, so I figured that no matter what the BioA gave me for an RIN, I would not be able to improve on it and I needed to just go forward. I have some RNA saved back that I can run now on the BioA, but I would be shocked if differential degradation were the problem.
Any ideas? We're wondering especially about trivial informatics sorts of things that can lead to false differences.....
Thanks!
Eli
Comment
-
That is definitely a head scratcher. How long were your reads? We have found for RNA-seq if we go over 75 bases we start hitting adapter and our mapping goes to awry. Did you multiplex? Was there anything that stuck out across the lanes in your QC? We generally multiplex and spread across the flow cell to reduce any lane variation.
The only other thing that I think could be an issue is if something odd happened during the poly-A selection. One way to check this is too see if you map to any known non poly-adenylated non-coding RNA and see if there are differences across the samples.
Comment
-
Originally posted by mnkyboyThat is definitely a head scratcher. How long were your reads? We have found for RNA-seq if we go over 75 bases we start hitting adapter and our mapping goes to awry.
Comment
-
Originally posted by chadn737 View PostThis is exactly the problem I had with the truseq libraries and I wonder if this is the problem now. We had 100bp reads and I was only getting ~60% to map. When I would blast random reads, the last 25 or so bps often had no match at all and turned out to be adapter sequence. I have heard of other people also having this problem with correct size selection.
Comment
-
Originally posted by chadn737 View PostWhen you say they look radically different, what do you mean? Is this before alignment or after alignment?
They looked vastly different.
In the first image I uploaded, I had used the wrong gtf (contained multiple entry names for the same transcript, ucsc_all_known_mRNA) file for the cufflinks analysis and that was a cause of much of the disparity. The R^2 value was only 0.60 or so.
After realizing my error, I grabbed the refSeq gtf file from the UCSC genome browser. After using it in cufflinks, we obtained the second image. The R^2 value for that one us much better at 0.90 or so, but probably should be a bit better.
Sam
Comment
-
As Sam (sdarko) writes, a change in the gtf improved the correlation between duplicate libraries, but we hope the actual correlation is even better. First off, if you look at the right-hand plot from his post, there are a good number of reads stacked up along the axes, meaning that they occurred in only one of the two libraries. Second, of the reads that occurred in both libraries, correlation between libraries is not so close, especially at the middle and lower ranges of abundance.
Comment
-
How deep was your sequencing? I almost always find a large number of genes with 1 or 2 reads mapping, that may be in one sample, but not in the other. Still, even 0.9 seems a bit low for technical replicates. We only do Biological replicates and there we usually an r2 of around .96 - .97.
Comment
-
Originally posted by chadn737 View PostHow deep was your sequencing? I almost always find a large number of genes with 1 or 2 reads mapping, that may be in one sample, but not in the other. Still, even 0.9 seems a bit low for technical replicates. We only do Biological replicates and there we usually an r2 of around .96 - .97.
So we have greater than 4x the reads aligning for one library versus the other.
Sam
Comment
-
Originally posted by sdarko View PostI think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).
So we have greater than 4x the reads aligning for one library versus the other.
Sam
Comment
-
Originally posted by Heisman View PostThat can be a big. Since you're a bioinformatician who is presumably much better at programming than I am can you take random samples of 1M reads from the total 4M and align them and see how the R^2 looks? How much coverage did you get overall?
Comment
-
We noticed that many of the species "unique" to 1/2 duplicates appear to be ubiquitously-expressed genes mapping to loci encompassing several possible transcripts. So there is no way they should have been unique to one of the starting RNA samples. Perhaps a single species is being called one thing from one duplicate library, and something else from the other? Either that, or PCR is so chaotic that it completely loses large numbers of moderately-abundant species in a somewhat random fashion? I feel like the field would be aware of that if it were the case, though.Last edited by eab; 07-13-2011, 09:00 AM.
Comment
-
Originally posted by sdarko View PostI think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).
So we have greater than 4x the reads aligning for one library versus the other.
Sam
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Comment