Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intra-sample variability, Illumina TruSeq mRNA

    Does anyone know if there is a technical issue during either mRNA library prep or data handling that could cause two libraries prepared from the same cell population to look radically different? We have made multiple libraries from the same sample and the results appear quite discouraging. We're not sure if we're doing something wrong with analysis, or whether there was a problem with sample prep. Anyone know of any common pitfalls that could explain our problem?
    Last edited by eab; 07-12-2011, 07:53 AM.

  • #2
    I did some QC runs using MAQC UHR and Human Brain with three replicates for TruSeq Whole Transcriptome and the libraries had an R-squared value of .98 or higher when I compared their FPKM as generated by Cufflinks. Granted I did not do the mRNA selection step.

    I have also made libraries from the same experimental sample using different methods for ribosomal reduction and when comparing the non-ribosomal FPKM I also get a very high correlation of >0.9.

    Were they prepared at different times? What kind of RNA and how does the BioAnlyzer look?

    Comment


    • #3
      Hey mnkyboy, thanks for the superfast reply! Here are details.

      Cells: sorted human naive T cells, approximately 15 million in one tube. Cells aliquotted into 5 tubes, including one (1) tube of 1x10e7, two (2) tubes of 2x10e6, and two (2) tubes of 2x10e5.

      Extractions: cells pelleted and lysed in RNAzol RT immediately after aliquoting, then stored at -80 until total RNA extraction. RNA extraction done at same time with same tubes of reagents on all 5 tubes.

      Library prep: TruSeq RNA sample prep kit A, all libraries prepared together in a single 96-well plate using high-throughput protocol (with a few minor mods).

      Library QC: completed, purified libraries run on bioanalyzer and showed appropriate size peak + a large peak that I took to represent the "bubble form" Illumina describes. Libraries quantified by Kapa qPCR with flowcell primers and SYBR Green reporter.

      Clustering: cBOT using cluster kit TruSeq PE cluster kit v2 - HighSeq.

      I did not run the starting RNA on the BioA before library prep. The cells were handled as immaculately as was possible, so I figured that no matter what the BioA gave me for an RIN, I would not be able to improve on it and I needed to just go forward. I have some RNA saved back that I can run now on the BioA, but I would be shocked if differential degradation were the problem.

      Any ideas? We're wondering especially about trivial informatics sorts of things that can lead to false differences.....

      Thanks!
      Eli

      Comment


      • #4
        When you say they look radically different, what do you mean? Is this before alignment or after alignment?

        Comment


        • #5
          That is definitely a head scratcher. How long were your reads? We have found for RNA-seq if we go over 75 bases we start hitting adapter and our mapping goes to awry. Did you multiplex? Was there anything that stuck out across the lanes in your QC? We generally multiplex and spread across the flow cell to reduce any lane variation.

          The only other thing that I think could be an issue is if something odd happened during the poly-A selection. One way to check this is too see if you map to any known non poly-adenylated non-coding RNA and see if there are differences across the samples.

          Comment


          • #6
            Originally posted by mnkyboy
            That is definitely a head scratcher. How long were your reads? We have found for RNA-seq if we go over 75 bases we start hitting adapter and our mapping goes to awry.
            This is exactly the problem I had with the truseq libraries and I wonder if this is the problem now. We had 100bp reads and I was only getting ~60% to map. When I would blast random reads, the last 25 or so bps often had no match at all and turned out to be adapter sequence. I have heard of other people also having this problem with correct size selection.

            Comment


            • #7
              Originally posted by chadn737 View Post
              This is exactly the problem I had with the truseq libraries and I wonder if this is the problem now. We had 100bp reads and I was only getting ~60% to map. When I would blast random reads, the last 25 or so bps often had no match at all and turned out to be adapter sequence. I have heard of other people also having this problem with correct size selection.
              Yeah our standard WT or mRNA-seq is now 2x75 bp and then 2x50 if we do FFPE.

              Comment


              • #8
                Originally posted by chadn737 View Post
                When you say they look radically different, what do you mean? Is this before alignment or after alignment?
                I'm the bioinformatician working on this.

                They looked vastly different.

                In the first image I uploaded, I had used the wrong gtf (contained multiple entry names for the same transcript, ucsc_all_known_mRNA) file for the cufflinks analysis and that was a cause of much of the disparity. The R^2 value was only 0.60 or so.

                After realizing my error, I grabbed the refSeq gtf file from the UCSC genome browser. After using it in cufflinks, we obtained the second image. The R^2 value for that one us much better at 0.90 or so, but probably should be a bit better.

                Sam
                Attached Files

                Comment


                • #9
                  As Sam (sdarko) writes, a change in the gtf improved the correlation between duplicate libraries, but we hope the actual correlation is even better. First off, if you look at the right-hand plot from his post, there are a good number of reads stacked up along the axes, meaning that they occurred in only one of the two libraries. Second, of the reads that occurred in both libraries, correlation between libraries is not so close, especially at the middle and lower ranges of abundance.

                  Comment


                  • #10
                    How deep was your sequencing? I almost always find a large number of genes with 1 or 2 reads mapping, that may be in one sample, but not in the other. Still, even 0.9 seems a bit low for technical replicates. We only do Biological replicates and there we usually an r2 of around .96 - .97.

                    Comment


                    • #11
                      Originally posted by chadn737 View Post
                      How deep was your sequencing? I almost always find a large number of genes with 1 or 2 reads mapping, that may be in one sample, but not in the other. Still, even 0.9 seems a bit low for technical replicates. We only do Biological replicates and there we usually an r2 of around .96 - .97.
                      I think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).

                      So we have greater than 4x the reads aligning for one library versus the other.

                      Sam

                      Comment


                      • #12
                        Originally posted by sdarko View Post
                        I think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).

                        So we have greater than 4x the reads aligning for one library versus the other.

                        Sam
                        That can be a big. Since you're a bioinformatician who is presumably much better at programming than I am can you take random samples of 1M reads from the total 4M and align them and see how the R^2 looks? How much coverage did you get overall?

                        Comment


                        • #13
                          Originally posted by Heisman View Post
                          That can be a big. Since you're a bioinformatician who is presumably much better at programming than I am can you take random samples of 1M reads from the total 4M and align them and see how the R^2 looks? How much coverage did you get overall?
                          Taking a random subset is on the agenda for today. Will let you know.

                          Comment


                          • #14
                            We noticed that many of the species "unique" to 1/2 duplicates appear to be ubiquitously-expressed genes mapping to loci encompassing several possible transcripts. So there is no way they should have been unique to one of the starting RNA samples. Perhaps a single species is being called one thing from one duplicate library, and something else from the other? Either that, or PCR is so chaotic that it completely loses large numbers of moderately-abundant species in a somewhat random fashion? I feel like the field would be aware of that if it were the case, though.
                            Last edited by eab; 07-13-2011, 09:00 AM.

                            Comment


                            • #15
                              Originally posted by sdarko View Post
                              I think that one issue may be that in one "identical" library we have ~ 4 million reads (with ~83% aligning to genome) while in the other "identical" library we have ~1 million reads (with ~71% aligning to genome).

                              So we have greater than 4x the reads aligning for one library versus the other.

                              Sam
                              Yeah, thats not very deep, so I would expect a lot more singletons. If you set an arbitrary cutoff and filter out the singletons, I wonder if your r2 will increase.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin


                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                ...
                                02-08-2024, 06:33 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:12 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              73 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X