Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • what is the "molecular indices" in the NEXTflex qRNA kit

    Hi

    I just saw the new NEXTflex RNA-Seq kit. They have this interesting "molecular indice" that would label each dsDNA molecule.



    When I took a closer look, I can see that this "indice" is most likely to be a pool of barcode adapters (as in Craig et al 2004 Nat Methods). I think they also have the 3' end matches the truseq adapter. Unlike the truseq adapter, one aditional barcode could be placed before the T overhang. Nextflex claims to have 9,216 barcode in each adapter. That is 4x4x4x4x4x3x3. They could synthesize/anneal 9,216 adapters but I think it is more reasonable to use 7 random nucleotide (during oligo synthesis) barcode before the T overhang. In order to make the adapter, they must be able to anneal it to a complementary oligo. I assume they could use one (or a few) with deoxyinosine.

    Anyone has more information? I am just guessing, and could be completely wrong.

    Cheers
    silin

  • #2
    I've attached the white paper where they describe the kit. They've commercialized the technique described in 2 of the PNAS papers in the references.

    Basically, they have a pool of 96 adapters that have an additional 8-mer barcode downstream of the read 1 and read 2 sequencing primer binding sites. The adapters in the pool are added stochastically to each end of a dsDNA, so you have 96 x 96 = 9216 possible combinations of "molecular barcodes." The barcode combination plus the sequence of the insert allow you to tell if a read is a PCR duplicate (generated during enrichment PCR) or if it is a true duplicate. A PCR duplicate will have the same insert and molecular barcodes. A true duplicate will have the same insert with different molecular barcodes.

    The white paper explains it pretty well.
    Attached Files

    Comment


    • #3
      The thing I'm not seeing in any of their literature is any actual data on the rate of PCR duplication detected using their technique. With these libraries you can directly measure the rate of PCR duplicate reads, so why haven't they reported how many they find? If PCR duplication is a significant source of error in RNA-Seq data sets why not present data showing that? On the other hand if PCR duplication is generally not a significant factor then there is no reason to use their kit.

      Hmm...did I just answer my own question?

      Comment


      • #4
        Hi Kmcarr,

        In the set of experiments we describe in the white paper describing the NEXTflex qRNA-Seq Kit our goal was to compare unique fragments determined by reads with unique start and stop sites verses molecular indexing. Figure 5 illustrates under representation of high expressing ERCC and selected mRNAs using start and stop sites as indicators of unique fragments. When molecular indices are used (blue line), the under representation at the high expressing genes is clear. We’re happy to share the individual ERCC RNAs and mRNAs studied if anyone is interested. In this study we didn’t focus on PCR duplicates, in fact we purposely performed the experiment with as few PCR cycles as possible to demonstrate the differences between copy number. We are working on data that describes the benefit of this technology for PCR duplication using Chip-Seq and other applications were PCR cycles are typically higher than they should be.

        Cheers,
        Dawn
        Last edited by Bioo Scientific; 04-06-2015, 09:51 AM. Reason: updating URL

        Comment


        • #5
          Originally posted by Bioo Scientific View Post
          Hi Kmcarr,

          In the set of experiments we describe in the white paper describing the NEXTflex qRNA-Seq Kit our goal was to compare unique fragments determined by reads with unique start and stop sites verses molecular indexing. Figure 5 illustrates under representation of high expressing ERCC and selected mRNAs using start and stop sites as indicators of unique fragments. When molecular indices are used (blue line), the under representation at the high expressing genes is clear. We’re happy to share the individual ERCC RNAs and mRNAs studied if anyone is interested. In this study we didn’t focus on PCR duplicates, in fact we purposely performed the experiment with as few PCR cycles as possible to demonstrate the differences between copy number. We are working on data that describes the benefit of this technology for PCR duplication using Chip-Seq and other applications were PCR cycles are typically higher than they should be.

          Cheers,
          Dawn
          As described, the benefit of using this kit is to permit the researcher a sensitive method to distinguish whether fragments with the same start-end positions arose from distinct cDNAs (the "8 reads, 8 fragments" scenario of Figure 2) or from PCR duplication (the "8 reads, 4 fragments" scenario).

          In the paragraph at the bottom of page 4 (below Figure 3) it states (emphasis mine),
          Therefore, when multiple reads mapping to the same transcript are encountered, it is not possible to determine whether sequenced reads originate from the same or different cDNA molecule. As a remedy to this re-sampling problem, many researchers evaluate whether or not each read has the same start and stop mapping coordinates. Reads with identical start and stop positions are usually assumed to be clonal duplicates derived from the same parent molecule.
          The problem of distinguishing "reads originate from the same or different cDNA molecule" IS the issue of PCR duplication. This whole study focuses on a protocol to distinguish fragments arising from PCR duplication from fragments arising from distinct but identical cDNAs prior to amplification. The paper then makes the assertion that reads with identical start-end coordinates are "usually assumed to be clonal (i.e. PCR) duplicates". That is a reasonable assumption to make for genomic DNA sequencing but not for RNA-Seq, and it's an assumption I never make for RNA-Seq; in fact I assume just the opposite. I never perform any duplicate removal if I am doing an RNA-Seq experiment involving counting reads.

          Both panels in Figure 5 need a third line added showing the "Total reads" for each of the ERCC controls or mRNA species. If the "Total reads" curve is not significantly different than the "Molecular indexing" then using this protocol for RNA-Seq doesn't add much.

          Comment


          • #6
            Hello,

            I have a question regarding the molecular indicies and demultiplexing them (maybe I'm missing something, let me know if I am).

            If one constructs an RNA-Seq library using this kit they should be able to count each individual read based on the stochaistically attached molecular indicies. However, when one is aligning the reads to the genome, or other reference, does one need to demultiplex the molecular indicies? Or will they not interfere with the alignment to the reference?

            Thanks,
            CH

            Comment


            • #7
              Hi CH,

              Here is the analysis workflow we recommend:

              1) quality control
              2) sample demultiplexing
              3) remove 5' stochastic label and 3' adapter sequence (if any)
              4) map to hg19 refseq gene library and filtered for reads with MAQ>30
              5) add back stochastic labels to mapped reads
              6) count the number of reads, the number of unique stochastic labels
              7) summary/plot

              We can send you a whitepaper that contains additional bioinformatics analysis information if you would like us to.

              Regards,
              Bioo Scientific

              Comment


              • #8
                Originally posted by Bioo Scientific View Post
                Hi CH,

                Here is the analysis workflow we recommend:

                1) quality control
                2) sample demultiplexing
                3) remove 5' stochastic label and 3' adapter sequence (if any)
                4) map to hg19 refseq gene library and filtered for reads with MAQ>30
                5) add back stochastic labels to mapped reads
                6) count the number of reads, the number of unique stochastic labels
                7) summary/plot

                We can send you a whitepaper that contains additional bioinformatics analysis information if you would like us to.

                Regards,
                Bioo Scientific
                Hi,

                Do you recommend doing a sequence trim and QC control BEFORE building a count table of unique reads?

                Another question: It seems to me that an improvement over the above method might be to simply build a count table of unique reads PRIOR to mapping reads to the reference. In other words, if one can identify the unique reads based on the sequence ends & the molecular indices, whay not just map those reads and use the RAW count data as the quantitative data? What am I missing? The reason I suggest this approach, is that wouldn't this save the bother of trying to track all the reads and then re-add the stoichastic labels?

                Thanks,
                Andor

                Comment


                • #9
                  re-adding stochastic labels

                  Can I ask for specifics about how, exactly, you re-add the stochastic labels (which I assume is the same as the 8bp molecular index at the start) to "mapped reads" (= Bam file?? Sorted how?)?
                  Or do you mean simply using the previous FASTQ files without the molecular index removed? If you have a specific script to do this, is it possible to share it? I am happy to develop one myself but no sense re-doing something that's been done already.

                  Comment


                  • #10
                    Hi dan,
                    You can find a script for analysis of qRNA-Seq data here under the Resources tab. Please email us at [email protected] if you have any questions.

                    Comment


                    • #11
                      thanks for the response...

                      Thanks for the response, but this does not answer my questions and nor does any of the information in the qRNA-Analysis.pdf white paper. Some of the terminology in the description of the dqRNASeq script is quite unclear and I'm still not clear about several things. The reason I'd prefer to have these questions on SeqAnswers is because then other people dealing with the same problem can see the solution. Specifically I'd like to know:
                      • Is BWA the required mapping tool or is Bowtie2 going to work too (a mapping is a mapping, right?
                      • Are BAM files to be used with the dqRNASeq meant to be sorted in any particular way?
                      • Where and when are you supposed to remove the 8bp molecular index from input FASTQ files, and when you say "stochastic label" is this the same thing as "molecular index"?
                      • Where and when are you supposed to use FASTQ files with molecular index still attached and/ or with molecular index removed? (I'm assuming that the mapping must be done without the molecular index, as it would result in mismatches if it was still attached)?
                      • When you mention "add back stochastic labels to mapped reads" what exactly do you mean (add back to the mapped reads in the BAM file? add back to the input FASTQ files? use input FASTQ files that haven't had the molecular index removed?)

                      If this script is not being maintained that's fine, but please let me know so that I can develop my own. I think the technology of molecular indexing is very useful and I'm keen to get resources into the public domain for easy analysis.

                      Comment


                      • #12
                        I understand and agree with your point about wanting to keep this discussion public. I will consult with my colleagues and post answers to your questions on Monday. I can tell you now that molecular index and stochastic label (STL) are used interchangeably, my apologies for the inconsistency.

                        We are working on an updated script for analysis of molecular indexing data, but we would value your ideas and input.

                        Comment


                        • #13
                          Much appreciated. I think it's a very exciting technology. Easy access to well-documented analysis tools would probably increase uptake.

                          Comment


                          • #14
                            Answers to your questions, as promised:

                            -Bowtie2 or any other aligner that produces a bam file is fine.
                            -Sorting is not necessary.
                            -Stochastic labels should be removed from the FASTQ prior to alignment (Bowtie also has an option to trim bases as part of the alignment command). The terms stochastic label and molecular index are used interchangeably.
                            -FASTQ files with the molecular indexes removed should be used for aligning. FASTQ files with the indexes still present should be input into the dqRNASeq script.
                            -Adding back of stochastic labels is performed by the dqRNASeq script in order to identify true PCR duplicates. Both start/stop site information and stochastic label information is required for proper PCR duplicate removal, which is why both mapped data and molecular index data is necessary.

                            Comment


                            • #15
                              Originally posted by danwiththeplan View Post
                              Thanks for the response, but this does not answer my questions and nor does any of the information in the qRNA-Analysis.pdf white paper. Some of the terminology in the description of the dqRNASeq script is quite unclear and I'm still not clear about several things. The reason I'd prefer to have these questions on SeqAnswers is because then other people dealing with the same problem can see the solution. Specifically I'd like to know:
                              • Is BWA the required mapping tool or is Bowtie2 going to work too (a mapping is a mapping, right?
                              • Are BAM files to be used with the dqRNASeq meant to be sorted in any particular way?
                              • Where and when are you supposed to remove the 8bp molecular index from input FASTQ files, and when you say "stochastic label" is this the same thing as "molecular index"?
                              • Where and when are you supposed to use FASTQ files with molecular index still attached and/ or with molecular index removed? (I'm assuming that the mapping must be done without the molecular index, as it would result in mismatches if it was still attached)?
                              • When you mention "add back stochastic labels to mapped reads" what exactly do you mean (add back to the mapped reads in the BAM file? add back to the input FASTQ files? use input FASTQ files that haven't had the molecular index removed?)

                              If this script is not being maintained that's fine, but please let me know so that I can develop my own. I think the technology of molecular indexing is very useful and I'm keen to get resources into the public domain for easy analysis.
                              We (my group) are almost finished a tool to be used with this technology (Molecular Indicies, aka STL) approach. We are using a different approach and will shortly have this available - currently we are in beta testing. The tool will be a Plug-In for the CLC Genomics Workbench and will be very easy to use GUI based tool. Our Plug-In will include a tutorial, help files, and example files. We think our approach will be much easier for most biologists and bioinformaticians than the current script. PM for more details.
                              Last edited by cement_head; 10-26-2015, 04:08 AM. Reason: clarity, spelling

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X