No announcement yet.

Please explain the concept of "spike in controls" for NGS analysis

  • Filter
  • Time
  • Show
Clear All
new posts

  • Please explain the concept of "spike in controls" for NGS analysis

    As a Bioinformatics person, I am currently having trouble in grasping the concept and biology behind using "spike in controls" for various kinds of next gen sequencing experiments.

    From what I have read, I understand that it is used as a quality control measure , however I am struggling to understand how.

    If someone could explain this concept in fairly simple language, it would be greatly appreciated.

    Thank you so much.

  • #2
    This should be simple enough:

    Even though this refers to microarrays the principle stays the same for NGS.


    • #3
      This looks relevant too:
      Synthetic spike-in standards for RNA-seq experiments.


      • #4
        Your confusion around 'how' to use spike-in controls as a QC tool is understandable. It seems that the majority of users are in the dark as to how to use the data.

        The first step is to understand what was spiked in, when, and how. In the basic case, the spike in consists of a pool of synthetic transcripts that cover a range of concentrations. For example, there may be 1 molecule of A, 2 of B, 32 of E, and so on. You would hope and expect to find that the number of counts in an RNA-seq experiment would mirror the concentration range - if you do not, you know that something went very wrong in your experiment.

        Using more complicated spike-in mixtures, you can get more interesting information. The 'exfold' pools allow you to determine a measure of confidence in Ratio Detection, so you can finally give an answer (albeit a very qualified one) to the question of "where should I set my FPKM cutoff?" These and other analyses that can be performed with spike-in data will be detailed in an upcoming (est. 1-3 months) "ERCC dashboard" paper and automated in an R package.