Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting the SAV files?

    Hello everyone,

    I am entering the world of genomics service providers and I love it. My group relies a lot on the SAV files to check the data, followed by the classic QC pipeline that includes demux, FastQC, MultiQC and FASTQCscreen. There are a few bits and bobs of the SAV data that I don't necessarily understand and I was hoping that someone here may help.

    1) The top and bottom surfaces of a flow cell. I know that the flow cells do not get imaged from top and bottom as they are positioned on a dark support. Therefore, the image can only come from the top, I am assuming. Also, the sample library and the reagents are only pumped through the flow cell. I do not understand what is the purpose of the image of the intensities of the "bottom" surface of the flow cell. What do they mean exactly with top and bottom? Why would we be interested in having both?

    2) For example in a MiSeq run, in the summary of the data, we get the data for Read 1 (ok), Read 2 (?), Read 3 (?) and Read 4 (ok). I understand that under the reads 1 and 4 we can find the metrics for the forward and reverse reads if paired end sequencing, but what is Read 2 and Read 3?

    Thanks to whoever will reply,

    G.

  • #2
    Hello GSeq94 ,

    I'd be happy to help as best as I can.

    1) Flow cells in sequencing platforms are typically designed with two surfaces: the top surface and the bottom surface. During sequencing, the DNA fragments or libraries are loaded onto the flow cell, where they bind to complementary oligonucleotides on the surface. The sequencing process involves imaging the clusters of DNA fragments on the flow cell to determine their sequences.
    The term "top" and "bottom" surfaces refer to the two sides of the flow cell where the DNA clusters are imaged. The top surface is the side facing the imaging system, and it is the primary surface used for sequencing. But the bottom surface is not imaged directly and is often used for calibration purposes. When we measure the intensities on the bottom surface, the sequencing system can assess the background noise and adjust the signal detection accordingly to enhance the accuracy of base calling on the top surface.

    Long story short, having both top and bottom surfaces allows for improved data quality and accuracy in sequencing. The information from the bottom surface helps in calibrating and optimizing the sequencing process, leading to more reliable results.

    2) In the context of MiSeq sequencing, Read 1 and Read 4 correspond to the forward and reverse reads, respectively, in a paired-end sequencing setup. These reads capture the sequences from the two ends of the DNA fragments. They are typically used for assembling the full sequence and mapping the reads to a reference genome.

    Conversely, Read 2 and Read 3 are specific to the MiSeq platform and represent the index reads. Index sequences, also known as barcodes, are short DNA sequences added to the library molecules during preparation. These barcodes allow for multiplexing, where multiple samples can be pooled and sequenced together in a single run, with each sample identified based on its unique barcode. Read 2 captures the index sequence associated with the forward read, while Read 3 captures the index sequence for the reverse read.


    I hope this helps!​

    Comment


    • #3
      Dear GenomicSeq, thanks so much for your help.

      I now understand the flow cell and the imaging process a lot better and also shared your response with my colleagues!

      I do have, however, a few doubts on the 2) question. Do the metrics for read 2 and read 3 represent the metrics for only the indexes? For example, how many times the software has sequenced a certain barcode? Why is this information important to me? What would I do with this?

      Thank you so much!!

      Comment


      • #4
        Hi @GSeq94,

        Just wanted to throw my 2 cents in:

        1.) I think both surfaces are imaged to improve the yield of the flow cell depending on the kit type.
        2.) Indexed Sequencing Overview for Illumina Systems may be a helpful reference.
        • Indexed reads only provide data to decode your multiplexed samples, so yes, only metrics for the indices.
        • Index length is typically 8 cycles, forward and reverse for dual indexing, 16 cycles of indexing total in that case.
        • This is important for multiplexing samples which provides ability to run multiple samples for a single run. This allows for efficient/cost effective use of the system's yield to address the amount of coverage needed for desired workflow.
          • Ex: System/kit spec is ~360G of yield, huWGS needs around 120G of yield for decent coverage. If I ran 1 sample on that kit, I would be "wasting" 2/3's of the yield on extra coverage. So instead, you run 3 samples with different indices on them to split that total yield 360/3 = 120G for each sample, happy day $.
        Happy sequencing!

        Comment


        • #5
          Hi SeqBuddy thanks for the additional explanation!

          Happy sequencing to you too!

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X