Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • First MiSeq: Some questions

    We are half-way through our first MiSeq run (NexteraXT v3 2x300bp). Sample is 16 plex of genomic DNA.

    Some questions about the BaseSpace reporter:

    1. Is it normal for the "intensity" to increase with cycle number? (from 400 to 800 for C and T bases)
    2. Is it normal for the % base composition to drift over the run? (from 20% C to 28% C; yet from 20% G to 16% G).

    What could cause the latter?

    Median Q-score seems OK (I think!).

    I've also copied the run summary so far for further info (cluster density = 1400, 1.5% mapped PhiX, etc)

    Here are some screen grabs.

    Cheers!





  • #2
    Base composition can drift, particularly if you have high adapter contamination due to short insert sizes. Light output is scheduled to increase at some point to compensate for the gradually reduced signal intensity of the clusters. I think for HiSeq that happens in a sudden jump near the middle; not sure about MiSeq.

    Comment


    • #3
      Thanks Brian. Since the % base drifts from cycle 150 onwards, does this suggest we have some quite short inserts then? Wouldn't Q-score be expected to tank pretty rapidly after that though? (get to ends of molecules)

      The intensity is increasing gradually during the whole run (see attached fig2)

      The opposite that is seen in a PhiX 2x151 that I was browsing.

      Originally posted by Brian Bushnell View Post
      Base composition can drift, particularly if you have high adapter contamination due to short insert sizes. Light output is scheduled to increase at some point to compensate for the gradually reduced signal intensity of the clusters. I think for HiSeq that happens in a sudden jump near the middle; not sure about MiSeq.

      Comment


      • #4
        Originally posted by M4TTN View Post
        Thanks Brian. Since the % base drifts from cycle 150 onwards, does this suggest we have some quite short inserts then? Wouldn't Q-score be expected to tank pretty rapidly after that though? (get to ends of molecules)

        The intensity is increasing gradually during the whole run (see attached fig2)

        The opposite that is seen in a PhiX 2x151 that I was browsing.
        That PhiX 2x151 was undoubtedly run using a MiSeq v2 kit whereas you are using a v3 kit. The intensity ramp up profile is different between v2 and v3. Your intensity plots look normal for a v3 run.

        You are correct that the skewing base composition means you have a lot of short inserts (<< 300bp). Quality scores will not degrade though until the read is all the way through the Illumina adapter (65-70bp after the end of the insert) and runs out of template.

        Now my 2¢: The Nextera XT kit is horrible if you want to use the libraries for longer reads like a MiSeq PE300. There is very poor control of size distribution of library molecules using this method and you end up with a set of libraries with very divergent size distributions; some very short and some very long.

        Comment


        • #5
          @kmcarr: All good to know. We'll see how are final read distribution looks. We don't especially care about long reads. Priorities are simple workflow for sample prep and "enough" coverage for multiplexed samples. Time will tell whether it is more cost effective to multiplex on 2x75 or 2x300.

          In terms of the goal, the resulting reads will need to be aligned and SNP-called compared to a reference. In principle we will have >30x coverage of each sample. But if some reads are much shorter, this might take a hit.

          Comment


          • #6
            Originally posted by M4TTN View Post
            @kmcarr: All good to know. We'll see how are final read distribution looks. We don't especially care about long reads. Priorities are simple workflow for sample prep and "enough" coverage for multiplexed samples. Time will tell whether it is more cost effective to multiplex on 2x75 or 2x300.

            In terms of the goal, the resulting reads will need to be aligned and SNP-called compared to a reference. In principle we will have >30x coverage of each sample. But if some reads are much shorter, this might take a hit.
            Extensively or completely overlapping reads will definitely eat into your true depth of coverage. You also want to make certain that your variant caller is not counting overlapping bases from the same read pair twice in its assessment of SNPs. You could merge overlapping pairs into a pseudo, long single read prior to mapping which will correct for the proper coverage by unique library fragments, but if you go that route you must be certain that the overlapping program deals intelligently with mismatched bases in the overlap region.

            Comment


            • #7
              @kmcarr Perhaps I should clarify. Each barcoded sample should have a single genotype. We will be calling SNPs against a reference genome. THus, I don't *think* overlapping reads matters...

              Separately: I assume that the wobbly %base distribution at the start of each read is due to the MiSeq reading the tagmentation inserts. However, I cannot sfind any mention by Illumina that this is the case (and that, with NexteraXT, one actually sacrifices 19 bp of high quality read at the beginning).

              Does anyone know if it is possible to use a custom primer to circumvent this issue?

              Comment


              • #8
                Originally posted by M4TTN View Post
                @kmcarr Perhaps I should clarify. Each barcoded sample should have a single genotype. We will be calling SNPs against a reference genome. THus, I don't *think* overlapping reads matters…
                Aligning overlapping reads to a reference does matter because if a variant is observed in the region where R1 and R2 overlap it is important to understand that this does not represent two independent observations of a variant from two different fragments, but duplicate observations of the variant on the same fragment. These two situations must be modeled differently by SNP calling software.

                Separately: I assume that the wobbly %base distribution at the start of each read is due to the MiSeq reading the tagmentation inserts. However, I cannot sfind any mention by Illumina that this is the case (and that, with NexteraXT, one actually sacrifices 19 bp of high quality read at the beginning).

                Does anyone know if it is possible to use a custom primer to circumvent this issue?
                The non-random base calls are not adapter. They demonstrate the the bias in base composition preference for the Nextera tagmentase enzyme. There was a paper a while back carefully examining biases in various fragmentation strategies:

                Adey, A., Morrison, H. G., Asan, Xun, X., Kitzman, J. O., Turner, E. H., et al. (2010). Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome biology, 11(12), R119. doi:10.1186/gb-2010-11-12-r119

                Comment


                • #9
                  Thanks kmcarr. So, in principle, these regions will map like any other. We'll find out soon.

                  Regarding your first point: I still think I am not being clear. We are not looking for SNPs within each barcoded sample. Each barcoded sample is a different unique haplotype of homogenous sequence, which will map against a reference. Each barcoded sample will therefore have numerous SNPs relative to the reference (And to each other), but there should be no SNPs present within the reads of any given barcoded sample.

                  Does that make sense?

                  Comment


                  • #10
                    The run finished: Here is the completed Median Q-score plot. Read 2 (read4) seems quite a lot worse than Read1. Is that normal?

                    Comment


                    • #11
                      Cluster density of 1400 is definitely pushing the limits for V3 chemistry. I think the drop you are seeing on read 2 (if these are plain genomic DNA samples) is likely due to overloading.

                      As has been discussed on the forum before you can push the limits of cluster density to surprising extent (beyond illumina's published specifications which are ~850 for phiX for v3) but when you cross a certain limit, the drop in quality is precipitous (akin to what you are seeing on read 2). You probably want to stay around 1100 for the best results.

                      Comment


                      • #12
                        When you have a chance please post the FastQC plots for this run.

                        Comment


                        • #13
                          @GenoMax: sorry of only just now seeing you post. What are FASTQC plots?

                          Comment


                          • #14
                            quality control plots of your data using the FastQC software, see

                            Comment


                            • #15
                              Is the Median Q-score (generated by MiSeq/basespace) similar enough?



                              I don't have the FASTQ to hand.
                              Last edited by M4TTN; 02-03-2015, 06:32 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 10:49 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-25-2024, 11:49 AM
                              0 responses
                              23 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X