Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Illumina MiSeq 16S run quality

    Hi SEQ-users,

    We are currently following the Illumina Demonstrated Protocol for 16S sequencing on the MiSeq (24-96 samples) for stool & saliva samples. We are experiencing some inconsistencies with the results of the run metrics (e.g. %Q30 ranges between 64 - 85%, cluster density between 421 to 1328 K/mm2 etc) between each run.

    I was just wondering for those who use the same protocol,

    1. What does your 16S MiSeq sequencing run look like? (in terms of %Q30, cluster density, % aligned etc).
    Have you set specific run metrics for run acceptance?
    (I have attached the run summary of our recent run, let me know your thoughts.)

    2. What is the minimum number of sequences you process per sample?
    Apart from the Illumina document, do you know any publication that recommends a certain number of reads per sample?

    3. We are using USEARCH for quality filtering before assembly. However we get very low R2 reads. Would you recommend other quality filtering tools?

    Any answers or suggestions would be greatly appreciated.

    Thank you in advance.
    Attached Files

  • #2
    Are you spiking in phiX and if so at what concentration? Are these same libraries being run repeatedly or different sample pools? What method are you using for estimation of concentration? Do you expect the reads to overlap and are you using any software to do the read merge before you quality trim?

    Comment


    • #3
      Originally posted by GenoMax View Post
      Are you spiking in phiX and if so at what concentration? Are these same libraries being run repeatedly or different sample pools? What method are you using for estimation of concentration? Do you expect the reads to overlap and are you using any software to do the read merge before you quality trim?
      Yes. For the attached run summary, I spiked-in 10% of 20pM PhiX and loaded 3.5 pM library. We use KAPA qPCR to quantify the libraries before the normalization step & pooling. We have run the same samples twice, some of them this is the 3rd time. We sequenced the V3-V4 region (around 466 bp stretch) and used 2x300 PE reads (miseq v3 kit). We expect the reads to overlap at least 40 bp.

      I am not really an expert in Bioinformatics but the way our bioinformatician set-up our pipeline is to
      1. validate first the DNA sequences from the miseq using USEARCH by quality filtering each R1 & R2 separately). Then once they pass the quality filtering, they will go to the next step which is...
      2. bacterial classification by cleaning, clustering, taxonomic assignment, building of abundance matrix using UPARSE.

      R2 rarely pass the quality filtering as we usually get only about 4000 reads per sample that pass.. Is this too low?

      Are we doing it differently from all the others? Is there a better way?

      Comment


      • #4
        Our 16S runs using the V4 region generally have Q30 ~ 85-95%, density of ~1000K/mm2 with no PhiX spike. I'll check with on how much we load.

        No specific run metrics for acceptance, we use the standard Illumina "does it pass spec" criteria. For 16S, you really don't need many reads per sample as you will rarify later in the analysis anyways. We aim for 100k reads per sample just to make sure most/all of them will have enough to be included, but the saturation curves generally plateau very quickly (even as low as 4-6k reads). Look at the HMP papers.

        We use the fastq-join utility to join reads (it's a quality score aware joiner, so low Q score pairs will be discarded). Is it possible that you're setting the quality filter too strict?
        Attached Files

        Comment


        • #5
          Originally posted by MiSeqUserLUX View Post
          R2 rarely pass the quality filtering as we usually get only about 4000 reads per sample that pass.. Is this too low?
          That is odd. Have you run FastQC on these? Can you post Q-score plots for read 1 and 2? Do you know what Q-score cut-off your informatics people are using?

          Comment


          • #6
            We expect the reads to overlap at least 40 bp.
            This might be part of your problem. I'm not a bioinformatics person, but I usually see a much larger overlap recommended (except by Illumina). The ends of Read 1 and Read 2 (especially) are much lower in quality than the start. If you only have a small overlap, you are trying to stick together two bad quality sections of your reads, which causes problems.

            I spiked-in 10% of 20pM PhiX and loaded 3.5 pM library
            I'm confused by this also. I spike 10% of 12.5 pM PhiX into a 9.5 pM library and see runs similar to the one you linked (cluster density ~900K, 10% phiX aligned). You're loading a lot more phiX and a lot less library, and only seeing 15% align.
            Last edited by microgirl123; 05-11-2015, 10:12 AM.

            Comment


            • #7
              If you only have a small overlap, you are trying to stick together two bad quality sections of your reads, which causes problems.
              We perform 2x300 bp PE reads for a 466 bp amplicon (we have an overlap of around 140 bp before the quality filtering). After quality filtering, we expect the reads to overlap by at least 40 bp. Then we will perform merging and classification. In this case, is 40 bp overlap after quality filtering still too small? Or is it enough?

              Do you know what Q-score cut-off your informatics people are using?
              Is it possible that you're setting the quality filter too strict?
              The quality reads selection criteria that our bioinformatician has set are as follows:
              • Expected error of global reads sequence < 1
              • Each reads nucleotide Q score > 3
              • Length > 250 bp (to have an overlap > 40bp after quality filtering)

              Is this too strict or just right? As per our bioinformatician, the Qscore and Expected error values are the ones recommended by Uparse developers and in the Uparse publication.

              An example of our quality filtering result is attached.

              I have also just ran a FASTQC on one of the samples. Attached are the results.

              Any thoughts?
              Thank you in advance.
              Attached Files

              Comment


              • #8
                I think you meant to say Q30 (not 3) since your data does not seem to have any reads below Q5.

                If that is indeed Q30 (and above) then it seems to be a very stringent filter. Since the reads are expected to overlap perhaps the merging should be done prior and Q-score used as a criteria to keep the base with the higher quality (if the merge is not perfect). Look into BBMerge (http://seqanswers.com/forums/showthread.php?t=43906) or FLASH as options http://ccb.jhu.edu/software/FLASH/.

                Comment


                • #9
                  Thank you GenoMax and thank you all for your replies.
                  I had a look at our pipeline closely and indeed there was something that needs to be fixed (UPARSE workflow recommends merging of paired reads first before read quality filtering.) So your right, merging needs to be done first.

                  For some reasons I don't know why our bioinformatician set-up the pipeline this way:

                  STEP 1. reads quality filtering of R1 and R2 separately (this is where a lot of our reads are discarded and the bioinformatician tells me that the MiSeq data are not usable)

                  IF the sequences pass STEP 1,
                  then what will be done is step 2...

                  STEP 2. back to scratch>> merging of paired reads, read quality filtering.... assembly....

                  I believe starting from step 2 would be sufficient.

                  Comment


                  • #10
                    Hello, Fanli, I like your result of 16s V4 Miseq run. I would like to know how much you loaded? And the size of your library is ?

                    Thanks.

                    Comment


                    • #11
                      We load 8.0 pM library using the 515F/806R primers detailed here:


                      Edit: 1.8pM was for NextSeq runs
                      Last edited by fanli; 07-30-2015, 06:44 AM.

                      Comment


                      • #12
                        Fanli, thank you for your information. Two more questions, did you use Miseq V2 kit for this 16s V4 run? Why no Phix spike in(how do you decide no Phix, any protocol mentioned or you tested it out? )? I want to change my protocol, but I would like to know why. Thank you very much.

                        Comment


                        • #13
                          Yes, these numbers are for v2 kits. We've found that there's little difference with a small PhiX spike on our particular MiSeq, but I don't really see the harm in doing something in the 5% range. You generally aren't going to be constrained for sequencing depth with 16S anyways.

                          Comment


                          • #14
                            Thank you, Fanli.

                            Comment


                            • #15
                              Not meaning to hijack the thread, but can anyone explain why we see such low 1st cycle intensities with 16s libraries? I see this in both v3-v4 and v4 only libaries, perhaps due to low diversity? If I look at run summary from targeted reseq or phix, then the 1st cycle intensities are comparable and normally in the 300-400 range, but 16s runs are usually <50. Thanks for any insight.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              37 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              31 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X