Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Poor seq quality due to low diversity sample

    Hi,

    I have some sets of HiSeq data that I am analyzing and the sequencing quality turned out quite bad. I attach the "per base seq quality" diagram and the "per tile seq quality" diagram for one of those sets, generated using FastQC.

    I contacted the service provider, and they say it's due to my sample having low diversity especially at the beginning. (I also attached the seq content diagram.)
    Based on some searches and reading of Illumina tech notes, I see that the diversity at the first several bases is quite important for the system to "calibrate" correctly for quality base calls for later bases.
    My first question is, is this roughly a correct interpretation? And is there any way to "post-process" maybe the raw(er) data to correct/improve the seq reads?

    Second, what I still don't understand is why does it affect the per tile seq quality? How does the low diversity at initial bases have anything to do with the spatial variation on seq quality?

    What do you guys think?
    What should I argue when replying to my service provider? Should I ask for a re-run?

    Any note will be greatly appreciated!
    Thanks.
    Attached Files
    Last edited by invu; 04-22-2020, 11:06 AM.

  • #2
    Yikes this is a really low diversity sample. Do you know how much phiX (if any) was added to this sample. Did you not tell the sequence provider that these were low diversity? If you did not then it would be hard to make a case for them to re-sequence this sample again for free. You may have to pay for a re-run with a significant % of phiX (10-20% or more), if you want to get improved Q-scores.

    It is possible that in spite of the bad Q-scores etc your sequence may still be usable. Have you looked at that?

    Comment


    • #3
      Originally posted by GenoMax View Post
      Yikes this is a really low diversity sample. Do you know how much phiX (if any) was added to this sample. Did you not tell the sequence provider that these were low diversity? If you did not then it would be hard to make a case for them to re-sequence this sample again for free. You may have to pay for a re-run with a significant % of phiX (10-20% or more), if you want to get improved Q-scores.

      It is possible that in spite of the bad Q-scores etc your sequence may still be usable. Have you looked at that?
      Thanks for your reply, GenoMax!
      The sample is a custom set of sequences with well-defined regions (hence those low-diversity regions). I had declined PhiX spike-in to obtain as many valid read lines as possible w/o sacrificing any to PhiX. I hadn't told them about the diversity because I had no idea about this kind of issue before; that being said, my old results for samples similar to this (even though they did have a few degenerate bases at the beginning) didn't have this problem (at least weren't as bad as this). Will I really need PhiX if I get to repeat something like this? Which way will I lose more data -- 10-20% loss by PhiX or less well-defined loss by poor quality reads like this?

      I am looking at the data, and a big portion of the lines do seem valid and usable, but again, I'd need more lines to be ideal, and more importantly, even among those lines that apparently look okay, if more base call errors were caused by this issue, then that's a separate problem, which is quite hard to tell just from looking at those other lines.

      Do you happen to know if someone looks at the rawer data (e.g., imaging data? if they're preserved? sorry I'm not really familiar with the details of the seq machines..) whether they could correct or improve the base calls throughout the seq data even now? Or is everything done real time by the machine and there's nothing that can be done to improve this?
      Also, do you know if this issue caused by low diversity would also cause the tile-dependent quality loss as shown in my diagram? (This is something I am having hard time in understanding, and something I'm trying to argue about..)
      Last edited by invu; 04-22-2020, 02:33 PM.

      Comment


      • #4
        You really should have asked for phiX to be added. You should consider the fact that this run could have completely failed, if it was a bit overloaded, leaving you with no data. Raw image data is generally not stored now-a-days so there is not much you can do with it afterwards. If you need more data consider sequencing an additional lane rather than taking a chance like this.

        Comment


        • #5
          Originally posted by GenoMax View Post
          You really should have asked for phiX to be added. You should consider the fact that this run could have completely failed, if it was a bit overloaded, leaving you with no data. Raw image data is generally not stored now-a-days so there is not much you can do with it afterwards. If you need more data consider sequencing an additional lane rather than taking a chance like this.
          Ha, I see. Lesson learned. Thanks for your help, GenoMax!

          Comment


          • #6
            If these are amplicon libraries and you want to minimize the amount of PhiX you can add "stagger" or "offset" nucleotides between the illumina sequencing primer region (like the nextera or truseq tail) and your locus-specific primer in order to create diversity of bases. These stagger nucleotides can also be added to restriction-digests adapters to increase base diversity.

            I always add staggers to my amplicon primers and sequence multiple amplicons per run to increase diversity but I still always add 5-12% Phix just to be sure.

            Comment


            • #7
              Originally posted by ATϟGC View Post
              If these are amplicon libraries and you want to minimize the amount of PhiX you can add "stagger" or "offset" nucleotides between the illumina sequencing primer region (like the nextera or truseq tail) and your locus-specific primer in order to create diversity of bases. These stagger nucleotides can also be added to restriction-digests adapters to increase base diversity.

              I always add staggers to my amplicon primers and sequence multiple amplicons per run to increase diversity but I still always add 5-12% Phix just to be sure.
              Thanks, ATϟGC, that's a good suggestion.
              Looking back, the adapter-primers that I had used for my older runs when I didn't have this issue, did have some degenerate bases in between for different purposes and I think that was key in preventing this issue.

              Still adding a minimal portion of PhiX is a good suggestion, too.
              Thanks!!

              Comment


              • #8
                Originally posted by invu View Post
                Thanks for your reply, GenoMax!
                The sample is a custom set of sequences with well-defined regions (hence those low-diversity regions). I had declined PhiX spike-in to obtain as many valid read lines as possible w/o sacrificing any to PhiX. I hadn't told them about the diversity because I had no idea about this kind of issue before; that being said, my old results for samples similar to this (even though they did have a few degenerate bases at the beginning) didn't have this problem (at least weren't as bad as this). Will I really need PhiX if I get to repeat something like this? Which way will I lose more data -- 10-20% loss by PhiX or less well-defined loss by poor quality reads like this?

                I am looking at the data, and a big portion of the lines do seem valid and usable, but again, I'd need more lines to be ideal, and more importantly, even among those lines that apparently look okay, if more base call errors were caused by this issue, then that's a separate problem, which is quite hard to tell just from looking at those other lines.

                Do you happen to know if someone looks at the rawer data (e.g., imaging data? if they're preserved? sorry I'm not really familiar with the details of the seq machines..) whether they could correct or improve the base calls throughout the seq data even now? Or is everything done real time by the machine and there's nothing that can be done to improve this?
                Also, do you know if this issue caused by low diversity would also cause the tile-dependent quality loss as shown in my diagram? (This is something I am having hard time in understanding, and something I'm trying to argue about..)
                I'd have to agree with GenoMax; super-important to have a consultation with the sequencing center about the library composition and ask them what they recommend. You probably should have had 10% PhiX spike-in added. HiSeq are terrible at dynamic calibration - MiSeqs are better (to a point).

                Comment


                • #9
                  Originally posted by cement_head View Post
                  I'd have to agree with GenoMax; super-important to have a consultation with the sequencing center about the library composition and ask them what they recommend. You probably should have had 10% PhiX spike-in added. HiSeq are terrible at dynamic calibration - MiSeqs are better (to a point).
                  I see. Next time I will consider PhiX spike-in. Thanks, cement_head!

                  Comment


                  • #10
                    I agree that would be best to discuss these issues with your sequencing provider.

                    If you do choose to use staggered bases I recommend making an alignment to check for base diversity in the first 12-20 base pairs of read1. This alignment should be made with respect to the Illumina sequencing primer. For my amplicon libraries, this means I anchor it on the left by the Nextera Read1 sequences. You then only need to consider the base diversity of your staggered and/or unstaggered (I use a mix of both in my round 1 PCR reactions) primers or adapters. I do this in microsoft excel so that I can calculate and optimize base diversity of all the amplicons that will be pooled in my run.

                    Adding stagger bases has the potential to introduce biases in your libraries due to secondary structures or other priming phenomena. If you use the same mix of staggers for all samples the bias should be the same in theory.

                    I have only sequenced amplicons on Miseq and Novaseq and 5-12% PhiX has been enough for me with those platforms so I cannot comment on Hiseq.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    51 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X