Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Disappearing Clusters - What is going on???

    We recently submitted a ChIP-seq library (avg fragment size=700bp) for sequencing with our core facility (HiSeq 2000), and a weird problem came up - half the clusters disappeared after the first cycle!

    Here is how the core described the problem to us:

    "The first cycle was good but starting from the second cycle, the cluster number drastically went down (e.g. the first cycle has 60 million clusters but the second cycle only has 30 million clusters). It is almost as if starting from the first cycle, the fluorescent labeled nucleotides were only incorporating to half of the clusters, and the other half stayed dark because they are not base pairing with the nucleotides. However, other lanes were fine. And I know the clusters are still there because all of them showed up again in index sequencing."

    Because we thought there was a problem with the machine we re-ran the samples but we got exactly the same issue.

    Does anyone have any idea why this might be happening? Could it have something to do with the larger-than-usual fragment size?

    Any suggestions for troubleshooting this problem?

  • #2
    As a member of a core that runs a HiSeq, I have to say the description your core is giving doesn't make sense.

    The HiSeq doesn't really count clusters until cycle 4. I mean it gives a rough estimate of the number of clusters in the "First Base Report (FBR)", after cycle one. But there is no "second base report". So it isn't clear to me how your core is assessing the number of clusters as being "60 million" in the first cycle and "30 million" in the second cycle.

    The only thing I can think of is that someone in your core is monitoring the scan thumbnails and noticed a difference in the apparent density between cycle 1 and cycle 2. But because the images are broken up into 4 channels (one for each base), it would not be trivial to determine that 1/2 of your clusters had "gone dark" during cycle 2. So, what may be happening is that there is some difference in the base composition of cycle 1 and 2. That is, if they are looking at the apparent cluster density of cycle 1 in channel "A" and then see that only about 1/2 of the number of clusters that were "A" in cycle 1 are "A" in cycle 2. This is fairly common and not a cause for concern.

    The fragment size would not cause the issue you describe. We have sequenced libraries on the HiSeq with an average insert size of over 1kb.

    You didn't mention what your actual yield was after the run completed. What was it?

    --
    Phillip

    Comment


    • #3
      I suppose a first base report could give a high count and then the actual cluster count be lower, especially for an over-clustered lane. But I don't know how you would get a count of the clusters from the index read as there's no counting done there.

      I'd like to see %base and Q30 graphs as well as the summary table.

      Comment


      • #4
        Thanks for your comments - I will contact the core and try to get a more detailed description of the problems they observed during the run.

        In the end, the reads that we did get back for our 4 libraries (~20-30mil reads per library) were all of very poor quality. When processing the reads before mapping, about 80% of the reads were filtered out due to low quality. And sadly, the remaining reads didn't map well to the genome.

        The experiment will obviously have to be repeated, as the data we got out doesn't seem too reliable. I'm just not sure whether the problems are stemming from the original samples, the library prep, or the sequencing run... or a combination.

        Back to the drawing board.

        Comment


        • #5
          This sounds like a bad set of library/samples. What cutoff was used that eliminated 80% of the reads (was the trimming purely done on quality threshold or presence of adapter)?

          Comment


          • #6
            The majority of the reads were lost when I filtered by quality - reads where 90% of cycles didn't have a quality score better than 20 were discarded. Pretty abysmal dataset.

            I didn't do the ChIP or Library prep, I've just been dealing with trying to process and make sense of the sequencing data. Any thoughts on what may have caused such poor quality scores?

            I'd like to be able to give some advice to the poor postdoc who has to repeat this experiment, so another failure could be avoided!

            Comment


            • #7
              Can you post FastQC plots? ChIP-seq libraries are going to be strange (compared to WGS) and as a result FastQC will likely flag several modules with a warning/fail.

              Comment


              • #8
                Also would be good to see information about the libraries -- posting Bioanalyzer traces or the like.

                Who did the titration of the library and what concentration was it clustered at?

                --
                Phillip

                Comment


                • #9
                  Here is the FastQC report from one of the samples before any filtering or adapter clipping. The reports for the other samples were very similar. Magnificently bad, right?

                  I'm working on getting the Bioanalyzer traces from the person that prepared the libraries. I'll definitely post when I get them.

                  Thanks for your help everyone!
                  ~Megan
                  Attached Files

                  Comment


                  • #10
                    Yes the Q-scores are bad across the read but the more worrisome observation is the inability of being able to align the reads that survived the trimming back to the genome. Did you scan and remove adapters in addition to quality trimming? What fraction successfully mapped?

                    Was the run these sample were on fine otherwise (i.e. there were no technical issues, other samples look great)?

                    Phillip should be able to asses the bioanalyzer traces once you post them.

                    Comment


                    • #11
                      75% of your reads were "A" at cycle 1. Any idea why this is?
                      I'm guessing this is the source of your core's contention that 1/2 of your clusters were "disappearing" after the first cycle. They were apparently looking at the "A" channel and didn't realize that the clusters were "disappearing" into the other channels.

                      If your core's version of HiSeq Control Software is not fairly recent, that very high amount of "A" for the first base may be causing problems.

                      --
                      Phillip

                      Comment


                      • #12
                        Also, even with the new "high bias compatible" HCS software, you have to watch your cluster density. Even though the cluster density might seem okay, you have to consider that during the first cycle 75% of the clusters are getting jammed into a single channel. So they are effectively 3x higher density than if the clusters were spread evenly.

                        Might be good to ask the core what cluster density they saw for that lane and what the pass filter percent (PF%) was.

                        Were your 4 libraries the only ones in that lane? If so the number of reads seems a little low for a highoutput flowcell. ~100M reads? Normally you would expect 180-240M reads. It could have been under-clustered. But it might also have been overclustered.

                        If a lane is sufficiently over-clustered the number of clusters counted by the software is actually less (sometimes dramatically less) than the actual number of clusters on the flowcell. Alas, there isn't a good metric to determine this is the case other than actually looking at some of the tile thumbnails. Well, FWHM is sort of a measure of average cluster diameter so it might be diagnostic for this issue.

                        --
                        Phillip

                        Comment


                        • #13
                          I hadn't noticed the "A" bias at cycle 1 - thank you for pointing that out Phillip! Since that is the report from the "Input" sample, I really can't think of a reason why that would happen. I just looked at the reports from the other 3 samples (ChIP replicates), and in each case ~50% of the reads were "A" at cycle 1. This seems suspicious. I think you're correct that this may explain what the core saw as "disappearing" clusters.

                          As for mapping: after filtering/clipping, 91% of the Input reads mapped back to mm9, but only 10-15% of the reads from the ChIP replicates mapped. (Which means that only 2-5% of the total reads were mapped for the ChIP samples.)

                          I will check with the core regarding cluster density and PF%. If the lane was in fact over-clustered, could that cause the overall low quality scores for the reads?

                          Comment


                          • #14
                            Originally posted by MeganH View Post
                            I will check with the core regarding cluster density and PF%. If the lane was in fact over-clustered, could that cause the overall low quality scores for the reads?
                            Yes. And if there are high bias cycle in your sample the effect is generally worse. That is, the instrument can tolerated higher cluster densities from non-biased libraries.

                            --
                            Phillip

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Essential Discoveries and Tools in Epitranscriptomics
                              by seqadmin




                              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                              04-22-2024, 07:01 AM
                            • seqadmin
                              Current Approaches to Protein Sequencing
                              by seqadmin


                              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                              04-04-2024, 04:25 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Today, 11:49 AM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 08:47 AM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-11-2024, 12:08 PM
                            0 responses
                            61 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 04-10-2024, 10:19 PM
                            0 responses
                            60 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X