Announcement

Collapse
No announcement yet.

duplicate reads in ChIPSeq

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • duplicate reads in ChIPSeq

    Hello community,

    we have a problem concernig a illumina sequenced ChIPSeq experiment.
    After mapping and viewing the reads in the UCSC GB surprisedly 99% of the reads map to some unique locations. The corresonding reads share the same start and end coordinate and there are no additional cluster of duplication surrounding a location in terms of the origional fragment lenght.

    Does anyone have an idea? I would very much appreciate your assistance

    tec

  • #2
    I'm not really clear on what you're saying here. Do you find that 99% of your reads are the same sequence, with exactly the same start and end positions? If that's the case I'd suspect that you may have just ended up sequencing a primer rather than your library. Sometimes these primer sequences can map to a reference genome and give a false impression that you're seeing a real genomic sequence.

    Alternatively are you saying that you have many clusters (if so, how many?), but that in each one you see just a single read duplicated many times, with no other overlapping reads? In this case I'd suspect a problem with your library preparation - probably in one of the PCR steps. This is assuming that your library was prepared using random fragmentation (sonnication or similar). If your library was generated by restriction digestion then this is what you'd expect to see.

    Have you checked the mapping efficiency of your sequence (ie what proportion of clusters were able to be mapped to your reference). This might give a clue as to what's gone wrong.

    Comment


    • #3
      duplicate reads in ChIPSeq

      Hello simonandrews,

      -> Alternatively are you saying that you have many clusters (if so, how many?), but that in each one you see just a single read duplicated many times, with no other overlapping reads?

      Thats exactly what i see. I work with the human genome and can detect at least clusters on every chromosome. Using seqmap for mapping of ~ 5 mil single reads it outputs ~ 15.000 unique locations of single reads - all other fall in this locations (duplicates). The mapping efficiency is ~ 65% as expected.
      (mapping with eland gives the same proportion)

      The library was prepared using random fragmentation (sonication) and the initial fragment length is ~ 200 - 400 bp.

      I have no idea what's gone wrong. What could happend during the library preparation?

      Thanks! tec

      Comment


      • #4
        My immediate thought would be that you could have had a step in your library prep where you lost virtually all of your input material, and that a subsequent PCR step dramatically amplified what was left and produced a large number of duplicated reads.

        Comment


        • #5
          ok, but how this could happend??? (..a virtually loss?)

          The library was prepared using the standard illumina protocol and kit.
          We sequencend another ChIPSeq experiment and there was no such problem.

          Thanks! tec

          Comment


          • #6
            duplicate reads in ChIPSeq !?

            Hello all,

            the problem with duplicate reads still keeps me busy..
            Therefore we performed a Topo cloning resequencing check of the library.
            Surprisingly, over 75% of the clones were unique - which doesn't correlate with the sequencing run!!!

            Does anyone have an idea???

            Thanks! tec

            Comment


            • #7
              Thats just a sampling issue.

              Say there are only 1000 unique molecules in the library:

              If you topo/sanger sequence x100, only a few will look like duplicates.

              But if you nex-gen sequence 10,000 most will look like duplicates.

              Make another library with more DNA input, less PCR...

              Comment


              • #8
                Originally posted by dvh View Post
                Thats just a sampling issue.

                Say there are only 1000 unique molecules in the library:

                If you topo/sanger sequence x100, only a few will look like duplicates.

                But if you nex-gen sequence 10,000 most will look like duplicates.

                Make another library with more DNA input, less PCR...
                i agree!
                But taken the fact into acount that another library showed exact the same distribution in the topo/sanger sequencing and the Illumina sequencing gave nice results - i am confused.
                Is it possible that during the preparation of the flow cell, e.g. cluster generation.., something went wrong which could led to that result???

                Comment

                Working...
                X