Announcement

Collapse
No announcement yet.

Ion Torrent data quality impressions?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ion Torrent data quality impressions?

    Hi.
    We have an Ion Torrent PGM where we have sequenced a number of genomes. Looking at the TMAP alignments we definitely see sequencing errors of the indel variety around homopolymer regions in the reference. The homopolymer runs don't necessarily have to be more than two consective bases either.

    Some quick analysis showed me that the positions of the indels in the reads aren't completely random, and do stack up in the alignments, which could cause false positives in the indel calling (as well as potentially interfering with true variants that are in the region).

    Downloading example E. Coli data from the Life website I see the same sorts of errors.

    To my dismay when I google this I find all sorts of technical and sponsored reports from Illumina et al, pointing out the errors in Ion Torrent data. Furthermore, I see the reports getting fired back from Life discounting all the analysis in the first paper, and so it continues.

    My question:
    Can anyone who has sequenced and analyzed data on the PGM objectively comment on the rate of indels in the reads? I would like to hear if other people have seen what I've seen, or even better if they know of a magic fix.

    thanks!

  • #2
    Originally posted by rcorbett View Post
    Hi.
    We have an Ion Torrent PGM where we have sequenced a number of genomes. Looking at the TMAP alignments we definitely see sequencing errors of the indel variety around homopolymer regions in the reference. The homopolymer runs don't necessarily have to be more than two consective bases either.

    Some quick analysis showed me that the positions of the indels in the reads aren't completely random, and do stack up in the alignments, which could cause false positives in the indel calling (as well as potentially interfering with true variants that are in the region).

    Downloading example E. Coli data from the Life website I see the same sorts of errors.

    To my dismay when I google this I find all sorts of technical and sponsored reports from Illumina et al, pointing out the errors in Ion Torrent data. Furthermore, I see the reports getting fired back from Life discounting all the analysis in the first paper, and so it continues.

    My question:
    Can anyone who has sequenced and analyzed data on the PGM objectively comment on the rate of indels in the reads? I would like to hear if other people have seen what I've seen, or even better if they know of a magic fix.

    thanks!
    I've posted on the Ion Community site recently about some very non-random errors; generally a C or T deletion in a CCTT type motif. Link is http://lifetech-it.hosted.jivesoftwa...sage/6220#6220 .
    I had waited to see what information I got from Life Tech about it before coming to this site. They suggested it was a problematic sequence. Would be happy to provide more details or discuss it offline.

    Hilary Morrison

    Comment


    • #3
      Here's a recording of Dr. Niall Lennon from the Broad on their experiences with semiconductor sequencing.

      http://www.youtube.com/watch?v=N2nbbBo0zT0

      Comment


      • #4
        error model available for indels?

        We are looking into amplicon sequencing for variant detection, some of the genes have several repeat regions and can generate a lot of false positive heterozygous indels.

        Is there a model describing how the Ion Torrent generates read errors in this area? With such a model we could adapt our filtering strategy to reduce the false positive rate (although we want to be sure not to miss a true positive).

        Comment


        • #5
          Originally posted by IonTorrent View Post
          Here's a recording of Dr. Niall Lennon from the Broad on their experiences with semiconductor sequencing.

          http://www.youtube.com/watch?v=N2nbbBo0zT0
          I don't need the advertisement.

          Comment


          • #6
            Hilary,
            I tried to follow your link to the IT community website, but I get an error message:
            "It appears you're not allowed to view what you requested"
            (I am registered, but not as a IT customer).
            David

            Comment


            • #7
              Video from Broad: I just lost 10 min. with it, no info on indel read errors

              Comment


              • #8
                Originally posted by david2 View Post
                Hilary,
                I tried to follow your link to the IT community website, but I get an error message:
                "It appears you're not allowed to view what you requested"
                (I am registered, but not as a IT customer).
                David
                This is what I posted there:

                "I've just finished looking through a set of reads from control templates (16S tag sequencing using fusion primers) and see a very interesting (and sad)

                error pattern. In this image, the top sequence is the most abundant *incorrect* read; the bottom (blue) is the correct read. Number of each is at the left. Ecoli tag results--less than 4% perfect reads. We have 43 controls including K12; the percent correct varied from 0% up to 82%. Seems to have happened on both runs, same day; one was 314 and the other 316. Some more investigation to do."

                They told us it was a difficult sequence and to try the enzyme in the 200 nt sequencing kit.

                IT_ErrorPattern.jpg
                Attached Files
                Last edited by HMorrison; 03-02-2012, 06:14 AM. Reason: added text

                Comment


                • #9
                  Thanks Hilary for the details, interesting case indeed.

                  Comment


                  • #10
                    Unfortunately this is an inherent problem of the 454, Ion Torrent and probably the Proton chemistry. It is well documented. This if from the NEJM article on the sequencing on the German EColi outbreak :
                    "We also performed sequencing on the Illumina HiSeq platform in accordance with the manufacturer's instructions. An initial single-end run was used to correct errors in the Ion Torrent sequence, principally in homopolymeric tracts. "

                    http://www.nejm.org/doi/full/10.1056...featured_home&

                    Comment


                    • #11
                      CCTT calling error

                      Wow, this would be very a big issue if proven to be a reproducible error for PGM. But given how many CCTTs there are in genomes (occuring once in every 256bp in totally random sequence) one would image this would have been identified much earlier in-house by LT. Looks like it may have something to do with specific context within in which a CCTT lies?

                      Thanks Hilary for the very intriguing observation. Have any other PGM users seen this?
                      Last edited by ngseq; 03-17-2012, 04:33 PM.

                      Comment


                      • #12
                        Non-random error reduced with new enzyme

                        I would love to know what the two different enzymes are, but whatever enzyme is included in the 200 nt PGM sequencing kit has almost eliminated the problem I first reported. Errors are mainly in what I would consider true homopolymer runs (i.e. more than two of the same base). Much more likely to continue using the system for pyrotag-like (ph-tag?) sequencing.

                        Comment


                        • #13
                          FWIW - there's a parrallel discussion about this at the IonTorrent community here.

                          http://lifetech-it.hosted.jivesoftwa.../2299?tstart=0

                          As Hmorrison mentioned - the 200nt kit largely eliminates this issue.
                          @bioinformer
                          http://www.linkedin.com/in/jonathanjacobs

                          Comment


                          • #14
                            Parallel discussions

                            I know; I posted it there too. LifeTech doesn't seem to like me using SeqAnswers exclusively.

                            Comment


                            • #15
                              thanks! looks like we should try to stick to 200nt kits.

                              Comment

                              Working...
                              X