Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • iamh2o
    Junior Member
    • Mar 2009
    • 6

    Parser For Illumina InterOp Binary Metric Files

    Hi All-

    I'm trying to locate a code which will extract the various helpful QC metrics locked in the Illumina binary InterOp files.

    Specifically:::
    ControlMetrics.bin ExtractionMetrics.bin QMetrics.bin
    ControlMetricsOut.bin ExtractionMetricsOut.bin QMetricsOut.bin
    CorrectedIntMetrics.bin IndexMetrics.bin TileMetrics.bin
    CorrectedIntMetricsOut.bin IndexMetricsOut.bin TileMetricsOut.bin

    I've taken a look at the picard libraries, but they seem to skip the metrics files.

    Any help would be appreciated!

    John
  • Rhizosis
    Member
    • Mar 2012
    • 41

    #2
    Hi John,

    I am currently working on this. Have you booked any progress in the meanwhile?
    What language are you writing the parser in?

    Regards,

    Bernd

    Comment

    • SeqOps
      Junior Member
      • Dec 2012
      • 2

      #3
      Interop Parser

      Either of you have any luck with this? Looking for the same thing myself.

      Comment

      • Rhizosis
        Member
        • Mar 2012
        • 41

        #4
        I have managed to build a fully functional server client model Illumina metrics parser which interprets data written by HiSeq's and MiSeq's in real time.

        However, I dont know yet if I am allowed to release it to the public. Need to talk to my supervisor about this first.

        Most likely it will be integrated with the GNomEx LIMS system in Q1 2013.
        Let me know if you need pointers.

        Comment

        • ECO
          --Site Admin--
          • Oct 2007
          • 1360

          #5
          I am also interested in this. It blows my mind that ILMN (or someone) hasn't released this code yet.

          Comment

          • Rhizosis
            Member
            • Mar 2012
            • 41

            #6
            I have talked to my supervisor and i've been given the green light to release this software under the GPLv3 license. I need to put it through some beta tests first, but it shouldn't be too long.

            In about a week or two I will create a separate thread for this in the bioinformatics section for feedback, support and bug reports.

            Comment

            • behavin
              Member
              • Oct 2011
              • 11

              #7
              This would be quite helpful for us as well, I'd love to be able to parse all the metrics without resorting to SAV.

              Comment

              • Rhizosis
                Member
                • Mar 2012
                • 41

                #8
                Hi everybody.

                As promised. The initial release of Metrix.
                For feedback, idea's, bug reports and future communication please see the Metrix thread.

                I hope this helps.

                Bernd

                Comment

                • iamh2o
                  Junior Member
                  • Mar 2009
                  • 6

                  #9
                  Thanks Bernd-

                  I'll reply with our experience, and contribute back to the project where I can.

                  j

                  Comment

                  • ECO
                    --Site Admin--
                    • Oct 2007
                    • 1360

                    #10
                    I was so annoyed with not being able to do get cluster density easily (Illumina, if you're reading this, you're retarded ), that I wrote the below to get it:

                    Code:
                    class TileMetrics:
                        def __init__(self,filename):
                            self.f = filename 
                            import pandas as pd
                            from bitstring import BitString
                                   
                            a = BitString(bytes=open(self.f, 'rb').read())
                            self.filever = a.read('uintle:8')  # version number == "2"
                            self.recordlen = a.read('uintle:8')  # length of each record == 10 (for TileMetrics)
                            a.pos = 16  # skip the above bytes which are invariant for this
                    
                            #setup data
                            self.data = {'lane' : [], 'tile' : [], 'met' : [], 'value' : []}
                            
                            #read records bytewise per specs in technote_rta_theory_operations.pdf from ILMN
                            for i in range(0,((a.len - 16) / (self.recordlen * 8 ))):  # 80 == record length in bits
                                self.data['lane'].append(a.read('uintle:16'))  #lane number
                                self.data['tile'].append(a.read('uintle:16'))  #tile number
                                self.data['met'].append(a.read('uintle:16'))  #metric code
                                self. data['value'].append(a.read('floatle:32')) #metric value
                            
                            #make it fuzzy
                            self.df = pd.DataFrame(self.data)
                            
                            #get some stuff
                            self.cdens = self.df[self.df.met == 100].reset_index(drop = 1)
                            self.pfcdens = self.df[self.df.met == 101].reset_index(drop = 1)
                            
                    
                    if __name__ == '__main__':
                        tm = TileMetrics('TileMetricsOut.bin')
                        print '###############'    
                        print 'filename %s' % tm.f
                        print '###############'
                        print 'average clusterdensity == %.2f' % tm.cdens.value.mean()  
                        print 'average perc pf clusters == %.2f' %  (100 * tm.pfcdens.value.mean() / tm.cdens.value.mean())
                    Easily extensible to other metrics files, I'm working on it (slowly) Metrix looks awesome but is beyond my skills to efficiently implement.

                    Comment

                    • earonesty
                      Member
                      • Mar 2011
                      • 52

                      #11
                      Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

                      sudo cpanm -i Bio::IlluminaSAV
                      Last edited by earonesty; 05-23-2013, 05:23 PM. Reason: Add install info

                      Comment

                      • mchen1
                        Member
                        • May 2013
                        • 10

                        #12
                        InterOp parsers in R and perl

                        Hi,

                        I'm from Illumina and over the past few months, I've built a package in R and some scripts in perl to accurately parse the binary InterOp files.

                        These are unsupported, which means tech support will not be able to help you with them, but we've tested these internally and I've received approval from my manager to share them with those that ask. So please PM me if these scripts would be useful for you.

                        Cheers,
                        mchen1

                        Comment

                        • earonesty
                          Member
                          • Mar 2011
                          • 52

                          #13
                          Originally posted by earonesty View Post
                          Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

                          sudo cpanm -i Bio::IlluminaSAV
                          Lastest version passes all the relevant CPAN test reports for MiSEQ and GAII data. the failed reports are on test systems that can't handle the volume of data. HiSeq passes as well, but is too large to upload to CPAN at all.

                          Comment

                          • ophir
                            Junior Member
                            • Nov 2013
                            • 1

                            #14
                            A new tool for the job is illumate - https://bitbucket.org/invitae/illuminate
                            Installation is pretty straightforward, was able to get it up and running in a few minutes.

                            Comment

                            • ploverso-pgdx
                              Junior Member
                              • Feb 2016
                              • 3

                              #15
                              Illuminate is not up-to-date, it only supports up through v5 of interOp files. There is a package called savR in Bioconductor which supports up through v6, for RTA versions 2.7 and up (for the HiSeq 4k, etc).

                              The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              23 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              28 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...