I am going to close this thread based on note in the last post by @mchen1.
No new posts can be added to this thread.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
This topic is closed.
X
X
-
Closing out this thread...
Illumina have now made available open-source libraries for parsing and extracting information from the InterOp files. Please see this thread: http://seqanswers.com/forums/showthread.php?t=66342
The libraries will be updated along with new releases of RTA software, and are backwards-compatible back to the GAs.
The parsers in this thread were only compatible with RTA versions 2.7 and below. Because of the availability of these open source libraries, I am retiring these parsers.
Cheers,
Menzies
Leave a comment:
-
Leave a comment:
-
Archana91, please either send me your email address, or turn on private messaging on SEQanswers. You sent me a note to ask for the scripts, but you have disabled private messaging and thus I have no way of contacting you.
For all other readers, it is best to send me an email via SEQanswers. You may also PM me, but make sure you have private messaging turned on. Or you can include your own email address in your message. If you don't do one of these, then I am unable to reply to you.
Thanks,
mchen1
Leave a comment:
-
jwater, you sent me a PM asking for the InterOp parsers, but you have set your account to reject PMs, and you left no contact information or way for me to reply. I can't help you if I can't reach you.
Leave a comment:
-
Originally posted by earonesty View PostGlobal dump" is kindof ambiguous. What format do you want?
Bio::IlluminaSAV can be used to make a "dump" by using JSON or YAML or whatever, and then dumping each metric to a file.
When I wrote my post, I didn't know exactly what kind of data (and format) I could obtain from these parsers.
I wanted to convert interop files in non binaries files to get a direct access to data.
After half day of work, I understand better interop files and Bio::IlluminaSAV and I have written some code.
I need to work more but I get data and I plan to store them (maybe filtered) in xml files.
I'm still thinking about how to manage the data (keep them all or only useful part) depending of next steps of my future quality control pipeline.
Thank you for your comment.
Leave a comment:
-
Global dump" is kindof ambiguous. What format do you want?
Bio::IlluminaSAV can be used to make a "dump" by using JSON or YAML or whatever, and then dumping each metric to a file.
Leave a comment:
-
Hi Mchen
Are you still there ?
I'm new here and it seems impossible for new member to post PM.
So, I try to contact you by replying to this old thread.
Your post and its comments about direct usage of interop files are very interesting and promising.
I'd like to try your parsers in R and Perl.
I tested a little Bio::IlluminaSAV and Illuminate but I prefer to have a global dump of interop data to integrate them in my QC pipeline.
I hope you can help me and contact me (in PM or in this thread).
Regards
Leave a comment:
-
Erik, I've emailed the packages to your email address.
Regarding #4, it's nice that you are able to speed up data extraction by not parsing all the data. Many times this type of curation summarizes run quality well. The packages I send out have the goal of simply providing all of the data. This leaves it up to the user to decide on what numbers to input into their LIMS.
Thanks for your post.
Leave a comment:
-
Originally posted by mchen1 View PostHi earonesty,
The CPAN module looks to be nice, but I think our scripts meet different needs. There is certainly room for both. The IlluminaSAV module appears to parse the InterOp data into perl arrays. This would seem ideal for someone working in perl who may want to access the InterOp data for further manipulation. On a separate note (perhaps a feature request?), there is a new InterOp file called IndexMetricsOut.bin that describes index metrics and does not appear to be in the IlluminaSAV perl module documentation. If you need the binary format for this file in order to update your perl module, shoot me a PM or email with your email address, and I can send you our latest documentation.
The scripts I've written here are designed to simply convert InterOp data into flat files, and do so as efficiently as possible. I designed them for those wanting to parse the InterOp data for entry into a LIMS system, for example. Since most LIMS are custom-built, flat files seem to be the most universally accepted format for the data. The perl code is also sent without module packaging so that users can see how I parse the binary files in case they want to integrate the code into their own perl work. The perl code also comes without dependencies on other modules so it works out of the box with any modern perl installation (I personally dislike having to install perl modules, especially in the context of group IT policies). In any case, the goal of the two packages is the same, but it would seem our design parameters differ.
Hopefully this discussion can illuminate how the packages differ in case users are deciding between the two.
M
2. I would also like to try out your code (same email)
3. The LibXML reader is for parsing the RunInfo.xml into a perl hash. Other than that the module is core. Somehow I thought it would be better just to do that right.
4. Extraction is fast because usually our apps don't need all the data... many programs are just looking for maximum values, etc. (Our LIMS only gets quantile scores per cycle for example.)
Leave a comment:
-
Reposting for a colleague of mine at InVitae who wrote an open source python parser for exactly this.
#######
Greetings all,
I work at InVitae and we just publicly released a library called Illuminate.
The purpose of Illuminate is to emulate the stats you see when you load a run data folder within Illumina SAV, providing programmatic access to these metrics for whatever purposes you may have -- data storage, analysis, automated machine monitoring, and so on.
This is completely free, open source software (MIT License) written in Python with the intent to be used, tested, and improved upon by the bioinformatics community.
Features:
Simple command-line tool you can use to quickly inspect a run.
Built to be easily integrated into other code.
Easily extensible even if you think you are "not much of a programmer".
Results standardized to pandas DataFrame objects (so if you know how to work in R, you can probably get up to speed quickly with this)
Here's an example of the smallest python script you could get away with using this tool.
Code:
import illuminate
myDataset = illuminate.InteropDataset('path/to/rundata/')
print myDataset.meta
print myDataset.IndexMetrics()
print myDataset.TileMetrics()
print myDataset.QualityMetrics()
And here's an example of how you would use the command-line reporter to do the same thing:
Code:
python illuminate --meta --index --tile --quality /path/to/rundata
You can even have illuminate open up in an interactive iPython shell, where the dataset will be loaded up into an InteropDataset object for you:
Code:
python illuminate -i /path/to/rundata
Not all of the metrics objects are fully fleshed out yet, although all of the binary parsers are "feature complete" in that you can produce a data dictionary and a DataFrame from them.
I'm hoping that some of you fine folks can pipe up and let me know what might be useful to you -- or better, submit contributions, bug reports, and so on that will help Illuminate become as full-featured as it needs to be.
This library has been in our production pipeline for several months now, reporting on cluster density, quality, and yield so we can keep tabs on sequencing run quality in an automated fashion.
If you use it, or you have questions about it, please comment here and let me know!
Cheers,
Naomi
Leave a comment:
-
I would like to try the package for parsing Interop files in well.. Did I read it correct that there is an R version?
Leave a comment:
-
Highly Recommended
I recently had the opportunity to use mchen1's script package and I highly recommend it. Very straight forward and was extremely easy to use. Well documented as well.
I was interested in parsing the InterOp folder for 50+ illumina HiSeq runs and gather the statistics to report back our general performance over time, and I was able to do so quickly and accurately.
Thanks for sharing the code!
Leave a comment:
-
Hi MChen,
Thank you for you previous PM regarding your parsing scripts. The IndexMetricsOut.bin statistics sound interesting to me as well. Would you be so kind to send the latest Theory of Operation (I assume) documentation to the same email address?
Many thanks in advance,
B
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
169 views
0 likes
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
||
Started by seqadmin, 02-28-2025, 12:58 PM
|
0 responses
256 views
0 likes
|
Last Post
by seqadmin
02-28-2025, 12:58 PM
|
||
Started by seqadmin, 02-24-2025, 02:48 PM
|
0 responses
636 views
0 likes
|
Last Post
by seqadmin
02-24-2025, 02:48 PM
|
||
Started by seqadmin, 02-21-2025, 02:46 PM
|
0 responses
265 views
0 likes
|
Last Post
by seqadmin
02-21-2025, 02:46 PM
|
Leave a comment: