Well it finally arrived! One of only 23 so far on the googlemap, excluding those at BGI of course.
Our HiSeq has just completed it’s validation run and passed with flying colours. I have attached a couple of status pages for you to take a look at, the interface to this kind of data is much nicer than before and there is an offline tool we can use so it can be reviewed anytime.
The validation run was a pair of PE flowcells with PhiX at 36bp read lengths. We obtained 51.5 and 56.0Gb of data from the flowcells which extrapolates to just over 300Gb from the run at PE100bp. I was pleasantly surprised to get such a hike over the 200Gb that the instrument is sold as achieving from the first run. I hope our next flowcells are anywhere near as good. The 50+Gb was for >Q30 data and we actually got just over 60Gb from one flowcell if you take the complete yield into account.
There were some spatial effects noticeable on both flowcells with the top surface appearing marginally worse than the bottom and a gradient visible on both but more pronounced on A than B. Something to watch out for later on but the volume of quality data makes this more a point of interest than anything to worry about.
The speed of data generation combined with two flowcells per instrument is going to make the HiSeq an indispensible part of daily research in many places. A comparison with SOLiD4 and the data volumes Life Technologies are talking about shows that we have entered another new era in Genomics research, one where even the smallest of labs can realistically sequence whole Human genomes without any external collaboration.
Unfortunately one thing that has not changed is the problem of dealing with masses of data. This is something where collaboration is a real boon and can make or break a project. Few labs in the world have the ability to generate samples and analyse them to completion. Tools are becoming easier to access, Galaxy and many commercial offerings as well but this is obviously the area where we will see a lot more development.
What next...
So what is going to come out of this instrument and others like it? Is the Wheat genome going to have a comprehensive draft by the end of 2010 for instance?
And where is the HiSeq 2000 going to go? Well I am hoping the clue is in the name and Terabases are possible. If Illumina can get to 200bp PE reads and 2billion clusters per run , neither of which sound impossible to me we would be touching 1TB. It is going to take a bit more work to get to HiSeq2Tb though.
For us our first Human genome on HiSeq has started as a PE100bp run and I am hoping we get over 100Gb on this run as well. I will post an update once it finishes.
Our HiSeq has just completed it’s validation run and passed with flying colours. I have attached a couple of status pages for you to take a look at, the interface to this kind of data is much nicer than before and there is an offline tool we can use so it can be reviewed anytime.
The validation run was a pair of PE flowcells with PhiX at 36bp read lengths. We obtained 51.5 and 56.0Gb of data from the flowcells which extrapolates to just over 300Gb from the run at PE100bp. I was pleasantly surprised to get such a hike over the 200Gb that the instrument is sold as achieving from the first run. I hope our next flowcells are anywhere near as good. The 50+Gb was for >Q30 data and we actually got just over 60Gb from one flowcell if you take the complete yield into account.
There were some spatial effects noticeable on both flowcells with the top surface appearing marginally worse than the bottom and a gradient visible on both but more pronounced on A than B. Something to watch out for later on but the volume of quality data makes this more a point of interest than anything to worry about.
The speed of data generation combined with two flowcells per instrument is going to make the HiSeq an indispensible part of daily research in many places. A comparison with SOLiD4 and the data volumes Life Technologies are talking about shows that we have entered another new era in Genomics research, one where even the smallest of labs can realistically sequence whole Human genomes without any external collaboration.
Unfortunately one thing that has not changed is the problem of dealing with masses of data. This is something where collaboration is a real boon and can make or break a project. Few labs in the world have the ability to generate samples and analyse them to completion. Tools are becoming easier to access, Galaxy and many commercial offerings as well but this is obviously the area where we will see a lot more development.
What next...
So what is going to come out of this instrument and others like it? Is the Wheat genome going to have a comprehensive draft by the end of 2010 for instance?
And where is the HiSeq 2000 going to go? Well I am hoping the clue is in the name and Terabases are possible. If Illumina can get to 200bp PE reads and 2billion clusters per run , neither of which sound impossible to me we would be touching 1TB. It is going to take a bit more work to get to HiSeq2Tb though.
For us our first Human genome on HiSeq has started as a PE100bp run and I am hoping we get over 100Gb on this run as well. I will post an update once it finishes.
Comment