Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Fancy a peek inside Sanger's Illumina GA Pipeline?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fancy a peek inside Sanger's Illumina GA Pipeline?



    So it has been brought to my attention that the Sanger has a publicly accessible "stats" page that contains quite a few statistics about their Illumina short read pipeline. The stats give a very interesting look into the daily operations of perhaps the highest throughput genome center in the world (...if I had a nickel for every PM I will get correcting me! ).

    Screenshot of the public page, containing a dropdown menu for different stats:


    I have reproduced all the available data below, unchanged (with the exception of blanking out someone's email address) as of this evening. I am hesitant to post the URL only because I don't want to cause undue ruckus, or cost anyone their job. I know these big genome centers are fiercely competitive...

    With that. Enjoy.











































    Eighty percent of the 28 genome analyzers that they have translates to 22 of them running all the time!


    Just scanned through Google Analytics, and realized that Sanger sends a fair amount of traffic here...appears they have a link to a popular thread on their intranet! Greetings Sanger-folk!

  • #2
    Graphs

    I see our graphs are getting around.

    Couple of things not clear from them as shown.

    The yields are PF yields, i.e from non-overlapping clusters. typically this is half of all of the clusters on a dense chip. Some people quote yields as total bases.

    Per run numbers are used, for paired end runs - which are about 90% - two runs needs to be summed to give yield per flowcell.

    Error rates are estimated fro control lanes and very often are an average of first and second read rates for a flowcell with 2 runs. Second reads often have worse data quality that first (this is being fixed in collaboration with illumina). Early data is clearly from a very small number of runs with high variable success rates - hence the mountains - error bars are not on these graphs but the would be very broad for early data, and very narrow for later data.

    Some of the graphs are under development.

    c.

    Comment


    • #3
      http://www.genomeweb.com/issues/news/146798-1.html

      in fact this article is a little off, the 300 Gigabases already submitted is bigger than Genbank.

      Comment


      • #4
        a new page is coming fro Roger....

        Comment


        • #5
          http://www.sanger.ac.uk/Teams/Team117/

          Comment


          • #6
            I think Sanger will hit 1 Terabase (PF) by the end of june

            http://www.sanger.ac.uk/cgi-bin/team...cum_yield_time

            Comment


            • #7
              ta daaaaa

              http://www.sanger.ac.uk/Info/Press/2008/080702.shtml

              Comment

              Working...
              X