Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data retention

    Hi Folks -

    How about a survey to see what data people retain from sequencing experiments? We have completed ~ 10 runs on an Illumina GA I, and have stored complete raw data (images, raw intensities, etc... the complete run folder), on external 1TB disks. I don't think everyone stores images long term. We may not be able to keep it up forever.

    Anyone know if it is possible to retain images using GA II?

  • #2
    It's possible to retain images on GA2s of course, but they are a lot bigger... In general it's reasonable to retain the raw intensity and noise files as reprocessing those with new basecallers may be of interest.

    If look at the data retained by the NCBI in the short read archive they are currently storing Raw intensity and noise files, processed intensities, the 4 quality scores and the fastq files (the SRFs also have the settings used to generate the data I believe). I think the short read archive only contains PF (purity filtered data) at the moment.

    Personally I think that's overkill. I'd store the raw intensities (PF and non-PF), 4 quality scores and a basecall. Bare in mind that it is possible to regenerate everything from a complete set of raw intensities.

    Comment


    • #3
      Storing the "raw data" as DNA in the freezer is likely going to be a more cost-effective option...

      What would you expect or hope to be able to achive by reanalysing the images?

      Comment


      • #4
        Originally posted by Chipper View Post
        What would you expect or hope to be able to achive by reanalysing the images?
        My thought was that future improvements in the image segmentation algorithm or intensity extraction could give you different results later on. For example, improvements might be better able to discern individual clusters when density is very high.

        In our microarray experiments, we always save raw images rather than just intensities, since we see variation whenever you do gridding or intensity extraction.

        You're right - saving sample to re-run later is a logical approach. Deleting what we see as raw data may be simply be a mental hurdle to get over.

        Comment


        • #5
          My group has archived all bzip2'd image files to 2Tb USB disks. Currently they are £200 from western digital. Seemed cheap in comparison to reagents/lab time.

          Partly, we also thought better base calling algorithms than Bustard are already in development. e.g altacyclic, others. Can someone perhaps start a thread on this?

          dvh
          Last edited by dvh; 09-16-2008, 02:23 PM.

          Comment


          • #6
            Originally posted by Chipper View Post
            What would you expect or hope to be able to achive by reanalysing the images?
            Well, if I understand correctly, the new pipeline that's on the way is supposed to increase the data output by 15-30% just through image analysis improvements alone. There's a lot of room for improvement in that area, apparently.

            That being said, we're keeping the images on our server only until we have no more room, then they'll be deleted as space is required. Individuals can keep the raw data on external disks if they want it.

            Scott.

            Comment


            • #7
              Originally posted by dvh View Post
              My group has archived all bzip2'd image files to 2Tb USB disks. Currently they are £200 from western digital. Seemed cheap in comparison to reagents/lab time.

              Partly, we also thought better base calling algorithms than Bustard are already in development. e.g altacyclic, others. Can someone perhaps start a thread on this?

              dvh
              With GA2 and read lengths approaching 100bp paired end this probably becomes infeasible. Storing the raw intensity I think gives you the biggest bang for your storage buck. All the new basecallers work from raw intensities, not images.

              It's true there's a lot to be gained back using improved image analysis, perhaps 10 or 20%, it's all a trade off.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                The Impact of AI in Genomic Medicine
                by seqadmin



                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                02-26-2024, 02:07 PM
              • seqadmin
                Multiomics Techniques Advancing Disease Research
                by seqadmin


                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                A major leap in the field has
                ...
                02-08-2024, 06:33 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:12 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-23-2024, 04:11 PM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-21-2024, 08:52 AM
              0 responses
              73 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 02-20-2024, 08:57 AM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Working...
              X