Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Nick
    Member
    • Jun 2009
    • 16

    SRF metadata

    What information is in the SRF meta-data other than read lengths and pairing info? Broad question I know but does srf_info print it all out? Is the information in the trace name in a certain format?
    i.e. in
    trace_name: IL20_1065:1:1: + 22:920 ... 499:175 x17333

    Are those numbers always something, or its just a string and they only mean something for the study and file I am looking at?
  • jkbonfield
    Senior Member
    • Jul 2008
    • 146

    #2
    The trace name doesn't have any explicit meaning defined in SRF, but typically they are automatically generated to ensure the names are unique. You illumina example consists of machine name / run number and then lane, tile, x/y coordinates in order.

    SRF does in theory also have an XML section for run meta-data. The intention was that this would be the SRF equivalent of the TraceInfo.xml that went alone side the old tar-balls in capillary trace submissions to NCBI et al. However NCBI's new SRA finally ended up with something like 5 separate XML schemas with no overall hierarchy able to embed them as a single XML object in the SRF file. I think for other practical reasons too people wanted to submit their metadata separate (eg before the bulk of the data gets uploaded).

    James

    Comment

    • jkbonfield
      Senior Member
      • Jul 2008
      • 146

      #3
      One thing I forgot to mention - there are also machine/run specific data files that get added to the SRF file; often many times. (We really needed a 3-layer system rather than 2-layer so we could add data common to an entire run.)

      SRF is really just a container for ZTR trace files, in much the same way that tar and zip are containers for various formats. The ZTR format allows for various types of data, called chunks. These can be things like sequence, base qualities, trace peaks, as well as more nebulous things like "TEXT".

      The illumina2srf program embeds various xml config files for the instrument run in the text chunks. There's no direct SRF tool that dumps this (except I guess for the srf2illumina reverse conversion). You can however extract a single sequence in ZTR format and then dump that. Using io_lib commands:

      jkb$ srf_list /fuse/mpsafs/runs/4100/4100_4.srf|head -4
      IL22_4100:4:1:0:193
      IL22_4100:4:1:0:467
      IL22_4100:4:1:0:585
      IL22_4100:4:1:0:612

      jkb$ srf_extract_linear /fuse/mpsafs/runs/4100/4100_4.srf IL22_4100:4:1:0:193 | get_comment

      (Edited for brevity)
      PROGRAM_ID=illumina2srf v2.0.0r72
      I2S_CMDLINE=/software/solexa/bin/illumina2srf -I -b -filter-bad-reads -bustard-dir ...
      ILLUMINA_GA_IPAR_NCLUSTERS=232975
      ILLUMINA_GA_MATRIX_FWD=# Auto-generated frequency response matrix
      > A
      > C
      > G
      > T
      1.41 0.05 -0.00 -0.00
      0.79 0.73 0.01 0.01
      -0.00 0.00 1.17 0.00
      -0.00 -0.00 0.65 0.87

      ILLUMINA_GA_MATRIX_FWD_FILENAME=Matrix/s_4_02_matrix.txt
      ILLUMINA_GA_MATRIX_REV=# Auto-generated frequency response matrix
      > A
      > C
      > G
      > T
      1.35 0.01 0.00 0.00
      0.66 0.62 0.01 0.01
      -0.00 0.00 1.21 0.00
      -0.00 -0.00 0.72 0.99

      ILLUMINA_GA_MATRIX_REV_FILENAME=Matrix/s_4_78_matrix.txt
      ILLUMINA_GA_PHASING_FWD=<Parameters>
      <Phasing>0.006000</Phasing>
      <Prephasing>0.002800</Prephasing>
      </Parameters>

      ILLUMINA_GA_PHASING_FWD_FILENAME=Phasing/s_4_01_phasing.xml
      ILLUMINA_GA_PHASING_REV=<Parameters>
      <Phasing>0.005900</Phasing>
      <Prephasing>0.002100</Prephasing>
      </Parameters>

      ILLUMINA_GA_PHASING_REV_FILENAME=Phasing/s_4_77_phasing.xml
      ILLUMINA_GA_BUSTARD_CONFIG=<?xml version="1.0"?>
      <BaseCallAnalysis>
      <Run Name="Bustard1.5.1_01-12-2009_RTA">
      <BaseCallParameters>
      <ChastityThreshold>0.600000</ChastityThreshold>
      <Matrix Path="">
      ...

      ILLUMINA_GA_BUSTARD_SUMMARY=<?xml version="1.0" ?>
      <?xml-stylesheet type="text/xsl"
      href="BustardSummary.xsl" ?>

      <BustardSummary>
      ...

      ILLUMINA_GA_PIPELINE_VERSION=1.5.1
      ILLUMINA_GA_RAW_DATA_COMPRESSION=none
      ILLUMINA_GA_REBASECALL=1
      ILLUMINA_GA_RUN_FOLDER=091123_IL22_4100
      ILLUMINA_GA_FIRECREST_FOLDER=Intensities
      ILLUMINA_GA_BUSTARD_FOLDER=Bustard1.5.1_01-12-2009_RTA
      ILLUMINA_GA_FIRECREST_CONFIG=<?xml version="1.0" encoding="utf-8"?>
      <ImageAnalysis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
      <Run Name="Intensities">
      <Cycles First="1" Last="152" Number="152" />
      <ImageParameters>
      ...

      etc

      You could also save the output of srf_extract_linear or srf_extract_hash to a file and run trace_dump on it to get the full data, including bases, qualities, etc.

      Most (all?) of the TEXT segment of the ZTRs though is lost when imported to SRA I believe. It's certainly arguable how useful all the XML config files are for the instrument runs (although they're *tiny* compared to the actual data), but I think the matrix files are perhaps of use to researchers as they explain a lot of the manipulation that took place on the data.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        New Genomics Tools and Methods Shared at AGBT 2025
        by seqadmin


        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

        The Headliner
        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
        03-03-2025, 01:39 PM
      • seqadmin
        Investigating the Gut Microbiome Through Diet and Spatial Biology
        by seqadmin




        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
        02-24-2025, 06:31 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-20-2025, 05:03 AM
      0 responses
      17 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-19-2025, 07:27 AM
      0 responses
      18 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-18-2025, 12:50 PM
      0 responses
      19 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-03-2025, 01:15 PM
      0 responses
      185 views
      0 reactions
      Last Post seqadmin  
      Working...