Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Assuming you have the images saved...

    You probably should have just removed the first cycle from image analysis and basecalling by using the command-line arguments. There isn't really a need to change file locations and folder names.

    Comment


    • #17
      Originally posted by sramshey View Post
      Hello-

      I have a question regarding the use of the script illumina2srf. We recently had a HiSeq run in which the first cycle did not contain any data (clogged fluidics?). Illumina technical support advised us that we could improve the overall quality of our data for the lane in question by removing the first cycle. This involved removing the data folder in <run folder>/Data/Intensities/<lane>/C1.1, renaming the folders for all of the subsequent cycles, editing the config.xml in the Intensities folder to reflect the changes, and then repeating the entire procedure for the control lane as well. Following these steps we were able to generate fastq files, but when we attempt to run illumina2srf to generate our srf files we encounter an error indicating that cycle 1 is missing from our renumbered tiles:

      /house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/../../../Config/FlowCellId.xml:
      No such file or directory
      Processing sequence files
      /house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/s_3_1_0001_qseq.txt
      /house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/s_3_2_0001_qseq.txt
      Error: Missing cycle 1 for lane 3 tile 1 from CIF files.

      I don't know how illumina2srf knows about cycles - perhaps they are encoded in the cif files? Is there a way that we can (easily) fool illumina2srf and force it to process the lane in a similar way to how we generated our fastqs?

      Thanks in advance!
      Yes, illumina2srf reads the cycle number from the .cif files, so it can't be fooled simply by changing the directory structure. You could try using the following perl script to fix them. You give it a list of .cif files to mangle on the command line.

      Code:
      #!/usr/bin/perl
      
      use strict;
      use warnings;
      
      foreach my $file (@ARGV) {
          # Open .cif file read-write
          open(my $f, '+<', $file) || die "Couldn't open $file for update: $!\n";
          my $data;
          # Read header
          read($f, $data, 13) || die "Couldn't read $file: $!\n";
          # Subtract 1 from cycle number
          substr($data, 5, 2) = pack('v', unpack('v', substr($data, 5, 2)) - 1);
          # Write header back out
          seek($f, 0, 0) || die "Couldn't rewind $file: $!\n";
          print $f $data || die "Couldn't write to $file: $!\n";
          close($f) || die "Error writing to $file: $!\n";
      }
      An example of what it does:

      Code:
      $ hexdump -C -n 16 s_1_43.cif
      00000000  43 49 46 01 02 19 00 01  00 81 3d 05 00 2a 00 dc  |CIF.......=..*..|
      $ ./cif_fix.pl s_1_43.cif
      $ hexdump -C -n 16 s_1_43.cif
      00000000  43 49 46 01 02 18 00 01  00 81 3d 05 00 2a 00 dc  |CIF.......=..*..|
      Note that this updates the .cif files in place, so I would strongly recommend backing them up before attempting to run it. Also, there's no guarantee that illumina2srf will work even after doing this. It would depend on whether it finds any other inconsistencies in the data.

      If you can live without the intensity data then an easier solution would be to not use the -b or -r options. Illumina2srf will then ignore the .cif files and will generate a considerably smaller .srf file.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X