Unconfigured Ad

**rmdavies** · 02-04-2010, 09:05 AM

I have made a new version (2.1.0) of illumina2srf, which is now available on sourceforge. Highlights include:

The -N/-n command line arguments are back, by popular demand. These allow you to change the read name format.
New options -pos/-no_pos have been added. Selecting -pos will make illumina2srf store spot positions as metadata on the BASE chunks.
Byte swapping code has been added to the .cif file reader for big-endian platforms. This means sparc and powerpc owners will now be able to build srf files from RTA runs correctly.
A new program called srf_split_by_tag has been included in the package. This is for use with data that has been tagged with sequence barcodes. It will split a single srf file into a set of files where each output file contains the reads for a single tag.

The new version can be obtained from the project download page.

Rob.

**kmcarr** · 08-09-2010, 11:10 AM

Trouble using srf_split_by_tag

I tried using srf_split_by_tag for the first time today and have encountered an error. I get the following error:

Code:

srf_split_by_tag: srf_split_by_tag.c:118: outfiles_open: Assertion `added != 0' failed.
Abort

This occurs whether I use the default (i.e. no arguments other than SRF file name) or if explicitly state "-u unindexed and -d myDir".

index_decoder was first run on the the qseq files followed by illumina2srf. I can extract fastq files from this SRF and everything appears to have been created properly. The index_decoder and illumina2srf were from v 2.1.0 of the sequenceread package. I have tested v 2.1.0 and 2.1.1 of srf_split_by_tag with the same result. The srf_split_by_tag 2.1.0 as well as the illumina2srf used to build the SRF file were built against io_lib 1.12.1. The v 2.1.1 of srf_split_by_tag was built against v 1.12.4 of io_lib.

Has anyone else experienced this problem using srf_split_by_tag? Any and all help appreciated.

---------------------------------
Never mind, solution found.

It turns out that the program did not like the '/' I had put in the read name. What tipped me off to this possibility was the following comment in the source code:

Code:

/* Crikey, someone put a / in a tag name.  We need to replace it. */

Who says code comments aren't useful.

**dawe** · 08-09-2010, 11:18 AM

I haven't used srf_split_by_tag so far, but I've realized today that srf2fastq messes up my sequence qualities, attempting to convert them from the stored scale to a mixed sanger/solexa one...

d

**kmcarr** · 08-09-2010, 11:44 AM

By design and default srf2fastq outputs Phred style q-scores using the Sanger scale (Phred+33). Could you describe what you mean by "a mixed sanger/solexa one"?

**dawe** · 08-09-2010, 12:32 PM

Originally posted by kmcarr View Post

By design and default srf2fastq outputs Phred style q-scores using the Sanger scale (Phred+33). Could you describe what you mean by "a mixed sanger/solexa one"?

Mmm... I've been too quick... can you confirm srf2fastq converts automagically from phred64 to phred33 (or from solexa to phred33)?

d

**kmcarr** · 08-09-2010, 01:15 PM

Originally posted by dawe View Post

Mmm... I've been too quick... can you confirm srf2fastq converts automagically from phred64 to phred33 (or from solexa to phred33)?

d

Well, it's really a combination of illumina2srf and srf2fastq.

illumina2srf creates the SRF file from the *_qseq.txt files it stores it stores Phred-style q-scores as integers from 0-40; that is it subtracts 64 from the q-score in the *_qseq.txt files before storing it in the SRF.

srf2fastq reads the integer based q-score from the SRF file and prints the corresponding ASCII character, but first off-setting from the character '!' (ASCII = 33).

The combination of these two transformations creates the appearance of magic.

**dawe** · 08-09-2010, 08:35 PM

Originally posted by kmcarr View Post

Well, it's really a combination of illumina2srf and srf2fastq.

illumina2srf creates the SRF file from the *_qseq.txt files it stores it stores Phred-style q-scores as integers from 0-40; that is it subtracts 64 from the q-score in the *_qseq.txt files before storing it in the SRF.

srf2fastq reads the integer based q-score from the SRF file and prints the corresponding ASCII character, but first off-setting from the character '!' (ASCII = 33).

Got it! I've missed the first step, I didn't know illumina2srf stores phred values (and not q-values).
Thanks.
d

**jkbonfield** · 08-10-2010, 12:14 AM

Illumina certainly have a lot to answer for with the myriad of quality encodings. The sole reason they had +64 was because they were using log-odds (which isn't a bad idea by any means - I rather liked them) and so could get negative values.

Switching to Phred was I guess a business decision to go with the flow, but using phred scale +64 was a total disaster!

For what it's worth SRF could store either phred or log-odds encodings, but internally it doesn't store these as ASCII. Instead it generates the data in a binary form representing the actual value. This maybe come from ASCII phred-33, phred-64 or logodds-64 depending on the input. It has a meta-data field to indicate the scale (phred vs logodds).

These days though they seem to be generating purely phred, even for secondary scores which then end up all the same as phred can't cope with that... should have stuck with logodds. Arggh

**rmdavies** · 08-20-2010, 05:14 AM

Originally posted by kmcarr View Post

I tried using srf_split_by_tag for the first time today and have encountered an error. I get the following error:

Code:

srf_split_by_tag: srf_split_by_tag.c:118: outfiles_open: Assertion `added != 0' failed.
Abort

This was indeed a bug, caused by the use of the wrong variable name in a loop. It has now been fixed, and I have uploaded a new version (2.1.2) of the package which includes the correction.

I have also added a couple of new options to srf_split_by_tag. The -s option can be used to change the separator in the output file names, so:

Code:

srf_split_by_tag 2956_3.srf

will produce output files named:

Code:

2956_3_1.srf
2956_3_2.srf
...etc.

assuming that the tags were imaginatively named 1,2,3 and so on, whereas:

Code:

srf_split_by_tag -s '#' 2956_3.srf

will produce files named:

Code:

2956_3#1.srf
2956_3#2.srf
...etc.

The -e option takes a comma separated list of tag names. If it is present, then only tags which appear in the list will be split into their own files. Any others will be treated as if they are unindexed.

This can be useful in the case where some of the tags in the list passed to index_decoder were not actually used in the sequencing experiment. When this happens, you often find that a small number of reads match the unused tags due to random base calling errors happening to match the tag sequence. Normally srf_split_by_tag would put these reads in their own files. By using the -e option, they can instead be put in with all the other tags that can't be decoded. For example, if 2956_3.srf contained tags 1 to 6, but only 1 to 3 were for real samples,

Code:

srf_split_by_tag -u 0 -e 1,2,3 2956_3.srf

will produce the following files:

Code:

2956_3_0.srf
2956_3_1.srf
2956_3_2.srf
2956_3_3.srf

2956_3_1.srf, 2956_3_2.srf and 2956_3_3.srf will contain the reads for tags 1, 2 and 3 respectively. 2956_3_0.srf will contain all of the reads where the tag could not be decoded along with any that matched the unwanted tags 4, 5 and 6.

As usual, the new version can be obtained from the sourceforge downloads area.

**Awesome** · 12-20-2010, 12:18 PM

This program is a lifesaver.
However, I get this error repeatedly when I try to extract .srf files generated by your program:

Zero or greater than one CNF chunks found.

Another error occurs repeatedly when using Illumina's GA 1.5.1 srf2illumina:

WARNING: can't find expected pos information in read

Any idea what causes this?
Also, which program is best to extract these v2 .srf files? Illumina's srf2illumina? or the staden io_lib one? or what?

I found this: http://seqanswers.com/forums/showthread.php?t=4101
Which says use the "-c" command, which none of my srf2illumina executables possess.

Any thoughts?

**rmdavies** · 12-21-2010, 10:03 AM

The -c option would be for srf2fastq in the io_lib package. In fact, the latest version of srf2fastq can work out with quality values are present itself, so it should work correctly without the -c now.

Unfortunately srf2illumina has been unsupported for a while now, so it has fallen a bit behind the times. It should be possible to tweak the Illumina version so that it works with the newer files, but I expect the output would be for a very old version of the Illumina pipeline. Fixing that would take much more effort.

What do you want to use srf2illumina for? If you just need fastq files, then srf2fastq is a much better way to go.

**Awesome** · 12-21-2010, 02:04 PM

I need to be able to store and retrieve base calls, quality scores, intensities, and noises. That is why I'm concerned with srf2illumina.

**rmdavies** · 12-22-2010, 07:14 AM

You can get the intensity and noise values out using srf_dump_all, for example:

srf_dump_all -c int -t solexa myfile.srf

will dump out all of the intensity data. The format is from an ancient version of the Illumina pipeline, I.e. lane, tile, x, y (derived from the read name) followed by the intensity data in groups of four numbers (for A, C, G and T). The groups are separated by tabs. You can change the -c parameter to get different data types (nse for noise, sig2 for processed intensities), if they are present.

It isn't an ideal solution, but it does give a fairly easy way of getting at the data.

**sramshey** · 04-28-2011, 10:33 AM

Running illumina2srf after removing cycle(s)

Hello-

I have a question regarding the use of the script illumina2srf. We recently had a HiSeq run in which the first cycle did not contain any data (clogged fluidics?). Illumina technical support advised us that we could improve the overall quality of our data for the lane in question by removing the first cycle. This involved removing the data folder in <run folder>/Data/Intensities/<lane>/C1.1, renaming the folders for all of the subsequent cycles, editing the config.xml in the Intensities folder to reflect the changes, and then repeating the entire procedure for the control lane as well. Following these steps we were able to generate fastq files, but when we attempt to run illumina2srf to generate our srf files we encounter an error indicating that cycle 1 is missing from our renumbered tiles:

/house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/../../../Config/FlowCellId.xml:
No such file or directory
Processing sequence files
/house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/s_3_1_0001_qseq.txt
/house/sdm/prod/illumina/staging/hiseq05/110224_HISEQ05_0066_B816YKABXX_1606/Data/Intensities/Bustard1.8.0_25-04-2011_sdm/s_3_2_0001_qseq.txt
Error: Missing cycle 1 for lane 3 tile 1 from CIF files.

I don't know how illumina2srf knows about cycles - perhaps they are encoded in the cif files? Is there a way that we can (easily) fool illumina2srf and force it to process the lane in a similar way to how we generated our fastqs?

Thanks in advance!

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 25 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 55 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

New illumina2srf available on sourceforge

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News