Seqanswers Leaderboard Ad

**GenoMax** · 10-11-2011, 05:03 AM

If you are going to use the new version of CASAVA (v.1.8.x) then the fastq conversion and de-multiplexing are done with a single command starting with the BCL files. You will need access to the entire flowcell folder for this to work.

Minimally the process will be something like this:

configureBclToFastq.pl --input-dir provide_location_to_Basecalls_dir --sample-sheet Location_of_SampleSheet.csv

You appear to have access to the illumina software so you should be able to download the relevant manuals in PDF format. Since there are many options for the above command that could be relevant in your specific case it would be best to refer to the CASAVA manual for detailed help.

PS: "qseq" files are no longer produced by the new version of CASAVA. You will get "fastq" format sequence files with sanger-encoding for quality calls. By default all sequences (those that would fail quality filter) are included in these files. Look for other threads on this forum for discussions on this issue.

**giampe** · 10-11-2011, 06:13 AM

Hi Genomax,
thanks for your quickly reply, but from the pdf CASAVA 1.7 user guide Rev A, the .bcl converter is not included in CASAVA.
where we can find this kind of script configureBclToFastq.plis it in CASAVA package?
We have setupBclToQseq.py in Off-Line Basecaller v 1.9. but we are not able to create his input files by bustard.py script as the user guide report.

Thanks a lot!

**GenoMax** · 10-11-2011, 07:20 AM

It sounds like you are going to stick with CASAVA v.1.7 for this processing (instead of v.1.8.x, which was the info I had provided before, so please ignore that info).

In that case, this will be a two step process. In the first step you will convert the BCL to qseq files. This will be followed by actual de-multiplexing.

Following assumes that you have the entire flowcell folder available, otherwise this will not work.

While in the "Basecalls" directory you can issue the following command to do step 1 of the process (bcl to qseq conversion).

setupBclToQseq.py -b . -o . -P .clocs --in-place

Run "make/distmake" to actually run the bcl conversion in the Basecalls directory after executing setupBclToQseq.py command.

After this conversion is complete, you can do step 2 (de-multiplexing). You will need to provide a "SampleSheet.csv" file that has the info about tags you have used. It would be best to refer to the manual for the exact format of this file. Remember not to use any spaces (and/or special characters) in sample names. The actual command to do the de-multiplexing is below:

demultiplex.pl --input-dir /Path_to/Basecalls_directory --sample-sheet /path_to/SampleSheet.csv --alignment-config /path_to/config.template.txt --qseq-mask "Replace_with_correct_qseq_mask_code"

You can eliminate the --qseq-mask and the command will automatically determine this info.

A "Demultiplexed" directory will be created in the "Basecalls" directory after running the demultiplex.pl command. You will need to change to "Demultiplexed" directory and execute the "make/distmake" equivalent commands to complete the demultiplexing process.

The *qseq* files will be distributed in "bins" labelled as (001 .. 0xx) depending on number of indexes in your samples. You will find a SamplesDirectory.csv file at the end of demultiplexing process created in the "Demultiplexed" directory that will provide a "key" to where your samples are located in the "bin" directories.

Note: Both of these processes could take several hours each to complete (depending on how many clusters you had in the lanes) so you will need to be patient. You can use multiple CPU's. Provide the appropriate switch to the make (or SGE/distmake process).

**giampe** · 10-12-2011, 06:54 AM

Hi Genomax,
thanks for your helpful suggestions, sorry but we are biologist without a good informatic skills so we have attached a pdf file showing the structure of our linux server, could you take a look at this file and check if the software and data folder are in the correct position?

At this moment according your suggestion we have launched this command in this way:

[serlab-carso:bin]# ./setupBclToQseq.py -b /srv/illumina/Runs/111006_H112_0131_AB0B0VABXX/Data/Intensities/BaseCalls/ -o --in-place -P .clocs INFO:setupBclToQseq:setupBclToQseq.py version 1.9.0
INFO:setupBclToQseq:Creating output directory /root/OLB_1.9/OLB-1.9.0/bin/--in-place
INFO:setupBclToQseq:Configuring /root/OLB_1.9/OLB-1.9.0/share/makefiles/bclToQseq/Makefile to /root/OLB_1.9/OLB-1.9.0/bin/--in-place/Makefile
INFO:setupBclToQseq:Creating the 'Makefile.config'
INFO:setupBclToQseq:Output directory successfully initialized. Type 'make' in /root/OLB_1.9/OLB-1.9.0/bin/--in-place to start the conversion

and we obtained qseq.txt files as you can see in the pdf file.
But now the second step of demultiplexing doesn't work! why? Have you some explanations?
Sorry I realized we are getting too much request, but at the moment you are the only person giving us help!

Attached Files

helpgenomax (2).pdf (478.6 KB, 49 views)

**GenoMax** · 10-12-2011, 10:48 AM

I am glad that at least part 1 has worked correctly.

Based on the error you attached it appears that your samplesheet file may not be formatted correctly.

Is it in "comma separated value (csv)" format? If you are making this file on a windows machine and then moving it to your server then use the "dos2unix" utility on your unix server to convert the "dos" format to unix.

Make sure you have no spaces/special characters (things like $,#,@) anywhere in the samplesheet file. Replace the spaces with "_" (underscore) that works well.

Originally posted by giampe View Post

But now the second step of demultiplexing doesn't work! why? Have you some explanations?
Sorry I realized we are getting too much request, but at the moment you are the only person giving us help!

**giampe** · 10-13-2011, 05:50 AM

dear GenoMax,
thanks for your suggestion, the problem in the demultiplexing command was effectively in the sample sheet.csv.
In this moment we have otained by demultiplexing.pl command output directories in the demultiplexed folder as the 001 showed in the pdf file, but we don't understand in which order are the our sample libraries (you can find attached our sample sheet.csv), and the format of file seems to be again qseq.txt and not fastq fileformat.
How do we get one single fastq.txt file( 4 row for each sequence) for each our sample?

Sorry for too much requests!

Attached Files

**GenoMax** · 10-13-2011, 07:31 AM

Giampe,

There should be a SamplesDirectories.csv file created in the "Demultiplexed" directory after the demultiplexing step completion that will tell you which "bin" (001, 002 etc) each sample was put in. Look for that info in the last column.

You will need to run at least "sequence" only analysis to get the sequence files. This is specified in the "config.template.txt" file. Again check with the manual or send the example of the file you used.

There should be a "GERALD_*" directory in each of the bins (001, 002 etc). That directory will contain final sequence files. Unfortunately they will be called s_*_sequence.txt, so you will need to appropriately rename them (we rename with sample name/tag info) before you copy them out of each bin/GERALD* dir.

Originally posted by giampe View Post

dear GenoMax,
thanks for your suggestion, the problem in the demultiplexing command was effectively in the sample sheet.csv.
In this moment we have otained by demultiplexing.pl command output directories in the demultiplexed folder as the 001 showed in the pdf file, but we don't understand in which order are the our sample libraries (you can find attached our sample sheet.csv), and the format of file seems to be again qseq.txt and not fastq fileformat.
How do we get one single fastq.txt file( 4 row for each sequence) for each our sample?

Sorry for too much requests!

**giampe** · 10-13-2011, 10:08 AM

Hi GenoMax,
ok we have found a SamplesDirectories.csv file created in the "Demultiplexed" directory where we can see six 00_ directories with several qseq.txt files for each one but some of these files are empty (0 Kb) and we noticed that there are some qseq.txt files in the same directory with the same lane number and the same barcode, so for each sample are there more than one file?
How we do run "sequence" only analysis to get the sequence files? We don't see the "config.template.txt" file and the GERALD_ directory where are they?

Thank you again !

**GenoMax** · 10-13-2011, 11:20 AM

Here is the relevant bit of info I had originally included with the command line for demultiplex.pl. You have to provide the configuration file for creating the final sequence files.

--alignment-config /path_to/config.template.txt

This configuration file is for GERALD where you will specify that you want a sequence only analysis (ANALYSIS sequence). You will find exact information about how to format this file in the manual (page 23 of CASAVA v.1.7 manual).

Please re-run the demultiplex.pl step with this command line option (providing the config file) to get the actual sequence files. You will need to specify an additional option for your "make" command as follows: "make -j no_of_cpu ALIGN=yes" (this is required to get the GERALD to run).

Originally posted by giampe View Post

Hi GenoMax,
ok we have found a SamplesDirectories.csv file created in the "Demultiplexed" directory where we can see six 00_ directories with several qseq.txt files for each one but some of these files are empty (0 Kb) and we noticed that there are some qseq.txt files in the same directory with the same lane number and the same barcode, so for each sample are there more than one file?
How we do run "sequence" only analysis to get the sequence files? We don't see the "config.template.txt" file and the GERALD_ directory where are they?

Thank you again !

**giampe** · 10-14-2011, 03:37 AM

Hi Genomax,
we are frustated!!!! providing a config.template.txt in the demultiplexing command we haven't obtained the expected result, moreover it returns different error message! there is something wrong in the our config.template.txt file! We are sending you our samplesheet file, could you edit a config.template.txt file for us? We have read the page 24 from the manual of CASAVA but it seems for us confused about formatting explanation. We want perform the ANALYSIS sequence for all samples.
An other question: in which folder we shoud put the config.template.txt file?

sorry and thanks for your help, we hope in your quickly reply!

P.S. you can also send information to my email address: [email protected]
or skype account: giampe79

Attached Files

sample_sheet.txt (1.9 KB, 15 views)

**GenoMax** · 10-20-2011, 08:26 AM

I am sorry I did not see your last message till just now. Let me have a look and I will respond.

Note: See the response below. I will attach a config.txt file to it soon.

Originally posted by giampe View Post

Hi Genomax,
we are frustated!!!! providing a config.template.txt in the demultiplexing command we haven't obtained the expected result, moreover it returns different error message! there is something wrong in the our config.template.txt file! We are sending you our samplesheet file, could you edit a config.template.txt file for us? We have read the page 24 from the manual of CASAVA but it seems for us confused about formatting explanation. We want perform the ANALYSIS sequence for all samples.
An other question: in which folder we shoud put the config.template.txt file?

sorry and thanks for your help, we hope in your quickly reply!

P.S. you can also send information to my email address: [email protected]
or skype account: giampe79

**GenoMax** · 10-20-2011, 08:45 AM

Try using the attached samplesheet file. I have already converted it into unix format. I had to "gzip" so you will need to unzip it before using.

Both files can be in any location. Just provide the full path to the respective files for corresponding command line switches (if not present in the local directory) when you run the demultiplex.pl command.

Attached Files

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

convert base call files (.bcl) into files (_qseq.txt)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News

Seqanswers Leaderboard Ad

Announcement

convert base call files (*.bcl) into files (*_qseq.txt)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News

convert base call files (.bcl) into files (_qseq.txt)