Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • convert base call files (*.bcl) into files (*_qseq.txt)

    We have a set of data files coming from a multiplex sequencing run on HiScan SQ machine and now we need to obtain fastq file in txt format of our sample. Could someone indicates a process on commands line step by step to do this?This is the first experience with this kind of data.

    Until now we are able to install OLB software and to launch the command as reported in the user-guide of Off-Line Basecaller v1.9 (nov.2010):
    ./bustard.py --CIF /srv/illumina/Runs/111006_H112_0131_AB0B0VABXX/Data/Intensities/ --make --with-qseq

    At this point we don't know if the output directories containes the exact files in the exact format.

    Thank's for your time and your help.

  • #2
    If you are going to use the new version of CASAVA (v.1.8.x) then the fastq conversion and de-multiplexing are done with a single command starting with the BCL files. You will need access to the entire flowcell folder for this to work.

    Minimally the process will be something like this:

    configureBclToFastq.pl --input-dir provide_location_to_Basecalls_dir --sample-sheet Location_of_SampleSheet.csv
    You appear to have access to the illumina software so you should be able to download the relevant manuals in PDF format. Since there are many options for the above command that could be relevant in your specific case it would be best to refer to the CASAVA manual for detailed help.

    PS: "qseq" files are no longer produced by the new version of CASAVA. You will get "fastq" format sequence files with sanger-encoding for quality calls. By default all sequences (those that would fail quality filter) are included in these files. Look for other threads on this forum for discussions on this issue.
    Last edited by GenoMax; 10-11-2011, 05:07 AM.

    Comment


    • #3
      Hi Genomax,
      thanks for your quickly reply, but from the pdf CASAVA 1.7 user guide Rev A, the .bcl converter is not included in CASAVA.
      where we can find this kind of script configureBclToFastq.plis it in CASAVA package?
      We have setupBclToQseq.py in Off-Line Basecaller v 1.9. but we are not able to create his input files by bustard.py script as the user guide report.

      Thanks a lot!

      Comment


      • #4
        It sounds like you are going to stick with CASAVA v.1.7 for this processing (instead of v.1.8.x, which was the info I had provided before, so please ignore that info).

        In that case, this will be a two step process. In the first step you will convert the BCL to qseq files. This will be followed by actual de-multiplexing.

        Following assumes that you have the entire flowcell folder available, otherwise this will not work.

        While in the "Basecalls" directory you can issue the following command to do step 1 of the process (bcl to qseq conversion).

        setupBclToQseq.py -b . -o . -P .clocs --in-place
        Run "make/distmake" to actually run the bcl conversion in the Basecalls directory after executing setupBclToQseq.py command.

        After this conversion is complete, you can do step 2 (de-multiplexing). You will need to provide a "SampleSheet.csv" file that has the info about tags you have used. It would be best to refer to the manual for the exact format of this file. Remember not to use any spaces (and/or special characters) in sample names. The actual command to do the de-multiplexing is below:

        demultiplex.pl --input-dir /Path_to/Basecalls_directory --sample-sheet /path_to/SampleSheet.csv --alignment-config /path_to/config.template.txt --qseq-mask "Replace_with_correct_qseq_mask_code"
        You can eliminate the --qseq-mask and the command will automatically determine this info.

        A "Demultiplexed" directory will be created in the "Basecalls" directory after running the demultiplex.pl command. You will need to change to "Demultiplexed" directory and execute the "make/distmake" equivalent commands to complete the demultiplexing process.

        The *qseq* files will be distributed in "bins" labelled as (001 .. 0xx) depending on number of indexes in your samples. You will find a SamplesDirectory.csv file at the end of demultiplexing process created in the "Demultiplexed" directory that will provide a "key" to where your samples are located in the "bin" directories.

        Note: Both of these processes could take several hours each to complete (depending on how many clusters you had in the lanes) so you will need to be patient. You can use multiple CPU's. Provide the appropriate switch to the make (or SGE/distmake process).
        Last edited by GenoMax; 10-11-2011, 07:31 AM.

        Comment


        • #5
          Hi Genomax,
          thanks for your helpful suggestions, sorry but we are biologist without a good informatic skills so we have attached a pdf file showing the structure of our linux server, could you take a look at this file and check if the software and data folder are in the correct position?

          At this moment according your suggestion we have launched this command in this way:

          [serlab-carso:bin]# ./setupBclToQseq.py -b /srv/illumina/Runs/111006_H112_0131_AB0B0VABXX/Data/Intensities/BaseCalls/ -o --in-place -P .clocs INFO:setupBclToQseq:setupBclToQseq.py version 1.9.0
          INFO:setupBclToQseq:Creating output directory /root/OLB_1.9/OLB-1.9.0/bin/--in-place
          INFO:setupBclToQseq:Configuring /root/OLB_1.9/OLB-1.9.0/share/makefiles/bclToQseq/Makefile to /root/OLB_1.9/OLB-1.9.0/bin/--in-place/Makefile
          INFO:setupBclToQseq:Creating the 'Makefile.config'
          INFO:setupBclToQseq:Output directory successfully initialized. Type 'make' in /root/OLB_1.9/OLB-1.9.0/bin/--in-place to start the conversion

          and we obtained qseq.txt files as you can see in the pdf file.
          But now the second step of demultiplexing doesn't work! why? Have you some explanations?
          Sorry I realized we are getting too much request, but at the moment you are the only person giving us help!
          Attached Files

          Comment


          • #6
            I am glad that at least part 1 has worked correctly.

            Based on the error you attached it appears that your samplesheet file may not be formatted correctly.

            Is it in "comma separated value (csv)" format? If you are making this file on a windows machine and then moving it to your server then use the "dos2unix" utility on your unix server to convert the "dos" format to unix.

            Make sure you have no spaces/special characters (things like $,#,@) anywhere in the samplesheet file. Replace the spaces with "_" (underscore) that works well.



            Originally posted by giampe View Post
            But now the second step of demultiplexing doesn't work! why? Have you some explanations?
            Sorry I realized we are getting too much request, but at the moment you are the only person giving us help!

            Comment


            • #7
              dear GenoMax,
              thanks for your suggestion, the problem in the demultiplexing command was effectively in the sample sheet.csv.
              In this moment we have otained by demultiplexing.pl command output directories in the demultiplexed folder as the 001 showed in the pdf file, but we don't understand in which order are the our sample libraries (you can find attached our sample sheet.csv), and the format of file seems to be again qseq.txt and not fastq fileformat.
              How do we get one single fastq.txt file( 4 row for each sequence) for each our sample?

              Sorry for too much requests!
              Attached Files

              Comment


              • #8
                Giampe,

                There should be a SamplesDirectories.csv file created in the "Demultiplexed" directory after the demultiplexing step completion that will tell you which "bin" (001, 002 etc) each sample was put in. Look for that info in the last column.

                You will need to run at least "sequence" only analysis to get the sequence files. This is specified in the "config.template.txt" file. Again check with the manual or send the example of the file you used.

                There should be a "GERALD_*" directory in each of the bins (001, 002 etc). That directory will contain final sequence files. Unfortunately they will be called s_*_sequence.txt, so you will need to appropriately rename them (we rename with sample name/tag info) before you copy them out of each bin/GERALD* dir.

                Originally posted by giampe View Post
                dear GenoMax,
                thanks for your suggestion, the problem in the demultiplexing command was effectively in the sample sheet.csv.
                In this moment we have otained by demultiplexing.pl command output directories in the demultiplexed folder as the 001 showed in the pdf file, but we don't understand in which order are the our sample libraries (you can find attached our sample sheet.csv), and the format of file seems to be again qseq.txt and not fastq fileformat.
                How do we get one single fastq.txt file( 4 row for each sequence) for each our sample?

                Sorry for too much requests!

                Comment


                • #9
                  Hi GenoMax,
                  ok we have found a SamplesDirectories.csv file created in the "Demultiplexed" directory where we can see six 00_ directories with several qseq.txt files for each one but some of these files are empty (0 Kb) and we noticed that there are some qseq.txt files in the same directory with the same lane number and the same barcode, so for each sample are there more than one file?
                  How we do run "sequence" only analysis to get the sequence files? We don't see the "config.template.txt" file and the GERALD_ directory where are they?

                  Thank you again !

                  Comment


                  • #10
                    Here is the relevant bit of info I had originally included with the command line for demultiplex.pl. You have to provide the configuration file for creating the final sequence files.

                    --alignment-config /path_to/config.template.txt

                    This configuration file is for GERALD where you will specify that you want a sequence only analysis (ANALYSIS sequence). You will find exact information about how to format this file in the manual (page 23 of CASAVA v.1.7 manual).

                    Please re-run the demultiplex.pl step with this command line option (providing the config file) to get the actual sequence files. You will need to specify an additional option for your "make" command as follows: "make -j no_of_cpu ALIGN=yes" (this is required to get the GERALD to run).


                    Originally posted by giampe View Post
                    Hi GenoMax,
                    ok we have found a SamplesDirectories.csv file created in the "Demultiplexed" directory where we can see six 00_ directories with several qseq.txt files for each one but some of these files are empty (0 Kb) and we noticed that there are some qseq.txt files in the same directory with the same lane number and the same barcode, so for each sample are there more than one file?
                    How we do run "sequence" only analysis to get the sequence files? We don't see the "config.template.txt" file and the GERALD_ directory where are they?

                    Thank you again !
                    Last edited by GenoMax; 10-13-2011, 11:25 AM.

                    Comment


                    • #11
                      Hi Genomax,
                      we are frustated!!!! providing a config.template.txt in the demultiplexing command we haven't obtained the expected result, moreover it returns different error message! there is something wrong in the our config.template.txt file! We are sending you our samplesheet file, could you edit a config.template.txt file for us? We have read the page 24 from the manual of CASAVA but it seems for us confused about formatting explanation. We want perform the ANALYSIS sequence for all samples.
                      An other question: in which folder we shoud put the config.template.txt file?

                      sorry and thanks for your help, we hope in your quickly reply!

                      P.S. you can also send information to my email address: [email protected]
                      or skype account: giampe79
                      Attached Files

                      Comment


                      • #12
                        I am sorry I did not see your last message till just now. Let me have a look and I will respond.

                        Note: See the response below. I will attach a config.txt file to it soon.

                        Originally posted by giampe View Post
                        Hi Genomax,
                        we are frustated!!!! providing a config.template.txt in the demultiplexing command we haven't obtained the expected result, moreover it returns different error message! there is something wrong in the our config.template.txt file! We are sending you our samplesheet file, could you edit a config.template.txt file for us? We have read the page 24 from the manual of CASAVA but it seems for us confused about formatting explanation. We want perform the ANALYSIS sequence for all samples.
                        An other question: in which folder we shoud put the config.template.txt file?

                        sorry and thanks for your help, we hope in your quickly reply!

                        P.S. you can also send information to my email address: [email protected]
                        or skype account: giampe79
                        Last edited by GenoMax; 10-20-2011, 08:47 AM.

                        Comment


                        • #13
                          Try using the attached samplesheet file. I have already converted it into unix format. I had to "gzip" so you will need to unzip it before using.

                          Both files can be in any location. Just provide the full path to the respective files for corresponding command line switches (if not present in the local directory) when you run the demultiplex.pl command.
                          Attached Files
                          Last edited by GenoMax; 10-20-2011, 08:57 AM.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            The Impact of AI in Genomic Medicine
                            by seqadmin



                            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                            02-26-2024, 02:07 PM
                          • seqadmin
                            Multiomics Techniques Advancing Disease Research
                            by seqadmin


                            New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                            A major leap in the field has
                            ...
                            02-08-2024, 06:33 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:12 AM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-23-2024, 04:11 PM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-21-2024, 08:52 AM
                          0 responses
                          73 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 02-20-2024, 08:57 AM
                          0 responses
                          63 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X