Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert illumina v1.5 fastq to sanger fastq

    Hi everybody !

    I am a very new user of new generation sequncing. I download the software BWA and SAMtools to analyse data of a illumina GA 2. I saw that BWA need .fastq format in input for the reads. I have data in qseq.txt format.
    I saw that .txt and .fastq can be the same thing but there are variants in .fastq. I read BWA needs sanger-fastq and i think i have illumina v1.5-fastq.
    Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
    Thanks !

  • #2
    See for a short perl script that converts .qseq.txt to a sangr-fastq file. The quality value conversion is actually done by this line:
    $q_line =~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;
    It's fairly easy to convert this into a quick-and-dirty perl script that will do the same thing for a fastq file:

    use strict;
    use warnings;
    my $count = 0;
    while (<>) {
        if ($count++ % 4 == 3) { tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/; }
        print "$_\n";
    N.B.: The script above assumes that the sequence and quality values in the fastq file are on single lines. This is not necessarily true, but you can usually get away with it for short read data. You should check the output carefully, to make sure that it is doing what you want. It should be fairly obvious if it gets out of synchronization, or if you run it on a sanger-fastq file by mistake.


    • #3
      Originally posted by zouzou View Post
      Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
      Thanks !
      You can use several existing tools to do the conversion from Illumina FASTQ to Sanger FASTQ, including EMBOSS seqret, Biopython, BioPerl, BioJava, BioRuby etc.

      Note in recent pipelines Illumina FASTQ files some of the low quality scores have special meaning:
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc
      Last edited by maubp; 05-31-2010, 06:55 AM. Reason: adding missing last two words of my sentence.


      • #4
        Originally posted by zouzou View Post
        Hi everybody !

        I am a very new user of new generation sequncing. I download the software BWA and SAMtools to analyse data of a illumina GA 2. I saw that BWA need .fastq format in input for the reads. I have data in qseq.txt format.
        I saw that .txt and .fastq can be the same thing but there are variants in .fastq. I read BWA needs sanger-fastq and i think i have illumina v1.5-fastq.
        Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
        Thanks !
        Also, bfast comes with a perl script to perform the conversion. It's under scripts (


        • #5
          Originally posted by zouzou View Post
          Hi everybody !

          Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
          Thanks !
          You may try to patch latest bwa version with the appropriate patch listed
          here. It is the first one. It adds a '-I' option to 'bwa aln' predicate so that one can use Illumina (pipeline 1.3+ or 1.5+) fastq and trim as they were in sanger scale. Output in the SAM file is in Sanger scale as well.



          • #6
            Hey, Galaxy has a tool called FASTQ Groomer under NGS: QC and manipulation menu.
            you can convert bw various quality format (sanger, solexa, Illumina 1.3 and above, colorspace sanger).

            I think you can also download the script directly from the website ...

            Nicolas Tremblay
            Graduate Student

            Cardiovascular Genetics - Andelfinger Lab
            CHU Ste-Justine Research Center


            • #7
              Questions on '-I' option

              Originally posted by dawe View Post
              You may try to patch latest bwa version with the appropriate patch listed
              here. It is the first one. It adds a '-I' option to 'bwa aln' predicate so that one can use Illumina (pipeline 1.3+ or 1.5+) fastq and trim as they were in sanger scale. Output in the SAM file is in Sanger scale as well.

              I have used patch to update my bwa.I followed you directions.But I don't know how to use the the "-I",and I have browsed your patch file and saw " -I Input files are in Illumina quallity scale." Meanwhile,when I type bwa aln after I used your patch file,I thought I would see the "-I" option ,but I didn't.
              So,can you give me some explanations?Supposed I will use Sanger quality 15,how to set -q INT after I used your patch.Shoud I set 15 or not?
              I really appreciate of you threads and sorry for bothering.

              bioinformatics@localhost bwa-0.5.8a]$ bwa aln

              Usage: bwa aln [options] <prefix> <in.fq>

              Options: -n NUM max #diff (int) or missing prob under 0.02 err rate (float)
              -o INT maximum number or fraction of gap opens [1]
              -e INT maximum number of gap extensions, -1 for disabling long
              gaps [-1]
              -i INT do not put an indel within INT bp towards the ends [5]
              -d INT maximum occurrences for extending a long deletion [10]
              -l INT seed length [32]
              -k INT maximum differences in the seed [2]
              -m INT maximum entries in the queue [2000000]
              -t INT number of threads [1]
              -M INT mismatch penalty [3]
              -O INT gap open penalty [11]
              -E INT gap extension penalty [4]
              -R INT stop searching when there are >INT equally best hits [30]
              -q INT quality threshold for read trimming down to 35bp [0]
              -c input sequences are in the color space
              -L log-scaled gap penalty for long deletions
              -N non-iterative mode: search for all n-difference hits
              -f FILE file to write output to instead of stdout


              • #8
                It appears you haven't applied the patch (or you haven't installed the patched binary).



                • #9
                  Questions on BWA patch

                  Originally posted by dawe View Post
                  It appears you haven't applied the patch (or you haven't installed the patched binary).

                  I'm sorry I don't unstand your reply.Would you give me some explicit directions.Thanks very much!

                  I followed the directions:
                  cd bwa-source-directory
                  patch -p1 < patch.file


                  • #10
                    Originally posted by zeam View Post
                    I'm sorry I don't unstand your reply.Would you give me some explicit directions.Thanks very much!

                    I followed the directions:
                    cd bwa-source-directory
                    patch -p1 < patch.file
                    Could you successfully apply the patch? If yes, well, try to issue
                    ./bwa aln
                    and see if the -I options appear. If yes, substitute the installed binary with this, i.e.

                    sudo install bwa `which bwa`


                    • #11
                      BWA Illumina Quality Patch

                      Hi dawe,

                      I just tried to apply your SVN v50 patch to the current svn download, which lists version 50, and the patch fails.

                      $ patch -p1 < bwa-svn-r50_illumina-qual.patch 
                      missing header for unified diff at line 5 of patch
                      can't find file to patch at input line 5
                      Perhaps you used the wrong -p or --strip option?
                      The text leading up to this was:
                      |Index: bwape.c
                      |--- bwape.c	(revision 50)
                      |+++ bwape.c	(working copy)
                      File to patch:
                      1) svn download of current bio-bwa subversion (version 50)

                      svn co bio-bwa
                      bunch of stuff
                      Checked out revision 50.
                      2) cd bio-bwa/trunk/bwa
                      3) make
                      4) copied patch to current directory
                      5) attempted to patch as noted above

                      I tried the archived bwa-0.5.8 patch and that applied perfectly

                      Any suggestions?

                      PS - thanks for this patch and the previous maq ill2sanger patch they are life savers.


                      • #12
                        My bad, sorry. Anyway, as suggested by 'patch' error, you should use a different strip:

                        $ patch -p0 < / path/to/patch
                        That should work.



                        • #13
                          Thanks, that worked perfectly


                          • #14
                            I am new to NGS and bioinformatics. I just got my data and am trying out Galaxy. I am trying to use Fastq Groomer to convert into fastq-sanger. I have 8GB's of data, does anyone know an estimate of how long this process should take? I don't know whether to quit and execute again, it has been running for about 3.5 hours. Am I being impatient?

                            Sorry for the novice/inexperienced question



                            • #15
                              It will depend on which Galaxy installation you are using (e.g. the main Penn State one), and how busy it is with other people's work. If you asked on the Galaxy mailing list you'd probably get a better answer.


                              Latest Articles


                              • seqadmin
                                Understanding Genetic Influence on Infectious Disease
                                by seqadmin

                                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                                09-09-2024, 10:59 AM
                              • seqadmin
                                Addressing Off-Target Effects in CRISPR Technologies
                                by seqadmin

                                The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                                08-27-2024, 04:44 AM





                              Topics Statistics Last Post
                              Started by seqadmin, Today, 06:25 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 01:02 PM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 09-18-2024, 06:39 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 09-11-2024, 02:44 PM
                              0 responses
                              Last Post seqadmin  