Header Leaderboard Ad

Collapse

Convert illumina v1.5 fastq to sanger fastq

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert illumina v1.5 fastq to sanger fastq

    Hi everybody !

    I am a very new user of new generation sequncing. I download the software BWA and SAMtools to analyse data of a illumina GA 2. I saw that BWA need .fastq format in input for the reads. I have data in qseq.txt format.
    I saw that .txt and .fastq can be the same thing but there are variants in .fastq. I read BWA needs sanger-fastq and i think i have illumina v1.5-fastq.
    Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
    Thanks !

  • #2
    See https://www.seqanswers.com/node/4344 for a short perl script that converts .qseq.txt to a sangr-fastq file. The quality value conversion is actually done by this line:
    Code:
    $q_line =~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;
    It's fairly easy to convert this into a quick-and-dirty perl script that will do the same thing for a fastq file:

    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $count = 0;
    while (<>) {
        chomp;
        if ($count++ % 4 == 3) { tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/; }
        print "$_\n";
    }
    N.B.: The script above assumes that the sequence and quality values in the fastq file are on single lines. This is not necessarily true, but you can usually get away with it for short read data. You should check the output carefully, to make sure that it is doing what you want. It should be fairly obvious if it gets out of synchronization, or if you run it on a sanger-fastq file by mistake.

    Comment


    • #3
      Originally posted by zouzou View Post
      Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
      Thanks !
      You can use several existing tools to do the conversion from Illumina FASTQ to Sanger FASTQ, including EMBOSS seqret, Biopython, BioPerl, BioJava, BioRuby etc.
      http://dx.doi.org/10.1093/nar/gkp1137

      Note in recent pipelines Illumina FASTQ files some of the low quality scores have special meaning:
      http://seqanswers.com/forums/showthread.php?p=17491
      Last edited by maubp; 05-31-2010, 06:55 AM. Reason: adding missing last two words of my sentence.

      Comment


      • #4
        Originally posted by zouzou View Post
        Hi everybody !

        I am a very new user of new generation sequncing. I download the software BWA and SAMtools to analyse data of a illumina GA 2. I saw that BWA need .fastq format in input for the reads. I have data in qseq.txt format.
        I saw that .txt and .fastq can be the same thing but there are variants in .fastq. I read BWA needs sanger-fastq and i think i have illumina v1.5-fastq.
        Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
        Thanks !
        Also, bfast comes with a perl script to perform the conversion. It's under scripts (ill2fastq.pl).
        -drd

        Comment


        • #5
          Originally posted by zouzou View Post
          Hi everybody !

          Do you know if there is a software to convert illumina v1.5 fastq to sanger-fastq? Or do you know the code to do this ?
          Thanks !
          You may try to patch latest bwa version with the appropriate patch listed
          here. It is the first one. It adds a '-I' option to 'bwa aln' predicate so that one can use Illumina (pipeline 1.3+ or 1.5+) fastq and trim as they were in sanger scale. Output in the SAM file is in Sanger scale as well.

          d

          Comment


          • #6
            Hey, Galaxy has a tool called FASTQ Groomer under NGS: QC and manipulation menu.
            you can convert bw various quality format (sanger, solexa, Illumina 1.3 and above, colorspace sanger).

            I think you can also download the script directly from the website ...

            NT
            Nicolas Tremblay
            Graduate Student

            Cardiovascular Genetics - Andelfinger Lab
            CHU Ste-Justine Research Center

            Comment


            • #7
              Questions on '-I' option

              Originally posted by dawe View Post
              You may try to patch latest bwa version with the appropriate patch listed
              here. It is the first one. It adds a '-I' option to 'bwa aln' predicate so that one can use Illumina (pipeline 1.3+ or 1.5+) fastq and trim as they were in sanger scale. Output in the SAM file is in Sanger scale as well.

              d
              I have used patch to update my bwa.I followed you directions.But I don't know how to use the the "-I",and I have browsed your patch file and saw " -I Input files are in Illumina quallity scale." Meanwhile,when I type bwa aln after I used your patch file,I thought I would see the "-I" option ,but I didn't.
              So,can you give me some explanations?Supposed I will use Sanger quality 15,how to set -q INT after I used your patch.Shoud I set 15 or not?
              I really appreciate of you threads and sorry for bothering.

              [email protected] bwa-0.5.8a]$ bwa aln

              Usage: bwa aln [options] <prefix> <in.fq>

              Options: -n NUM max #diff (int) or missing prob under 0.02 err rate (float)
              [0.04]
              -o INT maximum number or fraction of gap opens [1]
              -e INT maximum number of gap extensions, -1 for disabling long
              gaps [-1]
              -i INT do not put an indel within INT bp towards the ends [5]
              -d INT maximum occurrences for extending a long deletion [10]
              -l INT seed length [32]
              -k INT maximum differences in the seed [2]
              -m INT maximum entries in the queue [2000000]
              -t INT number of threads [1]
              -M INT mismatch penalty [3]
              -O INT gap open penalty [11]
              -E INT gap extension penalty [4]
              -R INT stop searching when there are >INT equally best hits [30]
              -q INT quality threshold for read trimming down to 35bp [0]
              -c input sequences are in the color space
              -L log-scaled gap penalty for long deletions
              -N non-iterative mode: search for all n-difference hits
              (slooow)
              -f FILE file to write output to instead of stdout

              Comment


              • #8
                It appears you haven't applied the patch (or you haven't installed the patched binary).

                d

                Comment


                • #9
                  Questions on BWA patch

                  Originally posted by dawe View Post
                  It appears you haven't applied the patch (or you haven't installed the patched binary).

                  d
                  I'm sorry I don't unstand your reply.Would you give me some explicit directions.Thanks very much!

                  I followed the directions:
                  cd bwa-source-directory
                  patch -p1 < patch.file
                  make

                  Comment


                  • #10
                    Originally posted by zeam View Post
                    I'm sorry I don't unstand your reply.Would you give me some explicit directions.Thanks very much!

                    I followed the directions:
                    cd bwa-source-directory
                    patch -p1 < patch.file
                    make
                    Could you successfully apply the patch? If yes, well, try to issue
                    Code:
                    ./bwa aln
                    and see if the -I options appear. If yes, substitute the installed binary with this, i.e.

                    Code:
                    sudo install bwa `which bwa`
                    d

                    Comment


                    • #11
                      BWA Illumina Quality Patch

                      Hi dawe,

                      I just tried to apply your SVN v50 patch to the current svn download, which lists version 50, and the patch fails.

                      Code:
                      $ patch -p1 < bwa-svn-r50_illumina-qual.patch 
                      missing header for unified diff at line 5 of patch
                      can't find file to patch at input line 5
                      Perhaps you used the wrong -p or --strip option?
                      The text leading up to this was:
                      --------------------------
                      |Index: bwape.c
                      |===================================================================
                      |--- bwape.c	(revision 50)
                      |+++ bwape.c	(working copy)
                      --------------------------
                      File to patch:
                      Steps:
                      1) svn download of current bio-bwa subversion (version 50)

                      Code:
                      svn co https://bio-bwa.svn.sourceforge.net/svnroot/bio-bwa bio-bwa
                      ....
                      bunch of stuff
                      ....
                      Checked out revision 50.
                      2) cd bio-bwa/trunk/bwa
                      3) make
                      4) copied patch to current directory
                      5) attempted to patch as noted above

                      I tried the archived bwa-0.5.8 patch and that applied perfectly

                      Any suggestions?

                      PS - thanks for this patch and the previous maq ill2sanger patch they are life savers.

                      Comment


                      • #12
                        My bad, sorry. Anyway, as suggested by 'patch' error, you should use a different strip:

                        Code:
                        $ patch -p0 < / path/to/patch
                        That should work.

                        HTH
                        D

                        Comment


                        • #13
                          Thanks, that worked perfectly

                          Comment


                          • #14
                            I am new to NGS and bioinformatics. I just got my data and am trying out Galaxy. I am trying to use Fastq Groomer to convert into fastq-sanger. I have 8GB's of data, does anyone know an estimate of how long this process should take? I don't know whether to quit and execute again, it has been running for about 3.5 hours. Am I being impatient?

                            Sorry for the novice/inexperienced question

                            Thanks
                            nsl

                            Comment


                            • #15
                              It will depend on which Galaxy installation you are using (e.g. the main http://usegalaxy.org Penn State one), and how busy it is with other people's work. If you asked on the Galaxy mailing list you'd probably get a better answer.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                                by seqadmin


                                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                                01-24-2023, 01:19 PM
                              • seqadmin
                                Introduction to Single-Cell Sequencing
                                by seqadmin
                                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                                ...
                                01-09-2023, 03:10 PM

                              ad_right_rmr

                              Collapse
                              Working...
                              X