Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tgenahmet
    Member
    • Apr 2009
    • 11

    convert qseq to fastq that maq or bwa can use

    Hi all,

    I'm new here.

    I have the Illumina pipeline 1.3 installed. I'd like to use BWA for aligning. I need to get from qseq files to phred fastq files.

    Is there a tool that already exists to accomplish this?
  • acnoll
    Member
    • Mar 2008
    • 14

    #2
    Since bwa doesn't use quality values one can disregard the calibrations or scaling that Illumina implements as part of the pipeline. A simple awk command should get you filtered input from the qseq files for bwa.

    cat s_1_1_*_qseq.txt | awk -F '\t' '{gsub(/\./,"N", $9); if ($11 > 0) print "@"$1$2$3$4$5$6$8"\n"$9"\n""+"$1$2$3$4$5$6$8"\n"$10}' > s_1_1_bwa.txt

    Or you could use the sequence.txt files directly.

    Comment

    • chuck
      Member
      • Apr 2009
      • 13

      #3
      still need conversion of qseq into fastq

      Hello,

      I have been working with the previous Solexa format, with two separate files *seq.txt and *prb.txt, but just starting receiving a second round of data in the qseq format.

      Converting them into the previous *seq.txt or a fasta file is not a problem but still I would like to run these files in maq to align them against my reference.

      I am a bit confused as well about the issue of (@Phred+64) or (@solexa+64). After conversion, all of the values range from 2 to almost 40, so I assume these are the former.

      But, I have never run maq either and it seems it needs (@Phred+33), correct? I can write something to convert them, if no one else has a script handy, but I want to make sure that I am doing it properly.

      All the best,
      Chuck

      Comment

      • TylerBackman
        Member
        • Oct 2008
        • 13

        #4
        This perl function I wrote will convert all of the qseq files for a given lane into proper Phred fastq files. Sorry the forum seems to be removing all of the whitespace... You can PM me for a text file if you'd like.

        Use it like this:
        #!/usr/bin/perl
        use strict;
        use Carp;
        generate_sequence_list("bustard_path", "0", "8", "output.fastq"); # where "0" means single end and "8" means lane 8

        **** begin code ****
        sub generate_sequence_list {
        # **** BEGIN CONFIG OPTIONS ****
        my $bustard_path = $_[0];
        my $pair = $_[1]; # 0=single end, 1=first pair, 2=second pair
        my $lane = $_[2];
        my $output_fastq_file = $_[3];
        # **** END CONFIG OPTIONS ****

        my $this_tile = 1;
        my $qfilter = "";

        open(OUTFASTAQFILE, "> $output_fastq_file");

        if($pair > 0){
        $pair = "_" . $pair . "_" ;
        } else {
        $pair = "_1_";
        }

        while(-r $bustard_path . "/s_" . $lane . $pair . sprintf("%04d", $this_tile) . "_qseq.txt"){
        my $filename = $bustard_path . "/s_" . $lane . $pair . sprintf("%04d", $this_tile) . "_qseq.txt";
        open(INFILE, "< $filename");
        foreach my $thisline (<INFILE>) {
        my @this_line = split("\t", $thisline);
        croak("Error: invalid column number in $filename\n") unless(scalar(@this_line) == 11);
        if($this_line[10] == 1) {
        $qfilter = "Y";
        } else {
        $qfilter = "N";
        }
        # Convert quality scores
        my $quality_string = $this_line[9];
        my @quality_array = split(//, $quality_string);
        my $phred_quality_string = "";
        # convert each char to Phred quality score
        foreach my $this_char (@quality_array){
        my $phred_quality = ord($this_char) - 64; # convert illumina scaled phred char to phred quality score
        my $phred_char = chr($phred_quality + 33); # convert phred quality score into phred char
        $phred_quality_string = $phred_quality_string . $phred_char;
        }

        # replace "." gaps with N
        $this_line[8] =~ s/\./N/g;

        # output line
        print OUTFASTAQFILE "@" . $this_line[2] . ":" . $this_line[3] . ":" . # output label line
        $this_line[4] . ":" . $this_line[5] . ":" . $qfilter . "\n" .
        $this_line[8] . "\n+\n" . # output sequence
        $phred_quality_string . "\n"; # output quality string
        }
        close(INFILE);
        $this_tile++;
        }
        $this_tile--;
        croak("Error: 99 or less tiles in lane\n") unless($this_tile > 99);
        print "\tFound $this_tile tiles in lane $lane.\n";

        close(OUTFASTAQFILE);
        }
        **** end code ****
        Last edited by TylerBackman; 05-08-2009, 09:34 AM.
        @1
        NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
        +
        """"""""""""""""""""""""""""""""""""

        Comment

        • ashrafi_h
          Junior Member
          • Jan 2010
          • 7

          #5
          Usage

          Originally posted by TylerBackman View Post
          This perl function I wrote will convert all of the qseq files for a given lane into proper Phred fastq files. Sorry the forum seems to be removing all of the whitespace... You can PM me for a text file if you'd like.

          Use it like this:
          #!/usr/bin/perl
          use strict;
          use Carp;
          generate_sequence_list("bustard_path", "0", "8", "output.fastq"); # where "0" means single end and "8" means lane 8

          **** begin code ****
          sub generate_sequence_list {
          # **** BEGIN CONFIG OPTIONS ****
          my $bustard_path = $_[0];
          my $pair = $_[1]; # 0=single end, 1=first pair, 2=second pair
          my $lane = $_[2];
          my $output_fastq_file = $_[3];
          # **** END CONFIG OPTIONS ****

          my $this_tile = 1;
          my $qfilter = "";

          open(OUTFASTAQFILE, "> $output_fastq_file");

          if($pair > 0){
          $pair = "_" . $pair . "_" ;
          } else {
          $pair = "_1_";
          }

          while(-r $bustard_path . "/s_" . $lane . $pair . sprintf("%04d", $this_tile) . "_qseq.txt"){
          my $filename = $bustard_path . "/s_" . $lane . $pair . sprintf("%04d", $this_tile) . "_qseq.txt";
          open(INFILE, "< $filename");
          foreach my $thisline (<INFILE>) {
          my @this_line = split("\t", $thisline);
          croak("Error: invalid column number in $filename\n") unless(scalar(@this_line) == 11);
          if($this_line[10] == 1) {
          $qfilter = "Y";
          } else {
          $qfilter = "N";
          }
          # Convert quality scores
          my $quality_string = $this_line[9];
          my @quality_array = split(//, $quality_string);
          my $phred_quality_string = "";
          # convert each char to Phred quality score
          foreach my $this_char (@quality_array){
          my $phred_quality = ord($this_char) - 64; # convert illumina scaled phred char to phred quality score
          my $phred_char = chr($phred_quality + 33); # convert phred quality score into phred char
          $phred_quality_string = $phred_quality_string . $phred_char;
          }

          # replace "." gaps with N
          $this_line[8] =~ s/\./N/g;

          # output line
          print OUTFASTAQFILE "@" . $this_line[2] . ":" . $this_line[3] . ":" . # output label line
          $this_line[4] . ":" . $this_line[5] . ":" . $qfilter . "\n" .
          $this_line[8] . "\n+\n" . # output sequence
          $phred_quality_string . "\n"; # output quality string
          }
          close(INFILE);
          $this_tile++;
          }
          $this_tile--;
          croak("Error: 99 or less tiles in lane\n") unless($this_tile > 99);
          print "\tFound $this_tile tiles in lane $lane.\n";

          close(OUTFASTAQFILE);
          }
          **** end code ****
          Could you please add usage to the code or an example as per how to run the code. I put it the directory where all of my qseq files are. I changed the line and pair parameters according to my file names but the $bustard_path is not clear to me.

          Thanks a lot.

          Comment

          • TylerBackman
            Member
            • Oct 2008
            • 13

            #6
            Originally posted by ashrafi_h View Post
            Could you please add usage to the code or an example as per how to run the code. I put it the directory where all of my qseq files are. I changed the line and pair parameters according to my file names but the $bustard_path is not clear to me.

            Thanks a lot.
            "bustard_path" is the path on your filesystem where the base caller output is located. If you're running it from within the folder where your qseq files are, you can probably use "." to denote the current directory
            @1
            NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
            +
            """"""""""""""""""""""""""""""""""""

            Comment

            • ashrafi_h
              Junior Member
              • Jan 2010
              • 7

              #7
              I tried to run the script within the folder that I have the qseq files. something like the following. Name of qseq files are also like
              s_6_1_0074_qseq.txt = lane 6 read 1

              >perl Perl_qseq_to_fastq.pl .

              Results:
              perl Perl_qseq_to_fastq.pl .
              Error: 99 or less tiles in lane
              at Perl_qseq_to_fastq.pl line 61
              main::generate_sequence_list('bustard_path', 1, 6, 'output.fastq') called at Perl_qseq_to_fastq.pl line 4

              or

              >Perl_qseq_to_fastq.pl ..

              or from outside of the folder, lets say from one folders outside of qseq file folder

              perl Perl_qseq_to_fastq.pl /BaseCalls
              Error: 99 or less tiles in lane
              at Perl_qseq_to_fastq.pl line 61
              main::generate_sequence_list('bustard_path', 1, 6, 'output.fastq') called at Perl_qseq_to_fastq.pl line 4

              Comment

              • rgregor
                Member
                • Jun 2010
                • 11

                #8
                Can somebody explain what is the qseq quality control filter (column 11, either 0 or 1) and how is it obtained (Illumina)? Do you usually disregard sequences with qseq_filter = 0?

                http://jumpgate.caltech.edu/wiki/QSeq, see column 11

                tnx,
                Gregor

                Comment

                • lingfung.tang
                  Member
                  • Dec 2010
                  • 10

                  #9
                  and for the python script, what's the trim and pass_qc_msg refer to?

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    New Genomics Tools and Methods Shared at AGBT 2025
                    by seqadmin


                    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                    The Headliner
                    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                    03-03-2025, 01:39 PM
                  • seqadmin
                    Investigating the Gut Microbiome Through Diet and Spatial Biology
                    by seqadmin




                    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                    02-24-2025, 06:31 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 05:03 AM
                  0 responses
                  16 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-19-2025, 07:27 AM
                  0 responses
                  15 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-18-2025, 12:50 PM
                  0 responses
                  16 views
                  0 reactions
                  Last Post seqadmin  
                  Started by seqadmin, 03-03-2025, 01:15 PM
                  0 responses
                  185 views
                  0 reactions
                  Last Post seqadmin  
                  Working...