Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Oxygen81 View Post
    Hello everybody,

    like some others here I have some problems to convert my Solexa reads into an appropriate format for Maq.
    I tried the scipts fq_all2std.pl and the Script of Farhad (#7) but nothing is working. Here is an example of my files:
    *.seq.txt
    >KN-964_02-2424_07-03179_1_1_1_1001_230
    GTGCTAA ... (36 sites)

    *.qual.txt
    >KN-964_02-2424_07-03179_1_1_1_1001_230
    32 32 32 32 32 32 32 ...

    The sequencing was done in 2007. The command seqprb2std is not contained in fq_all2std.pl, is it?
    You don't have a .prb file. You can't take a script that works on one, and give it different file, and expect it to work.

    Just looking at James Bonfield's code in post 7 (without testing it), I'd say try changing the part that goes:

    } else {
    my @qa = split('\t', $q);
    @qual = map {[split()]->[$hmap{substr($seq, $i++, 1)}]} @qa;
    }

    to

    } else {
    my @qual = split('\s', $q);
    }

    That might work.

    Edit: Note that the script below works if you have a single .qual file, and a single .seq file. Bonfield's script takes all the files in the folder, so if you point it at the run directory, it does all 300 tiles for all 8 lanes.
    Last edited by swbarnes2; 08-07-2009, 10:09 AM. Reason: Appened info after later post

    Comment


    • #17
      Originally posted by Oxygen81 View Post
      Hello everybody,

      like some others here I have some problems to convert my Solexa reads into an appropriate format for Maq.
      I tried the scipts fq_all2std.pl and the Script of Farhad (#7) but nothing is working. Here is an example of my files:
      *.seq.txt
      >KN-964_02-2424_07-03179_1_1_1_1001_230
      GTGCTAA ... (36 sites)

      *.qual.txt
      >KN-964_02-2424_07-03179_1_1_1_1001_230
      32 32 32 32 32 32 32 ...

      The sequencing was done in 2007. The command seqprb2std is not contained in fq_all2std.pl, is it? We did an other sequencing (paired-end) in 2008/2009 where I got a new format:
      >KN-1413_07-00010:6:1:727:174/1: GTGTG...:40 40 40 40 40
      These files are in standard FASTA format, with separate sequence and qualtiy score files. Here is a perl script which will take these two files as input an output a standard Sanger FASTQ file which can be used with MAQ.
      Code:
      #!/usr/bin/perl
      
      use warnings;
      use strict;
      use File::Basename;
      
      my $inFasta = $ARGV[0];
      my $baseName = basename($inFasta, qw/.fasta .fna/);
      my $inQual = $baseName . ".qual";
      my $outFastq = $baseName . ".fastq";
      
      my %seqs;
      
      $/ = ">";
      
      open (FASTA, "<$inFasta");
      my $junk = (<FASTA>);
      
      while (my $frecord = <FASTA>) {
              chomp $frecord;
              my ($fdef, @seqLines) = split /\n/, $frecord;
              my $seq = join '', @seqLines;
              $seqs{$fdef} = $seq;
      }
      
      close FASTA;
      
      open (QUAL, "<$inQual");
      $junk = <QUAL>;
      open (FASTQ, ">$outFastq");
      
      while (my $qrecord = <QUAL>) {
              chomp $qrecord;
              my ($qdef, @qualLines) = split /\n/, $qrecord;
              my $qualString = join ' ', @qualLines;
              my @quals = split / /, $qualString;
              print FASTQ "@","$qdef\n";
              print FASTQ "$seqs{$qdef}\n";
              print FASTQ "+\n";
              foreach my $qual (@quals) {
                      print FASTQ chr($qual + 33);
              }
              print FASTQ "\n";
      }
      
      close QUAL;
      close FASTQ;
      Some notes about this script.

      - It expects the filenames to be in the form foo.fasta (or foo.fna) and foo.qual so you would need to rename your *.seq.txt and *.qual.txt to match this format (or edit the code).

      - The sequence and quality score entries in the file do NOT need to be in the same order in their respective files but the definition lines between the two must match exactly. The sacrifice for this is RAM, it stores all of the sequences in a hash and then writes them out as it is looping through the qual file.

      To use the script save the text as fastaQual2Fastq.pl and make it executable. Make sure that both your .fasta and .qual file are named as described above and in your current directory then run the script:

      Code:
      % fastaQual2Fastq.pl foo.fasta
      When the sript is done there will be a file named foo.fastq which you can use as input for MAQ.
      Last edited by kmcarr; 08-07-2009, 08:41 AM.

      Comment


      • #18
        Originally posted by kmcarr View Post
        When the sript is done there will be a file named foo.fastq which you can use as input for MAQ.
        Dear kmcarr,

        thank you very much for your help! I tested your script and everything is working fine! :-)

        Originally posted by SWBARNES2 View Post
        That might work.
        Dear swarbarnes2,

        thanks to you, too. I will try your solution tomorrow. I have seen it to late.
        Last edited by Oxygen81; 08-10-2009, 08:02 AM.

        Comment


        • #19
          How about paired-end reads?

          When I use fq_all2std.pl , it only output single-end

          Comment


          • #20
            Originally posted by baohua100 View Post
            How about paired-end reads?

            When I use fq_all2std.pl , it only output single-end
            What is input file are you using? As far as I can recall Illumina never combines the read pairs in a single file. Read 1 and read 2 are always reported in separate fastq or qseq files with appropriate names (e.g. s_1_1_sequence.txt for lane 1 read 1). Read pairs are associated by their read names, not by being in the same file.

            Comment


            • #21
              kmcarr,

              I am very thankful for the script. It worked for me too!

              Comment


              • #22
                I have not been able to convert using fq_all2std.pl
                ie. perl fq_all2std.pl scarf2std s_5_1_export.txt gives me
                Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                Use of uninitialized value in concatenation (.) or string at fq_all2std.pl line 69, <> line 1.
                @HWI-EAS999 0031 5 12 13532 8891 0 1 CTGCCAAGGAAGTCTCAAATTCAAGGAGAAGTTTCC fefffffffdffcfffffdefefffbefdffdfeef c20.fa 2553076 R 36 118 236 -107 F Y
                ____

                +
                Use of uninitialized value in split at fq_all2std.pl line 71, <> line 1.

                Is there anyway to convert this format to FASTQ?
                Last edited by husamia; 12-08-2010, 12:30 PM.

                Comment


                • #23
                  Originally posted by kmcarr View Post
                  As far as I can recall Illumina never combines the read pairs in a single file. Read 1 and read 2 are always reported in separate fastq or qseq files with appropriate names (e.g. s_1_1_sequence.txt for lane 1 read 1). Read pairs are associated by their read names, not by being in the same file.
                  There are alignment tools (like BFAST) that require paired ends to be interleaved.
                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                  Comment


                  • #24
                    Originally posted by husamia View Post
                    I have not been able to convert using fq_all2std.pl
                    ie. perl fq_all2std.pl scarf2std s_5_1_export.txt gives me
                    Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                    Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                    Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                    Use of uninitialized value in join or string at fq_all2std.pl line 68, <> line 1.
                    Use of uninitialized value in concatenation (.) or string at fq_all2std.pl line 69, <> line 1.
                    @HWI-EAS393 0031 5 12 13532 8891 0 1 CTGCCAAGGAAGTCTCAAATTCAAGGAGAAGTTTCC fefffffffdffcfffffdefefffbefdffdfeef c20.fa 2553076 R 36 118 236 -107 F Y
                    ____

                    +
                    Use of uninitialized value in split at fq_all2std.pl line 71, <> line 1.

                    Is there anyway to convert this format to FASTQ?
                    You used the command "scarf2std" but your input is an export file. You need to use this command instead.
                    Code:
                    perl fq_all2std.pl export2std s_5_1_export.txt
                    Originally posted by kmcarr;
                    As far as I can recall Illumina never combines the read pairs in a single file. Read 1 and read 2 are always reported in separate fastq or qseq files with appropriate names (e.g. s_1_1_sequence.txt for lane 1 read 1). Read pairs are associated by their read names, not by being in the same file.
                    Originally posted by adamdeluca View Post
                    There are alignment tools (like BFAST) that require paired ends to be interleaved.
                    http://seqanswers.com/forums/showthread.php?t=3905
                    True enough, but that does not change the fact that the Illumina software outputs separate files for reads 1 and 2. It is up to the user to merge the reads into a single file if required for downstream analysis.

                    Comment


                    • #25
                      Originally posted by kmcarr View Post
                      You used the command "scarf2std" but your input is an export file. You need to use this command instead.
                      Code:
                      perl fq_all2std.pl export2std s_5_1_export.txt
                      You are right, I did export2std and I got converted FASTQ
                      @HWI-EAS393_0031:5:12:13532:8891/1
                      CTGCCAAGGAAGTCTCAAATTCAAGGAGAAGTTTCC
                      +
                      GFGGGGGGGEGGDGGGGGEFGFGGGCFGEGGEGFFG
                      But export file contains 43,236,910 reads and my converted file contains only 33,742,681 reads. Is there quality filtering and what are the paramenters?
                      By the way I used this script
                      GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

                      Comment


                      • #26
                        Originally posted by husamia View Post
                        You are right, I did export2std and I got converted FASTQ
                        @HWI-EAS393_0031:5:12:13532:8891/1
                        CTGCCAAGGAAGTCTCAAATTCAAGGAGAAGTTTCC
                        +
                        GFGGGGGGGEGGDGGGGGEFGFGGGCFGEGGEGFFG
                        But export file contains 43,236,910 reads and my converted file contains only 33,742,681 reads. Is there quality filtering and what are the paramenters?
                        By the way I used this script
                        http://github.com/dandavison/msg/raw.../fq_all2std.pl
                        The export file contains information for all reads. The script only outputs filter passing reads; that is, those reads which have a "Y" in the last column of the export file. Filter passing is determined during the image analysis stage of the Illumina pipeline, before the base calling and alignment stage. There are no arguments you can pass to fq_all2std.pl which will cause it to output all reads (and you probably wouldn't want to anyway). If you want it to output all the read you would need to modify the script or roll your own.

                        Comment


                        • #27
                          Originally posted by kmcarr View Post
                          The export file contains information for all reads. The script only outputs filter passing reads; that is, those reads which have a "Y" in the last column of the export file. Filter passing is determined during the image analysis stage of the Illumina pipeline, before the base calling and alignment stage. There are no arguments you can pass to fq_all2std.pl which will cause it to output all reads (and you probably wouldn't want to anyway). If you want it to output all the read you would need to modify the script or roll your own.
                          So 78% of my total reads passed quality filtering for one strand. Can we say this is a good measure for experiment success? regardless weather or not this is a good number, is it a good number?

                          Comment


                          • #28
                            Originally posted by husamia View Post
                            So 78% of my total reads passed quality filtering for one strand.
                            78% passing is a quite acceptable number
                            Can we say this is a good measure for experiment success?
                            No. The (default) Illumina quality filtering only considers the first 25 cycles of read 1. Even if the read quality goes completely off the rails after cycle 25, once the Illumina software calls it filter passed, it's passed. A program like FastQC can give you a better idea of the overall quality of your reads.

                            Comment


                            • #29
                              Hello All,

                              I have prb file and I am trying to use this script on it. Right now I have a tar.gz file .
                              When I try to untar it . It complains of error.
                              Code:
                              lakshmaa@hpcc01 Bustard_04NOV08_input_lane7]$ tar xzvf lane7_prb.tar.gz 
                              
                              gzip: stdin: not in gzip format
                              tar: Child returned status 1
                              tar: Error exit delayed from previous errors
                              [lakshmaa@hpcc01 Bustard_04NOV08_input_lane7]$ ls
                              lane7_prb.tar.gz  lane7_qhg.tar.gz  lane7_seq.txt  lane7_sig2.tar.gz
                              I have no idea why this happens. Does anyone have any suggestions?

                              Thanks

                              Comment


                              • #30
                                Originally posted by lakshmaa View Post
                                Hello All,

                                I have prb file and I am trying to use this script on it. Right now I have a tar.gz file .
                                When I try to untar it . It complains of error.
                                Code:
                                lakshmaa@hpcc01 Bustard_04NOV08_input_lane7]$ tar xzvf lane7_prb.tar.gz 
                                
                                gzip: stdin: not in gzip format
                                tar: Child returned status 1
                                tar: Error exit delayed from previous errors
                                [lakshmaa@hpcc01 Bustard_04NOV08_input_lane7]$ ls
                                lane7_prb.tar.gz  lane7_qhg.tar.gz  lane7_seq.txt  lane7_sig2.tar.gz
                                I have no idea why this happens. Does anyone have any suggestions?

                                Thanks
                                As it says, your gzip file is not in gzip format. It likely got truncated during transfer. Download the file again and retry.
                                Farhat Habib

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                61 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X