Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the difference between sol2std and sol2sanger

    Hello, does any one know what is the difference between these two maq scripts (both available in the latest maq version 0.7.1):

    sol2sanger
    sol2sanger convert Solexa FASTQ to standard/Sanger FASTQ
    Usage: maq sol2sanger in.fastq out.fastq


    sol2std
    sol2std Convert Solexa/Illumina FastQ to the standard/Sanger FASTQ
    Usage: pearl fq_all2std.pl sol2std in.txt > out.fastq


    I tried both scripts with the file "_sequence.txt" that I got from the Illumina. Here are the results for the first sequence:

    s_sequence.txt
    @GAII01_3:8:1:136:1560
    GAGTTCAATTTTGTTAATTCCTTTCTGAAAATAGAT
    +GAII01_3:8:1:136:1560
    WWWWWWWWWWWWWWWWWWWWWWWWWWLWWWRRRIRO


    sol2sanger_s_sequence.fastq
    @GAII01_3:8:1:136:1560
    GAGTTCAATTTTGTTAATTCCTTTCTGAAAATAGAT
    +
    88888888888888888888888888-888333+30


    sol2std_s_sequence.fastq
    @GAII01_3:8:1:136:1560
    GAGTTCAATTTTGTTAATTCCTTTCTGAAAATAGAT
    +
    88888888888888888888888888-888333+30!


    The only difference between the fastq file generated by sol2sanger and sol2std scripts is that the sol2std writes a "!" in the end of the 4th line..
    I checked, it does this in every 4th line of the file..
    What does this means?
    What script is the best?

    Thanks!
    Ines de Santiago

  • #2
    Hi,

    If you run the following command:
    Code:
    perl fq_all2std.pl instruction
    You get this output:
    Code:
    FASTQ format is first used in the Sanger Institute, and therefore
    we take the Sanger specification as the standard FASTQ. Although
    Solexa/Illumina reads file looks pretty much like the standard
    FASTQ, they are different in that the qualities are scaled
    differently. In the quality string, if you can see a character
    with its ASCII code higher than 90, probably your file is in the
    Solexa/Illumina format.
    
    Sometimes we also use an integer, instead of a single character,
    to explicitly show the qualities. In that case, negative
    qualities indicates that Solexa/Illumina qualities are used.
    You might find these two threads informative as well:

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Any topic/question that does not fit into the subcategories below. If you're unsure of where to put something, ask in here!


    Well, to answer your question, both should provide the same output. However as far as I can see, the ! is not introduced on purpose, looks more like a bug to me.

    I've "annotated" the perl sol2std function of the fq_all2std.pl script (maq 0.7.1) below:

    Code:
    sub sol2std {
      my $max = 0;
      while (<>) {
    	if (/^@/) {
    	  
              # print the first entry line
              print;
    	 
              # print the seq line, skip the second ID line and 
              # get the content of the quality line
    
    	  # the actual problem is there, the carriage return 
              # char is captured as well	  
              $_ = <>; print; $_ = <>; $_ = <>;
              
              # my solution is to add this line:
              #  chomp;	  
              my @t = split('', $_);
    
              # here the at array contains all the quality 
              # chars + a carriage return 
     	  my $qual = '';
    
              # here the additional carriage return will be converted 
              # as well; therefore the ! which is introduced
    	  $qual .= $conv_table[ord($_)] for (@t);
    	  print "+\n$qual\n";
    	}
      }
    }

    I hope lh3 will read this thread and can comment on it.

    Therefore, I would recommand to use this for the time being
    Code:
    maq sol2sanger
    Best,

    Comment


    • #3
      thanks very much!

      Comment


      • #4
        I have also encountered the trailing '!' produced by the perl sol2std function of the fq_all2std.pl script (maq 0.7.1). This causes an exception if the subsequent fastq file is fed into hmmSplicer. Thanks for the advice on using sol2sanger instead.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X