Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • base quality encoding changed after "bwa samse" command

    hello,

    Please look at base quality string in my sample2New.fq file


    @EBRI093151_0051:4:55:2998:9540#0/1
    ACAACACAGTGGGTTGGAGTAGAGCATCTCCAAAGGCCCTTTCCAATCCAACATGAGTAACTCAAGCTCTGCACCAGCCACGAAAAGGCAAGGCTTTGGAT
    +
    FFFFFFFFFFDFFBFDEAEEEFFFFFFFFCFFEFFCEEEDDFFEEEFEADDFDFDEEDFFE@FCDDD>ACDFADD?CCECDB<?@047:9@?BB+B@@@]]



    after commands

    opt/bwa-0.6.2/bwa index -a bwtsw -p ref reference.fa
    /opt/bwa-0.6.2/bwa aln -t 10 -f sample2New.sai -I ref sample2New.fq
    /opt/bwa-0.6.2/bwa samse -f sample2New.sam -r "@RG\tID:sample2\tPL:ILLUMINA\tPUu1\tLB:sample2\tSM:sample2" ref sample2New.sai sample2New.fq



    I can see changed base quality string in the sample2New.sam file

    EBRI093151_0051:4:55:2998:9540#0 0 Chr10 377653 0 101M * 0 0 ACAACACAGTGGGTTGGAGTAGAGCATCTCCAAAGGCCCTTTCCAATCCAACATGAGTAACTCAAGCTCTGCACCAGCCACGAAAAGGCAAGGCTTTGGAT ''''''''''%''#'%&"&&&''''''''$''&''$&&&%%''&&&'&"%%'%'%&&%''&!'$%%%^_"$%'"%% $$&$%#^] !^Q^U^XESC^Z! ##^L#!!!>> RG:Z:sample2 XT:A:R NM:i:0 X0:i:3 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:101 XA:Z:Chr10,+33,101M,0;Chr10,+242847,101M,0;


    and ofcourse the command

    java -Xmx8g -jar /opt/picard-tools-1.85/SortSam.jar SO=coordinate INPUT=sample2New.sam OUTPUT=sample2New.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true


    fails with error

    Exception in thread "main" java.lang.IllegalArgumentException: Invalid fastq character:

    Why "bwa samse" is changing quality encoding??
    Do you have an idea what Im doing wrong?

    thanks

  • #2
    Could you edit your post to use the [ code ] and [ /code ] tags? This is easily done via the advanced editor view where there is a button for this in the tool bar (not shown in the quick reply edit box).

    Comment


    • #3
      I've not checked all the bases (due to the forum formatting), however, it would appear to be down to a FASTQ encoding problem. It appears bwa defaulted to assuming the obsolete Illumina specific ASCII encoding of PHRED+64, while your data was actually the original standard Sanger ASCII encoding of PHRED+33 (now adopted by Illumina). For background, see:


      In your FASTQ file, the first base has quality code 'F', ASCII character 70. Under the Sanger FASTQ scheme that means 70-33 = quality 37. However, if read in as the obsolete Illumina scheme it would be 70-64 = 6 quality, which when output again in SAM format (which uses the Sanger FASTQ scheme) becomes 6+33 = ASCII 39 = ' (single quote).

      Solution - there is a command line option to tell bwa you have a Sanger style FASTQ file. Use it, otherwise you get a bad SAM/BAM file.

      Comment


      • #4
        Thank you for your help Maubp,

        Your explanations helped me to find the solutions

        The problem was in "bwa aln" cmmand
        /opt/bwa-0.6.2/bwa aln -t 10 -f sample2New.sai -I ref sample2New.fq

        from the documentation we can se "-I The input is in the Illumina 1.3+ read format (quality equals ASCII-64). ". So, everything is OK when I ommit the -I option.

        /opt/bwa-0.6.2/bwa aln -t 10 -f sample2New.sai ref sample2New.fq


        Once again, Thank You for your help.

        Comment


        • #5
          Well done - and thank you for posting back with the details for anyone searching about this again in the future.

          (I couldn't remember the details about the switch, and wasn't at a machine where I could quickly check - but this way you'll probably remember the problem and solution )

          Comment

          Latest Articles

          Collapse

          • seqadmin
            The Impact of AI in Genomic Medicine
            by seqadmin



            Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
            02-26-2024, 02:07 PM
          • seqadmin
            Multiomics Techniques Advancing Disease Research
            by seqadmin


            New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

            A major leap in the field has
            ...
            02-08-2024, 06:33 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 02-28-2024, 06:12 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-23-2024, 04:11 PM
          0 responses
          69 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-21-2024, 08:52 AM
          0 responses
          77 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-20-2024, 08:57 AM
          0 responses
          67 views
          0 likes
          Last Post seqadmin  
          Working...
          X