Hello dear ngs community,
I am new to this forum but allready red many threads which helped me alot.
So i found many ways in this forum to convert illumina fastq quality scores into sanger fastq phred scores. My Data comes from sequencer which use Illumina 1.5 (thx to fastqc ). For my Diploma thesis (iam the last of my kind with Diploma ) i write a pipleline script in ruby. Therefore i use the tools bwa samtools, gatk and picard. My Prof. wants me to convert all fasq files to sanger fastq. So i read about bioruby maq and other tools but did come to the conclusion that i want to write it on my own so the user of the script wont need to install even more tools or patch bwa for my tool to correctly use it. Thats why i experimented with ASCII codes in ruby and got some result and i want to doublecheck this results with your comments.
my results:
here a exampe read:
"NACGTTATACTTGTTAGCACAATCCAAGCTAGGCTAAGAAGTTCAAACATGGTGGACGTACCCACTGATCTTTTG "
illumina 1.5 score
"BIKKGQNMLL[[[[[Y[[[[_______________YYYYYYYYYY[[[[[[Y[[YY[[[[_____________QQ"
(in numbers
66 73 75 75 71 81 78 77 76 76 91 91 91 91 91 89 91 91 91 91 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 89 89 89 89 89 89 89 89 89 89 91 91 91 91 91 91 89 91 91 89 89 91 91 91 91 95 95 95 95 95 95 95 95 95 95 95 95 95 81 81
sanger score
"#*,,(2/.--<<<<<:<<<<@@@@@@@@@@@@@@@::::::::::<<<<<<:<<::<<<<@@@@@@@@@@@@@22"
(in numbers)
35 42 44 44 40 50 47 46 45 45 60 60 60 60 60 58 60 60 60 60 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 58 58 58 58 58 58 58 58 58 58 60 60 60 60 60 60 58 60 60 58 58 60 60 60 60 64 64 64 64 64 64 64 64 64 64 64 64 64 50 50
i got the sanger score from athread in this forum who uses a commandline for converting it in bam files (couldn"t find the thread again):
samtools view -h chrYvs48_2_1_KESC1_mymod_48_2_2_KESC1_mymod.bam | perl -lane '$"="\t"; if (/^@/) {print;} else {$F[10]=~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;print "@F"}' | samtools view -Sbh - > Phred_score.bam
so my question is, can i simply substract 31 to the numbers and i get a sanger quality score ?And there was something with offsets if i recognize correcly... I would converts this number again into ascii and replace them with the scores in the fasq file.
Is this the correct way or where did i mistakes.
Thank you in Advance Alex
I am new to this forum but allready red many threads which helped me alot.
So i found many ways in this forum to convert illumina fastq quality scores into sanger fastq phred scores. My Data comes from sequencer which use Illumina 1.5 (thx to fastqc ). For my Diploma thesis (iam the last of my kind with Diploma ) i write a pipleline script in ruby. Therefore i use the tools bwa samtools, gatk and picard. My Prof. wants me to convert all fasq files to sanger fastq. So i read about bioruby maq and other tools but did come to the conclusion that i want to write it on my own so the user of the script wont need to install even more tools or patch bwa for my tool to correctly use it. Thats why i experimented with ASCII codes in ruby and got some result and i want to doublecheck this results with your comments.
my results:
here a exampe read:
"NACGTTATACTTGTTAGCACAATCCAAGCTAGGCTAAGAAGTTCAAACATGGTGGACGTACCCACTGATCTTTTG "
illumina 1.5 score
"BIKKGQNMLL[[[[[Y[[[[_______________YYYYYYYYYY[[[[[[Y[[YY[[[[_____________QQ"
(in numbers
66 73 75 75 71 81 78 77 76 76 91 91 91 91 91 89 91 91 91 91 95 95 95 95 95 95 95 95 95 95 95 95 95 95 95 89 89 89 89 89 89 89 89 89 89 91 91 91 91 91 91 89 91 91 89 89 91 91 91 91 95 95 95 95 95 95 95 95 95 95 95 95 95 81 81
sanger score
"#*,,(2/.--<<<<<:<<<<@@@@@@@@@@@@@@@::::::::::<<<<<<:<<::<<<<@@@@@@@@@@@@@22"
(in numbers)
35 42 44 44 40 50 47 46 45 45 60 60 60 60 60 58 60 60 60 60 64 64 64 64 64 64 64 64 64 64 64 64 64 64 64 58 58 58 58 58 58 58 58 58 58 60 60 60 60 60 60 58 60 60 58 58 60 60 60 60 64 64 64 64 64 64 64 64 64 64 64 64 64 50 50
i got the sanger score from athread in this forum who uses a commandline for converting it in bam files (couldn"t find the thread again):
samtools view -h chrYvs48_2_1_KESC1_mymod_48_2_2_KESC1_mymod.bam | perl -lane '$"="\t"; if (/^@/) {print;} else {$F[10]=~ tr/\x40-\xff\x00-\x3f/\x21-\xe0\x21/;print "@F"}' | samtools view -Sbh - > Phred_score.bam
so my question is, can i simply substract 31 to the numbers and i get a sanger quality score ?And there was something with offsets if i recognize correcly... I would converts this number again into ascii and replace them with the scores in the fasq file.
Is this the correct way or where did i mistakes.
Thank you in Advance Alex
Comment