Seqanswers Leaderboard Ad

**labrat73** · 05-18-2011, 08:26 AM

Originally posted by Xi Wang View Post

You can use the script below (name it qseq2fastq.pl and replace the former one):

Code:

#!/usr/bin/perl

use warnings;
use strict;

while (<>) {
	chomp;
	my @parts = split /\t/;
	print "@","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
	print "$parts[8]\n";
	print "+","$parts[0]:$parts[2]:$parts[3]:$parts[4]:$parts[5]#$parts[6]/$parts[7]\n";
	print "$parts[9]\n";
}

Greetings Xi Wang,

I have tried to use this script to convert from minimal fastq format to one in which the read name is listed before the base qualities. Here is my command line:

$ perl qseq2fastq.pl sequence.fastq > test.fastq

However at each attempt, I get an empty output file and the "use of uninitialized value in concatenation (.) or string" message in the terminal. Please excuse my ignorance as I have only very limited knowledge of perl scripts. I would appreciate it very much if you could explain what I am doing wrong and give me step-by-step instructions on how to run this script.

Many thanks!

**sklages** · 05-18-2011, 11:41 AM

Originally posted by labrat73 View Post

Greetings Xi Wang,

I have tried to use this script to convert from minimal fastq format to one in which the read name is listed before the base qualities. Here is my command line:

$ perl qseq2fastq.pl sequence.fastq > test.fastq

However at each attempt, I get an empty output file and the "use of uninitialized value in concatenation (.) or string" message in the terminal. Please excuse my ignorance as I have only very limited knowledge of perl scripts. I would appreciate it very much if you could explain what I am doing wrong and give me step-by-step instructions on how to run this script.

Many thanks!

You try to convert fastq to fastq; that's not the intention of the script. The above script converts qseq format to fastq.

**labrat73** · 05-18-2011, 02:00 PM

Originally posted by sklages View Post

You try to convert fastq to fastq; that's not the intention of the script. The above script converts qseq format to fastq.

sklages-

thanks so much for your reply. i'm a bit confused because my file has the fastq extension and it looks like this:

@SRR101483.1 SCS_0014:6:1:1063:16736/1
GCGTAGGCTCTATCCCTAGAATGCAAAGGTGGTTCAACATACACAGATCAATAAATGTGATTCAC
+
DDDBDCC=D-5AA<B--CAAC5?A5@CC-=AA>>5CC:5=?:A5AC:C?D:C

:>

5?==@A@

when i try to run it, though, i keep getting an error. i compared it to other files that i've run and that's when i noticed that in other files, the title name appears again after the "+", immediately before the base qualities. i'm trying to convert or edit this file so that it looks like this:

@SRR101483.1 SCS_0014:6:1:1063:16736/1
GCGTAGGCTCTATCCCTAGAATGCAAAGGTGGTTCAACATACACAGATCAATAAATGTGATTCAC
+SRR101483.1 SCS_0014:6:1:1063:16736/1
DDDBDCC=D-5AA<B--CAAC5?A5@CC-=AA>>5CC:5=?:A5AC:C?D:C

:>

5?==@A@

i hope this makes sense and appreciate any advice you could offer.

best-

labrat73

**maubp** · 05-18-2011, 02:57 PM

Use [ code ] and [ /code ] tags to prevent the forum messing up the display of examples.

Your files is already FASTQ format - without the redundant optional repeated identifier on the plus lines. You don't need to make that change.

As sklages said earlier, the script this thread is about converting from the Illumina qseq format into FASTQ.

**arcolombo698** · 06-25-2014, 04:51 PM

fastq validator

has anyone tried using this to test?

I have a very similar problem here where my .txt is in this format
where there is no line break after the '+'... however this is still in fastq format because the '+' line is optional... however some people here were still getting errors in the format i have posted below

has anyone used http://genome.sph.umich.edu/wiki/FastQValidator ?

@HWI-ST604_0134:4:1101:1391:1882#0/1
NATAGTGCTTTAGCATCATATCTAAGGCTGTTCGTCCTACATTGTTGAGGAAACAACTATGACCTCCCTTGGGTCGGTTGCTATGCAA AGCAATGCTAACA
+HWI-ST604_0134:4:1101:1391:1882#0/1
BUXRMZ[Z[[cccccccccccccccccccccccccccccc\cccccccccc_cccUYcccccccaccUYccccc_ccc__a\cac\_V __^X^^^\^^[^\
@HWI-ST604_0134:4:1101:1493:1886#0/1
NTAGATAATGATGCCACTGTTACAACTCTGTGCTTTGGGGTACCTAACAAGTCTCCCTCAGTGCCTCTCTGATTTGTAGCTAGTCAAT AGAATGAATAAAG
+HWI-ST604_0134:4:1101:1493:1886#0/1
BUXYX[[Z[[cccccc_cccccccc_ccccccccccc\ccZ____ccc_ccccccccccc[____ccccc_[cc_c_ccc_c_c_cc_ \_BBBBBBBBBBB

**sklages** · 06-25-2014, 09:51 PM

Originally posted by arcolombo698 View Post

has anyone tried using this to test?

I have a very similar problem here where my .txt is in this format
where there is no line break after the '+'... however this is still in fastq format because the '+' line is optional... however some people here were still getting errors in the format i have posted below

has anyone used http://genome.sph.umich.edu/wiki/FastQValidator ?

@HWI-ST604_0134:4:1101:1391:1882#0/1
NATAGTGCTTTAGCATCATATCTAAGGCTGTTCGTCCTACATTGTTGAGGAAACAACTATGACCTCCCTTGGGTCGGTTGCTATGCAA AGCAATGCTAACA
+HWI-ST604_0134:4:1101:1391:1882#0/1
BUXRMZ[Z[[cccccccccccccccccccccccccccccc\cccccccccc_cccUYcccccccaccUYccccc_ccc__a\cac\_V __^X^^^\^^[^\
@HWI-ST604_0134:4:1101:1493:1886#0/1
NTAGATAATGATGCCACTGTTACAACTCTGTGCTTTGGGGTACCTAACAAGTCTCCCTCAGTGCCTCTCTGATTTGTAGCTAGTCAAT AGAATGAATAAAG
+HWI-ST604_0134:4:1101:1493:1886#0/1
BUXYX[[Z[[cccccc_cccccccc_ccccccccccc\ccZ____ccc_ccccccccccc[____ccccc_[cc_c_ccc_c_c_cc_ \_BBBBBBBBBBB

I don't get it. There is a "linebreak" (newline) after your '+' line. So this is normal fastq format.

Btw, the '+' line is *not* optional, its content is! There must always be at least the '+' sign as header for the quality line. But it is optional to write any information after that (in the same line).

**Brian Bushnell** · 06-25-2014, 10:28 PM

The problem I see is that bases and qualities both have a spaces in them, but otherwise it looks fine.

**sklages** · 06-25-2014, 10:30 PM

Originally posted by Brian Bushnell View Post

The problem I see is that bases and qualities both have a spaces in them, but otherwise it looks fine.

You're right, maybe a copy&paste issue ..?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News