Seqanswers Leaderboard Ad

**vjimenez** · 07-15-2010, 02:14 PM

If you are working in LINUX, you can use awk as follows:

awk '$12 ~ /Y/{print "@"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$9"\n+"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$10}' s_1_export.txt > s_1_sequence.txt

**Asifullah** · 08-11-2010, 12:44 AM

Dear All,
I am anew user and i am analyzing Illumina NGS data. I downloaded the bowtie on Linux on 32 bit Linux system for reference based assembly. I sucessfully follow its tutorial for aligning an exemplary data already given within software folder. But I am stuck at Samtool step of aligning visualization. could some one please help me beyond that step. I thing i can,t compiled accurately the Samtool. could you please provide ready to run compiled version of samtool for 32 bit Suse linx system. I will higly oblige. my email address for corresponding is ([email protected]).
Thanks all and sorry if my question is too silly as i am a new user of bowite.

Asif

**Asifullah** · 08-11-2010, 12:49 AM

Originally posted by kwebb View Post

Hi

I'm trying to work through some of the various assembler programs before actually collecting my own Illumina data. I've found some test datasets here:

SHARCGS

http://sharcgs.molgen.mpg.de/download.shtml

but I'm not sure if the file formats are the same as raw data from the Genome Analzyer.

The files are s_4_seq.txt and s_4_prb.txt and the first few lines look like this:
s_4_seq.txt
4 1 56 910 AACTTACAATTGAAAATATAAACTCAT
4 1 64 716 AAGATGATTATATGTCTTCCTTTTCGA
4 1 890 894 TCAAACCAATCAGACCTATGTTTCATA

s_4_prb.txt
40 -40 -40 -40 40 -40 -40 -40 -40 40 -40 -40 -40 -4
0 -40 40 -40 -40 -40 40 40 -40 -40 -40 -40 40 -40
-40 40 -40 -40 -40 40 -40 -40 -40 -40 -40 -40 40

So my questions are
1. Is this the raw data format from the machine?
2. How do I get these files into fastq format? The maq converter and sanger perl scripts previously mentioned do not seem to work.

Thank you!

Hi,

I my self facing the same format within my illumina sequencing file which you have shown here. could you please provide me any perl script for converting such data in to fasta or fastq format. i will be highly oblige to find any guidelines from your side. my email address for corresponding is (asifullah111"gmail.com).

regards
asif

**husamia** · 08-25-2010, 11:38 AM

Originally posted by vjimenez View Post

If you are working in LINUX, you can use awk as follows:

awk '$12 ~ /Y/{print "@"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$9"\n+"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$10}' s_1_export.txt > s_1_sequence.txt

just to clarify, is this to convert the format SCARF ASCII mentioned above? is there any quality trimming done? because I got a file that was smaller than what I expected. I started out with file that has 43,236,910 reads to a file that has 80,81,040 lines. here is sample of input to I take it same as above post
HWI-EAS393 0031 5 1 1295 9710 0 3 AGACGTGTGTCTGAGTAAGGAACCCGCGGGGAAGGG ]PLLPU\]Z_`^`L`aL^`LYb^bbc`^^cH``TL^ c10.fa 130687332 F 3A26T3T1 70 188 128 R Y

**vschulz** · 09-02-2010, 06:58 AM

The awk line only outputs sequences with Y in the 12th (QC??) field. If you want all sequences in fastq output, you can do

awk ' {print "@"$1"_000"$2":"$3":"$4":"$5":"$6"#"$7"/"$8"\n"$9"\n+"$1"_000"$2":"$3":"$4":"$5":"$6"# "$7"/"$8"\n"$10}' s_1_export.txt > s_1_sequence.txt

caveats that I don't know awk , but output seems correct.

**cgkumar** · 12-02-2010, 10:55 AM

Originally posted by alig View Post

To lparsons,

Thank you. Yes I realised that later after I'd sent my post.

Also in case anyone else is looking to separate a fastq file into seq.fasta & qual.fasta files you actually need the other command within Maq

fq_all2std.pl std2qual <out.prefix> <in.fastq>

Thanks again

alig

Hi,

I need to convert Illumina files into .seq and .qual for Phrap. I am unable to find the newest version of "fq_all2std.pl" with the "std2qual". Is there any other program that would convert the Illumina quality characters into phred qualities?

Thanks,
Charu

**alig** · 12-02-2010, 05:12 PM

fq_all2std

Hi,

the "std2qual" is part of the perl script "fq_all2std.pl" which comes with maq-0.7.1

thanks

ali

Topics	Statistics	Last Post
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, Today, 06:57 AM	0 responses 4 views 0 likes	Last Post by seqadmin Today, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, Yesterday, 07:17 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 21 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News