Unconfigured Ad

**DineshCyanam** · 10-03-2010, 04:03 PM

Hi Johnathon,

I have run into the same problem as you have. So I am now converting my Illumina sequence files to fastq format.

I have use the same perl script as you have used:

Code:

>fq_all2std.pl sol2std s_2_1_sequence.txt

Is the above command correct? Should I give an output file or does it create an output file by itself? This command has been running for 5 hrs now and it's just spitting out the fastq data on the screen. My read file is 3 GB. How long did it take for you to convert you files?

Thank You

Regards,
Dinesh Cyanam

**jdanderson** · 10-03-2010, 04:16 PM

Hello Dinesh,

Well first, you should probably stop your current run because it is just printing to screen. I learned the hard way that you have to tell it where to put the output; eg:

fq_all2std.pl sol2std s_2_1_sequence.txt > place/where/i/want/my/file/s_2_1sequence.fq

How long it takes depends a bit on the hardware of your computer. I don't remember how long it took me to do it off the top of my head, but 5 hours seems like its a bit excessive (although the computer i use has 8GB RAM and a quad processor). I would recommend not using the computer for anything else while its running and making sure its still running even when you're not using it (maybe change the hibernation time constraint).

Although i did use the sol2std command and got a "good" output file from it, i had trouble running it in Tophat afterwards (not sure if that's what you're planning or not). I ended up using the export2std command with the export.txt file and got good results in Tophat. Actually, when i changed to export2std i had also changed a couple of other things (like reinstalling Tophat), so i can't say for sure that using export2std on export.txt file instead of sol2std on seq.txt file was the culprit. The reason may have been because I think the export2std command may have standardized the length of reads (which a couple of posts on here say is critical for running Tophat).

I hope this helps. Let me know if you have anymore questions; i'd be happy to respond.

**DineshCyanam** · 10-03-2010, 04:23 PM

Thanks Johnathon,

Yes I did realize that I have to output it to a file and stopped the process and yes I am planning to use this fastq file in Tophat and then visualize the data on UCSC Genome browser. So ur suggesting that i use the export2std command instead of sol2std?

And thanks for the prompt reply... Really appreciate that...

**DineshCyanam** · 10-03-2010, 04:36 PM

Anyways I don't have access to the export files. I just found from another thread that the sol2std command adds a ! and the end of every sequence. So maybe thats why it failed in tophat for you. https://www.seqanswers.com/node/842

So people are recommending the below command:

Code:

>maq sol2sanger <in.txt> <out.fq>

I am trying with the above command. Will post my result here soon.
Might need ur help in running tophat, Johnathon.

**jdanderson** · 10-03-2010, 04:44 PM

Hello Dinesh,

Please let me know it turns out for you, I'm interested.

In all honesty I have had more trouble with formatting issues than anything else.

But again, let me know if i can be of any help. Good luck!

-
Johnathon

**jdanderson** · 10-03-2010, 05:39 PM

Hello Dinesh,

Check out this thread about sol2sanger:

Maq - sol2sanger problem - different sizes for the pair? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=3310

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

-
Johnathon

**txm** · 10-04-2010, 01:07 PM

Hi Johnathon and Dinesh,

You can also try using Penn State's Galaxy - http://main.g2.bx.psu.edu/ for conversion between quality score formats. Amongst a set of other next gen seq tools, it has a FASTQ Groomer tool that converts a variety of quality score formats such as Illumina v1.3+, Solexa (Illumina pipeline prior to v1.3) to FastQSanger. It also checks for line breaks etc in your raw reads file. You can create an account and the items in your workflow will get saved as history items in your account.

Hope this helps,
txm

**jdanderson** · 10-04-2010, 04:04 PM

Hello Txm,

Thank you for your post. That's funny because I had been to Galaxy's website before and looked under the Convert Formats header and didn't find the Illumina to Sanger module before. However, upon reading your post i went back and looked around more and found what you are talking about under the QC and Manipulation-> Fastq Groomer header. Thank you again, I am about to load my files and try it. Hopefully this will help with my latest issue with barcode demultiplexing:

FASTX Toolkit barcode splitter issue - SEQanswers

http://seqanswers.com/forums/showthread.php?t=7117

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Cheers,
Johnathon

**txm** · 10-04-2010, 04:20 PM

Yeah they also have a FASTQ Splitter tool, in the same section, if that's what you're talking about.
And atleast on Galaxy, for any manipulation with the fastq files, you need to have Sanger score format.

**jdanderson** · 10-04-2010, 04:48 PM

Hello Txm,

Thanks for posting again Txm. I think the Splitter tool you mention is for dealing with paired-end runs, not so much for barcode demulitplexing. I think that there is something akin to a beta site for Galaxy where they do have modules and add-ons they are testing and they do have most of the FASTX Toolkit on there, but the BarCode Splitter tool is unfortunately not one of them.

Thank you for the posts. I can use all the help I can get!

Regards,
Johnathon

**DineshCyanam** · 10-06-2010, 06:41 AM

--- Deleted the post ---

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 40 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 102 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 123 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 114 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News