Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • DineshCyanam
    Compendia Bio
    • Oct 2010
    • 35

    #16
    Hi Johnathon,

    I have run into the same problem as you have. So I am now converting my Illumina sequence files to fastq format.

    I have use the same perl script as you have used:
    Code:
    >fq_all2std.pl sol2std s_2_1_sequence.txt
    Is the above command correct? Should I give an output file or does it create an output file by itself? This command has been running for 5 hrs now and it's just spitting out the fastq data on the screen. My read file is 3 GB. How long did it take for you to convert you files?

    Thank You

    Regards,
    Dinesh Cyanam

    Comment

    • jdanderson
      Member
      • Sep 2010
      • 45

      #17
      Hello Dinesh,

      Well first, you should probably stop your current run because it is just printing to screen. I learned the hard way that you have to tell it where to put the output; eg:

      fq_all2std.pl sol2std s_2_1_sequence.txt > place/where/i/want/my/file/s_2_1sequence.fq

      How long it takes depends a bit on the hardware of your computer. I don't remember how long it took me to do it off the top of my head, but 5 hours seems like its a bit excessive (although the computer i use has 8GB RAM and a quad processor). I would recommend not using the computer for anything else while its running and making sure its still running even when you're not using it (maybe change the hibernation time constraint).

      Although i did use the sol2std command and got a "good" output file from it, i had trouble running it in Tophat afterwards (not sure if that's what you're planning or not). I ended up using the export2std command with the export.txt file and got good results in Tophat. Actually, when i changed to export2std i had also changed a couple of other things (like reinstalling Tophat), so i can't say for sure that using export2std on export.txt file instead of sol2std on seq.txt file was the culprit. The reason may have been because I think the export2std command may have standardized the length of reads (which a couple of posts on here say is critical for running Tophat).


      I hope this helps. Let me know if you have anymore questions; i'd be happy to respond.
      Last edited by jdanderson; 10-03-2010, 04:19 PM. Reason: Expanded my answer

      Comment

      • DineshCyanam
        Compendia Bio
        • Oct 2010
        • 35

        #18
        Thanks Johnathon,

        Yes I did realize that I have to output it to a file and stopped the process and yes I am planning to use this fastq file in Tophat and then visualize the data on UCSC Genome browser. So ur suggesting that i use the export2std command instead of sol2std?

        And thanks for the prompt reply... Really appreciate that...

        Comment

        • DineshCyanam
          Compendia Bio
          • Oct 2010
          • 35

          #19
          Anyways I don't have access to the export files. I just found from another thread that the sol2std command adds a ! and the end of every sequence. So maybe thats why it failed in tophat for you. https://www.seqanswers.com/node/842

          So people are recommending the below command:
          Code:
          >maq sol2sanger <in.txt> <out.fq>
          I am trying with the above command. Will post my result here soon.
          Might need ur help in running tophat, Johnathon.

          Comment

          • jdanderson
            Member
            • Sep 2010
            • 45

            #20
            Hello Dinesh,

            Please let me know it turns out for you, I'm interested.

            In all honesty I have had more trouble with formatting issues than anything else.

            But again, let me know if i can be of any help. Good luck!

            -
            Johnathon

            Comment

            • jdanderson
              Member
              • Sep 2010
              • 45

              #21
              Hello Dinesh,

              Check out this thread about sol2sanger:

              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


              -
              Johnathon

              Comment

              • txm
                Junior Member
                • Oct 2010
                • 2

                #22
                Hi Johnathon and Dinesh,

                You can also try using Penn State's Galaxy - http://main.g2.bx.psu.edu/ for conversion between quality score formats. Amongst a set of other next gen seq tools, it has a FASTQ Groomer tool that converts a variety of quality score formats such as Illumina v1.3+, Solexa (Illumina pipeline prior to v1.3) to FastQSanger. It also checks for line breaks etc in your raw reads file. You can create an account and the items in your workflow will get saved as history items in your account.

                Hope this helps,
                txm

                Comment

                • jdanderson
                  Member
                  • Sep 2010
                  • 45

                  #23
                  Hello Txm,

                  Thank you for your post. That's funny because I had been to Galaxy's website before and looked under the Convert Formats header and didn't find the Illumina to Sanger module before. However, upon reading your post i went back and looked around more and found what you are talking about under the QC and Manipulation-> Fastq Groomer header. Thank you again, I am about to load my files and try it. Hopefully this will help with my latest issue with barcode demultiplexing:

                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                  Cheers,
                  Johnathon

                  Comment

                  • txm
                    Junior Member
                    • Oct 2010
                    • 2

                    #24
                    Yeah they also have a FASTQ Splitter tool, in the same section, if that's what you're talking about.
                    And atleast on Galaxy, for any manipulation with the fastq files, you need to have Sanger score format.

                    Comment

                    • jdanderson
                      Member
                      • Sep 2010
                      • 45

                      #25
                      Hello Txm,

                      Thanks for posting again Txm. I think the Splitter tool you mention is for dealing with paired-end runs, not so much for barcode demulitplexing. I think that there is something akin to a beta site for Galaxy where they do have modules and add-ons they are testing and they do have most of the FASTX Toolkit on there, but the BarCode Splitter tool is unfortunately not one of them.

                      Thank you for the posts. I can use all the help I can get!

                      Regards,
                      Johnathon

                      Comment

                      • DineshCyanam
                        Compendia Bio
                        • Oct 2010
                        • 35

                        #26
                        --- Deleted the post ---

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                          Here are nine questions we think about, in roughly the order they matter, before...
                          06-18-2026, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-17-2026, 06:09 AM
                        0 responses
                        40 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        102 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        123 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        114 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...