Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Johnathon,

    I have run into the same problem as you have. So I am now converting my Illumina sequence files to fastq format.

    I have use the same perl script as you have used:
    Code:
    >fq_all2std.pl sol2std s_2_1_sequence.txt
    Is the above command correct? Should I give an output file or does it create an output file by itself? This command has been running for 5 hrs now and it's just spitting out the fastq data on the screen. My read file is 3 GB. How long did it take for you to convert you files?

    Thank You

    Regards,
    Dinesh Cyanam

    Comment


    • #17
      Hello Dinesh,

      Well first, you should probably stop your current run because it is just printing to screen. I learned the hard way that you have to tell it where to put the output; eg:

      fq_all2std.pl sol2std s_2_1_sequence.txt > place/where/i/want/my/file/s_2_1sequence.fq

      How long it takes depends a bit on the hardware of your computer. I don't remember how long it took me to do it off the top of my head, but 5 hours seems like its a bit excessive (although the computer i use has 8GB RAM and a quad processor). I would recommend not using the computer for anything else while its running and making sure its still running even when you're not using it (maybe change the hibernation time constraint).

      Although i did use the sol2std command and got a "good" output file from it, i had trouble running it in Tophat afterwards (not sure if that's what you're planning or not). I ended up using the export2std command with the export.txt file and got good results in Tophat. Actually, when i changed to export2std i had also changed a couple of other things (like reinstalling Tophat), so i can't say for sure that using export2std on export.txt file instead of sol2std on seq.txt file was the culprit. The reason may have been because I think the export2std command may have standardized the length of reads (which a couple of posts on here say is critical for running Tophat).


      I hope this helps. Let me know if you have anymore questions; i'd be happy to respond.
      Last edited by jdanderson; 10-03-2010, 04:19 PM. Reason: Expanded my answer

      Comment


      • #18
        Thanks Johnathon,

        Yes I did realize that I have to output it to a file and stopped the process and yes I am planning to use this fastq file in Tophat and then visualize the data on UCSC Genome browser. So ur suggesting that i use the export2std command instead of sol2std?

        And thanks for the prompt reply... Really appreciate that...

        Comment


        • #19
          Anyways I don't have access to the export files. I just found from another thread that the sol2std command adds a ! and the end of every sequence. So maybe thats why it failed in tophat for you. https://www.seqanswers.com/node/842

          So people are recommending the below command:
          Code:
          >maq sol2sanger <in.txt> <out.fq>
          I am trying with the above command. Will post my result here soon.
          Might need ur help in running tophat, Johnathon.

          Comment


          • #20
            Hello Dinesh,

            Please let me know it turns out for you, I'm interested.

            In all honesty I have had more trouble with formatting issues than anything else.

            But again, let me know if i can be of any help. Good luck!

            -
            Johnathon

            Comment


            • #21
              Hello Dinesh,

              Check out this thread about sol2sanger:

              Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


              -
              Johnathon

              Comment


              • #22
                Hi Johnathon and Dinesh,

                You can also try using Penn State's Galaxy - http://main.g2.bx.psu.edu/ for conversion between quality score formats. Amongst a set of other next gen seq tools, it has a FASTQ Groomer tool that converts a variety of quality score formats such as Illumina v1.3+, Solexa (Illumina pipeline prior to v1.3) to FastQSanger. It also checks for line breaks etc in your raw reads file. You can create an account and the items in your workflow will get saved as history items in your account.

                Hope this helps,
                txm

                Comment


                • #23
                  Hello Txm,

                  Thank you for your post. That's funny because I had been to Galaxy's website before and looked under the Convert Formats header and didn't find the Illumina to Sanger module before. However, upon reading your post i went back and looked around more and found what you are talking about under the QC and Manipulation-> Fastq Groomer header. Thank you again, I am about to load my files and try it. Hopefully this will help with my latest issue with barcode demultiplexing:

                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                  Cheers,
                  Johnathon

                  Comment


                  • #24
                    Yeah they also have a FASTQ Splitter tool, in the same section, if that's what you're talking about.
                    And atleast on Galaxy, for any manipulation with the fastq files, you need to have Sanger score format.

                    Comment


                    • #25
                      Hello Txm,

                      Thanks for posting again Txm. I think the Splitter tool you mention is for dealing with paired-end runs, not so much for barcode demulitplexing. I think that there is something akin to a beta site for Galaxy where they do have modules and add-ons they are testing and they do have most of the FASTX Toolkit on there, but the BarCode Splitter tool is unfortunately not one of them.

                      Thank you for the posts. I can use all the help I can get!

                      Regards,
                      Johnathon

                      Comment


                      • #26
                        --- Deleted the post ---

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM
                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        31 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        32 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 09:21 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-04-2024, 09:00 AM
                        0 responses
                        53 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X