Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What does Illumina raw data look like?

    Hi

    I'm trying to work through some of the various assembler programs before actually collecting my own Illumina data. I've found some test datasets here:



    but I'm not sure if the file formats are the same as raw data from the Genome Analzyer.

    The files are s_4_seq.txt and s_4_prb.txt and the first few lines look like this:
    s_4_seq.txt
    4 1 56 910 AACTTACAATTGAAAATATAAACTCAT
    4 1 64 716 AAGATGATTATATGTCTTCCTTTTCGA
    4 1 890 894 TCAAACCAATCAGACCTATGTTTCATA

    s_4_prb.txt
    40 -40 -40 -40 40 -40 -40 -40 -40 40 -40 -40 -40 -4
    0 -40 40 -40 -40 -40 40 40 -40 -40 -40 -40 40 -40
    -40 40 -40 -40 -40 40 -40 -40 -40 -40 -40 -40 40

    So my questions are
    1. Is this the raw data format from the machine?
    2. How do I get these files into fastq format? The maq converter and sanger perl scripts previously mentioned do not seem to work.

    Thank you!

  • #2
    Update

    I've managed to convert my data using the solexa2fasta.pl script. However the tool included with Maq, sol2sanger, does not work with my data. Can someone please explain?

    Thank you!

    Comment


    • #3
      There's a really good tool in the maq package (latest release) called fq_all2std.pl

      See below:


      Usage: fq_all2std.pl <command> <in.txt>

      Command: scarf2std Convert SCARF format to the standard/Sanger FASTQ
      fqint2std Convert FASTQ-int format to the standard/Sanger FASTQ
      sol2std Convert Solexa/Illumina FASTQ to the standard FASTQ
      fa2std Convert FASTA to the standard FASTQ
      seqprb2std Convert .seq and .prb files to the standard FASTQ
      fq2fa Convert various FASTQ-like format to FASTA
      export2sol Convert Solexa export format to Solexa FASTQ
      export2std Convert Solexa export format to Sanger FASTQ
      csfa2std Convert AB SOLiD read format to Sanger FASTQ
      instruction Explanation to different format
      example Show examples of various formats

      Note: Read/quality sequences MUST be presented in one line.

      Comment


      • #4
        There's a really good tool in the maq package (latest release) called fq_all2std.pl

        See below:


        Usage: fq_all2std.pl <command> <in.txt>

        Command: scarf2std Convert SCARF format to the standard/Sanger FASTQ
        fqint2std Convert FASTQ-int format to the standard/Sanger FASTQ
        sol2std Convert Solexa/Illumina FASTQ to the standard FASTQ
        fa2std Convert FASTA to the standard FASTQ
        seqprb2std Convert .seq and .prb files to the standard FASTQ
        fq2fa Convert various FASTQ-like format to FASTA
        export2sol Convert Solexa export format to Solexa FASTQ
        export2std Convert Solexa export format to Sanger FASTQ
        csfa2std Convert AB SOLiD read format to Sanger FASTQ
        instruction Explanation to different format
        example Show examples of various formats

        Note: Read/quality sequences MUST be presented in one line.

        Comment


        • #5
          Great tool!

          Thanks for the info!

          Comment


          • #6
            I have similar data, seq.txt and prb just like you,
            seq.txt
            ........................................................................
            6 1 914 893 GCTACTGCCGTGACCTCATTTCTCTTA
            6 1 898 905 GAAAAAGAGAAAGTTTAGGAGATCGAT
            .....................................................................................
            prob.txt
            .....................................................................................
            -30 -30 30 -30 -30 30 -30 -30 -30 -30 -30 30 30 -30 -30
            -30 -30 30 -30 -30 -30 -30 -30 30 -30 -30 30 -30 -30 3
            0 -30 -30 -30 30 -30 -30 -30 -30 30 -30 -30 -30 -30 30
            -30 -30 30 -30 30 -30 -30 -30 -30 30 -30 -30 -30 30 -30
            -30 -30 -30 -30...
            .............................................................................

            but when i run
            fq_all2std.pl seqprb2std seq.txt prb.txt
            The output is like following,
            ...........................................
            @6:1:914:893
            GCTACTGCCGTGACCTCATTTCTCTTA
            +
            ???????????????????????????
            ..................................................

            And i had lots of the warnings, similar things like this

            but there is other problems, i got lots of warning message like this:
            Argument "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..." isn't numeric in numeric gt (>) at /usr/local/bin/fq_all2std.pl line 152, <$fhq> line 6609.
            ....................................................................................................................................................................................................... line ......


            i wonder if this kind of warning is happening to others too, if so, what do you think the problem is?
            now i am checking my prb.txt, i guess there is some lines which was not accpeted.
            Last edited by hannat; 01-26-2009, 04:00 AM.

            Comment


            • #7
              Hi,

              Re : There's a really good tool in the maq package (latest release) called fq_all2std.pl

              I tried to use

              fq2fa Convert various FASTQ-like format to FASTA

              to convert my illumina seq data from fastq to fasta as I want the quality in fasta format to run Mosaik's gigBayes program.

              But the Maq perl script fq_all2std.pl fq2fa <in.txt> command

              just seemed to print the results to the screen & not place them in a fasta file.

              Am I doing something really silly here?

              Only I've got a 1.8 Gb illumina seq text file so this process takes a while & I need it in a file, not printed to the screen

              thanks alig

              Comment


              • #8
                You simply need to redirect the standard output (which is printing to your screen) to a file:

                fq_all2std.pl fq2fa in.txt > out.fasta

                See http://www.december.com/unix/tutor/redirect.html for more info.

                Comment


                • #9
                  convert fastq to fasta

                  To lparsons,

                  Thank you. Yes I realised that later after I'd sent my post.

                  Also in case anyone else is looking to separate a fastq file into seq.fasta & qual.fasta files you actually need the other command within Maq

                  fq_all2std.pl std2qual <out.prefix> <in.fastq>

                  Thanks again

                  alig

                  Comment


                  • #10
                    Bowtie

                    Has anybody used Bowtie for mapping?

                    Comment


                    • #11
                      Bowtie for alignment

                      Originally posted by [email protected] View Post
                      Has anybody used Bowtie for mapping?
                      Oh yeah! We have. And that is the best that I've come across in my career for alignment of short reads. Just too fast - Great for expression data.

                      Spade

                      Comment


                      • #12
                        Originally posted by spadejac View Post
                        Oh yeah! We have. And that is the best that I've come across in my career for alignment of short reads. Just too fast - Great for expression data.

                        Spade
                        I heard bowtie is great for mapping Chromatin IPs and RNA back to a reference but isn't as good as MAQ for finding snps though. Is this accurate?

                        Comment


                        • #13
                          hi,
                          everyone, I am a new user of BWA. Greatly appreciate if I could get any of your help!
                          I have paired-end Solexa data (in two files s_2_1.export.txt ; s_2_2_export.txt) presented in the following format (SCARF ASCII with mapping information)

                          HWI-EAS433 16 3 11 255 71 0 2 TGAAAGGGAATATCTTCATATAAAATCTAGACAAAAGCATTCTCAGAATC abbb``b_`aaab_bb``babaa_`a^b_a__aaa`aa`aa`_`aa[^a_
                          chr9.fa 66572916 F G32G3A10G1 33 0 chr7.fa 61087451 R Y

                          now, I would like to convert the Solexa export file to fastq format file so that I could use BWA, I tried the scripts fq_all2std.pl export2std command, but it doesn't work. i also tried scarf2std command, it converted my file, but the export file was not the fastq format, there was other information (Eland mapping position also included in the output file.

                          I don't have any experience to write perl or other scripts.
                          Could you please help me?
                          Many thanks!

                          Comment


                          • #14
                            script format converter

                            Originally posted by alig View Post
                            To lparsons,

                            Thank you. Yes I realised that later after I'd sent my post.

                            Also in case anyone else is looking to separate a fastq file into seq.fasta & qual.fasta files you actually need the other command within Maq

                            fq_all2std.pl std2qual <out.prefix> <in.fastq>

                            Thanks again

                            alig
                            Hi,

                            I checked in my maq 0.7.1 version for this script but I didn't find it...do you know if it is anymore available or did you find it as a supplemetary maq script? Thanks

                            Comment


                            • #15
                              @hannat..

                              Did you get the solution of your problem. I have same kind of problem with my data.

                              Thanks
                              ~Adnan~

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X