Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Converting Solexa new format to FASTQ

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Solexa new format to FASTQ

    Hi,
    1. I got from Illumina sequnces in the following single-lined format:
    HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
    XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

    Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.

    2. Many of the sequences are in the format <seq tag><3' adaptor><AAA...>. Therefore I think MAQ fails to remove the 3' adaptor (because it is not in the 3' end of the sequence). Any idea how to overcome this in MAQ or other progrms?

    Thanks
    Asaf

  • #2
    You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

    Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.

    http://sourceforge.net/mailarchive/f..._name=maq-help

    Option Two - Use Biopython 1.51b (or later)

    Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

    Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).

    Comment


    • #3
      Originally posted by asafle View Post
      Hi,
      1. I got from Illumina sequnces in the following single-lined format:
      HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
      XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

      Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.
      I didn't read you message quite carefully enough. That looks like a 50bp read, a kind of FASTQ entry forced onto one line. Are there any tabs in there? What was the filename - the extension might be of interest?

      I would guess converted to an Illumina 1.3+ FASTQ file it probably looks like this:

      Code:
      @HWI-EAS306:1:1:16:678#0/1
      GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
      +
      a_\XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
      Or, as a Sanger standard FASTQ file,

      Code:
      @HWI-EAS306:1:1:16:678#0/1
      GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
      +
      [email protected]=9/8A/,2>9>6####################################
      Converted to a PHRED QUAL file,

      Code:
      >HWI-EAS306:1:1:16:678#0/1
      33 31 28 24 14 23 32 14 11 17 29 24 29 21 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      If you have some other files with this one, you can probably confirm if this interpretation is correct or not.

      Peter

      Comment


      • #4
        Originally posted by maubp View Post
        You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

        Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.

        http://sourceforge.net/mailarchive/f..._name=maq-help

        Option Two - Use Biopython 1.51b (or later)

        Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

        Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).
        Too add an option I'd recommend to patch maq with : this patch
        hope it's helpful

        Comment

        Working...
        X