Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Solexa new format to FASTQ

    Hi,
    1. I got from Illumina sequnces in the following single-lined format:
    HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
    XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

    Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.

    2. Many of the sequences are in the format <seq tag><3' adaptor><AAA...>. Therefore I think MAQ fails to remove the 3' adaptor (because it is not in the 3' end of the sequence). Any idea how to overcome this in MAQ or other progrms?

    Thanks
    Asaf

  • #2
    You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

    Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.



    Option Two - Use Biopython 1.51b (or later)

    Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

    Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).

    Comment


    • #3
      Originally posted by asafle View Post
      Hi,
      1. I got from Illumina sequnces in the following single-lined format:
      HWI-EAS306:1:1:16:678#0/1:GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN:a_\
      XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

      Any idea how to convert to fastq in order to use MAQ? (without loosing quality scores). I didn't see that the fq_all2std.pl script can handle this format.
      I didn't read you message quite carefully enough. That looks like a 50bp read, a kind of FASTQ entry forced onto one line. Are there any tabs in there? What was the filename - the extension might be of interest?

      I would guess converted to an Illumina 1.3+ FASTQ file it probably looks like this:

      Code:
      @HWI-EAS306:1:1:16:678#0/1
      GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
      +
      a_\XNW`NKQ]X]UBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
      Or, as a Sanger standard FASTQ file,

      Code:
      @HWI-EAS306:1:1:16:678#0/1
      GGGGCTGTAGCTCAGNTGGTCGTATGNNNNNNNNNNNNNNNNNNNNNNNN
      +
      B@=9/8A/,2>9>6####################################
      Converted to a PHRED QUAL file,

      Code:
      >HWI-EAS306:1:1:16:678#0/1
      33 31 28 24 14 23 32 14 11 17 29 24 29 21 2 2 2 2 2 2 2 2 2
      2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
      If you have some other files with this one, you can probably confirm if this interpretation is correct or not.

      Peter

      Comment


      • #4
        Originally posted by maubp View Post
        You have several options to convert Illumina 1.3+ FASTQ to Sanger FASTQ. All you really need to do is shift the ASCII values of the quality string as they both use PHRED scores.

        Option One - Use an updated MAQ fq_all2std.pl script, there is a patch for Illumina to Sanger, but it isn't included in MAQ yet, see e.g.



        Option Two - Use Biopython 1.51b (or later)

        Option Three - Use the latest BioPerl (not sure if this code is in a public release yet)

        Option Four - Use the latest EMBOSS seqret (but there are a couple of minor issues in version 6.1.0 to watch out for).
        Too add an option I'd recommend to patch maq with : this patch
        hope it's helpful

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        68 views
        0 likes
        Last Post seqadmin  
        Working...
        X