Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tonybert
    Member
    • Aug 2012
    • 38

    fastx_quality_stats error with paired end sequencesr

    Greetings, I have just recently received a HiSeq Illumina run (paired end, 72bp) of several genomes and metagenomes.

    I am currently trying to retrieve quality stat info for the demultiplexed samples after combining the two paired end .fastq files using shuffleSeqs.pl. When using fastx_quality_stats on the resulting combined file, i receive the following error:

    fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 5. Is this a valid FASTQ file?

    I went back and tried using fastx_quality_stats on both of the paired end samples independently, and it worked just fine.

    Just curious if anyone else has run into a similar problem with trying to combine paired end sequence data, and if they would be willing to offer advice or a solution. It am fairly certain the combination step is the portion of the process that is introducing the problem.

    shuffleSeqs.pl was downloaded from the following website:


    Although i am fairly certain this is a part of the velvet package as well.

    Thanks,

    -Tony
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Well, what was line 5 of the file? Perhaps you could show us the output of the command 'head -n 10 example.fastq' or similar? Use the [ code ] and [ /code ] tags to ensure the forum displays the output nicely (available as a button in the advanced view editor).

    Comment

    • tonybert
      Member
      • Aug 2012
      • 38

      #3
      Below is your requested output (maubp):
      head -n 10 shuffled.fastq
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACACG
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GGTTTCGAAAAGAGGGGGGGGGGGGGAGAGGGGGGGAAACCGGTGGGGCCCCCCCCCAANAAAAAAAAAAAAAAAA
      +
      @@@DDDDDHFFDHGIID<C?<BHG<GGGDCBBHB;?FGHI9??BBFH@>GCF;A.-==@C@C36@;@D########
      +
      ############################################################################
      @HWI-ST700693:263:C0K6DACXX:3:1101:2339:2112
      CATGTAGTGAACCATATGCTCCAGTAATACCTTGAACAATGACTCCTTTATTTTCATAATCAGAATCCTCTGGTTT

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        That FASTQ file is certainly messed up. My guess is you used a FASTA interleaving script which assumed 2 lines per record... while FASTQ files usually have 4 lines per record.

        Which script exactly did you use? There is no shuffleSequences.pl script on Nick's blog post - it just mentions using Velvet’s bundled Perl script of that name.

        Comment

        • tonybert
          Member
          • Aug 2012
          • 38

          #5
          Thanks for the prompt reply! Below is the script I used:

          $ cat shuffleSequences.pl
          #!/usr/bin/perl

          $filenameA = $ARGV[0];
          $filenameB = $ARGV[1];
          $filenameOut = $ARGV[2];

          open $FILEA, "< $filenameA";
          open $FILEB, "< $filenameB";

          open $OUTFILE, "> $filenameOut";

          while(<$FILEA>) {
          print $OUTFILE $_;
          $_ = <$FILEA>;
          print $OUTFILE $_;

          $_ = <$FILEB>;
          print $OUTFILE $_;
          $_ = <$FILEB>;
          print $OUTFILE $_;
          }

          Comment

          • tonybert
            Member
            • Aug 2012
            • 38

            #6
            as well, this script was not actually on Nick Loman's blog, however it was mentioned in the text. I copied it from following website:

            Comment

            • maubp
              Peter (Biopython etc)
              • Jul 2009
              • 1544

              #7
              I really can't recommend running random Perl scripts found online like that - it doesn't even have a comment at the start telling you what it should be doing. However, from my limited Perl knowledge, I think it is doing a very simple interleaving process assuming 2 lines per record, which would be OK for short read FASTA files with no line wrapping, but it does absolutely no error checking - thus it mangled your data without warning.

              If you look at the actual Velvet repository, it has some more clearly labelled Perl scripts, with a version for FASTA and another for FASTQ:
              Short read de novo assembler using de Bruijn graphs, as published in: D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 1...


              (They still need a bit of documentation, and in my personal view, error handling)

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 10:09 AM
              0 responses
              10 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              19 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              27 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Working...