Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tonybert
    Member
    • Aug 2012
    • 38

    fastx_quality_stats error with paired end sequencesr

    Greetings, I have just recently received a HiSeq Illumina run (paired end, 72bp) of several genomes and metagenomes.

    I am currently trying to retrieve quality stat info for the demultiplexed samples after combining the two paired end .fastq files using shuffleSeqs.pl. When using fastx_quality_stats on the resulting combined file, i receive the following error:

    fastx_quality_stats: Invalid input: expecting FASTQ prefix character '@' on line 5. Is this a valid FASTQ file?

    I went back and tried using fastx_quality_stats on both of the paired end samples independently, and it worked just fine.

    Just curious if anyone else has run into a similar problem with trying to combine paired end sequence data, and if they would be willing to offer advice or a solution. It am fairly certain the combination step is the portion of the process that is introducing the problem.

    shuffleSeqs.pl was downloaded from the following website:


    Although i am fairly certain this is a part of the velvet package as well.

    Thanks,

    -Tony
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Well, what was line 5 of the file? Perhaps you could show us the output of the command 'head -n 10 example.fastq' or similar? Use the [ code ] and [ /code ] tags to ensure the forum displays the output nicely (available as a button in the advanced view editor).

    Comment

    • tonybert
      Member
      • Aug 2012
      • 38

      #3
      Below is your requested output (maubp):
      head -n 10 shuffled.fastq
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACACG
      @HWI-ST700693:263:C0K6DACXX:3:1101:2281:2077
      GGTTTCGAAAAGAGGGGGGGGGGGGGAGAGGGGGGGAAACCGGTGGGGCCCCCCCCCAANAAAAAAAAAAAAAAAA
      +
      @@@DDDDDHFFDHGIID<C?<BHG<GGGDCBBHB;?FGHI9??BBFH@>GCF;A.-==@C@C36@;@D########
      +
      ############################################################################
      @HWI-ST700693:263:C0K6DACXX:3:1101:2339:2112
      CATGTAGTGAACCATATGCTCCAGTAATACCTTGAACAATGACTCCTTTATTTTCATAATCAGAATCCTCTGGTTT

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        That FASTQ file is certainly messed up. My guess is you used a FASTA interleaving script which assumed 2 lines per record... while FASTQ files usually have 4 lines per record.

        Which script exactly did you use? There is no shuffleSequences.pl script on Nick's blog post - it just mentions using Velvet’s bundled Perl script of that name.

        Comment

        • tonybert
          Member
          • Aug 2012
          • 38

          #5
          Thanks for the prompt reply! Below is the script I used:

          $ cat shuffleSequences.pl
          #!/usr/bin/perl

          $filenameA = $ARGV[0];
          $filenameB = $ARGV[1];
          $filenameOut = $ARGV[2];

          open $FILEA, "< $filenameA";
          open $FILEB, "< $filenameB";

          open $OUTFILE, "> $filenameOut";

          while(<$FILEA>) {
          print $OUTFILE $_;
          $_ = <$FILEA>;
          print $OUTFILE $_;

          $_ = <$FILEB>;
          print $OUTFILE $_;
          $_ = <$FILEB>;
          print $OUTFILE $_;
          }

          Comment

          • tonybert
            Member
            • Aug 2012
            • 38

            #6
            as well, this script was not actually on Nick Loman's blog, however it was mentioned in the text. I copied it from following website:

            Comment

            • maubp
              Peter (Biopython etc)
              • Jul 2009
              • 1544

              #7
              I really can't recommend running random Perl scripts found online like that - it doesn't even have a comment at the start telling you what it should be doing. However, from my limited Perl knowledge, I think it is doing a very simple interleaving process assuming 2 lines per record, which would be OK for short read FASTA files with no line wrapping, but it does absolutely no error checking - thus it mangled your data without warning.

              If you look at the actual Velvet repository, it has some more clearly labelled Perl scripts, with a version for FASTA and another for FASTQ:
              Short read de novo assembler using de Bruijn graphs, as published in: D.R. Zerbino and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 1...


              (They still need a bit of documentation, and in my personal view, error handling)

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 07-02-2026, 11:08 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              13 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              54 views
              0 reactions
              Last Post SEQadmin2  
              Working...