This Python solution may be a bit late but it will work on both multi-line fasta and fastq files.
- ASCII quality scores in fastq files can contain arbitrary characters like '>' and '@'. The only way to protect against this for multi-line fastq files is to keep a record of the sequence length.
- This code reads the entire set of reads into memory, then outputs random sequences -- this will not work on fasta/fastq files that are larger than the memory limit. While this is the only way to do it for streamed input, the code here uses file names, so can do a two-pass operation and reopen a file (see my previous post).
Leave a comment: