The Biopieces are a collection of bioinformatics tools that can be pieced together in a very easy and flexible manner to perform both simple and complex tasks. The Biopieces work on a data stream in such a way that the data stream can be passed through several different Biopieces, each performing one specific task: modifying or adding records to the data stream, creating plots, or uploading data to databases and web services. The Biopieces are executed in a command line environment where the data stream is initialized by specific Biopieces which read data from files, databases, or web services, and output records to the data stream that is passed to downstream Biopieces until the data stream is terminated at the end of the analysis as outlined below:
read_data | calculate_something | write_results
The following example demonstrates how a next generation sequencing experiment can be cleaned and analyzed – including plotting of scores and length distribution, removal of adaptor sequence, trimming and filtering using quality scores, mapping to a specified genome, and uploading the data to the UCSC genome browser for further analysis:
Code:
read_fastq -i data.fq | # Initialize data stream from a FASTQ file. plot_scores -t png -o scores_unclean.png | # Plot scores before cleaning. find_adaptor -c 24 -a TCGTATGCCGTCTTC -p | # Locate adaptor - including partial adaptor. clip_adaptor | # Clip any located adaptor. trim_seq | # End trim sequences according to quality scores. grab -e 'SEQ_LEN > 18' # Filter short sequences. mean_scores -l | # Locate local quality score minima. grab -e 'SCORES_MEAN >= 15' | # Filter low local quality score minima. write_fastq -o data_clean.fq | # Write the cleaned data to a FASTQ file. plot_scores -t png -o scores_clean.png | # Plot scores after cleaning. plot_distribution -k SEQ_LEN -t png -o lengths.png | # Plot sequence length distribution. bowtie_seq -c 24 -g hg19 -m 2 | # Map sequences to the human genome with Bowtie. upload_to_ucsc –d hg19 –t my_data –x # Upload the results to the UCSC Genome Browser.
There are currently ~175 Biopieces.
EDIT
To make Biopieces more accessible an installer has been released here.
EDIT
Updated the example
Comment