Goby is a Java framework and set of tools for storing and analyzing next-gen data. We have just released version 1.8 of the framework. Significant changes include:
For end-users:
- New mode discover-sequence-variants will either (i) identify sequence variants within a group of sample or (ii) identify variants whose frequency is significantly enriched in one of two groups. This mode requires sorted/indexed alignments as input (such as provided by GobyWeb). See the online tutorial for details.
- SamToCompact mode now populates the read quality scores for sequence variations (toQuality field).
- SGE helper scripts bz2compact.sh and keep-unique-reads.sh help process hundred of lanes in parallel on an SGE grid. bz2compact extracts fastq files compressed with BZip2 and converts them to compact-reads format. keep-unique-reads.sh determines the set of reads that are unique in each input .compact-reads and writes this information to a .uniqset-keep.filter [available only in source release]
- Mode concatenate-compact-reads now supports read index filters. This makes it possible to concatenate and keep only reads that are unique within each file.
- In the mode “alignment-to-annotation-counts” the “–eval” options supports
a new value “counts” which will output a format specifically designed
for use with R’s DESeq and notably for the R script geneDESeqAnalysis.R
which is used with GobyWeb.
- Fix bug in extract sequence variations for SAM format, where matches on the reverse strand got a read-index larger than one from the correct value.
- Fixed a bug that prevented Goby tools from opening large alignment files (>3Gb).
For developers:
- Draft helper to iterate through individual reference positions of a sorted set of alignments (see IterateSortedAlignments).
- We now distribute a subset of Goby as the Goby IO API. This subset is packaged in the goby-io.jar file and released under the LGPL3 license. This was done to make it possible to include Goby format input output code directly into other software licensed under the LGPL3 (e.g., IGV).
- Update picard/samtools to version 1.25.
- Python API now can read gzip compressed TooManyHits files.
A complete list of changes is available in the distribution. See the download page.
For end-users:
- New mode discover-sequence-variants will either (i) identify sequence variants within a group of sample or (ii) identify variants whose frequency is significantly enriched in one of two groups. This mode requires sorted/indexed alignments as input (such as provided by GobyWeb). See the online tutorial for details.
- SamToCompact mode now populates the read quality scores for sequence variations (toQuality field).
- SGE helper scripts bz2compact.sh and keep-unique-reads.sh help process hundred of lanes in parallel on an SGE grid. bz2compact extracts fastq files compressed with BZip2 and converts them to compact-reads format. keep-unique-reads.sh determines the set of reads that are unique in each input .compact-reads and writes this information to a .uniqset-keep.filter [available only in source release]
- Mode concatenate-compact-reads now supports read index filters. This makes it possible to concatenate and keep only reads that are unique within each file.
- In the mode “alignment-to-annotation-counts” the “–eval” options supports
a new value “counts” which will output a format specifically designed
for use with R’s DESeq and notably for the R script geneDESeqAnalysis.R
which is used with GobyWeb.
- Fix bug in extract sequence variations for SAM format, where matches on the reverse strand got a read-index larger than one from the correct value.
- Fixed a bug that prevented Goby tools from opening large alignment files (>3Gb).
For developers:
- Draft helper to iterate through individual reference positions of a sorted set of alignments (see IterateSortedAlignments).
- We now distribute a subset of Goby as the Goby IO API. This subset is packaged in the goby-io.jar file and released under the LGPL3 license. This was done to make it possible to include Goby format input output code directly into other software licensed under the LGPL3 (e.g., IGV).
- Update picard/samtools to version 1.25.
- Python API now can read gzip compressed TooManyHits files.
A complete list of changes is available in the distribution. See the download page.