Just in time for the holidays, we are pleased to announce our new
release! (with uploaded URLs)
We are especially excited to be making RTG Core available
for non-commercial academic research under improved terms
(see LICENCE) in response to the feedback we have been receiving.
The main highlights are:
* Free for non-commercial academic research
* Unlimited duration (no license key file required)
* Source code available on github at:
https://github.com/RealTimeGenomics/rtg-core
* The non-commercial release is available now via the following links:
rtg-core-3.4-non-commercial-linux-x64.zip (60.0 MB)
rtg-core-3.4-non-commercial-nojre.zip (13.4 MB)
rtg-core-3.4-non-commercial-windows-x64.zip (54.1 MB)
If you have any problems or questions, you can contact us at
[email protected] and we'll do our best to help you out.
If you require a license for commercial use, or wish to purchase
commercial support, contact us via [email protected].
Below are the release notes for RTG Core 3.4. We aim
to produce an updated release of RTG Tools but couldn't fit it in just
yet -- look for that in the new year.
=== Release Notes for RTG Core 3.4 ===
Below are the release notes for RTG Core, upon which RTG Core 3.4
is built. Not all features described below may be included in this
product.
RTG Core 3.4 (2014-12-20)
-------------------------
Major features of this release:
* Added the ability to run variant calling only on a list of regions
provided via BED file. This results in a large speed improvement
when performing exome variant calling, by avoiding computation
associated with off-target locations, as well as permitting fast
variant calling of target sites from whole genome data, or running
variant calling in haploid mode in areas of loss-of-heterozygosity.
* Added the ability to perform variant calling for sites where the
reference is unknown but where reads have been mapped. This can be
used to fill in gaps in draft reference assemblies. This includes
both sites where an N is observed in the reference, larger N-blocks
where reads have been mapped spanning the N block, and large
N-blocks where reads are anchored on one side by known reference.
* Workflow improvements to human pipeline processing to identify
mislabelled samples or incorrect pedigree. At the end of read
mapping, average coverage levels across chromosomes are examined and
a warning is issued if there appear to be gross chromosomal
abnormalities or if the coverage levels do not match expected levels
for the sex of the individual specified. A standalone tool for this
is also provided. Similarly, the mendelian analysis tool now
computes concordance with pedigree and issues a warning if low
concordance indicates a parent or child is inconsistent with the
supplied pedigree. In addition we have added two commands for
manipulating, extracting information from, and summarizing pedigree
files.
* New commands for metagenomics taxonomy and reference database
management. Previously using metagenomics databases other than those
pre-built by RTG was difficult and error-prone. Three commands have
been added to allow taxonomy construction starting from a NCBI
taxonomy dump, filtering the taxonomy based on user criteria, and
validating the structure of a metagenomics species reference
database.
Detailed changes are listed below by area. Please read these through
fully, as some command-line flags have changed, so updates to your
pipeline scripts may be required. For more information on new
features, see the RTG Operations Manual.
== Basic Formatting and Mapping
* map/cgmap/mapf: As an alternative to supplying --sex to specify the
sex of the individual being mapped, you may specify a pedigree file
containing the sex information for the sample. This requires you to
have either formatted the read set with read-group information or to
supply read group information at mapping time (the advantage of this
feature is that it lets you minimize the number of command-line
differences for each sample being mapped).
* map/cgmap: When mapping using a reference containing sex chromosome
information, average per-chromosome coverage information is used to
issue warnings when it is likely that the incorrect mapping sex has
been specified or if any autosomes have abnormal coverage levels
(perhaps indicating a chromosomal abnormality). This feature
requires you to be using a reference genome SDF containing chromosome
information, as described in the RTG Operations Manual.
* chrstats: New command to perform standalone average coverage
reporting and checking against expected coverage levels from
calibrated mapping files. This is essentially the same check that is
performed during mapping, but allows multiple mapping files to be
provided (either if multiple mapping runs were performed for a
single sample, or for batch reporting for multiple samples).
* calibrate: New option --merge to allow merging multiple alignment
files into a single output file while performing calibration. For
example, this can reduce the number of I/O operations needed to go
from multiple, uncalibrated, unindexed third party input files to a
single calibrated indexed BAM file.
* calibrate: New option --threads to allow calibration of multiple files to
use multiple cores. (Currently this option only takes effect when
used with the --merge option, not regular multi-file calibration)
=== Variant Calling
* snp/family/population/somatic: New flag --bed-regions, adds the
ability to only perform calling on the regions specified via a BED
file. This is more efficient than applying BED filtering via
--filter-bed. However note that the results can sometimes differ,
due to edge effects of complex calling regions that cross region
boundaries.
* snp/family/population/somatic: Implemented variant calling across
N's in the reference. (This was previously occurring in some cases
where mappings across the N contain indels, but has now been fully
implemented). Calls where the reference is not a valid allele due to
containing an N are annotated with an NREF INFO tag for easy
filtering, and neither contain QUAL or GL values.
* snp: As an alternative to supplying --sex to specify the sex of the
individual for variant calling, you may specify a pedigree file
containing the sex information for the sample. This can reduce the
number of command-line differences when processing multiple samples.
* family/population/somatic: Better error handling when input mappings
contain a record that does not correspond to one of the samples
being called.
* snp/family/population/somatic: Fixed a hang that could occur when
trying to clean up after an out-of-memory error.
* snp/family/population/somatic: Fixed a rare crash that could occur
at the end of chromosomes.
* somatic: Previously stored a somatic score indicating the likelihood
of the variant being a somatic variant in the QUAL field. This is
not strictly according to the VCF spec, so this score has been moved
to the new NCS INFO field.
* vcfannotate: The --fill-ac-an flag now does not add an AC annotation
when no ALTs are present in a record.
* vcffilter: New flag --region to extract and filter only the variants
contained within a single specified region.
* vcffilter: New flag --bed-regions to extract and filter only
variants contained within the regions contained in a BED file.
* vcffilter: Better error handling when applying criteria that require
GT be present to files that are missing the GT field.
* vcfmerge: The default behaviour has changed when merging variants at
the same position where the ALTs are different and the variants
contain FORMAT fields that cannot be automatically be merged
(Number=A,G,R, or the special case of the AD FORMAT field). Now
these FORMAT fields are removed to allow the merge to proceed. There
is a new flag --preserve-formats to instead output separate variants
that keep those FORMAT fields.
* vcfeval: New flag --baseline-tp that allows additionally outputing
the baseline version of true positive variants (the regular tp.vcf
contains the called representation of true positive variants).
* vcfeval: --squash-ploidy treats heterozygous calls in baseline and
calls as homozygous ALT to allow a lenient comparison. Note that
genotypes at multi-allelic sites where neither allele is REF simply
choose the ALT with the highest index.
* vcfeval: Fixed an exception that could occur when processing variant
missing GT information for some samples.
* vcfeval: Fixed an exception that could occur when provided variants
that were outside the bounds of the supplied reference genome
* vcfeval: Fixed an inconsistency when handling ROC files in locales
where ',' is the decimal separator.
* mendelian: The default is now to perform checks only on non-failing
variants. The --pass flag has been removed, and a new flag added
--all-records in order to obtain the behaviour of checking all
variant records regardless of filters.
* mendelian: Now performs concordance checking to detect sample
mislabelling and incorrect pedigree.
* mendelian: Removed --male and --female flag, which were only needed
for VCFs produced by versions of RTG prior to 2.7. If required,
alternative pedigree information can be supplied via the --pedigree
flag.
=== Metagenomics
* ncbi2tax: New tool to generate an RTG taxonomy file from NCBI
taxonomy dump.
* taxfilter: New tool for the custom filtering of taxonomy files and
metagenomic reference SDFs containing taxonomy information.
* taxstats: New tool for verifying the contents of a metagenomic
reference SDF.
=== Other
* sdfsubseq: The output sequence name is the same as the input
sequence if the coordinates are unchanged.
* many: Added the ability to read BED from stdin by specifying '-' as
the BED file name (this is not supported in cases where a region
restriction is also being applied to the file, as this would require
the BED to be tabix indexed)
* many: Added the ability to read VCF from stdin by specifying '-' as
the VCF file name (not supported in cases where a region restriction
is also being applied to the file, as this would require the VCF to
be tabix indexed)
* many: Users of linux bash can enable command and flag
completion. See the file rtg-bash-completion in the scripts
directory for more information.
* bgzip: New flag --no-terminate allows the omission the block gzip
termination block. This permits advanced users to compress multiple
files for later fast concatenation (the termination block should be
present on the final file only).
* bgzip: New flag --compression-level allows altering the degree of
compression (thus speed) from 1 (least but fast) to 9 (best but
slow).
* rocplot: GUI mode has better error handling when there is no
graphical environment.
* rocplot: PNG output mode will attempt to use headless mode to
prevent an error when the graphical environment is unavailable.
* popsim: Speed improvements.
* readsim/cgsim: Added the --sam-rg flag to set the read group
information to be stored in the output SDF. Removed --diploid-input
as the recommended way to simulate diploid genomes is to use
samplereplay or the --output-sdf option of
samplesim/childsim/denovosim.
* readsimeval: New command for evaluating the accuracy of mapping reads
generated by readsim.
* pedfilter: New command for pedigree file filtering and simple
manipulation and conversion between pedigree PED files and
pedigree-augmented VCF headers.
* pedstats: New command for extracting information and summarizing
information contained in a pedigree file.
* aview: The flag --dont-display-dots has been renamed to
--no-dots for consistency.
release! (with uploaded URLs)
We are especially excited to be making RTG Core available
for non-commercial academic research under improved terms
(see LICENCE) in response to the feedback we have been receiving.
The main highlights are:
* Free for non-commercial academic research
* Unlimited duration (no license key file required)
* Source code available on github at:
https://github.com/RealTimeGenomics/rtg-core
* The non-commercial release is available now via the following links:
rtg-core-3.4-non-commercial-linux-x64.zip (60.0 MB)
rtg-core-3.4-non-commercial-nojre.zip (13.4 MB)
rtg-core-3.4-non-commercial-windows-x64.zip (54.1 MB)
If you have any problems or questions, you can contact us at
[email protected] and we'll do our best to help you out.
If you require a license for commercial use, or wish to purchase
commercial support, contact us via [email protected].
Below are the release notes for RTG Core 3.4. We aim
to produce an updated release of RTG Tools but couldn't fit it in just
yet -- look for that in the new year.
=== Release Notes for RTG Core 3.4 ===
Below are the release notes for RTG Core, upon which RTG Core 3.4
is built. Not all features described below may be included in this
product.
RTG Core 3.4 (2014-12-20)
-------------------------
Major features of this release:
* Added the ability to run variant calling only on a list of regions
provided via BED file. This results in a large speed improvement
when performing exome variant calling, by avoiding computation
associated with off-target locations, as well as permitting fast
variant calling of target sites from whole genome data, or running
variant calling in haploid mode in areas of loss-of-heterozygosity.
* Added the ability to perform variant calling for sites where the
reference is unknown but where reads have been mapped. This can be
used to fill in gaps in draft reference assemblies. This includes
both sites where an N is observed in the reference, larger N-blocks
where reads have been mapped spanning the N block, and large
N-blocks where reads are anchored on one side by known reference.
* Workflow improvements to human pipeline processing to identify
mislabelled samples or incorrect pedigree. At the end of read
mapping, average coverage levels across chromosomes are examined and
a warning is issued if there appear to be gross chromosomal
abnormalities or if the coverage levels do not match expected levels
for the sex of the individual specified. A standalone tool for this
is also provided. Similarly, the mendelian analysis tool now
computes concordance with pedigree and issues a warning if low
concordance indicates a parent or child is inconsistent with the
supplied pedigree. In addition we have added two commands for
manipulating, extracting information from, and summarizing pedigree
files.
* New commands for metagenomics taxonomy and reference database
management. Previously using metagenomics databases other than those
pre-built by RTG was difficult and error-prone. Three commands have
been added to allow taxonomy construction starting from a NCBI
taxonomy dump, filtering the taxonomy based on user criteria, and
validating the structure of a metagenomics species reference
database.
Detailed changes are listed below by area. Please read these through
fully, as some command-line flags have changed, so updates to your
pipeline scripts may be required. For more information on new
features, see the RTG Operations Manual.
== Basic Formatting and Mapping
* map/cgmap/mapf: As an alternative to supplying --sex to specify the
sex of the individual being mapped, you may specify a pedigree file
containing the sex information for the sample. This requires you to
have either formatted the read set with read-group information or to
supply read group information at mapping time (the advantage of this
feature is that it lets you minimize the number of command-line
differences for each sample being mapped).
* map/cgmap: When mapping using a reference containing sex chromosome
information, average per-chromosome coverage information is used to
issue warnings when it is likely that the incorrect mapping sex has
been specified or if any autosomes have abnormal coverage levels
(perhaps indicating a chromosomal abnormality). This feature
requires you to be using a reference genome SDF containing chromosome
information, as described in the RTG Operations Manual.
* chrstats: New command to perform standalone average coverage
reporting and checking against expected coverage levels from
calibrated mapping files. This is essentially the same check that is
performed during mapping, but allows multiple mapping files to be
provided (either if multiple mapping runs were performed for a
single sample, or for batch reporting for multiple samples).
* calibrate: New option --merge to allow merging multiple alignment
files into a single output file while performing calibration. For
example, this can reduce the number of I/O operations needed to go
from multiple, uncalibrated, unindexed third party input files to a
single calibrated indexed BAM file.
* calibrate: New option --threads to allow calibration of multiple files to
use multiple cores. (Currently this option only takes effect when
used with the --merge option, not regular multi-file calibration)
=== Variant Calling
* snp/family/population/somatic: New flag --bed-regions, adds the
ability to only perform calling on the regions specified via a BED
file. This is more efficient than applying BED filtering via
--filter-bed. However note that the results can sometimes differ,
due to edge effects of complex calling regions that cross region
boundaries.
* snp/family/population/somatic: Implemented variant calling across
N's in the reference. (This was previously occurring in some cases
where mappings across the N contain indels, but has now been fully
implemented). Calls where the reference is not a valid allele due to
containing an N are annotated with an NREF INFO tag for easy
filtering, and neither contain QUAL or GL values.
* snp: As an alternative to supplying --sex to specify the sex of the
individual for variant calling, you may specify a pedigree file
containing the sex information for the sample. This can reduce the
number of command-line differences when processing multiple samples.
* family/population/somatic: Better error handling when input mappings
contain a record that does not correspond to one of the samples
being called.
* snp/family/population/somatic: Fixed a hang that could occur when
trying to clean up after an out-of-memory error.
* snp/family/population/somatic: Fixed a rare crash that could occur
at the end of chromosomes.
* somatic: Previously stored a somatic score indicating the likelihood
of the variant being a somatic variant in the QUAL field. This is
not strictly according to the VCF spec, so this score has been moved
to the new NCS INFO field.
* vcfannotate: The --fill-ac-an flag now does not add an AC annotation
when no ALTs are present in a record.
* vcffilter: New flag --region to extract and filter only the variants
contained within a single specified region.
* vcffilter: New flag --bed-regions to extract and filter only
variants contained within the regions contained in a BED file.
* vcffilter: Better error handling when applying criteria that require
GT be present to files that are missing the GT field.
* vcfmerge: The default behaviour has changed when merging variants at
the same position where the ALTs are different and the variants
contain FORMAT fields that cannot be automatically be merged
(Number=A,G,R, or the special case of the AD FORMAT field). Now
these FORMAT fields are removed to allow the merge to proceed. There
is a new flag --preserve-formats to instead output separate variants
that keep those FORMAT fields.
* vcfeval: New flag --baseline-tp that allows additionally outputing
the baseline version of true positive variants (the regular tp.vcf
contains the called representation of true positive variants).
* vcfeval: --squash-ploidy treats heterozygous calls in baseline and
calls as homozygous ALT to allow a lenient comparison. Note that
genotypes at multi-allelic sites where neither allele is REF simply
choose the ALT with the highest index.
* vcfeval: Fixed an exception that could occur when processing variant
missing GT information for some samples.
* vcfeval: Fixed an exception that could occur when provided variants
that were outside the bounds of the supplied reference genome
* vcfeval: Fixed an inconsistency when handling ROC files in locales
where ',' is the decimal separator.
* mendelian: The default is now to perform checks only on non-failing
variants. The --pass flag has been removed, and a new flag added
--all-records in order to obtain the behaviour of checking all
variant records regardless of filters.
* mendelian: Now performs concordance checking to detect sample
mislabelling and incorrect pedigree.
* mendelian: Removed --male and --female flag, which were only needed
for VCFs produced by versions of RTG prior to 2.7. If required,
alternative pedigree information can be supplied via the --pedigree
flag.
=== Metagenomics
* ncbi2tax: New tool to generate an RTG taxonomy file from NCBI
taxonomy dump.
* taxfilter: New tool for the custom filtering of taxonomy files and
metagenomic reference SDFs containing taxonomy information.
* taxstats: New tool for verifying the contents of a metagenomic
reference SDF.
=== Other
* sdfsubseq: The output sequence name is the same as the input
sequence if the coordinates are unchanged.
* many: Added the ability to read BED from stdin by specifying '-' as
the BED file name (this is not supported in cases where a region
restriction is also being applied to the file, as this would require
the BED to be tabix indexed)
* many: Added the ability to read VCF from stdin by specifying '-' as
the VCF file name (not supported in cases where a region restriction
is also being applied to the file, as this would require the VCF to
be tabix indexed)
* many: Users of linux bash can enable command and flag
completion. See the file rtg-bash-completion in the scripts
directory for more information.
* bgzip: New flag --no-terminate allows the omission the block gzip
termination block. This permits advanced users to compress multiple
files for later fast concatenation (the termination block should be
present on the final file only).
* bgzip: New flag --compression-level allows altering the degree of
compression (thus speed) from 1 (least but fast) to 9 (best but
slow).
* rocplot: GUI mode has better error handling when there is no
graphical environment.
* rocplot: PNG output mode will attempt to use headless mode to
prevent an error when the graphical environment is unavailable.
* popsim: Speed improvements.
* readsim/cgsim: Added the --sam-rg flag to set the read group
information to be stored in the output SDF. Removed --diploid-input
as the recommended way to simulate diploid genomes is to use
samplereplay or the --output-sdf option of
samplesim/childsim/denovosim.
* readsimeval: New command for evaluating the accuracy of mapping reads
generated by readsim.
* pedfilter: New command for pedigree file filtering and simple
manipulation and conversion between pedigree PED files and
pedigree-augmented VCF headers.
* pedstats: New command for extracting information and summarizing
information contained in a pedigree file.
* aview: The flag --dont-display-dots has been renamed to
--no-dots for consistency.
Comment