Command line options of the teach_sparse main program

Main options

quippy.teach_sparse_parse_command_line(*args, **kwargs)

This subroutine parses the main command line options.

args_str options

Name Type Default Comments
at_file str //MANDATORY// XYZ file with teaching configurations
config_type_parameter_name str config_type Identifier of property determining the type of input data in the at_file
config_type_sigma str None What sigma values to choose for each type of data. Format: {type:energy:force:virial:hessian}
core_ip_args str None QUIP init string for a potential to subtract from data (and added back after prediction)
core_param_file str quip_params.xml QUIP XML file for a potential to subtract from data (and added back after prediction)
default_sigma float //MANDATORY// Error in [energies forces virials hessians]
do_copy_at_file bool True Do copy the at_file into the GAP XML file (should be set to False for NetCDF input).
do_e0_avg bool True Method of calculating e0 if not explicitly specified. If true, computes the average atomic energy in input data. If false, sets e0 to the lowest atomic energy in the input data.
do_ip_timing bool False To enable or not timing of the interatomic potential.
e0 str 0.0 Atomic energy value to be subtracted from energies before fitting (and added back on after prediction). Specifiy a single number (used for all species) or by species: {Ti:-150.0:O:-320}. energy = core + GAP + e0
e0_offset float 0.0 Offset of baseline. If zero, the offset is the average atomic energy of the input data or the e0 specified manually.
energy_parameter_name str energy Name of energy property in the at_file that describes the data
force_parameter_name str force Name of force property in the at_file that describes the data
gap str //MANDATORY// Initialisation string for GAPs
gp_file str gp_new.xml Output XML file
hessian_delta float 1.0e-2 Delta to use in numerical differentiation when obtaining second derivative for the Hessian covariance
hessian_parameter_name str hessian Name of hessian property in the at_file that describes the data
rnd_seed int -1 Random seed.
sigma_parameter_name str sigma Sigma parameters (error hyper) for a given configuration in the database. Overrides the command line sigmas. In the XYZ, it must be prepended by energy_, force_, virial_ or hessian_
sigma_per_atom bool True Interpretation of the energy and virial sigmas specified in >>default_sigma<< and >>config_type_sigma<<. If >>T<<, they are interpreted as per-atom errors, and the variance will be scaled according to the number of atoms in the configuration. If >>F<< they are treated as absolute errors and no scaling is performed. NOTE: sigmas specified on a per-configuration basis (see >>sigma_parameter_name<<) are always absolute.
sparse_jitter float 1.0e-10 Intrisic error of atomic/bond energy, used to regularise the sparse covariance matrix
sparse_separate_file bool True Save sparse coordinates data in separate file
sparse_use_actual_gpcov bool False Use actual GP covariance for sparsification methods
template_file str Template XYZ file for initialising object
verbosity str NORMAL Verbosity control. Options: NORMAL, VERBOSE, NERD, ANAL.
virial_parameter_name str virial Name of virial property in the at_file that describes the data


Routine is wrapper around Fortran routine teach_sparse_parse_command_line defined in file src/GAP-filler/teach_sparse_module.f95.

GAP options

quippy.teach_sparse_parse_gap_str(*args, **kwargs)

This subroutine parses the options given in the gap string, for each GAP.

args_str options

Name Type Default Comments
add_species bool False Create species-specific descriptor, using the descriptor string as a template.
config_type_n_sparse str None Number of sparse points in each config type. Format: {type1:50:type2:100}
covariance_type str //MANDATORY// Type of covariance function to use. Available: ARD_SE, DOT_PRODUCT, BOND_REAL_SPACE, PP (piecewise polynomial)
delta float //MANDATORY// Set the standard deviation of the Gaussian process. Typically this would be set to the standard deviation (i.e. root mean square) of the function that is approximated with the Gaussian process.
f0 float 0.0 Set the mean of the Gaussian process. Defaults to 0.
mark_sparse_atoms bool False Reprints the original xyz file after sparsification process. sparse propery added, true for atoms associated with a sparse point.
n_sparse int 0 Number of sparse points to use in the sparsification of the Gaussian process
print_sparse_index str None If given, after determinining the sparse points, their 1-based indices are appended to this file
sparse_file str None Sparse points from a file. Integers, in single line.
theta_fac str 1.0 Set the width of Gaussians for the ARD_SE and PP kernel by multiplying the range of each descriptor by theta_fac. Can be a single number or different for each dimension. For multiple theta_fac separate each value by whitespaces.
theta_file str None Set the width of Gaussians for the ARD_SE kernel from a file. There should be as many real numbers as the number of dimensions, in a single line
theta_uniform float 0.0 Set the width of Gaussians for the ARD_SE and PP kernel, same in each dimension.
zeta float 1.0 Exponent of soap type dot product covariance kernel


Routine is wrapper around Fortran routine teach_sparse_parse_gap_str defined in file src/GAP-filler/teach_sparse_module.f95.

sparse_method options are:
  • RANDOM: default, chooses n_sparse random datapoints
  • PIVOT: based on the full covariance matrix finds the n_sparse “pivoting” points
  • CLUSTER: based on the full covariance matrix performs a k-medoid clustering into n_sparse clusters, returning the medoids
  • UNIFORM: makes a histogram of the data based on n_sparse and returns a data point from each bin
  • KMEANS: k-means clustering based on the data points
  • COVARIANCE: greedy data point selection based on the sparse covariance matrix, to minimise the GP variance of all datapoints
  • UNIQ: selects unique datapoints from the dataset
  • FUZZY: fuzzy k-means clustering
  • FILE: reads sparse points from a file
  • INDEX_FILE: reads indices of sparse points from a file
  • CUR_COVARIANCE: CUR, based on the full covariance matrix
  • CUR_POINTS: CUR, based on the datapoints