Command line options of the teach_sparse main program¶
Main options¶

quippy.
teach_sparse_parse_command_line
(*args, **kwargs)¶ This subroutine parses the main command line options.
args_str options
Name Type Default Comments at_file str //MANDATORY// XYZ file with teaching configurations config_type_parameter_name str config_type Identifier of property determining the type of input data in the at_file config_type_sigma str None What sigma values to choose for each type of data. Format: {type:energy:force:virial:hessian} core_ip_args str None QUIP init string for a potential to subtract from data (and added back after prediction) core_param_file str quip_params.xml QUIP XML file for a potential to subtract from data (and added back after prediction) default_sigma float //MANDATORY// Error in [energies forces virials hessians] do_copy_at_file bool True Do copy the at_file into the GAP XML file (should be set to False for NetCDF input). do_ip_timing bool False To enable or not timing of the interatomic potential. e0_method str isolated Method to determine e0, if not explicitly specified. Possible options: isolated (default, each atom present in the XYZ needs to have an isolated representative, with a valid energy), average (e0 is the average of all total energies across the XYZ) e0_offset float 0.0 Offset of baseline. If zero, the offset is the average atomic energy of the input data or the e0 specified manually. energy_parameter_name str energy Name of energy property in the at_file that describes the data force_parameter_name str force Name of force property in the at_file that describes the data gap str //MANDATORY// Initialisation string for GAPs gp_file str gp_new.xml Output XML file hessian_delta float 1.0e2 Delta to use in numerical differentiation when obtaining second derivative for the Hessian covariance hessian_parameter_name str hessian Name of hessian property in the at_file that describes the data local_property_parameter_name str local_property Name of local_property in the at_file that describes the data rnd_seed int 1 Random seed. sigma_parameter_name str sigma Sigma parameters (error hyper) for a given configuration in the database. Overrides the command line sigmas. In the XYZ, it must be prepended by energy_, force_, virial_ or hessian_ sigma_per_atom bool True Interpretation of the energy and virial sigmas specified in >>default_sigma<< and >>config_type_sigma<<. If >>T<<, they are interpreted as peratom errors, and the variance will be scaled according to the number of atoms in the configuration. If >>F<< they are treated as absolute errors and no scaling is performed. NOTE: sigmas specified on a perconfiguration basis (see >>sigma_parameter_name<<) are always absolute. sparse_jitter float 1.0e10 Intrisic error of atomic/bond energy, used to regularise the sparse covariance matrix sparse_separate_file bool True Save sparse coordinates data in separate file sparse_use_actual_gpcov bool False Use actual GP covariance for sparsification methods sparsify_only_no_fit bool False If true, sparsification is done, but no fitting. print the sparse index by adding print_sparse_index=file.dat to the descriptor string. template_file str template.xyz Template XYZ file for initialising object verbosity str NORMAL Verbosity control. Options: NORMAL, VERBOSE, NERD, ANAL. virial_parameter_name str virial Name of virial property in the at_file that describes the data References
Routine is wrapper around Fortran routine
teach_sparse_parse_command_line
defined in file src/GAPfiller/teach_sparse_module.f95.
GAP options¶

quippy.
teach_sparse_parse_gap_str
(*args, **kwargs)¶ This subroutine parses the options given in the gap string, for each GAP.
args_str options
Name Type Default Comments add_species bool False Create speciesspecific descriptor, using the descriptor string as a template. config_type_n_sparse str None Number of sparse points in each config type. Format: {type1:50:type2:100} covariance_type str //MANDATORY// Type of covariance function to use. Available: ARD_SE, DOT_PRODUCT, BOND_REAL_SPACE, PP (piecewise polynomial) delta float //MANDATORY// Set the standard deviation of the Gaussian process. Typically this would be set to the standard deviation (i.e. root mean square) of the function that is approximated with the Gaussian process. f0 float 0.0 Set the mean of the Gaussian process. Defaults to 0. mark_sparse_atoms bool False Reprints the original xyz file after sparsification process. sparse propery added, true for atoms associated with a sparse point. n_sparse int 0 Number of sparse points to use in the sparsification of the Gaussian process print_sparse_index str None If given, after determinining the sparse points, their 1based indices are appended to this file sparse_method str RANDOM Sparsification method. RANDOM(default), PIVOT, CLUSTER, UNIFORM, KMEANS, COVARIANCE, NONE, FUZZY, FILE, INDEX_FILE, CUR_COVARIANCE, CUR_POINTS theta_fac str 1.0 Set the width of Gaussians for the ARD_SE and PP kernel by multiplying the range of each descriptor by theta_fac. Can be a single number or different for each dimension. For multiple theta_fac separate each value by whitespaces. theta_file str None Set the width of Gaussians for the ARD_SE kernel from a file. There should be as many real numbers as the number of dimensions, in a single line theta_uniform float 0.0 Set the width of Gaussians for the ARD_SE and PP kernel, same in each dimension. unique_descriptor_tolerance float 1.0e10 Descriptor tolerance when filtering out duplicate data points unique_hash_tolerance float 1.0e10 Hash tolerance when filtering out duplicate data points zeta float 1.0 Exponent of soap type dot product covariance kernel References
Routine is wrapper around Fortran routine
teach_sparse_parse_gap_str
defined in file src/GAPfiller/teach_sparse_module.f95.
 sparse_method options are:
 RANDOM: default, chooses n_sparse random datapoints
 PIVOT: based on the full covariance matrix finds the n_sparse “pivoting” points
 CLUSTER: based on the full covariance matrix performs a kmedoid clustering into n_sparse clusters, returning the medoids
 UNIFORM: makes a histogram of the data based on n_sparse and returns a data point from each bin
 KMEANS: kmeans clustering based on the data points
 COVARIANCE: greedy data point selection based on the sparse covariance matrix, to minimise the GP variance of all datapoints
 UNIQ: selects unique datapoints from the dataset
 FUZZY: fuzzy kmeans clustering
 FILE: reads sparse points from a file
 INDEX_FILE: reads indices of sparse points from a file
 CUR_COVARIANCE: CUR, based on the full covariance matrix
 CUR_POINTS: CUR, based on the datapoints