wfl.select package#
Submodules#
wfl.select.by_descriptor module#
- wfl.select.by_descriptor.CUR(mat, num, stochastic=True, rng=None, exclude_list=None)#
Compute selection by CUR of descriptors with dot-product, with optional exponentiation
- Parameters
mat (np.array(vec_len, n_vecs) or (n_vecs, n_vecs)) – rectangular array of descriptors as column vectors or square kernel matrix
num (int) – number to select
stochastic (bool, default True) – use stochastic selection algorithm
rng (numpy.random.Generator) – random number generator, required if stochastic
exclude_list (list(int), default None) – list of descriptor indices to exclude
- Returns
selected_inds – list of indices for selected descriptors
- Return type
list(int)
- wfl.select.by_descriptor.CUR_conf_global(inputs, outputs, num, at_descs=None, at_descs_info_key=None, kernel_exp=None, stochastic=True, rng=None, keep_descriptor_info=True, exclude_list=None, center=True, leverage_score_key=None)#
Select atoms from a list or iterable using CUR on global (per-config) descriptors
- Parameters
inputs (ConfigSet) – atomic configs to select from
outputs (OutputSpec) – where to write output to
num (int) – number to select
rng (int, default None) – random number generator
at_descs (np.array(n_descs, desc_len), mutually exclusive with at_descs_info_key) – list of descriptor vectors
at_descs_info_key (str, mutually exclusive with at_descs) – key to Atoms.info dict containing per-config descriptor vector
kernel_exp (float, default None) – exponent to compute kernel (if other than 1)
stochastic (bool, default True) – use stochastic selection
keep_descriptor_info (bool, default True) – do not delete descriptor from info
exclude_list (iterable(Atoms)) – list of Atoms to exclude from CUR selection. Needs to be _exactly_ the same as actual Atoms objects in inputs, to full machine precision
center (bool, default True) – center data before doing SVD, as generally required for PCA
leverage_score_key (str, default None) – if not None, info key to store leverage score in
- Return type
ConfigSet corresponding to selected configs output
- wfl.select.by_descriptor.do_svd(at_descs, num, do_vectors='vh')#
- wfl.select.by_descriptor.greedy_fps_conf_global(inputs, outputs, num, at_descs=None, at_descs_info_key=None, keep_descriptor_info=True, exclude_list=None, prev_selected_descs=None, O_N_sq=False, rng=None, verbose=False)#
Select atoms from a list or iterable using greedy farthest point selection on global (per-config) descriptors
- Parameters
inputs (ConfigSet) – atomic configs to select from
outputs (OutputSpec) – where to write output to
num (int) – number to select
at_descs (np.array(n_descs, desc_len), mutually exclusive with at_descs_info_key) – list of descriptor vectors
at_descs_info_key (str, mutually exclusive with at_descs) – key to Atoms.info dict containing per-config descriptor vector
keep_descriptor_info (bool, default True) – do not delete descriptor from info
exclude_list (iterable(Atoms)) – list of Atoms to exclude from selection by descriptor
prev_selected_descs (np.array(n_prev_descs, desc_len), default False) – if present, list of previously selected descriptors to also be farthest from
O_N_sq (bool, default False) – use O(N^2) algorithm with smaller prefactor
rng (numpy.random.Generator) – random number generator
verbose (bool, default False) – more verbose output
- Returns
selected_configs – corresponding to selected configs output
- Return type
- wfl.select.by_descriptor.prep_descs_and_exclude(inputs, at_descs, at_descs_info_key, exclude_list)#
process configs and/or input descriptor row array and exclude list to produce descriptors column array and indices of excluded configuration
- Parameters
inputs (ConfigSet) – input configurations
at_descs (np.ndarray( n_configs x desc_len ), default None) – if not None, array of descriptors (as rows) for each config mutually exclusive with at_descs_info_key, one is required
at_descs_info_key (str, default None) – key into Atoms.info dict for descriptor vector of each config mutually exclusive with at_descs, one is required
exclude_list (iterable(Atoms)) – list of Atoms structures to be excluded from selection, to be converted into indices into enumerate(inputs)
- Returns
at_descs_cols (np.ndarray (desc_len x n_configs)) – array of descriptors (as columns) for each config
exclude_ind_list (list(int)) – indices of configurations to exclude
- wfl.select.by_descriptor.write_selected_and_clean(inputs, outputs, selected, at_descs_info_key=None, keep_descriptor_info=True)#
Writes selected (by index) configs to output configset
- Parameters
inputs (ConfigSet) – input configuration set
outputs (OutputSpec) – target for output of selected configurations
selected (list(int)) – list of indices to be selected, cannot have duplicates
at_descs_info_key (str, default None) – key in info dict to delete if keep_descriptor_info is False
keep_descriptor_info (bool, default True) – keep descriptor in info dict
wfl.select.convex_hull module#
- wfl.select.convex_hull.select(inputs, outputs, info_field, Zs=None, verbose=False)#
wfl.select.flat_histogram module#
- wfl.select.flat_histogram.biased_select_conf(inputs, outputs, num, info_field, rng, kT=-1.0, bins='auto', by_bin=True, replace=False, verbose=False)#
select configurations by Boltzmann biased flat histogram on some quantity in Atoms.info
- Parameters
inputs (ConfigSet) – input configurations
output (OutputSpec) – output configurations
num (int) – number of configs to select
info_field (string) – Atoms.info key for quantity by which to do flat histogram and Boltzmann bias
rng (np.random.Generator) – random number generator
kT (float, default -1) – Boltzmann bias temperature, <= 0 to not bias [kT] should have the same unit as the “info_field” parameter
bins (np.histogram bins argument, default 'auto') – argument to pass to np.histogram
by_bin (bool, default True) – do selections by bin, which is more accurate, but works badly for small kT and does not allow for selection with replacement
replace (bool, default False) – do selection with replacement (i.e. repeat configs)
verbose (bool, default False) – verbose output
- Return type
ConfigSet containing output configs
wfl.select.selection_space module#
- wfl.select.selection_space.compare_manual_minima(i, j, positions, nn_minima)#
compare nearby minima (presumably from efficient np.reduceat) to manual enumeration
- wfl.select.selection_space.minima_among_neighbors(positions, ranges, values, cartesian_distance=True)#
find, for each config, lowest value config that’s within some distance cutoffs (in some feature space, typically composition and volume)
- Parameters
positions (float array(Nsamples, Nfeatures)) – array of positions in feature space
values (float array(Nsamples)) – values for each sample
cartesian_distance (bool, default True) – do Cartesian distance in feature space, otherwise require that max(dist) in each feature dimension is < range of that dimension (Chebychev distance?)
- Returns
minima – value of nearby minimum for each sample
- Return type
float array(Nsamples)
- wfl.select.selection_space.val_relative_to_nearby_composition_volume_min(inputs, outputs, vol_range, compos_range, info_field_in, info_field_out, Zs=None, per_atom=True)#
compute difference between some value to corresponding values for configurations that are nearby in compositions/volume space
- Parameters
inputs (ConfigSet) – input configurations
outputs (OutputSpec) – corresponding place for output configs
vol_range (float) – cutoff range for “nearby” in cell volume/atom [we should define what to do about nonperiodic systems]
compos_range (float) – cutoff range for “nearby” in composition (fractional, i.e. 0.0-1.0)
info_field_in (str) – Atoms.info field containing quantity to be subtracted relative to “nearby” configs
info_field_out (str) – Atoms.info field to store value differences in
Zs (list(int), default None) – Zs that defined the composition space, if None get from inputs
per_atom (bool, default True) – apply calculations to per-atom quantities, i.e. Atoms.info[info_field_in] / len(atoms)
- Returns
ConfigSet pointing to configurations with the saved relative value field
- Return type
wfl.select.simple module#
- wfl.select.simple.by_bool_func(*args, **kwargs)#
apply a filter to a sequence of configs
- Parameters
inputs (iterable(Atoms)) – input quantities of type Atoms
outputs (OutputSpec or None) – where to write output atomic configs, or None for no output (i.e. only side-effects)
at_filter (callable) – callable that takes an Atoms and returns a bool indicating if it should be selected
autopara_info (AutoParaInfo / dict, optional) – information for automatic parallelization
- Returns
co – output configs
- Return type
- wfl.select.simple.by_index(inputs, outputs, indices)#
select atoms from configs by index
- Parameters
inputs (ConfigSet) – source configurations
outputs (OutputSpec) – output configurations
indices (list(int)) – Indices to be selected. Values outside 0..len(inputs)-1 will be ignored. Repeated values will lead to multiple copies of configuration
- Return type
ConfigSet pointing to selected configurations
Notes
This routine depends on details of ConfigSet and OutputSpec, so perhaps belongs as a use case of autoparallelize, but since it can return multiple outputs for a single input, this cannot be done right now