wfl.select package#

Submodules#

wfl.select.by_descriptor module#

wfl.select.by_descriptor.CUR(mat, num, stochastic=True, rng=None, exclude_list=None)#

Compute selection by CUR of descriptors with dot-product, with optional exponentiation

Parameters

mat (np.array(vec_len, n_vecs) or (n_vecs, n_vecs)) – rectangular array of descriptors as column vectors or square kernel matrix
num (int) – number to select
stochastic (bool, default True) – use stochastic selection algorithm
rng (numpy.random.Generator) – random number generator, required if stochastic
exclude_list (list(int), default None) – list of descriptor indices to exclude

Returns

selected_inds – list of indices for selected descriptors

Return type

list(int)

wfl.select.by_descriptor.CUR_conf_global(inputs, outputs, num, at_descs=None, at_descs_info_key=None, kernel_exp=None, stochastic=True, rng=None, keep_descriptor_info=True, exclude_list=None, center=True, leverage_score_key=None)#

Select atoms from a list or iterable using CUR on global (per-config) descriptors

Parameters

inputs (ConfigSet) – atomic configs to select from
outputs (OutputSpec) – where to write output to
num (int) – number to select
rng (int, default None) – random number generator
at_descs (np.array(n_descs, desc_len), mutually exclusive with at_descs_info_key) – list of descriptor vectors
at_descs_info_key (str, mutually exclusive with at_descs) – key to Atoms.info dict containing per-config descriptor vector
kernel_exp (float, default None) – exponent to compute kernel (if other than 1)
stochastic (bool, default True) – use stochastic selection
keep_descriptor_info (bool, default True) – do not delete descriptor from info
exclude_list (iterable(Atoms)) – list of Atoms to exclude from CUR selection. Needs to be _exactly_ the same as actual Atoms objects in inputs, to full machine precision
center (bool, default True) – center data before doing SVD, as generally required for PCA
leverage_score_key (str, default None) – if not None, info key to store leverage score in

Return type

ConfigSet corresponding to selected configs output

wfl.select.by_descriptor.do_svd(at_descs, num, do_vectors='vh')#

wfl.select.by_descriptor.greedy_fps_conf_global(inputs, outputs, num, at_descs=None, at_descs_info_key=None, keep_descriptor_info=True, exclude_list=None, prev_selected_descs=None, O_N_sq=False, rng=None, verbose=False)#

Select atoms from a list or iterable using greedy farthest point selection on global (per-config) descriptors

Parameters

inputs (ConfigSet) – atomic configs to select from
outputs (OutputSpec) – where to write output to
num (int) – number to select
at_descs (np.array(n_descs, desc_len), mutually exclusive with at_descs_info_key) – list of descriptor vectors
at_descs_info_key (str, mutually exclusive with at_descs) – key to Atoms.info dict containing per-config descriptor vector
keep_descriptor_info (bool, default True) – do not delete descriptor from info
exclude_list (iterable(Atoms)) – list of Atoms to exclude from selection by descriptor
prev_selected_descs (np.array(n_prev_descs, desc_len), default False) – if present, list of previously selected descriptors to also be farthest from
O_N_sq (bool, default False) – use O(N^2) algorithm with smaller prefactor
rng (numpy.random.Generator) – random number generator
verbose (bool, default False) – more verbose output

Returns

selected_configs – corresponding to selected configs output

Return type

ConfigSet

wfl.select.by_descriptor.prep_descs_and_exclude(inputs, at_descs, at_descs_info_key, exclude_list)#

process configs and/or input descriptor row array and exclude list to produce descriptors column array and indices of excluded configuration

Parameters

inputs (ConfigSet) – input configurations
at_descs (np.ndarray( n_configs x desc_len ), default None) – if not None, array of descriptors (as rows) for each config mutually exclusive with at_descs_info_key, one is required
at_descs_info_key (str, default None) – key into Atoms.info dict for descriptor vector of each config mutually exclusive with at_descs, one is required
exclude_list (iterable(Atoms)) – list of Atoms structures to be excluded from selection, to be converted into indices into enumerate(inputs)

Returns

at_descs_cols (np.ndarray (desc_len x n_configs)) – array of descriptors (as columns) for each config
exclude_ind_list (list(int)) – indices of configurations to exclude

wfl.select.by_descriptor.write_selected_and_clean(inputs, outputs, selected, at_descs_info_key=None, keep_descriptor_info=True)#

Writes selected (by index) configs to output configset

Parameters

inputs (ConfigSet) – input configuration set
outputs (OutputSpec) – target for output of selected configurations
selected (list(int)) – list of indices to be selected, cannot have duplicates
at_descs_info_key (str, default None) – key in info dict to delete if keep_descriptor_info is False
keep_descriptor_info (bool, default True) – keep descriptor in info dict

wfl.select.convex_hull module#

wfl.select.convex_hull.select(inputs, outputs, info_field, Zs=None, verbose=False)#

wfl.select.flat_histogram module#

wfl.select.flat_histogram.biased_select_conf(inputs, outputs, num, info_field, rng, kT=-1.0, bins='auto', by_bin=True, replace=False, verbose=False)#

select configurations by Boltzmann biased flat histogram on some quantity in Atoms.info

Parameters

inputs (ConfigSet) – input configurations
output (OutputSpec) – output configurations
num (int) – number of configs to select
info_field (string) – Atoms.info key for quantity by which to do flat histogram and Boltzmann bias
rng (np.random.Generator) – random number generator
kT (float, default -1) – Boltzmann bias temperature, <= 0 to not bias [kT] should have the same unit as the “info_field” parameter
bins (np.histogram bins argument, default 'auto') – argument to pass to np.histogram
by_bin (bool, default True) – do selections by bin, which is more accurate, but works badly for small kT and does not allow for selection with replacement
replace (bool, default False) – do selection with replacement (i.e. repeat configs)
verbose (bool, default False) – verbose output

Return type

ConfigSet containing output configs

wfl.select.selection_space module#

wfl.select.selection_space.compare_manual_minima(i, j, positions, nn_minima)#: compare nearby minima (presumably from efficient np.reduceat) to manual enumeration

wfl.select.selection_space.minima_among_neighbors(positions, ranges, values, cartesian_distance=True)#

find, for each config, lowest value config that’s within some distance cutoffs (in some feature space, typically composition and volume)

Parameters

positions (float array(Nsamples, Nfeatures)) – array of positions in feature space
values (float array(Nsamples)) – values for each sample
cartesian_distance (bool, default True) – do Cartesian distance in feature space, otherwise require that max(dist) in each feature dimension is < range of that dimension (Chebychev distance?)

Returns

minima – value of nearby minimum for each sample

Return type

float array(Nsamples)

wfl.select.selection_space.val_relative_to_nearby_composition_volume_min(inputs, outputs, vol_range, compos_range, info_field_in, info_field_out, Zs=None, per_atom=True)#

compute difference between some value to corresponding values for configurations that are nearby in compositions/volume space

Parameters

inputs (ConfigSet) – input configurations
outputs (OutputSpec) – corresponding place for output configs
vol_range (float) – cutoff range for “nearby” in cell volume/atom [we should define what to do about nonperiodic systems]
compos_range (float) – cutoff range for “nearby” in composition (fractional, i.e. 0.0-1.0)
info_field_in (str) – Atoms.info field containing quantity to be subtracted relative to “nearby” configs
info_field_out (str) – Atoms.info field to store value differences in
Zs (list(int), default None) – Zs that defined the composition space, if None get from inputs
per_atom (bool, default True) – apply calculations to per-atom quantities, i.e. Atoms.info[info_field_in] / len(atoms)

Returns

ConfigSet pointing to configurations with the saved relative value field

Return type

ConfigSet

wfl.select.simple module#

wfl.select.simple.by_bool_func(*args, **kwargs)#

apply a filter to a sequence of configs

Parameters

inputs (iterable(Atoms)) – input quantities of type Atoms
outputs (OutputSpec or None) – where to write output atomic configs, or None for no output (i.e. only side-effects)
at_filter (callable) – callable that takes an Atoms and returns a bool indicating if it should be selected
autopara_info (AutoParaInfo / dict, optional) – information for automatic parallelization

Returns

co – output configs

Return type

ConfigSet

wfl.select.simple.by_index(inputs, outputs, indices)#

select atoms from configs by index

Parameters

inputs (ConfigSet) – source configurations
outputs (OutputSpec) – output configurations
indices (list(int)) – Indices to be selected. Values outside 0..len(inputs)-1 will be ignored. Repeated values will lead to multiple copies of configuration

Return type

ConfigSet pointing to selected configurations

Notes

This routine depends on details of ConfigSet and OutputSpec, so perhaps belongs as a use case of autoparallelize, but since it can return multiple outputs for a single input, this cannot be done right now

wfl.select package

Contents

wfl.select package#

Submodules#

wfl.select.by_descriptor module#

wfl.select.convex_hull module#

wfl.select.flat_histogram module#

wfl.select.selection_space module#

wfl.select.simple module#

Module contents#