wfl.select package#

Submodules#

wfl.select.by_descriptor module#

wfl.select.by_descriptor.CUR(mat, num, stochastic=True, rng=None, exclude_list=None)#

Compute selection by CUR of descriptors with dot-product, with optional exponentiation

Parameters
  • mat (np.array(vec_len, n_vecs) or (n_vecs, n_vecs)) – rectangular array of descriptors as column vectors or square kernel matrix

  • num (int) – number to select

  • stochastic (bool, default True) – use stochastic selection algorithm

  • rng (numpy.random.Generator) – random number generator, required if stochastic

  • exclude_list (list(int), default None) – list of descriptor indices to exclude

Returns

selected_inds – list of indices for selected descriptors

Return type

list(int)

wfl.select.by_descriptor.CUR_conf_global(inputs, outputs, num, at_descs=None, at_descs_info_key=None, kernel_exp=None, stochastic=True, rng=None, keep_descriptor_info=True, exclude_list=None, center=True, leverage_score_key=None)#

Select atoms from a list or iterable using CUR on global (per-config) descriptors

Parameters
  • inputs (ConfigSet) – atomic configs to select from

  • outputs (OutputSpec) – where to write output to

  • num (int) – number to select

  • rng (int, default None) – random number generator

  • at_descs (np.array(n_descs, desc_len), mutually exclusive with at_descs_info_key) – list of descriptor vectors

  • at_descs_info_key (str, mutually exclusive with at_descs) – key to Atoms.info dict containing per-config descriptor vector

  • kernel_exp (float, default None) – exponent to compute kernel (if other than 1)

  • stochastic (bool, default True) – use stochastic selection

  • keep_descriptor_info (bool, default True) – do not delete descriptor from info

  • exclude_list (iterable(Atoms)) – list of Atoms to exclude from CUR selection. Needs to be _exactly_ the same as actual Atoms objects in inputs, to full machine precision

  • center (bool, default True) – center data before doing SVD, as generally required for PCA

  • leverage_score_key (str, default None) – if not None, info key to store leverage score in

Return type

ConfigSet corresponding to selected configs output

wfl.select.by_descriptor.do_svd(at_descs, num, do_vectors='vh')#
wfl.select.by_descriptor.greedy_fps_conf_global(inputs, outputs, num, at_descs=None, at_descs_info_key=None, keep_descriptor_info=True, exclude_list=None, prev_selected_descs=None, O_N_sq=False, rng=None, verbose=False)#

Select atoms from a list or iterable using greedy farthest point selection on global (per-config) descriptors

Parameters
  • inputs (ConfigSet) – atomic configs to select from

  • outputs (OutputSpec) – where to write output to

  • num (int) – number to select

  • at_descs (np.array(n_descs, desc_len), mutually exclusive with at_descs_info_key) – list of descriptor vectors

  • at_descs_info_key (str, mutually exclusive with at_descs) – key to Atoms.info dict containing per-config descriptor vector

  • keep_descriptor_info (bool, default True) – do not delete descriptor from info

  • exclude_list (iterable(Atoms)) – list of Atoms to exclude from selection by descriptor

  • prev_selected_descs (np.array(n_prev_descs, desc_len), default False) – if present, list of previously selected descriptors to also be farthest from

  • O_N_sq (bool, default False) – use O(N^2) algorithm with smaller prefactor

  • rng (numpy.random.Generator) – random number generator

  • verbose (bool, default False) – more verbose output

Returns

selected_configs – corresponding to selected configs output

Return type

ConfigSet

wfl.select.by_descriptor.prep_descs_and_exclude(inputs, at_descs, at_descs_info_key, exclude_list)#

process configs and/or input descriptor row array and exclude list to produce descriptors column array and indices of excluded configuration

Parameters
  • inputs (ConfigSet) – input configurations

  • at_descs (np.ndarray( n_configs x desc_len ), default None) – if not None, array of descriptors (as rows) for each config mutually exclusive with at_descs_info_key, one is required

  • at_descs_info_key (str, default None) – key into Atoms.info dict for descriptor vector of each config mutually exclusive with at_descs, one is required

  • exclude_list (iterable(Atoms)) – list of Atoms structures to be excluded from selection, to be converted into indices into enumerate(inputs)

Returns

  • at_descs_cols (np.ndarray (desc_len x n_configs)) – array of descriptors (as columns) for each config

  • exclude_ind_list (list(int)) – indices of configurations to exclude

wfl.select.by_descriptor.write_selected_and_clean(inputs, outputs, selected, at_descs_info_key=None, keep_descriptor_info=True)#

Writes selected (by index) configs to output configset

Parameters
  • inputs (ConfigSet) – input configuration set

  • outputs (OutputSpec) – target for output of selected configurations

  • selected (list(int)) – list of indices to be selected, cannot have duplicates

  • at_descs_info_key (str, default None) – key in info dict to delete if keep_descriptor_info is False

  • keep_descriptor_info (bool, default True) – keep descriptor in info dict

wfl.select.convex_hull module#

wfl.select.convex_hull.select(inputs, outputs, info_field, Zs=None, verbose=False)#

wfl.select.flat_histogram module#

wfl.select.flat_histogram.biased_select_conf(inputs, outputs, num, info_field, rng, kT=-1.0, bins='auto', by_bin=True, replace=False, verbose=False)#

select configurations by Boltzmann biased flat histogram on some quantity in Atoms.info

Parameters
  • inputs (ConfigSet) – input configurations

  • output (OutputSpec) – output configurations

  • num (int) – number of configs to select

  • info_field (string) – Atoms.info key for quantity by which to do flat histogram and Boltzmann bias

  • rng (np.random.Generator) – random number generator

  • kT (float, default -1) – Boltzmann bias temperature, <= 0 to not bias [kT] should have the same unit as the “info_field” parameter

  • bins (np.histogram bins argument, default 'auto') – argument to pass to np.histogram

  • by_bin (bool, default True) – do selections by bin, which is more accurate, but works badly for small kT and does not allow for selection with replacement

  • replace (bool, default False) – do selection with replacement (i.e. repeat configs)

  • verbose (bool, default False) – verbose output

Return type

ConfigSet containing output configs

wfl.select.selection_space module#

wfl.select.selection_space.compare_manual_minima(i, j, positions, nn_minima)#

compare nearby minima (presumably from efficient np.reduceat) to manual enumeration

wfl.select.selection_space.minima_among_neighbors(positions, ranges, values, cartesian_distance=True)#

find, for each config, lowest value config that’s within some distance cutoffs (in some feature space, typically composition and volume)

Parameters
  • positions (float array(Nsamples, Nfeatures)) – array of positions in feature space

  • values (float array(Nsamples)) – values for each sample

  • cartesian_distance (bool, default True) – do Cartesian distance in feature space, otherwise require that max(dist) in each feature dimension is < range of that dimension (Chebychev distance?)

Returns

minima – value of nearby minimum for each sample

Return type

float array(Nsamples)

wfl.select.selection_space.val_relative_to_nearby_composition_volume_min(inputs, outputs, vol_range, compos_range, info_field_in, info_field_out, Zs=None, per_atom=True)#

compute difference between some value to corresponding values for configurations that are nearby in compositions/volume space

Parameters
  • inputs (ConfigSet) – input configurations

  • outputs (OutputSpec) – corresponding place for output configs

  • vol_range (float) – cutoff range for “nearby” in cell volume/atom [we should define what to do about nonperiodic systems]

  • compos_range (float) – cutoff range for “nearby” in composition (fractional, i.e. 0.0-1.0)

  • info_field_in (str) – Atoms.info field containing quantity to be subtracted relative to “nearby” configs

  • info_field_out (str) – Atoms.info field to store value differences in

  • Zs (list(int), default None) – Zs that defined the composition space, if None get from inputs

  • per_atom (bool, default True) – apply calculations to per-atom quantities, i.e. Atoms.info[info_field_in] / len(atoms)

Returns

ConfigSet pointing to configurations with the saved relative value field

Return type

ConfigSet

wfl.select.simple module#

wfl.select.simple.by_bool_func(*args, **kwargs)#

apply a filter to a sequence of configs

Parameters
  • inputs (iterable(Atoms)) – input quantities of type Atoms

  • outputs (OutputSpec or None) – where to write output atomic configs, or None for no output (i.e. only side-effects)

  • at_filter (callable) – callable that takes an Atoms and returns a bool indicating if it should be selected

  • autopara_info (AutoParaInfo / dict, optional) – information for automatic parallelization

Returns

co – output configs

Return type

ConfigSet

wfl.select.simple.by_index(inputs, outputs, indices)#

select atoms from configs by index

Parameters
  • inputs (ConfigSet) – source configurations

  • outputs (OutputSpec) – output configurations

  • indices (list(int)) – Indices to be selected. Values outside 0..len(inputs)-1 will be ignored. Repeated values will lead to multiple copies of configuration

Return type

ConfigSet pointing to selected configurations

Notes

This routine depends on details of ConfigSet and OutputSpec, so perhaps belongs as a use case of autoparallelize, but since it can return multiple outputs for a single input, this cannot be done right now

Module contents#