How to train a GAP model from scratch

steps

1. generate a small dataset of water structures
    - use CP2K if you have access to it
    - otherwise: use any simple potential implemented in ASE, just for trying this out I have used EMT here
1. generate e0 values: isolated atoms in the dataset
1. separate a training and a validation dataset
1. **train the model**
1. look at the outcome of the model

here we will fit twice, to see the difference between a 2b-only and a 2b+3b fit

[15]:
# general imports
import numpy as np
import matplotlib.pyplot as plt
from copy import deepcopy as cp

# ase imports
import ase.io
from ase import Atoms, Atom
from ase import units
from ase.build import molecule
# for MD
from ase.md.langevin import Langevin
from ase.io.trajectory import Trajectory
[2]:
# helper functions
def make_water(density, super_cell=[3, 3, 3]):
    """ Geenrates a supercell of water molecules with a desired density.
        Density in g/cm^3!!!"""
    h2o = molecule('H2O')
    a = np.cbrt((sum(h2o.get_masses()) * units.m ** 3 * 1E-6 ) / (density * units.mol))
    h2o.set_cell((a, a, a))
    h2o.set_pbc((True, True, True))
    #return cp(h2o.repeat(super_cell))
    return h2o.repeat(super_cell)

def rms_dict(x_ref, x_pred):
    """ Takes two datasets of the same shape and returns a dictionary containing RMS error data"""

    x_ref = np.array(x_ref)
    x_pred = np.array(x_pred)

    if np.shape(x_pred) != np.shape(x_ref):
        raise ValueError('WARNING: not matching shapes in rms')

    error_2 = (x_ref - x_pred) ** 2

    average = np.sqrt(np.average(error_2))
    std_ = np.sqrt(np.var(error_2))

    return {'rmse': average, 'std': std_}

generating data only with ASE, using the EMT calculator

This is only for the demonstration of how to do it, this run is will be done very fast. There is no practical use of the data beyond learning the use teach_sparse, quip, etc. with it.

[3]:
# Running MD with ASE's EMT

from ase.calculators.emt import EMT
calc = EMT()

T = 150  # Kelvin

# Set up a grid of water
water = make_water(1.0, [3, 3, 3])
water.set_calculator(calc)

# We want to run MD using the Langevin algorithm
# with a time step of 1 fs, the temperature T and the friction
# coefficient to 0.002 atomic units.
dyn = Langevin(water, 1 * units.fs, T * units.kB, 0.0002)

def printenergy(a=water):  # store a reference to atoms in the definition.
    """Function to print the potential, kinetic and total energy."""
    epot = a.get_potential_energy() / len(a)
    ekin = a.get_kinetic_energy() / len(a)
    print('Energy per atom: Epot = %.3feV  Ekin = %.3feV (T=%3.0fK)  '
          'Etot = %.3feV' % (epot, ekin, ekin / (1.5 * units.kB), epot + ekin))

dyn.attach(printenergy, interval=5)

# We also want to save the positions of all atoms after every 5th time step.
traj = Trajectory('dyn_emt.traj', 'w', water)
dyn.attach(traj.write, interval=5)

# Now run the dynamics
printenergy(water)
dyn.run(600)   # CHANGE THIS IF YOU WANT LONGER/SHORTER RUN
Energy per atom: Epot = 0.885eV  Ekin = 0.000eV (T=  0K)  Etot = 0.885eV
Energy per atom: Epot = 0.820eV  Ekin = 0.053eV (T=413K)  Etot = 0.874eV
Energy per atom: Epot = 0.660eV  Ekin = 0.208eV (T=1611K)  Etot = 0.868eV
Energy per atom: Epot = 0.632eV  Ekin = 0.224eV (T=1736K)  Etot = 0.857eV
Energy per atom: Epot = 0.869eV  Ekin = 0.005eV (T= 39K)  Etot = 0.874eV
Energy per atom: Epot = 0.781eV  Ekin = 0.088eV (T=682K)  Etot = 0.869eV
Energy per atom: Epot = 0.756eV  Ekin = 0.117eV (T=903K)  Etot = 0.872eV
Energy per atom: Epot = 0.706eV  Ekin = 0.164eV (T=1270K)  Etot = 0.870eV
Energy per atom: Epot = 0.703eV  Ekin = 0.158eV (T=1223K)  Etot = 0.861eV
Energy per atom: Epot = 0.867eV  Ekin = 0.006eV (T= 47K)  Etot = 0.873eV
Energy per atom: Epot = 0.704eV  Ekin = 0.160eV (T=1241K)  Etot = 0.864eV
Energy per atom: Epot = 0.679eV  Ekin = 0.190eV (T=1468K)  Etot = 0.868eV
Energy per atom: Epot = 0.822eV  Ekin = 0.051eV (T=391K)  Etot = 0.872eV
Energy per atom: Epot = 0.782eV  Ekin = 0.088eV (T=677K)  Etot = 0.869eV
Energy per atom: Epot = 0.775eV  Ekin = 0.096eV (T=739K)  Etot = 0.871eV
Energy per atom: Epot = 0.751eV  Ekin = 0.119eV (T=920K)  Etot = 0.870eV
Energy per atom: Epot = 0.752eV  Ekin = 0.117eV (T=904K)  Etot = 0.869eV
Energy per atom: Epot = 0.764eV  Ekin = 0.107eV (T=825K)  Etot = 0.871eV
Energy per atom: Epot = 0.752eV  Ekin = 0.120eV (T=927K)  Etot = 0.871eV
Energy per atom: Epot = 0.747eV  Ekin = 0.123eV (T=953K)  Etot = 0.871eV
Energy per atom: Epot = 0.724eV  Ekin = 0.145eV (T=1119K)  Etot = 0.868eV
Energy per atom: Epot = 0.703eV  Ekin = 0.166eV (T=1281K)  Etot = 0.869eV
Energy per atom: Epot = 0.738eV  Ekin = 0.132eV (T=1022K)  Etot = 0.870eV
Energy per atom: Epot = 0.703eV  Ekin = 0.165eV (T=1280K)  Etot = 0.868eV
Energy per atom: Epot = 0.697eV  Ekin = 0.172eV (T=1331K)  Etot = 0.869eV
Energy per atom: Epot = 0.731eV  Ekin = 0.140eV (T=1083K)  Etot = 0.871eV
Energy per atom: Epot = 0.716eV  Ekin = 0.155eV (T=1199K)  Etot = 0.871eV
Energy per atom: Epot = 0.706eV  Ekin = 0.163eV (T=1257K)  Etot = 0.869eV
Energy per atom: Epot = 0.721eV  Ekin = 0.149eV (T=1151K)  Etot = 0.870eV
Energy per atom: Epot = 0.685eV  Ekin = 0.184eV (T=1424K)  Etot = 0.869eV
Energy per atom: Epot = 0.706eV  Ekin = 0.164eV (T=1271K)  Etot = 0.870eV
Energy per atom: Epot = 0.721eV  Ekin = 0.150eV (T=1161K)  Etot = 0.871eV
Energy per atom: Epot = 0.690eV  Ekin = 0.180eV (T=1396K)  Etot = 0.870eV
Energy per atom: Epot = 0.663eV  Ekin = 0.207eV (T=1601K)  Etot = 0.870eV
Energy per atom: Epot = 0.623eV  Ekin = 0.247eV (T=1907K)  Etot = 0.869eV
Energy per atom: Epot = 0.609eV  Ekin = 0.261eV (T=2022K)  Etot = 0.870eV
Energy per atom: Epot = 0.576eV  Ekin = 0.294eV (T=2271K)  Etot = 0.870eV
Energy per atom: Epot = 0.563eV  Ekin = 0.306eV (T=2370K)  Etot = 0.870eV
Energy per atom: Epot = 0.571eV  Ekin = 0.298eV (T=2307K)  Etot = 0.869eV
Energy per atom: Epot = 0.605eV  Ekin = 0.265eV (T=2050K)  Etot = 0.870eV
Energy per atom: Epot = 0.566eV  Ekin = 0.302eV (T=2338K)  Etot = 0.869eV
Energy per atom: Epot = 0.579eV  Ekin = 0.290eV (T=2242K)  Etot = 0.868eV
Energy per atom: Epot = 0.580eV  Ekin = 0.289eV (T=2234K)  Etot = 0.868eV
Energy per atom: Epot = 0.557eV  Ekin = 0.311eV (T=2406K)  Etot = 0.868eV
Energy per atom: Epot = 0.605eV  Ekin = 0.264eV (T=2046K)  Etot = 0.870eV
Energy per atom: Epot = 0.541eV  Ekin = 0.327eV (T=2530K)  Etot = 0.868eV
Energy per atom: Epot = 0.521eV  Ekin = 0.347eV (T=2686K)  Etot = 0.868eV
Energy per atom: Epot = 0.535eV  Ekin = 0.339eV (T=2622K)  Etot = 0.874eV
Energy per atom: Epot = 0.540eV  Ekin = 0.333eV (T=2574K)  Etot = 0.873eV
Energy per atom: Epot = 0.597eV  Ekin = 0.278eV (T=2151K)  Etot = 0.875eV
Energy per atom: Epot = 0.534eV  Ekin = 0.340eV (T=2631K)  Etot = 0.874eV
Energy per atom: Epot = 0.490eV  Ekin = 0.384eV (T=2968K)  Etot = 0.874eV
Energy per atom: Epot = 0.496eV  Ekin = 0.378eV (T=2928K)  Etot = 0.875eV
Energy per atom: Epot = 0.495eV  Ekin = 0.384eV (T=2970K)  Etot = 0.879eV
Energy per atom: Epot = 0.456eV  Ekin = 0.424eV (T=3277K)  Etot = 0.880eV
Energy per atom: Epot = 0.454eV  Ekin = 0.430eV (T=3325K)  Etot = 0.883eV
Energy per atom: Epot = 0.527eV  Ekin = 0.358eV (T=2767K)  Etot = 0.884eV
Energy per atom: Epot = 0.493eV  Ekin = 0.391eV (T=3027K)  Etot = 0.885eV
Energy per atom: Epot = 0.496eV  Ekin = 0.389eV (T=3011K)  Etot = 0.885eV
Energy per atom: Epot = 0.500eV  Ekin = 0.384eV (T=2971K)  Etot = 0.884eV
Energy per atom: Epot = 0.508eV  Ekin = 0.374eV (T=2895K)  Etot = 0.882eV
Energy per atom: Epot = 0.470eV  Ekin = 0.411eV (T=3176K)  Etot = 0.881eV
Energy per atom: Epot = 0.430eV  Ekin = 0.451eV (T=3489K)  Etot = 0.881eV
Energy per atom: Epot = 0.404eV  Ekin = 0.478eV (T=3696K)  Etot = 0.882eV
Energy per atom: Epot = 0.410eV  Ekin = 0.471eV (T=3645K)  Etot = 0.881eV
Energy per atom: Epot = 0.473eV  Ekin = 0.408eV (T=3160K)  Etot = 0.881eV
Energy per atom: Epot = 0.429eV  Ekin = 0.451eV (T=3490K)  Etot = 0.880eV
Energy per atom: Epot = 0.439eV  Ekin = 0.442eV (T=3423K)  Etot = 0.881eV
Energy per atom: Epot = 0.465eV  Ekin = 0.416eV (T=3216K)  Etot = 0.881eV
Energy per atom: Epot = 0.448eV  Ekin = 0.432eV (T=3346K)  Etot = 0.880eV
Energy per atom: Epot = 0.465eV  Ekin = 0.417eV (T=3222K)  Etot = 0.882eV
Energy per atom: Epot = 0.457eV  Ekin = 0.423eV (T=3274K)  Etot = 0.880eV
Energy per atom: Epot = 0.425eV  Ekin = 0.456eV (T=3528K)  Etot = 0.881eV
Energy per atom: Epot = 0.415eV  Ekin = 0.465eV (T=3598K)  Etot = 0.880eV
Energy per atom: Epot = 0.458eV  Ekin = 0.427eV (T=3301K)  Etot = 0.884eV
Energy per atom: Epot = 0.444eV  Ekin = 0.442eV (T=3423K)  Etot = 0.886eV
Energy per atom: Epot = 0.386eV  Ekin = 0.499eV (T=3858K)  Etot = 0.885eV
Energy per atom: Epot = 0.386eV  Ekin = 0.499eV (T=3864K)  Etot = 0.885eV
Energy per atom: Epot = 0.421eV  Ekin = 0.466eV (T=3609K)  Etot = 0.887eV
Energy per atom: Epot = 0.390eV  Ekin = 0.498eV (T=3854K)  Etot = 0.888eV
Energy per atom: Epot = 0.404eV  Ekin = 0.488eV (T=3777K)  Etot = 0.892eV
Energy per atom: Epot = 0.411eV  Ekin = 0.480eV (T=3714K)  Etot = 0.891eV
Energy per atom: Epot = 0.429eV  Ekin = 0.460eV (T=3558K)  Etot = 0.889eV
Energy per atom: Epot = 0.391eV  Ekin = 0.497eV (T=3848K)  Etot = 0.888eV
Energy per atom: Epot = 0.379eV  Ekin = 0.507eV (T=3924K)  Etot = 0.886eV
Energy per atom: Epot = 0.429eV  Ekin = 0.460eV (T=3555K)  Etot = 0.889eV
Energy per atom: Epot = 0.431eV  Ekin = 0.458eV (T=3545K)  Etot = 0.889eV
Energy per atom: Epot = 0.440eV  Ekin = 0.447eV (T=3458K)  Etot = 0.887eV
Energy per atom: Epot = 0.417eV  Ekin = 0.467eV (T=3610K)  Etot = 0.884eV
Energy per atom: Epot = 0.445eV  Ekin = 0.442eV (T=3422K)  Etot = 0.888eV
Energy per atom: Epot = 0.437eV  Ekin = 0.447eV (T=3461K)  Etot = 0.884eV
Energy per atom: Epot = 0.439eV  Ekin = 0.447eV (T=3460K)  Etot = 0.886eV
Energy per atom: Epot = 0.424eV  Ekin = 0.464eV (T=3587K)  Etot = 0.888eV
Energy per atom: Epot = 0.394eV  Ekin = 0.493eV (T=3817K)  Etot = 0.887eV
Energy per atom: Epot = 0.370eV  Ekin = 0.515eV (T=3984K)  Etot = 0.885eV
Energy per atom: Epot = 0.373eV  Ekin = 0.511eV (T=3951K)  Etot = 0.883eV
Energy per atom: Epot = 0.402eV  Ekin = 0.485eV (T=3748K)  Etot = 0.886eV
Energy per atom: Epot = 0.380eV  Ekin = 0.506eV (T=3912K)  Etot = 0.886eV
Energy per atom: Epot = 0.404eV  Ekin = 0.483eV (T=3736K)  Etot = 0.887eV
Energy per atom: Epot = 0.443eV  Ekin = 0.444eV (T=3436K)  Etot = 0.887eV
Energy per atom: Epot = 0.419eV  Ekin = 0.469eV (T=3626K)  Etot = 0.888eV
Energy per atom: Epot = 0.440eV  Ekin = 0.451eV (T=3489K)  Etot = 0.891eV
Energy per atom: Epot = 0.434eV  Ekin = 0.453eV (T=3505K)  Etot = 0.887eV
Energy per atom: Epot = 0.423eV  Ekin = 0.468eV (T=3619K)  Etot = 0.891eV
Energy per atom: Epot = 0.369eV  Ekin = 0.523eV (T=4047K)  Etot = 0.892eV
Energy per atom: Epot = 0.413eV  Ekin = 0.478eV (T=3701K)  Etot = 0.891eV
Energy per atom: Epot = 0.359eV  Ekin = 0.531eV (T=4108K)  Etot = 0.890eV
Energy per atom: Epot = 0.401eV  Ekin = 0.491eV (T=3799K)  Etot = 0.892eV
Energy per atom: Epot = 0.414eV  Ekin = 0.477eV (T=3688K)  Etot = 0.890eV
Energy per atom: Epot = 0.376eV  Ekin = 0.513eV (T=3969K)  Etot = 0.889eV
Energy per atom: Epot = 0.368eV  Ekin = 0.522eV (T=4035K)  Etot = 0.890eV
Energy per atom: Epot = 0.370eV  Ekin = 0.521eV (T=4029K)  Etot = 0.891eV
Energy per atom: Epot = 0.379eV  Ekin = 0.512eV (T=3959K)  Etot = 0.891eV
Energy per atom: Epot = 0.393eV  Ekin = 0.502eV (T=3881K)  Etot = 0.894eV
Energy per atom: Epot = 0.373eV  Ekin = 0.515eV (T=3984K)  Etot = 0.888eV
Energy per atom: Epot = 0.367eV  Ekin = 0.530eV (T=4100K)  Etot = 0.897eV
Energy per atom: Epot = 0.380eV  Ekin = 0.520eV (T=4026K)  Etot = 0.900eV
Energy per atom: Epot = 0.384eV  Ekin = 0.523eV (T=4048K)  Etot = 0.908eV
Energy per atom: Epot = 0.417eV  Ekin = 0.485eV (T=3751K)  Etot = 0.902eV
Energy per atom: Epot = 0.402eV  Ekin = 0.518eV (T=4005K)  Etot = 0.919eV
Energy per atom: Epot = 0.417eV  Ekin = 0.558eV (T=4313K)  Etot = 0.974eV
[4]:
# wrap and save traj in .xyz --- the .traj is a non human readable database file, xyz is much better
out_traj = ase.io.read('dyn_emt.traj', ':')
for at in out_traj:
    at.wrap()
    if 'momenta' in at.arrays: del at.arrays['momenta']
ase.io.write('dyn_emt.xyz', out_traj, 'xyz')

get e0 for H and O - energies of the isolated atoms

This is the energy of the isolated atom, will be in the teach_sparse string in the following format: e0={H:energy:O:energy}

[5]:
isolated_H = Atoms('H', calculator=EMT())
isolated_O = Atoms('O', calculator=EMT())

print('e0_H:',isolated_H.get_potential_energy())
print('e0_O:',isolated_O.get_potential_energy())

# this made the e0 string be the following: e0={H:3.21:O:4.6}
e0_H: 3.21
e0_O: 4.6

separate the dataset into a training and a validation set

As we have 120 frames from the 600fs MD, I will do it 60,60 with taking even and odd frames for the two

[6]:
ase.io.write('train.xyz', out_traj[0::2])
ase.io.write('validate.xyz', out_traj[1::2])
[7]:
out_traj[0].arrays.keys()
[7]:
dict_keys(['numbers', 'positions'])

train our GAP model from the command line

Will use a fit of 2b only, using the desciptor distance_2b.

Let’s understand how this works. The bash command takes named arguments separated by spaces.

  • teach_sparse the command which actually does the fit
  • e0={H:3.21:O:4.6} the energies of the isolated atoms
  • energy_parameter_name=energy force_parameter_name=forces names of the parameters
  • do_copy_at_file=F sparse_separate_file=T just needed, don’t want to copy the training data and using separate files for the xml makes it faster
  • gp_file=GAP.xml filename of the potential parameters, I have always used this name, because I had separate directories for the different trainings potentials
  • at_file=train.xyz training file
  • default_sigma={0.008 0.04 0 0} sigma values to be used for energies, forces, stresses, hessians in order; this represents the accuracy of the data and the relative weight of them in the fit (more accurate –> more significant in the fit)
  • gap={...} the potential to be fit, separated by ‘:’

distance_2b - cutoff=4.0 radial, practically the highest distance the descriptor takes into account - covariance_type=ard_se use gausses in the fit - delta=0.5 what relative portion of the things shall be determined by this potential - theta_uniform=1.0 width of the gaussians - sparse_method=uniform use uniform bins to choose the sparse points - add_species=T take the species into account, so it will generate more GAPs automatically (see the output) - n_sparse=10 number of sparse points

notice, that the script is running in parallel, using all 8 cores of the current machine

[8]:
! teach_sparse e0={H:3.21:O:4.6} energy_parameter_name=energy force_parameter_name=forces do_copy_at_file=F sparse_separate_file=T gp_file=GAP.xml at_file=train.xyz default_sigma={0.008 0.04 0 0} gap={distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10}

libAtoms::Hello World: 07/10/2018   18:10:52
libAtoms::Hello World: git version  https://github.com/libAtoms/QUIP.git,531330f-dirty
libAtoms::Hello World: QUIP_ARCH    linux_x86_64_gfortran_openmp
libAtoms::Hello World: compiled on  Jul  2 2018 at 21:44:13
libAtoms::Hello World: OpenMP parallelisation with 8 threads
WARNING: libAtoms::Hello World: environment variable OMP_STACKSIZE not set explicitly. The default value - system and compiler dependent - may be too small for some applications.
libAtoms::Hello World: Random Seed = 65452599
libAtoms::Hello World: global verbosity = 0

Calls to system_timer will do nothing by default


================================ Input parameters ==============================

at_file = train.xyz
gap = "distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10"
e0 = H:3.21:O:4.6
e0_offset = 0.0
do_e0_avg = T
default_sigma = "0.008 0.04 0 0"
sparse_jitter = 1.0e-10
hessian_delta = 1.0e-2
core_param_file = quip_params.xml
core_ip_args =
energy_parameter_name = energy
force_parameter_name = forces
virial_parameter_name = virial
hessian_parameter_name = hessian
config_type_parameter_name = config_type
sigma_parameter_name = sigma
config_type_sigma =
sigma_per_atom = T
do_copy_at_file = F
sparse_separate_file = T
sparse_use_actual_gpcov = F
gp_file = GAP.xml
verbosity = NORMAL
rnd_seed = -1
do_ip_timing = F
template_file = template.xyz

========================================  ======================================


============== Gaussian Approximation Potentials - Database fitting ============


Initial parsing of command line arguments finished.
Found 1 GAPs.
Descriptors have been parsed
XYZ file read
Old GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10}
New GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform               n_sparse=10 Z1=8 Z2=8}
New GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform               n_sparse=10 Z1=8 Z2=1}
New GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform               n_sparse=10 Z1=1 Z2=1}
Multispecies support added where requested
Number of target energies (property name: energy) found: 60
Number of target forces (property name: forces) found: 14580
Number of target virials (property name: virial) found: 0
Number of target Hessian eigenvalues (property name: hessian) found: 0
Cartesian coordinates transformed to descriptors
Started sparse covariance matrix calculation of coordinate 1

Finished sparse covariance matrix calculation of coordinate 1
TIMER: gpFull_covarianceMatrix_sparse_Coordinate1_sparse  done in .91605700000000012 cpu secs, .11785597354173660 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate1         done in .91605700000000012 cpu secs, .11797238700091839 wall clock secs.
Started sparse covariance matrix calculation of coordinate 2

Finished sparse covariance matrix calculation of coordinate 2
TIMER: gpFull_covarianceMatrix_sparse_Coordinate2_sparse  done in 3.7722349999999998 cpu secs, .47368377633392811 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate2         done in 3.7722349999999998 cpu secs, .47379869595170021 wall clock secs.
Started sparse covariance matrix calculation of coordinate 3

Finished sparse covariance matrix calculation of coordinate 3
TIMER: gpFull_covarianceMatrix_sparse_Coordinate3_sparse  done in 3.5482220000000000 cpu secs, .44678380154073238 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate3         done in 3.5522219999999995 cpu secs, .44689681194722652 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_LinearAlgebra       done in .56005000000000749E-001 cpu secs, .13479800894856453E-001 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_FunctionValues      done in .00000000000000000E+000 cpu secs, .14016032218933105E-003 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse                     done in 8.3005189999999995 cpu secs, 1.0586479119956493 wall clock secs.
TIMER: GP sparsify                                        done in 9.8286140000000000 cpu secs, 1.4589964970946312 wall clock secs.

libAtoms::Finalise: 07/10/2018   18:10:55
libAtoms::Finalise: Bye-Bye!

use the potential with QUIP on trani.xyz and validate.xyz

[9]:
# calculate train.xyz

! quip E=T F=T atoms_filename=train.xyz param_filename=GAP.xml | grep AT | sed 's/AT//' >> quip_train.xyz
! quip E=T F=T atoms_filename=validate.xyz param_filename=GAP.xml | grep AT | sed 's/AT//' >> quip_validate.xyz

make simple plots of the energies and forces on the EMT and GAP datas

[16]:
def energy_plot(in_file, out_file, ax, title='Plot of energy'):
    """ Plots the distribution of energy per atom on the output vs the input"""
    # read files
    in_atoms = ase.io.read(in_file, ':')
    out_atoms = ase.io.read(out_file, ':')
    # list energies
    ener_in = [at.get_potential_energy() / len(at.get_chemical_symbols()) for at in in_atoms]
    ener_out = [at.get_potential_energy() / len(at.get_chemical_symbols()) for at in out_atoms]
    # scatter plot of the data
    ax.scatter(ener_in, ener_out)
    # get the appropriate limits for the plot
    for_limits = np.array(ener_in +ener_out)
    elim = (for_limits.min() - 0.05, for_limits.max() + 0.05)
    ax.set_xlim(elim)
    ax.set_ylim(elim)
    # add line of slope 1 for refrence
    ax.plot(elim, elim, c='k')
    # set labels
    ax.set_ylabel('energy by GAP / eV')
    ax.set_xlabel('energy by EMT / eV')
    #set title
    ax.set_title(title)
    # add text about RMSE
    _rms = rms_dict(ener_in, ener_out)
    rmse_text = 'RMSE:\n' + str(np.round(_rms['rmse'], 3)) + ' +- ' + str(np.round(_rms['std'], 3)) + 'eV/atom'
    ax.text(0.9, 0.1, rmse_text, transform=ax.transAxes, fontsize='large', horizontalalignment='right',
            verticalalignment='bottom')

def force_plot(in_file, out_file, ax, symbol='HO', title='Plot of force'):
    """ Plots the distribution of firce components per atom on the output vs the input
        only plots for the given atom type(s)"""

    in_atoms = ase.io.read(in_file, ':')
    out_atoms = ase.io.read(out_file, ':')

    # extract data for only one species
    in_force, out_force = [], []
    for at_in, at_out in zip(in_atoms, out_atoms):
        # get the symbols
        sym_all = at_in.get_chemical_symbols()
        # add force for each atom
        for j, sym in enumerate(sym_all):
            if sym in symbol:
                in_force.append(at_in.get_forces()[j])
                #out_force.append(at_out.get_forces()[j]) \
                out_force.append(at_out.arrays['force'][j]) # because QUIP and ASE use different names
    # convert to np arrays, much easier to work with
    #in_force = np.array(in_force)
    #out_force = np.array(out_force)
    # scatter plot of the data
    ax.scatter(in_force, out_force)
    # get the appropriate limits for the plot
    for_limits = np.array(in_force + out_force)
    flim = (for_limits.min() - 1, for_limits.max() + 1)
    ax.set_xlim(flim)
    ax.set_ylim(flim)
    # add line of
    ax.plot(flim, flim, c='k')
    # set labels
    ax.set_ylabel('force by GAP / (eV/Å)')
    ax.set_xlabel('force by EMT / (eV/Å)')
    #set title
    ax.set_title(title)
    # add text about RMSE
    _rms = rms_dict(in_force, out_force)
    rmse_text = 'RMSE:\n' + str(np.round(_rms['rmse'], 3)) + ' +- ' + str(np.round(_rms['std'], 3)) + 'eV/Å'
    ax.text(0.9, 0.1, rmse_text, transform=ax.transAxes, fontsize='large', horizontalalignment='right',
            verticalalignment='bottom')
[25]:
fig, ax_list = plt.subplots(nrows=3, ncols=2, gridspec_kw={'hspace': 0.3})
fig.set_size_inches(15, 20)
ax_list = ax_list.flat[:]

energy_plot('train.xyz', 'quip_train.xyz', ax_list[0], 'Energy on training data')
energy_plot('validate.xyz', 'quip_validate.xyz', ax_list[1], 'Energy on validation data')
force_plot('train.xyz', 'quip_train.xyz', ax_list[2], 'H', 'Force on training data - H')
force_plot('train.xyz', 'quip_train.xyz', ax_list[3], 'O', 'Force on training data - O')
force_plot('validate.xyz', 'quip_validate.xyz', ax_list[4], 'H', 'Force on validation data - H')
force_plot('validate.xyz', 'quip_validate.xyz', ax_list[5], 'O',  'Force on validation data - O')

# if you wanted to have the same limits on the firce plots
#for ax in ax_list[2:]:
#    flim = (-20, 20)
#    ax.set_xlim(flim)
#    ax.set_ylim(flim)
../_images/tutorials_gap_tutorial_in_docker_17_0.png

train our GAP_3b model from the command line

Let’s add three ody terms to the fit, which will hopefully improve it. We will be using the desciprtors distance_2b and angle_3b.

angle_3b - theta_fac=0.5 this takes the input data and determines the width from that; useful here, because the dimensions of the descriptor are different - n_sparse=50 higher dimensional space, more sparse points

both training and quip takes significantly more time than the last one!!!

[12]:
! teach_sparse e0={H:3.21:O:4.6} energy_parameter_name=energy force_parameter_name=forces do_copy_at_file=F sparse_separate_file=T gp_file=GAP_3b.xml at_file=train.xyz default_sigma={0.008 0.04 0 0} gap={distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10 : angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5 add_species=T n_sparse=30 sparse_method=uniform}

libAtoms::Hello World: 07/10/2018   18:11:09
libAtoms::Hello World: git version  https://github.com/libAtoms/QUIP.git,531330f-dirty
libAtoms::Hello World: QUIP_ARCH    linux_x86_64_gfortran_openmp
libAtoms::Hello World: compiled on  Jul  2 2018 at 21:44:13
libAtoms::Hello World: OpenMP parallelisation with 8 threads
WARNING: libAtoms::Hello World: environment variable OMP_STACKSIZE not set explicitly. The default value - system and compiler dependent - may be too small for some applications.
libAtoms::Hello World: Random Seed = 65469770
libAtoms::Hello World: global verbosity = 0

Calls to system_timer will do nothing by default


================================ Input parameters ==============================

at_file = train.xyz
gap = "distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10 : angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5 add_species=T n_sparse=30 sparse_method=uniform"
e0 = H:3.21:O:4.6
e0_offset = 0.0
do_e0_avg = T
default_sigma = "0.008 0.04 0 0"
sparse_jitter = 1.0e-10
hessian_delta = 1.0e-2
core_param_file = quip_params.xml
core_ip_args =
energy_parameter_name = energy
force_parameter_name = forces
virial_parameter_name = virial
hessian_parameter_name = hessian
config_type_parameter_name = config_type
sigma_parameter_name = sigma
config_type_sigma =
sigma_per_atom = T
do_copy_at_file = F
sparse_separate_file = T
sparse_use_actual_gpcov = F
gp_file = GAP_3b.xml
verbosity = NORMAL
rnd_seed = -1
do_ip_timing = F
template_file = template.xyz

========================================  ======================================


============== Gaussian Approximation Potentials - Database fitting ============


Initial parsing of command line arguments finished.
Found 2 GAPs.
Descriptors have been parsed
XYZ file read
Old GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform add_species=T n_sparse=10}
New GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform               n_sparse=10 Z1=8 Z2=8}
New GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform               n_sparse=10 Z1=8 Z2=1}
New GAP: {distance_2b cutoff=4.0 covariance_type=ard_se delta=0.5 theta_uniform=1.0 sparse_method=uniform               n_sparse=10 Z1=1 Z2=1}
Old GAP: { angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5 add_species=T n_sparse=30 sparse_method=uniform}
New GAP: { angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5               n_sparse=30 sparse_method=uniform Z=8 Z1=8 Z2=8}
New GAP: { angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5               n_sparse=30 sparse_method=uniform Z=8 Z1=8 Z2=1}
New GAP: { angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5               n_sparse=30 sparse_method=uniform Z=8 Z1=1 Z2=1}
New GAP: { angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5               n_sparse=30 sparse_method=uniform Z=1 Z1=8 Z2=8}
New GAP: { angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5               n_sparse=30 sparse_method=uniform Z=1 Z1=8 Z2=1}
New GAP: { angle_3b cutoff=3.5 covariance_type=ard_se delta=0.5 theta_fac=0.5               n_sparse=30 sparse_method=uniform Z=1 Z1=1 Z2=1}
Multispecies support added where requested
Number of target energies (property name: energy) found: 60
Number of target forces (property name: forces) found: 14580
Number of target virials (property name: virial) found: 0
Number of target Hessian eigenvalues (property name: hessian) found: 0
Cartesian coordinates transformed to descriptors
Started sparse covariance matrix calculation of coordinate 1

Finished sparse covariance matrix calculation of coordinate 1
TIMER: gpFull_covarianceMatrix_sparse_Coordinate1_sparse  done in .89205500000000093 cpu secs, .11343244090676308 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate1         done in .89205500000000093 cpu secs, .11355240270495415 wall clock secs.
Started sparse covariance matrix calculation of coordinate 2

Finished sparse covariance matrix calculation of coordinate 2
TIMER: gpFull_covarianceMatrix_sparse_Coordinate2_sparse  done in 3.6442280000000018 cpu secs, .45859701186418533 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate2         done in 3.6442280000000018 cpu secs, .45871092379093170 wall clock secs.
Started sparse covariance matrix calculation of coordinate 3

Finished sparse covariance matrix calculation of coordinate 3
TIMER: gpFull_covarianceMatrix_sparse_Coordinate3_sparse  done in 3.4482159999999986 cpu secs, .43381043337285519 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate3         done in 3.4482159999999986 cpu secs, .43392318300902843 wall clock secs.
Started sparse covariance matrix calculation of coordinate 4

Finished sparse covariance matrix calculation of coordinate 4
TIMER: gpFull_covarianceMatrix_sparse_Coordinate4_sparse  done in 9.9606220000000008 cpu secs, 1.2572825513780117 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate4         done in 9.9606220000000008 cpu secs, 1.2574077267199755 wall clock secs.
Started sparse covariance matrix calculation of coordinate 5

Finished sparse covariance matrix calculation of coordinate 5
TIMER: gpFull_covarianceMatrix_sparse_Coordinate5_sparse  done in 50.367147000000003 cpu secs, 6.3618728630244732 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate5         done in 50.367147000000003 cpu secs, 6.3619935847818851 wall clock secs.
Started sparse covariance matrix calculation of coordinate 6

Finished sparse covariance matrix calculation of coordinate 6
TIMER: gpFull_covarianceMatrix_sparse_Coordinate6_sparse  done in 55.067442000000000 cpu secs, 6.9305843152105808 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate6         done in 55.067442000000000 cpu secs, 6.9306991323828697 wall clock secs.
Started sparse covariance matrix calculation of coordinate 7

Finished sparse covariance matrix calculation of coordinate 7
TIMER: gpFull_covarianceMatrix_sparse_Coordinate7_sparse  done in 25.993623999999983 cpu secs, 3.2758836857974529 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate7         done in 25.993623999999983 cpu secs, 3.2760034948587418 wall clock secs.
Started sparse covariance matrix calculation of coordinate 8

Finished sparse covariance matrix calculation of coordinate 8
TIMER: gpFull_covarianceMatrix_sparse_Coordinate8_sparse  done in 112.66704200000001 cpu secs, 14.186459138989449 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate8         done in 112.67104200000003 cpu secs, 14.186579804867506 wall clock secs.
Started sparse covariance matrix calculation of coordinate 9

Finished sparse covariance matrix calculation of coordinate 9
TIMER: gpFull_covarianceMatrix_sparse_Coordinate9_sparse  done in 103.17444799999998 cpu secs, 12.990166073665023 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_Coordinate9         done in 103.17444799999998 cpu secs, 12.990285659208894 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_LinearAlgebra       done in .44402700000000550 cpu secs, .12523609958589077 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse_FunctionValues      done in .00000000000000000E+000 cpu secs, .13986043632030487E-003 wall clock secs.
TIMER: gpFull_covarianceMatrix_sparse                     done in 365.71485500000000 cpu secs, 46.178485954180360 wall clock secs.
TIMER: GP sparsify                                        done in 368.54703300000000 cpu secs, 47.065343659371138 wall clock secs.

libAtoms::Finalise: 07/10/2018   18:12:12
libAtoms::Finalise: Bye-Bye!

use the potential with QUIP on trani.xyz and validate.xyz

[13]:
# calculate train.xyz

! quip E=T F=T atoms_filename=train.xyz param_filename=GAP_3b.xml | grep AT | sed 's/AT//' >> quip_3b_train.xyz
! quip E=T F=T atoms_filename=validate.xyz param_filename=GAP_3b.xml | grep AT | sed 's/AT//' >> quip_3b_validate.xyz

look at the outputs - clear improvement

[26]:
fig, ax_list = plt.subplots(nrows=3, ncols=2, gridspec_kw={'hspace': 0.3})
fig.set_size_inches(15, 20)
ax_list = ax_list.flat[:]

energy_plot('train.xyz', 'quip_3b_train.xyz', ax_list[0], 'Energy on training data')
energy_plot('validate.xyz', 'quip_3b_validate.xyz', ax_list[1], 'Energy on validation data')
force_plot('train.xyz', 'quip_3b_train.xyz', ax_list[2], 'H', 'Force on training data - H')
force_plot('train.xyz', 'quip_3b_train.xyz', ax_list[3], 'O', 'Force on training data - O')
force_plot('validate.xyz', 'quip_3b_validate.xyz', ax_list[4], 'H', 'Force on validation data - H')
force_plot('validate.xyz', 'quip_3b_validate.xyz', ax_list[5], 'O',  'Force on validation data - O')

# if you wanted to have the same limits on the firce plots
#for ax in ax_list[2:]:
#    flim = (-20, 20)
#    ax.set_xlim(flim)
#    ax.set_ylim(flim)
../_images/tutorials_gap_tutorial_in_docker_23_0.png
[ ]: