Checkpointing¶
Checkpointing adds restart and rollback capabilities to ASE scripts. It stores
the current state of the simulation (and its history) into an ase.db
.
Something like what follows is found in many ASE scripts:
if os.path.exists('atoms_after_relax.traj'):
a = ase.io.read('atoms_after_relax.traj')
else:
ase.optimize.FIRE(a).run(fmax=0.01)
ase.io.write('atoms_after_relax.traj')
The idea behind checkpointing is to replace this manual checkpointing capability with a unified infrastructure.
Manual checkpointing¶
The class Checkpoint
takes care of storing and retrieving
information from the database. This information always includes an
Atoms
object, and it can include attached information on
the internal state of the script.
- class ase.calculators.checkpoint.Checkpoint(db='checkpoints.db', logfile=None)[source]¶
- load(atoms=None)[source]¶
Retrieve checkpoint data from file. If atoms object is specified, then the calculator connected to that object is copied to all returning atoms object.
Returns tuple of values as passed to flush or save during checkpoint write.
In order to use checkpointing, first create a Checkpoint object:
from ase.calculators.checkpoint import Checkpoint
CP = Checkpoint()
You can optionally choose a database filename. Default is checkpoints.db
.
Code blocks are wrapped into checkpointed regions:
try:
a = CP.load()
except NoCheckpoint:
ase.optimize.FIRE(a).run(fmax=0.01)
CP.save(a)
The code block in the except
statement is executed only if it has not yet
been executed in a previous run of the script. The save()
statement stores all of its parameters to the database.
This is not yet much shorter than the above example. The checkpointing object
can, however, store arbitrary information along the Atoms
object. Imagine we have computed elastic constants and don’t want to recompute
them. We can then use:
try:
a, C = CP.load()
except NoCheckpoint:
C = fit_elastic_constants(a)
CP.save(a, C)
Note that one parameter to save()
needs to be an
Atoms
object, the others can be arbitrary. The
load()
statement returns these parameters in the order they
were stored upon save. In the above example, the elastic constants are stored
attached to the atomic configuration. If the script is executed again after the
elastic constants have already been computed, it will skip that computation and
just use the stored value.
If the checkpointed region contains a single statement, such as the above, there is a shorthand notation available:
C = CP(fit_elastic_constants)(a)
Sometimes it is necessary to checkpoint an iterative loop. If the script terminates within that loop, it is useful to resume calculation from the same loop position:
try:
a, converged, tip_x, tip_y = CP.load()
except NoCheckpoint:
converged = False
tip_x = tip_x0
tip_y = tip_y0
while not converged:
... do something to find better crack tip position ...
converged = ...
CP.flush(a, converged, tip_x, tip_y)
The above code block is an example of an iterative search for a crack tip
position. Note that the convergence criteria needs to be stored to the database
so the loop is not executed if convergence has been reached. The
flush()
statement overrides the last value stored to the
database.
As a rule save()
has to be used inside an
except NoCheckpoint
statement and flush()
outside.
Automatic checkpointing with the checkpoint calculator¶
The CheckpointCalculator
is a shorthand for wrapping every single
energy/force evaluation in a checkpointed region. It wraps the actual
calculator.
- class ase.calculators.checkpoint.CheckpointCalculator(calculator, db='checkpoints.db', logfile=None)[source]¶
This wraps any calculator object to checkpoint whenever a calculation is performed.
This is particularly useful for expensive calculators, e.g. DFT and allows usage of complex workflows.
Example usage:
calc = … cp_calc = CheckpointCalculator(calc) atoms.calc = cp_calc e = atoms.get_potential_energy() # 1st time, does calc, writes to checkfile # subsequent runs, reads from checkpoint file
Basic calculator implementation.
- restart: str
Prefix for restart file. May contain a directory. Default is None: don’t restart.
- ignore_bad_restart_file: bool
Deprecated, please do not use. Passing more than one positional argument to Calculator() is deprecated and will stop working in the future. Ignore broken or missing restart file. By default, it is an error if the restart file is missing or broken.
- directory: str or PurePath
Working directory in which to read and write files and perform calculations.
- label: str
Name used for all files. Not supported by all calculators. May contain a directory, but please use the directory parameter for that instead.
- atoms: Atoms object
Optional Atoms object to which the calculator will be attached. When restarting, atoms will get its positions and unit-cell updated from file.
- implemented_properties: List[str] = ['energy', 'forces', 'stress', 'stresses', 'dipole', 'charges', 'magmom', 'magmoms', 'free_energy', 'energies']¶
Properties calculator can handle (energy, forces, …)
- calculate(atoms, properties, system_changes)[source]¶
Do the calculation.
- properties: list of str
List of what needs to be calculated. Can be any combination of ‘energy’, ‘forces’, ‘stress’, ‘dipole’, ‘charges’, ‘magmom’ and ‘magmoms’.
- system_changes: list of str
List of what has changed since last calculation. Can be any combination of these six: ‘positions’, ‘numbers’, ‘cell’, ‘pbc’, ‘initial_charges’ and ‘initial_magmoms’.
Subclasses need to implement this, but can ignore properties and system_changes if they want. Calculated properties should be inserted into results dictionary like shown in this dummy example:
self.results = {'energy': 0.0, 'forces': np.zeros((len(atoms), 3)), 'stress': np.zeros(6), 'dipole': np.zeros(3), 'charges': np.zeros(len(atoms)), 'magmom': 0.0, 'magmoms': np.zeros(len(atoms))}
The subclass implementation should first call this implementation to set the atoms attribute and create any missing directories.
Example usage:
calc = ...
cp_calc = CheckpointCalculator(calc)
atoms.calc = cp_calc
e = atoms.get_potential_energy()
The first call to get_potential_energy()
does the actual
calculation, a rerun of the script will load energies and force from the
database. Note that this is useful for calculation where each energy evaluation
is slow (e.g. DFT), but not recommended for molecular dynamics with classical
potentials since every single time step will be dumped to the database. This
will generate huge files.