RunnerData for Runners

RunnerData attaches to a row in the ase database as a python dictionary, and is used by a Runner to run the simulation for that row.

`RunnerData` description :header-rows: 1
Key	Value type	Description
‘name’	str	Name given to the run
‘scheduler_options’	dict	Data to define the workflow manager options
‘parents’	list	Parent rows in the database. The runner waits for completion of these tasks before running the present row.
‘files’	dict	Files required during the run
‘tasks’	list	List of tasks, python and shell, to be performed for the run
‘keep_run’	bool	Boolean indicating if the run folder is to be kept after completion

RunnerData is designed to simplify the genration of this data.

Template data:

template is initialised as:

>>> runnerdata = runner.RunnerData('myEnergyRun')

Scheduler options:

are added as:

>>> runnerdata.scheduler_options = scheduler_options

Scheduler options are defined differently based on the workflow manager used:

Slurm scheduler options in RunnerData

Terminal scheduler options in RunnerData

Parents:

are added as:

>>> # setting parents as row id 2 and 3 form the same database
>>> runnerdata.parents = [2, 3]

Files:

are added as:

>>> runnerdata.add_file('get_energy.py')
>>> runnerdata.add_files(['BASIS', 'POTENTIAL'])

The files can be string or binary.

Format for a python run file:

The file should have a main function, this function is called at execution

The first argument, of the main function should take a list. This is the list of atoms rows. The 0th index is the atoms row of the run, and the rest are the atoms rows of the parents, in the order defined in the parents list.

The rest parameters are passed as **kwargs, as defined in the tasks

The function should return an atoms object, to be added in-place at the row being run.

The key_value_pairs stored in ase.Atoms.info of the returned atoms object, is updated in the database.

The ase.db.row.data is updated with the rest of ase.Atoms.info.

Tasks:

Runner supports shell and python tasks. shell task can be added as:

>>> runnerdata.append_tasks('shell', 'module load anaconda3')

This will be run as:

$ module load anaconda3

Python task is added with the python file:

>>> runnerdata.append_tasks('python', 'get_energy.py')

This will be called from python code as:

>>> from get_energy import main
>>> main(atoms_list, **parameters)

Here, if Python parameters are defined in the task as:

>>> runnerdata.append_tasks('python', 'get_energy.py', {'param': 0})

then, the parameters dict will pass param=0 as **kwargs in the main function.

If Python task is to be executed with different python command then:

>>> runnerdata.append_tasks('python', 'get_energy.py', {},
...                         'mpirun -n 4 python3')

This will be run as:

$ mpirun -n 4 python3 get_energy.py

Note

When adding a custom python command, without parameters, the third argument has to be an empty parameter to be passed to the function.
Multiple python functions can be added to the task list. The returned atoms object of one python task is sent as an input to the next python task.

Keep run:

is set as:

>>> runnerdata.keep_run = True

Note

Failed run folders are not deleted regardless of keep_run value. This aids in the debugging of the run.

class runner.utils.runnerdata.RunnerData(name='untitled_run')[source]

Class to handle runner data using helper function

Example

>>> # typical runner data
>>> data =  {'scheduler_options': {'-N': 1,
...                                '-n': 16,
...                                '-t': '0:5:0:0',
...                                '--mem-per-cpu': 2000},
...          'name': '<calculation name>',
...          'parents': [],
...          'tasks': [['python', '<filename>'], # simple python run
...                    ['python', '<filename>', <params>],
...                    ['python', '<filename>', <params>, '<pycommand>'],
...                    ['shell', '<command>']] # any shell command
...          'files': {'<filename1>': '<contents, string or bytes>',
...                    '<filename2>': '<contents, string or bytes>'
...                   }
...          'keep_run': False
...          'log': ''}
>>> runnerdata = RunnerData.from_data_dict(data)

where:

<params>: can be a dictionary of parameters, or an empty {} for no parameters

<pycommand>: is a string of python command, example, ‘python3’ or ‘mpirun -n 4 python3’ default ‘python’

keep_run: is a bool to keep run after status done, otherwise the run folder is deleted.

However, the RunnerData can be used to generate the data stepwise, using the functions provided as:

>>> runnerdata = RunnerData('<calculation name>')
>>> runnerdata.add_file('<filename>')
>>> runnerdata.append_tasks('python',
...                         '<filename>',
...                         params_dict,
...                         '<pycommand>')
>>> runnerdata.add_scheduler_options({'-N': 1,
...                                   '-n': 16,
...                                   '-t': '0:5:0:0',
...                                   '--mem-per-cpu': 2000})
>>> # and so on

Parameters:: name (str) – name of RunnerData

data: dictionary of the runner data

add_file(filename, add_as=None)[source]

Add file to runner data

Parameters:

filename (str) – name of the file
add_as (str) – name the file should be added as

add_files(filenames, add_as=None)[source]

Adds files to runner data

Parameters:

filenames (list) – list of filenames to be added
add_as (list, optional) – list of name the file should be added as in the runner data

add_scheduler_options(scheduler_options)[source]

Adds scheduler_options to runner data

Parameters:: scheduler_options (dict) – dictionary of options

append_tasks(task_type, *args)[source]

Appends task to tasks

Example

>>> rdat = runner.RunnerData()
>>> # shell task_type followed by shell command
>>> rdat.append_tasks('shell', 'module load anaconda3')
>>> # python task_type followed by python file
>>> rdat.append_tasks('python', 'get_energy.py')
>>> # python task_type with parameters
>>> rdat.append_tasks('python', 'get_energy.py', {'param': 0})
>>> # python task_type with python execute command
>>> # NB: the 3rd argument has to be parameters, if no parameters
>>> # empty dict has to be given.
>>> # default: python <python file>
>>> # to execute: mpirun -n 4 python3 get_energy.py
>>> rdat.append_tasks('python', 'get_energy.py', {},
...                   'mpirun -n 4 python3')

Parameters:

task_type (str) – task type, ‘shell’ or ‘python’
*args – args for task type, see example for shell task_type, args is shell command (str) for python task_type, args is python filename (str), parameters (dict), and python execute command (str)

property files: Files in RunnerData

classmethod from_data_dict(data)[source]

Construct RunnerData from data dictionary

Parameters:: data (dict) – runnerdata dictionary
Returns:: class defining runner data
Return type:: RunnerData

classmethod from_db(database, id_)[source]

get RunnerData from database

Parameters:

databse (str) – ase database
id (int) – id in the database

Returns:

class defining runner data

Return type:

RunnerData

classmethod from_json(filename)[source]

get RunnerData from json

Parameters:: filename (str) – name of json file
Returns:: class defining runner data
Return type:: RunnerData

get_runner_data(_skip_empty_task_test=False)[source]

helper function to get complete runner data

Returns:: containing all options to run a job str: name of the calculation, for tags list: list of parents attached to the present job list: list of tasks to perform dict: dictionary of filenames as key and strings as value
Return type:: dict

property keep_run: Stores bool, indicates if the run should be saved after completing tasks

Note

Failed run folders are not deleted regardless of keep_run value. This aids in the debugging of the run.

property name: Name of the RunnerData

property parents: Parent simulations of the row

property scheduler_options: Scheduler_options in RunnerData

property tasks: tasks in RunnerData

to_db(database, ids)[source]

add run data to ids in database

Parameters:

database (str) – ase database
ids (int, or list) – ids in the database

to_json(filename)[source]

Saves RunnerData to json

Parameters:: filename (str) – name of json file