RunnerData for Runners

RunnerData attaches to a row in the ase database as a python dictionary, and is used by a Runner to run the simulation for that row.

RunnerData description :header-rows: 1

Key

Value type

Description

‘name’

str

Name given to the run

‘scheduler_options’

dict

Data to define the workflow manager options

‘parents’

list

Parent rows in the database. The runner waits for completion of these tasks before running the present row.

‘files’

dict

Files required during the run

‘tasks’

list

List of tasks, python and shell, to be performed for the run

‘keep_run’

bool

Boolean indicating if the run folder is to be kept after completion

RunnerData is designed to simplify the genration of this data.

Template data:

template is initialised as:

>>> runnerdata = runner.RunnerData('myEnergyRun')
Scheduler options:

are added as:

>>> runnerdata.scheduler_options = scheduler_options

Scheduler options are defined differently based on the workflow manager used:

Parents:

are added as:

>>> # setting parents as row id 2 and 3 form the same database
>>> runnerdata.parents = [2, 3]
Files:

are added as:

>>> runnerdata.add_file('get_energy.py')
>>> runnerdata.add_files(['BASIS', 'POTENTIAL'])

The files can be string or binary.

Format for a python run file:

  • The file should have a main function, this function is called at execution

  • The first argument, of the main function should take a list. This is the list of atoms rows. The 0th index is the atoms row of the run, and the rest are the atoms rows of the parents, in the order defined in the parents list.

  • The rest parameters are passed as **kwargs, as defined in the tasks

  • The function should return an atoms object, to be added in-place at the row being run.

  • The key_value_pairs stored in ase.Atoms.info of the returned atoms object, is updated in the database.

  • The ase.db.row.data is updated with the rest of ase.Atoms.info.

Tasks:

Runner supports shell and python tasks. shell task can be added as:

>>> runnerdata.append_tasks('shell', 'module load anaconda3')

This will be run as:

$ module load anaconda3

Python task is added with the python file:

>>> runnerdata.append_tasks('python', 'get_energy.py')

This will be called from python code as:

>>> from get_energy import main
>>> main(atoms_list, **parameters)

Here, if Python parameters are defined in the task as:

>>> runnerdata.append_tasks('python', 'get_energy.py', {'param': 0})

then, the parameters dict will pass param=0 as **kwargs in the main function.

If Python task is to be executed with different python command then:

>>> runnerdata.append_tasks('python', 'get_energy.py', {},
...                         'mpirun -n 4 python3')

This will be run as:

$ mpirun -n 4 python3 get_energy.py

Note

  • When adding a custom python command, without parameters, the third argument has to be an empty parameter to be passed to the function.

  • Multiple python functions can be added to the task list. The returned atoms object of one python task is sent as an input to the next python task.

Keep run:

is set as:

>>> runnerdata.keep_run = True

Note

Failed run folders are not deleted regardless of keep_run value. This aids in the debugging of the run.

class runner.utils.runnerdata.RunnerData(name='untitled_run')[source]

Class to handle runner data using helper function

Example

>>> # typical runner data
>>> data =  {'scheduler_options': {'-N': 1,
...                                '-n': 16,
...                                '-t': '0:5:0:0',
...                                '--mem-per-cpu': 2000},
...          'name': '<calculation name>',
...          'parents': [],
...          'tasks': [['python', '<filename>'], # simple python run
...                    ['python', '<filename>', <params>],
...                    ['python', '<filename>', <params>, '<pycommand>'],
...                    ['shell', '<command>']] # any shell command
...          'files': {'<filename1>': '<contents, string or bytes>',
...                    '<filename2>': '<contents, string or bytes>'
...                   }
...          'keep_run': False
...          'log': ''}
>>> runnerdata = RunnerData.from_data_dict(data)

where:

  • <params>: can be a dictionary of parameters, or an empty {} for no parameters

  • <pycommand>: is a string of python command, example, ‘python3’ or ‘mpirun -n 4 python3’ default ‘python’

  • keep_run: is a bool to keep run after status done, otherwise the run folder is deleted.

However, the RunnerData can be used to generate the data stepwise, using the functions provided as:

>>> runnerdata = RunnerData('<calculation name>')
>>> runnerdata.add_file('<filename>')
>>> runnerdata.append_tasks('python',
...                         '<filename>',
...                         params_dict,
...                         '<pycommand>')
>>> runnerdata.add_scheduler_options({'-N': 1,
...                                   '-n': 16,
...                                   '-t': '0:5:0:0',
...                                   '--mem-per-cpu': 2000})
>>> # and so on
Parameters:

name (str) – name of RunnerData

data

dictionary of the runner data

add_file(filename, add_as=None)[source]

Add file to runner data

Parameters:
  • filename (str) – name of the file

  • add_as (str) – name the file should be added as

add_files(filenames, add_as=None)[source]

Adds files to runner data

Parameters:
  • filenames (list) – list of filenames to be added

  • add_as (list, optional) – list of name the file should be added as in the runner data

add_scheduler_options(scheduler_options)[source]

Adds scheduler_options to runner data

Parameters:

scheduler_options (dict) – dictionary of options

append_tasks(task_type, *args)[source]

Appends task to tasks

Example

>>> rdat = runner.RunnerData()
>>> # shell task_type followed by shell command
>>> rdat.append_tasks('shell', 'module load anaconda3')
>>> # python task_type followed by python file
>>> rdat.append_tasks('python', 'get_energy.py')
>>> # python task_type with parameters
>>> rdat.append_tasks('python', 'get_energy.py', {'param': 0})
>>> # python task_type with python execute command
>>> # NB: the 3rd argument has to be parameters, if no parameters
>>> # empty dict has to be given.
>>> # default: python <python file>
>>> # to execute: mpirun -n 4 python3 get_energy.py
>>> rdat.append_tasks('python', 'get_energy.py', {},
...                   'mpirun -n 4 python3')
Parameters:
  • task_type (str) – task type, ‘shell’ or ‘python’

  • *args – args for task type, see example for shell task_type, args is shell command (str) for python task_type, args is python filename (str), parameters (dict), and python execute command (str)

property files

Files in RunnerData

classmethod from_data_dict(data)[source]

Construct RunnerData from data dictionary

Parameters:

data (dict) – runnerdata dictionary

Returns:

class defining runner data

Return type:

RunnerData

classmethod from_db(database, id_)[source]

get RunnerData from database

Parameters:
  • databse (str) – ase database

  • id (int) – id in the database

Returns:

class defining runner data

Return type:

RunnerData

classmethod from_json(filename)[source]

get RunnerData from json

Parameters:

filename (str) – name of json file

Returns:

class defining runner data

Return type:

RunnerData

get_runner_data(_skip_empty_task_test=False)[source]

helper function to get complete runner data

Returns:

containing all options to run a job str: name of the calculation, for tags list: list of parents attached to the present job list: list of tasks to perform dict: dictionary of filenames as key and strings as value

Return type:

dict

property keep_run

Stores bool, indicates if the run should be saved after completing tasks

Note

Failed run folders are not deleted regardless of keep_run value. This aids in the debugging of the run.

property name

Name of the RunnerData

property parents

Parent simulations of the row

property scheduler_options

Scheduler_options in RunnerData

property tasks

tasks in RunnerData

to_db(database, ids)[source]

add run data to ids in database

Parameters:
  • database (str) – ase database

  • ids (int, or list) – ids in the database

to_json(filename)[source]

Saves RunnerData to json

Parameters:

filename (str) – name of json file