RunnerData for Runners
RunnerData attaches to a row in the ase database as a python
dictionary, and is used by a Runner to run the simulation for
that row.
Key |
Value type |
Description |
‘name’ |
str |
Name given to the run |
‘scheduler_options’ |
dict |
Data to define the workflow manager options |
‘parents’ |
list |
Parent rows in the database. The runner waits for completion of these tasks before running the present row. |
‘files’ |
dict |
Files required during the run |
‘tasks’ |
list |
List of tasks, python and shell, to be performed for the run |
‘keep_run’ |
bool |
Boolean indicating if the run folder is to be kept after completion |
RunnerData is designed
to simplify the genration of this data.
- Template data:
template is initialised as:
>>> runnerdata = runner.RunnerData('myEnergyRun')
- Scheduler options:
are added as:
>>> runnerdata.scheduler_options = scheduler_options
Scheduler options are defined differently based on the workflow manager used:
- Parents:
are added as:
>>> # setting parents as row id 2 and 3 form the same database >>> runnerdata.parents = [2, 3]
- Files:
are added as:
>>> runnerdata.add_file('get_energy.py') >>> runnerdata.add_files(['BASIS', 'POTENTIAL'])
The files can be string or binary.
Format for a python run file:
The file should have a
mainfunction, this function is called at executionThe first argument, of the
mainfunction should take a list. This is the list ofatomsrows. The 0th index is theatomsrow of the run, and the rest are theatomsrows of the parents, in the order defined in the parents list.The rest parameters are passed as **kwargs, as defined in the tasks
The function should return an
atomsobject, to be added in-place at the row being run.The key_value_pairs stored in ase.Atoms.info of the returned
atomsobject, is updated in the database.The ase.db.row.data is updated with the rest of ase.Atoms.info.
- Tasks:
Runnersupportsshellandpythontasks.shelltask can be added as:>>> runnerdata.append_tasks('shell', 'module load anaconda3')
This will be run as:
$ module load anaconda3
Pythontask is added with the python file:>>> runnerdata.append_tasks('python', 'get_energy.py')
This will be called from python code as:
>>> from get_energy import main >>> main(atoms_list, **parameters)
Here, if
Pythonparameters are defined in the task as:>>> runnerdata.append_tasks('python', 'get_energy.py', {'param': 0})
then, the parameters dict will pass param=0 as **kwargs in the main function.
If
Pythontask is to be executed with different python command then:>>> runnerdata.append_tasks('python', 'get_energy.py', {}, ... 'mpirun -n 4 python3')
This will be run as:
$ mpirun -n 4 python3 get_energy.py
Note
When adding a custom python command, without parameters, the third argument has to be an empty parameter to be passed to the function.
Multiple python functions can be added to the task list. The returned
atomsobject of one python task is sent as an input to the next python task.
- Keep run:
is set as:
>>> runnerdata.keep_run = True
Note
Failed run folders are not deleted regardless of keep_run value. This aids in the debugging of the run.
- class runner.utils.runnerdata.RunnerData(name='untitled_run')[source]
Class to handle runner data using helper function
Example
>>> # typical runner data >>> data = {'scheduler_options': {'-N': 1, ... '-n': 16, ... '-t': '0:5:0:0', ... '--mem-per-cpu': 2000}, ... 'name': '<calculation name>', ... 'parents': [], ... 'tasks': [['python', '<filename>'], # simple python run ... ['python', '<filename>', <params>], ... ['python', '<filename>', <params>, '<pycommand>'], ... ['shell', '<command>']] # any shell command ... 'files': {'<filename1>': '<contents, string or bytes>', ... '<filename2>': '<contents, string or bytes>' ... } ... 'keep_run': False ... 'log': ''} >>> runnerdata = RunnerData.from_data_dict(data)
where:
<params>: can be a dictionary of parameters, or an empty {} for no parameters
<pycommand>: is a string of python command, example, ‘python3’ or ‘mpirun -n 4 python3’ default ‘python’
keep_run: is a bool to keep run after status done, otherwise the run folder is deleted.
However, the
RunnerDatacan be used to generate the data stepwise, using the functions provided as:>>> runnerdata = RunnerData('<calculation name>') >>> runnerdata.add_file('<filename>') >>> runnerdata.append_tasks('python', ... '<filename>', ... params_dict, ... '<pycommand>') >>> runnerdata.add_scheduler_options({'-N': 1, ... '-n': 16, ... '-t': '0:5:0:0', ... '--mem-per-cpu': 2000}) >>> # and so on
- Parameters:
name (str) – name of RunnerData
- data
dictionary of the runner data
- add_file(filename, add_as=None)[source]
Add file to runner data
- Parameters:
filename (str) – name of the file
add_as (str) – name the file should be added as
- add_files(filenames, add_as=None)[source]
Adds files to runner data
- Parameters:
filenames (list) – list of filenames to be added
add_as (list, optional) – list of name the file should be added as in the runner data
- add_scheduler_options(scheduler_options)[source]
Adds scheduler_options to runner data
- Parameters:
scheduler_options (dict) – dictionary of options
- append_tasks(task_type, *args)[source]
Appends task to tasks
Example
>>> rdat = runner.RunnerData() >>> # shell task_type followed by shell command >>> rdat.append_tasks('shell', 'module load anaconda3') >>> # python task_type followed by python file >>> rdat.append_tasks('python', 'get_energy.py') >>> # python task_type with parameters >>> rdat.append_tasks('python', 'get_energy.py', {'param': 0}) >>> # python task_type with python execute command >>> # NB: the 3rd argument has to be parameters, if no parameters >>> # empty dict has to be given. >>> # default: python <python file> >>> # to execute: mpirun -n 4 python3 get_energy.py >>> rdat.append_tasks('python', 'get_energy.py', {}, ... 'mpirun -n 4 python3')
- Parameters:
task_type (str) – task type, ‘shell’ or ‘python’
*args – args for task type, see example for shell task_type, args is shell command (str) for python task_type, args is python filename (str), parameters (dict), and python execute command (str)
- property files
Files in RunnerData
- classmethod from_data_dict(data)[source]
Construct RunnerData from data dictionary
- Parameters:
data (dict) – runnerdata dictionary
- Returns:
class defining runner data
- Return type:
- classmethod from_db(database, id_)[source]
get RunnerData from database
- Parameters:
databse (str) – ase database
id (int) – id in the database
- Returns:
class defining runner data
- Return type:
- classmethod from_json(filename)[source]
get RunnerData from json
- Parameters:
filename (str) – name of json file
- Returns:
class defining runner data
- Return type:
- get_runner_data(_skip_empty_task_test=False)[source]
helper function to get complete runner data
- Returns:
containing all options to run a job str: name of the calculation, for tags list: list of parents attached to the present job list: list of tasks to perform dict: dictionary of filenames as key and strings as value
- Return type:
dict
- property keep_run
Stores bool, indicates if the run should be saved after completing tasks
Note
Failed run folders are not deleted regardless of keep_run value. This aids in the debugging of the run.
- property name
Name of the RunnerData
- property parents
Parent simulations of the row
- property scheduler_options
Scheduler_options in RunnerData
- property tasks
tasks in RunnerData