API Reference¶
Job Script API¶
The following functions may be used in your job scripts that are executed via cluster_utils.
-
cluster_utils.initialize_job(cmd_line: list[str] | None =
None, verbose: bool =True, dynamic: bool =True) AttributeDict¶ Read parameters from command line and register at cluster_utils server.
This function is intended to be called at the beginning of your job scripts. It does two things at once:
parse the command line arguments to get the parameters for the job, and
if server information is provided via command line arguments, register at the cluster_utils server (i.e. the main process, that orchestrates the job execution).
- cluster_utils.finalize_job(metrics: MutableMapping[str, float], params) None¶
Save metrics and parameters and send metrics to the cluster_utils server.
Save the used parameters and resulting metrics to CSV files (filenames defined by
CLUSTER_PARAM_FILEandCLUSTER_METRIC_FILE) in the job’s working directory and report the metrics to the cluster_utils main process.Make sure to call this function at the end of your job script, otherwise cluster_utils will not receive the resulting metrics and will consider the job as failed.
- Parameters:¶
- metrics: MutableMapping[str, float]¶
Dictionary with metrics that should be sent to the server.
- params¶
Parameters that were used to run the job (given by
initialize_job()).
- cluster_utils.exit_for_resume() None¶
Send a “resume”-request to the cluster_utils server and exit with return code 3.
Use this to split a single long-running job into multiple shorter jobs by frequently saving the state of the job (e.g. checkpoints) and restarting by calling this function.
See Restart jobs using exit_for_resume() for more information.
- cluster_utils.announce_early_results(metrics)¶
Report intermediate results to cluster_utils.
Results reported with this function are by hyperparameter optimization to stop bad jobs early (see
kill_bad_jobs_earlyoption).
- cluster_utils.announce_fraction_finished(fraction_finished: float) None¶
Report job progress to cluster_utils.
You may use this function to report the progress of the job. If done, the information is used by cluster_utils to estimate the remaining duration of the job.
-
cluster_utils.cluster_main(main_func=
None, **read_params_args)¶ Decorator for your main function to automatically register with cluster_utils.
Use this as a decorator to automatically wrap a function (usually
main) with calls toinitialize_job()andfinalize_job().The parameters read by
initialize_job()will be passed as kwargs to the function. Further, the function is expected to return the metrics dictionary as expected byfinalize_job().See Using the cluster_main Decorator for an usage example.
Output Filenames¶
The constants listed below define names of output files that are written by cluster_utils. They are listed here, so that other parts of the documentation can reference them.
-
cluster_utils.base.constants.CLUSTER_METRIC_FILE =
'metrics.csv'¶ Name of the CSV file to which resulting metrics of a job are saved.
-
cluster_utils.base.constants.CLUSTER_PARAM_FILE =
'param_choice.csv'¶ Name of the CSV file to which used parameters of a job are saved.
-
cluster_utils.base.constants.JSON_SETTINGS_FILE =
'settings.json'¶ Name of the JSON file to which used parameters of a job are saved.
Deprecated API¶
The following functions can still be used but are deprecated and may be removed in a future release. Do not use them anymore in new code! Also see the description of the individual functions on how using code should be updated.
-
cluster_utils.read_params_from_cmdline(cmd_line: list[str] | None =
None, make_immutable: bool =True, verbose: bool =True, dynamic: bool =True, save_params: bool =True) AttributeDict¶ Alias for
initialize_job().- Deprecated:
This function is deprecated and will be removed in a future release. Use
initialize_job()instead.
- cluster_utils.save_metrics_params(metrics: MutableMapping[str, float], params) None¶
Alias for
finalize_job().- Deprecated:
This function is deprecated and will be removed in a future release. Use
finalize_job()instead.