mlair.run_modules.training
¶
Training module.
Module Contents¶
Classes¶
Train your model with this module. |
Attributes¶
-
mlair.run_modules.training.
__date__
= 2019-12-05¶
-
class
mlair.run_modules.training.
Training
¶ Bases:
mlair.run_modules.run_environment.RunEnvironment
Train your model with this module.
This module isn’t required to run, if only a fresh post-processing is preformed. Either remove training call from your run script or set create_new_model and train_model both to false.
- Schedule of training:
set_generators(): set generators for training, validation and testing and distribute according to batch size
make_predict_function(): create predict function before distribution on multiple nodes (detailed information in method description)
train(): start or resume training of model and save callbacks
save_model(): save best model from training as final model
- Required objects [scope] from data store:
model [model]
batch_size [.]
epochs [.]
callbacks [model]
model_name [model]
experiment_name [.]
experiment_path [.]
train_model [.]
create_new_model [.]
generator [train, val, test]
plot_path [.]
- Optional objects
permute_data [train, val, test]
upsampling [train, val, test]
- Sets
model [.]
- Creates
<exp_name>_model-best.h5
<exp_name>_model-best-callbacks-<name>.h5 (all callbacks from CallbackHandler)
history.json
history_lr.json (optional)
<exp_name>_history_<name>.pdf (different monitoring plots depending on loss metrics and callbacks)
-
make_predict_function
(self) → None¶ Create predict function.
Must be called before distributing. This is necessary, because tf will compile the predict function just in the moment it is used the first time. This can cause problems, if the model is distributed on different workers. To prevent this, the function is pre-compiled. See discussion @ https://stackoverflow.com/questions/40850089/is-keras-thread-safe/43393252#43393252
-
_set_gen
(self, mode: str) → None¶ Set and distribute the generators for given mode regarding batch size.
- Parameters
mode – name of set, should be from [“train”, “val”, “test”]
-
set_generators
(self) → None¶ Set all generators for training, validation, and testing subsets.
The called sub-method will automatically distribute the data according to the batch size. The subsets can be accessed as class variables train_set, val_set, and test_set.
-
train
(self) → None¶ Perform training using keras fit().
Callbacks are stored locally in the experiment directory. Best model from training is saved for class variable model. If the file path of checkpoint is not empty, this method assumes, that this is not a new training starting from the very beginning, but a resumption from a previous started but interrupted training (or a stopped and now continued training). Train will automatically load the locally stored information and the corresponding model and proceed with the already started training.
-
save_model
(self) → None¶ Save model in local experiment directory. Model is named as <experiment_name>_<custom_model_name>.h5.
-
save_callbacks_as_json
(self, history: tensorflow.keras.callbacks.Callback, lr_sc: tensorflow.keras.callbacks.Callback, epo_timing: tensorflow.keras.callbacks.Callback) → None¶ Save callbacks (history, learning rate) of training.
history.history -> history.json
lr_sc.lr -> history_lr.json
- Parameters
history – history object of training
lr_sc – learning rate object
-
create_monitoring_plots
(self, history: tensorflow.keras.callbacks.Callback, lr_sc: tensorflow.keras.callbacks.Callback, epoch_best: int = None) → None¶ Create plot of history and learning rate in dependence of the number of epochs.
The plots are saved in the experiment’s plot_path. History plot is named <exp_name>_history_loss_val_loss.pdf, the learning rate with <exp_name>_history_learning_rate.pdf.
- Parameters
history – keras history object with losses to plot (must at least include loss and val_loss)
lr_sc – learning rate decay object with ‘lr’ attribute
epoch_best – number of best epoch (starts counting as 0)
-
report_training
(self)¶