:py:mod:`mlair.run_modules.post_processing` =========================================== .. py:module:: mlair.run_modules.post_processing .. autoapi-nested-parse:: Post-processing module. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: mlair.run_modules.post_processing.PostProcessing Attributes ~~~~~~~~~~ .. autoapisummary:: mlair.run_modules.post_processing.__author__ mlair.run_modules.post_processing.__date__ .. py:data:: __author__ :annotation: = Lukas Leufen, Felix Kleinert .. py:data:: __date__ :annotation: = 2019-12-11 .. py:class:: PostProcessing Bases: :py:obj:`mlair.run_modules.run_environment.RunEnvironment` Perform post-processing for performance evaluation. Schedule of post-processing: #. train an ordinary least squared model (ols) for reference #. create forecasts for nn, ols, and persistence #. evaluate feature importance with bootstrapped predictions #. calculate skill scores #. create plots Required objects [scope] from data store: * `model` [.] or locally saved model plus `model_name` [model] and `model` [model] * `generator` [train, val, test, train_val] * `forecast_path` [.] * `plot_path` [postprocessing] * `model_path` [.] * `target_var` [.] * `sampling` [.] * `output_shape` [model] * `evaluate_feature_importance` [postprocessing] and if enabled: * `create_new_bootstraps` [postprocessing] * `bootstrap_path` [postprocessing] * `number_of_bootstraps` [postprocessing] Optional objects * `batch_size` [model] Creates * forecasts in `forecast_path` if enabled * bootstraps in `bootstrap_path` if enabled * plots in `plot_path` .. py:method:: _run(self) .. py:method:: estimate_sample_uncertainty(self, separate_ahead=False) Estimate sample uncertainty by using a bootstrap approach. Forecasts are split into individual blocks along time and randomly drawn with replacement. The resulting behaviour of the error indicates the robustness of each analyzed model to quantify which model might be superior compared to others. .. py:method:: report_sample_uncertainty(self, percentiles: list = None) Store raw results of uncertainty estimate and calculate aggregate statistics and store as raw data but also as markdown and latex. .. py:method:: calculate_block_mse(self, evaluate_competitors=True, separate_ahead=False, block_length='1m') Transform data into blocks along time axis. Block length can be any frequency like '1m' or '7d. Data are only split along time axis, which means that a single block can have very diverse quantities regarding the number of station or actual data contained. This is intended to analyze not only the robustness against the time but also against the number of observations and diversity ot stations. .. py:method:: create_error_array(self, data) Calculate squared error of all given time series in relation to observation. .. py:method:: create_full_time_dim(data, dim, sampling, start, end) :staticmethod: Ensure time dimension to be equidistant. Sometimes dates if missing values have been dropped. .. py:method:: load_competitors(self, station_name: str) -> xarray.DataArray Load all requested and available competitors for a given station. Forecasts must be available in the competitor path like `//forecasts__test.nc`. The naming style is equal for all forecasts of MLAir, so that forecasts of a different experiment can easily be copied into the competitor path without any change. :param station_name: station indicator to load competitors for :return: a single xarray with all competing forecasts .. py:method:: calculate_feature_importance(self, create_new_bootstraps: bool, _iter: int = 0, bootstrap_type='singleinput', bootstrap_method='shuffle') -> None Calculate skill scores of bootstrapped data. Create bootstrapped data if create_new_bootstraps is true or a failure occurred during skill score calculation (this will happen by default, if no bootstrapped data is available locally). Set class attribute bootstrap_skill_scores. This method is implemented in a recursive fashion, but is only allowed to call itself once. :param create_new_bootstraps: calculate all bootstrap predictions and overwrite already available predictions :param _iter: internal counter to reduce unnecessary recursive calls (maximum number is 2, otherwise something went wrong). .. py:method:: create_feature_importance_bootstrap_forecast(self, bootstrap_type, bootstrap_method) -> None Create bootstrapped predictions for all stations and variables. These forecasts are saved in bootstrap_path with the names `bootstraps_{var}_{station}.nc` and `bootstraps_labels_{station}.nc`. .. py:method:: calculate_feature_importance_skill_scores(self, bootstrap_type, bootstrap_method) -> Dict[str, xarray.DataArray] Calculate skill score of bootstrapped variables. Use already created bootstrap predictions and the original predictions (the not-bootstrapped ones) and calculate skill scores for the bootstraps. The result is saved as a xarray DataArray in a dictionary structure separated for each station (keys of dictionary). :return: The result dictionary with station-wise skill scores .. py:method:: get_distinct_branches_from_bootstrap_iter(bootstrap_iter) :staticmethod: .. py:method:: rename_boot_var_with_branch(self, boot_var, bootstrap_type, branch_names=None, expected_len=0) .. py:method:: get_orig_prediction(self, path, file_name, prediction_name=None, reference_name=None) .. py:method:: repeat_data(data, number_of_repetition) :staticmethod: .. py:method:: _get_model_name(self) Return model name without path information. .. py:method:: _load_model(self) -> mlair.model_modules.AbstractModelClass Load NN model either from data store or from local path. :return: the model .. py:method:: plot(self) Create all plots. Plots are defined in experiment set up by `plot_list`. As default, all (following) plots are enabled: * :py:class:`PlotBootstrapSkillScore ` * :py:class:`PlotConditionalQuantiles ` * :py:class:`PlotStationMap ` * :py:class:`PlotMonthlySummary ` * :py:class:`PlotClimatologicalSkillScore ` * :py:class:`PlotCompetitiveSkillScore ` * :py:class:`PlotTimeSeries ` * :py:class:`PlotAvailability ` .. note:: Bootstrap plots are only created if bootstraps are evaluated. .. py:method:: calculate_test_score(self) Evaluate test score of model and save locally. .. py:method:: train_ols_model(self) Train ordinary least squared model on train data. .. py:method:: setup_persistence(self) Check if persistence is requested from competitors and store this information. .. py:method:: make_prediction(self, subset) Create predictions for NN, OLS, and persistence and add true observation as reference. Predictions are filled in an array with full index range. Therefore, predictions can have missing values. All predictions for a single station are stored locally under `__test.nc` and can be found inside `forecast_path`. .. py:method:: _get_frequency(self) -> str Get frequency abbreviation. .. py:method:: _create_competitor_forecast(self, station_name: str, competitor_name: str) -> xarray.DataArray Load and format the competing forecast of a distinct model indicated by `competitor_name` for a distinct station indicated by `station_name`. The name of the competitor is set in the `type` axis as indicator. This method will raise either a `FileNotFoundError` or `KeyError` if no competitor could be found for the given station. Either there is no file provided in the expected path or no forecast for given `competitor_name` in the forecast file. Forecast is trimmed on interval start and end of test subset. :param station_name: name of the station to load data for :param competitor_name: name of the model :return: the forecast of the given competitor .. py:method:: _create_observation(self, data, _, transformation_func: Callable, normalised: bool) -> xarray.DataArray Create observation as ground truth from given data. Inverse transformation is applied to the ground truth to get the output in the original space. :param data: observation :param transformation_func: a callable function to apply inverse transformation :param normalised: transform ground truth in original space if false, or use normalised predictions if true :return: filled data array with observation .. py:method:: _create_ols_forecast(self, input_data: xarray.DataArray, ols_prediction: xarray.DataArray, transformation_func: Callable, normalised: bool) -> xarray.DataArray Create ordinary least square model forecast with given input data. Inverse transformation is applied to the forecast to get the output in the original space. :param input_data: transposed history from DataPrep :param ols_prediction: empty array in right shape to fill with data :param transformation_func: a callable function to apply inverse transformation :param normalised: transform prediction in original space if false, or use normalised predictions if true :return: filled data array with ols predictions .. py:method:: _create_persistence_forecast(self, data, persistence_prediction: xarray.DataArray, transformation_func: Callable, normalised: bool) -> xarray.DataArray Create persistence forecast with given data. Persistence is deviated from the value at t=0 and applied to all following time steps (t+1, ..., t+window). Inverse transformation is applied to the forecast to get the output in the original space. :param data: observation :param persistence_prediction: empty array in right shape to fill with data :param transformation_func: a callable function to apply inverse transformation :param normalised: transform prediction in original space if false, or use normalised predictions if true :return: filled data array with persistence predictions .. py:method:: _create_nn_forecast(self, nn_output: xarray.DataArray, nn_prediction: xarray.DataArray, transformation_func: Callable, normalised: bool) -> xarray.DataArray Create NN forecast for given input data. Inverse transformation is applied to the forecast to get the output in the original space. Furthermore, only the output of the main branch is returned (not all minor branches, if the network has multiple output branches). The main branch is defined to be the last entry of all outputs. :param nn_output: Full NN model output :param nn_prediction: empty array in right shape to fill with data :param transformation_func: a callable function to apply inverse transformation :param normalised: transform prediction in original space if false, or use normalised predictions if true :return: filled data array with nn predictions .. py:method:: _create_empty_prediction_arrays(target_data, count=1) :staticmethod: Create array to collect all predictions. Expand target data by a station dimension. .. py:method:: create_fullindex(df: Union[xarray.DataArray, pandas.DataFrame, pandas.DatetimeIndex], freq: str) -> pandas.DataFrame :staticmethod: Create full index from first and last date inside df and resample with given frequency. :param df: use time range of this data set :param freq: frequency of full index :return: empty data frame with full index. .. py:method:: create_forecast_arrays(index: pandas.DataFrame, ahead_names: List[Union[str, int]], time_dimension, ahead_dim='ahead', index_dim='index', type_dim='type', **kwargs) :staticmethod: Combine different forecast types into single xarray. :param index: index for forecasts (e.g. time) :param ahead_names: names of ahead values (e.g. hours or days) :param kwargs: as xarrays; data of forecasts :return: xarray of dimension 3: index, ahead_names, # predictions .. py:method:: _get_internal_data(self, station: str, path: str) -> Union[xarray.DataArray, None] Get internal data for given station. Internal data is defined as data that is already known to the model. From an evaluation perspective, this refers to data, that is no test data, and therefore to train and val data. :param station: name of station to load internal data. .. py:method:: _get_external_data(self, station: str, path: str) -> Union[xarray.DataArray, None] Get external data for given station. External data is defined as data that is not known to the model. From an evaluation perspective, this refers to data, that is not train or val data, and therefore to test data. :param station: name of station to load external data. .. py:method:: _combine_forecasts(self, forecast, competitor, dim=None) Combine forecast and competitor if both are xarray. If competitor is None, this returns forecasts and vise versa. .. py:method:: calculate_bias_free_error_metrics(self) .. py:method:: calculate_error_metrics(self) -> Tuple[Dict, Dict, Dict, Dict] Calculate error metrics and skill scores of NN forecast. The competitive skill score compares the NN prediction with persistence and ordinary least squares forecasts. Whereas, the climatological skill scores evaluates the NN prediction in terms of meaningfulness in comparison to different climatological references. :return: competitive and climatological skill scores, error metrics .. py:method:: calculate_average_skill_scores(scores, counts) :staticmethod: .. py:method:: calculate_average_errors(errors) :staticmethod: .. py:method:: report_feature_importance_results(self, results) Create a csv file containing all results from feature importance. .. py:method:: report_error_metrics(self, errors, tag=None) .. py:method:: store_errors(self, errors)