:py:mod:`mlair.run_modules.post_processing`
===========================================

.. py:module:: mlair.run_modules.post_processing

.. autoapi-nested-parse::

   Post-processing module.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   mlair.run_modules.post_processing.PostProcessing


Attributes
~~~~~~~~~~

.. autoapisummary::

   mlair.run_modules.post_processing.__author__
   mlair.run_modules.post_processing.__date__


.. py:data:: __author__
   :annotation: = Lukas Leufen, Felix Kleinert

   
.. py:data:: __date__
   :annotation: = 2019-12-11

   
.. py:class:: PostProcessing

   Bases: :py:obj:`mlair.run_modules.run_environment.RunEnvironment`

   Perform post-processing for performance evaluation.

   Schedule of post-processing:
       #. train an ordinary least squared model (ols) for reference
       #. create forecasts for nn, ols, and persistence
       #. evaluate feature importance with bootstrapped predictions
       #. calculate skill scores
       #. create plots

   Required objects [scope] from data store:
       * `model` [.] or locally saved model plus `model_name` [model] and `model` [model]
       * `generator` [train, val, test, train_val]
       * `forecast_path` [.]
       * `plot_path` [postprocessing]
       * `model_path` [.]
       * `target_var` [.]
       * `sampling` [.]
       * `output_shape` [model]
       * `evaluate_feature_importance` [postprocessing] and if enabled:

           * `create_new_bootstraps` [postprocessing]
           * `bootstrap_path` [postprocessing]
           * `number_of_bootstraps` [postprocessing]

   Optional objects
       * `batch_size` [model]

   Creates
       * forecasts in `forecast_path` if enabled
       * bootstraps in `bootstrap_path` if enabled
       * plots in `plot_path`


   .. py:method:: _run(self)


   .. py:method:: estimate_sample_uncertainty(self, separate_ahead=False)

      Estimate sample uncertainty by using a bootstrap approach. Forecasts are split into individual blocks along time
      and randomly drawn with replacement. The resulting behaviour of the error indicates the robustness of each
      analyzed model to quantify which model might be superior compared to others.


   .. py:method:: report_sample_uncertainty(self, percentiles: list = None)

      Store raw results of uncertainty estimate and calculate aggregate statistics and store as raw data but also as
      markdown and latex.


   .. py:method:: calculate_block_mse(self, evaluate_competitors=True, separate_ahead=False, block_length='1m')

      Transform data into blocks along time axis. Block length can be any frequency like '1m' or '7d. Data are only
      split along time axis, which means that a single block can have very diverse quantities regarding the number of
      station or actual data contained. This is intended to analyze not only the robustness against the time but also
      against the number of observations and diversity ot stations.


   .. py:method:: create_error_array(self, data)

      Calculate squared error of all given time series in relation to observation.


   .. py:method:: create_full_time_dim(data, dim, sampling, start, end)
      :staticmethod:

      Ensure time dimension to be equidistant. Sometimes dates if missing values have been dropped.


   .. py:method:: load_competitors(self, station_name: str) -> xarray.DataArray

      Load all requested and available competitors for a given station. Forecasts must be available in the competitor
      path like `<competitor_path>/<target_var>/forecasts_<station_name>_test.nc`. The naming style is equal for all
      forecasts of MLAir, so that forecasts of a different experiment can easily be copied into the competitor path
      without any change.

      :param station_name: station indicator to load competitors for

      :return: a single xarray with all competing forecasts


   .. py:method:: calculate_feature_importance(self, create_new_bootstraps: bool, _iter: int = 0, bootstrap_type='singleinput', bootstrap_method='shuffle') -> None

      Calculate skill scores of bootstrapped data.

      Create bootstrapped data if create_new_bootstraps is true or a failure occurred during skill score calculation
      (this will happen by default, if no bootstrapped data is available locally). Set class attribute
      bootstrap_skill_scores. This method is implemented in a recursive fashion, but is only allowed to call itself
      once.

      :param create_new_bootstraps: calculate all bootstrap predictions and overwrite already available predictions
      :param _iter: internal counter to reduce unnecessary recursive calls (maximum number is 2, otherwise something
          went wrong).


   .. py:method:: create_feature_importance_bootstrap_forecast(self, bootstrap_type, bootstrap_method) -> None

      Create bootstrapped predictions for all stations and variables.

      These forecasts are saved in bootstrap_path with the names `bootstraps_{var}_{station}.nc` and
      `bootstraps_labels_{station}.nc`.


   .. py:method:: calculate_feature_importance_skill_scores(self, bootstrap_type, bootstrap_method) -> Dict[str, xarray.DataArray]

      Calculate skill score of bootstrapped variables.

      Use already created bootstrap predictions and the original predictions (the not-bootstrapped ones) and calculate
      skill scores for the bootstraps. The result is saved as a xarray DataArray in a dictionary structure separated
      for each station (keys of dictionary).

      :return: The result dictionary with station-wise skill scores


   .. py:method:: get_distinct_branches_from_bootstrap_iter(bootstrap_iter)
      :staticmethod:


   .. py:method:: rename_boot_var_with_branch(self, boot_var, bootstrap_type, branch_names=None, expected_len=0)


   .. py:method:: get_orig_prediction(self, path, file_name, prediction_name=None, reference_name=None)


   .. py:method:: repeat_data(data, number_of_repetition)
      :staticmethod:


   .. py:method:: _get_model_name(self)

      Return model name without path information.


   .. py:method:: _load_model(self) -> mlair.model_modules.AbstractModelClass

      Load NN model either from data store or from local path.

      :return: the model


   .. py:method:: plot(self)

      Create all plots.

      Plots are defined in experiment set up by `plot_list`. As default, all (following) plots are enabled:

      * :py:class:`PlotBootstrapSkillScore <src.plotting.postprocessing_plotting.PlotBootstrapSkillScore>`
      * :py:class:`PlotConditionalQuantiles <src.plotting.postprocessing_plotting.PlotConditionalQuantiles>`
      * :py:class:`PlotStationMap <src.plotting.postprocessing_plotting.PlotStationMap>`
      * :py:class:`PlotMonthlySummary <src.plotting.postprocessing_plotting.PlotMonthlySummary>`
      * :py:class:`PlotClimatologicalSkillScore <src.plotting.postprocessing_plotting.PlotClimatologicalSkillScore>`
      * :py:class:`PlotCompetitiveSkillScore <src.plotting.postprocessing_plotting.PlotCompetitiveSkillScore>`
      * :py:class:`PlotTimeSeries <src.plotting.postprocessing_plotting.PlotTimeSeries>`
      * :py:class:`PlotAvailability <src.plotting.postprocessing_plotting.PlotAvailability>`

      .. note:: Bootstrap plots are only created if bootstraps are evaluated.


   .. py:method:: calculate_test_score(self)

      Evaluate test score of model and save locally.


   .. py:method:: train_ols_model(self)

      Train ordinary least squared model on train data.


   .. py:method:: setup_persistence(self)

      Check if persistence is requested from competitors and store this information.


   .. py:method:: make_prediction(self, subset)

      Create predictions for NN, OLS, and persistence and add true observation as reference.

      Predictions are filled in an array with full index range. Therefore, predictions can have missing values. All
      predictions for a single station are stored locally under `<forecast/forecast_norm>_<station>_test.nc` and can
      be found inside `forecast_path`.


   .. py:method:: _get_frequency(self) -> str

      Get frequency abbreviation.


   .. py:method:: _create_competitor_forecast(self, station_name: str, competitor_name: str) -> xarray.DataArray

      Load and format the competing forecast of a distinct model indicated by `competitor_name` for a distinct station
      indicated by `station_name`. The name of the competitor is set in the `type` axis as indicator. This method will
      raise either a `FileNotFoundError` or `KeyError` if no competitor could be found for the given station. Either
      there is no file provided in the expected path or no forecast for given `competitor_name` in the forecast file.
      Forecast is trimmed on interval start and end of test subset.

      :param station_name: name of the station to load data for
      :param competitor_name: name of the model
      :return: the forecast of the given competitor


   .. py:method:: _create_observation(self, data, _, transformation_func: Callable, normalised: bool) -> xarray.DataArray

      Create observation as ground truth from given data.

      Inverse transformation is applied to the ground truth to get the output in the original space.

      :param data: observation
      :param transformation_func: a callable function to apply inverse transformation
      :param normalised: transform ground truth in original space if false, or use normalised predictions if true

      :return: filled data array with observation


   .. py:method:: _create_ols_forecast(self, input_data: xarray.DataArray, ols_prediction: xarray.DataArray, transformation_func: Callable, normalised: bool) -> xarray.DataArray

      Create ordinary least square model forecast with given input data.

      Inverse transformation is applied to the forecast to get the output in the original space.

      :param input_data: transposed history from DataPrep
      :param ols_prediction: empty array in right shape to fill with data
      :param transformation_func: a callable function to apply inverse transformation
      :param normalised: transform prediction in original space if false, or use normalised predictions if true

      :return: filled data array with ols predictions


   .. py:method:: _create_persistence_forecast(self, data, persistence_prediction: xarray.DataArray, transformation_func: Callable, normalised: bool) -> xarray.DataArray

      Create persistence forecast with given data.

      Persistence is deviated from the value at t=0 and applied to all following time steps (t+1, ..., t+window).
      Inverse transformation is applied to the forecast to get the output in the original space.

      :param data: observation
      :param persistence_prediction: empty array in right shape to fill with data
      :param transformation_func: a callable function to apply inverse transformation
      :param normalised: transform prediction in original space if false, or use normalised predictions if true

      :return: filled data array with persistence predictions


   .. py:method:: _create_nn_forecast(self, nn_output: xarray.DataArray, nn_prediction: xarray.DataArray, transformation_func: Callable, normalised: bool) -> xarray.DataArray

      Create NN forecast for given input data.

      Inverse transformation is applied to the forecast to get the output in the original space. Furthermore, only the
      output of the main branch is returned (not all minor branches, if the network has multiple output branches). The
      main branch is defined to be the last entry of all outputs.

      :param nn_output: Full NN model output
      :param nn_prediction: empty array in right shape to fill with data
      :param transformation_func: a callable function to apply inverse transformation
      :param normalised: transform prediction in original space if false, or use normalised predictions if true

      :return: filled data array with nn predictions


   .. py:method:: _create_empty_prediction_arrays(target_data, count=1)
      :staticmethod:

      Create array to collect all predictions. Expand target data by a station dimension.


   .. py:method:: create_fullindex(df: Union[xarray.DataArray, pandas.DataFrame, pandas.DatetimeIndex], freq: str) -> pandas.DataFrame
      :staticmethod:

      Create full index from first and last date inside df and resample with given frequency.

      :param df: use time range of this data set
      :param freq: frequency of full index

      :return: empty data frame with full index.


   .. py:method:: create_forecast_arrays(index: pandas.DataFrame, ahead_names: List[Union[str, int]], time_dimension, ahead_dim='ahead', index_dim='index', type_dim='type', **kwargs)
      :staticmethod:

      Combine different forecast types into single xarray.

      :param index: index for forecasts (e.g. time)
      :param ahead_names: names of ahead values (e.g. hours or days)
      :param kwargs: as xarrays; data of forecasts

      :return: xarray of dimension 3: index, ahead_names, # predictions


   .. py:method:: _get_internal_data(self, station: str, path: str) -> Union[xarray.DataArray, None]

      Get internal data for given station.

      Internal data is defined as data that is already known to the model. From an evaluation perspective, this
      refers to data, that is no test data, and therefore to train and val data.

      :param station: name of station to load internal data.


   .. py:method:: _get_external_data(self, station: str, path: str) -> Union[xarray.DataArray, None]

      Get external data for given station.

      External data is defined as data that is not known to the model. From an evaluation perspective, this refers to
      data, that is not train or val data, and therefore to test data.

      :param station: name of station to load external data.


   .. py:method:: _combine_forecasts(self, forecast, competitor, dim=None)

      Combine forecast and competitor if both are xarray. If competitor is None, this returns forecasts and vise
      versa.


   .. py:method:: calculate_bias_free_error_metrics(self)


   .. py:method:: calculate_error_metrics(self) -> Tuple[Dict, Dict, Dict, Dict]

      Calculate error metrics and skill scores of NN forecast.

      The competitive skill score compares the NN prediction with persistence and ordinary least squares forecasts.
      Whereas, the climatological skill scores evaluates the NN prediction in terms of meaningfulness in comparison
      to different climatological references.

      :return: competitive and climatological skill scores, error metrics


   .. py:method:: calculate_average_skill_scores(scores, counts)
      :staticmethod:


   .. py:method:: calculate_average_errors(errors)
      :staticmethod:


   .. py:method:: report_feature_importance_results(self, results)

      Create a csv file containing all results from feature importance.


   .. py:method:: report_error_metrics(self, errors, tag=None)


   .. py:method:: store_errors(self, errors)