:py:mod:`mlair.data_handler.data_handler_mixed_sampling`
========================================================

.. py:module:: mlair.data_handler.data_handler_mixed_sampling


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingSingleStation
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSampling
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithFilterSingleStation
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithFirFilterSingleStation
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithFirFilter
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithClimateFirFilterSingleStation
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithClimateFirFilter
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithClimateAndFirFilter
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerIFSSingleStation
   mlair.data_handler.data_handler_mixed_sampling.DataHandlerIFS


Attributes
~~~~~~~~~~

.. autoapisummary::

   mlair.data_handler.data_handler_mixed_sampling.__author__
   mlair.data_handler.data_handler_mixed_sampling.__date__


.. py:data:: __author__
   :annotation: = Lukas Leufen

   
.. py:data:: __date__
   :annotation: = 2020-11-05

   
.. py:class:: DataHandlerMixedSamplingSingleStation(*args, **kwargs)

   Bases: :py:obj:`mlair.data_handler.data_handler_single_station.DataHandlerSingleStation`

   :param window_history_offset: used to shift t0 according to the specified value.
   :param window_history_end: used to set the last time step that is used to create a sample. A negative value
       indicates that not all values up to t0 are used, a positive values indicates usage of values at t>t0. Default
       is 0.

   .. py:method:: update_kwargs(parameter_name: str, default: Any, kwargs: dict)
      :staticmethod:

      Update a single element of kwargs inplace to be usable for inputs and targets.

      The updated value in the kwargs dictionary is a tuple consisting on the value applicable to the inputs as first
      element and the target's value as second element: (<value_input>, <value_target>). If the value for the given
      parameter_name is already a tuple, it is checked to have exact two entries. If the paramter_name is not
      included in kwargs, the given default value is used and applied to both elements of the update tuple.

      :param parameter_name: name of the parameter that should be transformed to 2-dim
      :param default: the default value to fill if parameter is not in kwargs
      :param kwargs: the kwargs dictionary containing parameters


   .. py:method:: make_input_target(self)


   .. py:method:: load_and_interpolate(self, ind) -> [xarray.DataArray, pandas.DataFrame]


   .. py:method:: set_inputs_and_targets(self)


   .. py:method:: setup_data_path(self, data_path, sampling)

      Sets two paths instead of single path. Expects sampling arg to be a list with two entries


   .. py:method:: _extract_lazy(self, lazy_data)


.. py:class:: DataHandlerMixedSampling(id_class: data_handler, experiment_path: str, min_length: int = 0, extreme_values: num_or_list = None, extremes_on_right_tail_only: bool = False, name_affix=None, store_processed_data=True, iter_dim=DEFAULT_ITER_DIM, time_dim=DEFAULT_TIME_DIM, use_multiprocessing=True, max_number_multiprocessing=MAX_NUMBER_MULTIPROCESSING)

   Bases: :py:obj:`mlair.data_handler.DefaultDataHandler`

   Data handler using mixed sampling for input and target.

   .. py:attribute:: data_handler
      

   .. py:attribute:: data_handler_transformation
      

   .. py:attribute:: _requirements
      

.. py:class:: DataHandlerMixedSamplingWithFilterSingleStation(*args, **kwargs)

   Bases: :py:obj:`DataHandlerMixedSamplingSingleStation`, :py:obj:`mlair.data_handler.data_handler_with_filter.DataHandlerFilterSingleStation`

   :param window_history_offset: used to shift t0 according to the specified value.
   :param window_history_end: used to set the last time step that is used to create a sample. A negative value
       indicates that not all values up to t0 are used, a positive values indicates usage of values at t>t0. Default
       is 0.

   .. py:method:: _check_sampling(self, **kwargs)


   .. py:method:: apply_filter(self)
      :abstractmethod:


   .. py:method:: create_filter_index(self) -> pandas.Index
      :abstractmethod:

      Create name for filter dimension.


   .. py:method:: _create_lazy_data(self)
      :abstractmethod:


   .. py:method:: make_input_target(self)

      A FIR filter is applied on the input data that has hourly resolution. Labels Y are provided as aggregated values
      with daily resolution.


   .. py:method:: estimate_filter_width(self)
      :abstractmethod:

      Return maximum filter width.


   .. py:method:: _add_time_delta(date, delta)
      :staticmethod:


   .. py:method:: update_start_end(self, ind)


   .. py:method:: load_and_interpolate(self, ind) -> [xarray.DataArray, pandas.DataFrame]


   .. py:method:: _extract_lazy(self, lazy_data)


.. py:class:: DataHandlerMixedSamplingWithFirFilterSingleStation(*args, **kwargs)

   Bases: :py:obj:`DataHandlerMixedSamplingWithFilterSingleStation`, :py:obj:`mlair.data_handler.data_handler_with_filter.DataHandlerFirFilterSingleStation`

   :param window_history_offset: used to shift t0 according to the specified value.
   :param window_history_end: used to set the last time step that is used to create a sample. A negative value
       indicates that not all values up to t0 are used, a positive values indicates usage of values at t>t0. Default
       is 0.

   .. py:method:: estimate_filter_width(self)

      Filter width is determined by the filter with the highest order.


   .. py:method:: apply_filter(self)

      Apply FIR filter only on inputs.


   .. py:method:: create_filter_index(self, add_unfiltered_index=True) -> pandas.Index

      Create name for filter dimension.


   .. py:method:: _extract_lazy(self, lazy_data)


   .. py:method:: _create_lazy_data(self)


   .. py:method:: _get_fs(**kwargs)
      :staticmethod:

      Return frequency in 1/day (not Hz)


.. py:class:: DataHandlerMixedSamplingWithFirFilter(*args, use_filter_branches=False, **kwargs)

   Bases: :py:obj:`mlair.data_handler.data_handler_with_filter.DataHandlerFirFilter`

   Data handler using mixed sampling for input and target. Inputs are temporal filtered.

   .. py:attribute:: data_handler
      

   .. py:attribute:: data_handler_transformation
      

   .. py:attribute:: _requirements
      

.. py:class:: DataHandlerMixedSamplingWithClimateFirFilterSingleStation(*args, **kwargs)

   Bases: :py:obj:`mlair.data_handler.data_handler_with_filter.DataHandlerClimateFirFilterSingleStation`, :py:obj:`DataHandlerMixedSamplingWithFirFilterSingleStation`

   Data handler for a single station to be used by a superior data handler. Inputs are FIR filtered. In contrast to
   the simple DataHandlerFirFilterSingleStation, this data handler is centered around t0 to have no time delay. For
   values in the future (t > t0), this data handler assumes a climatological value for the low pass data and values of
   0 for all residuum components.

   :param apriori: Data to use as apriori information. This should be either a xarray dataarray containing monthly or
       any other heuristic to support the clim filter, or a list of such arrays containing heuristics for all residua
       in addition. The 2nd can be used together with apriori_type `residuum_stats` which estimates the error of the
       residuum when the clim filter should be applied with exogenous parameters. If apriori_type is None/`zeros` data
       can be provided, but this is not required in this case.
   :param apriori_type: set type of information that is provided to the clim filter. For the first low pass always a
       calculated or given statistic is used. For residuum prediction a constant value of zero is assumed if
       apriori_type is None or `zeros`, and a climatology of the residuum is used for `residuum_stats`.
   :param apriori_diurnal: use diurnal anomalies of each hour as addition to the apriori information type chosen by
       parameter apriori_type. This is only applicable for hourly resolution data.
   :param apriori_sel_opts: specify some parameters to select a subset of data before calculating the apriori
       information. Use this parameter for example, if apriori shall only calculated on a shorter time period than
       available in given data.
   :param extend_length_opts: use this parameter to use future data in the filter calculation. This parameter does not
       affect the size of the history samples as this is handled by the window_history_size parameter. Example: set
       extend_length_opts=7*24 to use the observation of the next 7 days to calculate the filtered components. Which
       data are finally used for the input samples is not affected by these 7 days. In case the range of history sample
       exceeds the horizon of extend_length_opts, the history sample will also include data from climatological
       estimates.

   .. py:method:: _extract_lazy(self, lazy_data)


.. py:class:: DataHandlerMixedSamplingWithClimateFirFilter(*args, data_handler_class_unfiltered: data_handler_unfiltered = None, filter_add_unfiltered: bool = DEFAULT_FILTER_ADD_UNFILTERED, **kwargs)

   Bases: :py:obj:`mlair.data_handler.data_handler_with_filter.DataHandlerClimateFirFilter`

   Data handler using mixed sampling for input and target. Inputs are temporal filtered.

   .. py:attribute:: data_handler
      

   .. py:attribute:: data_handler_transformation
      

   .. py:attribute:: data_handler_unfiltered
      

   .. py:attribute:: _requirements
      

   .. py:attribute:: DEFAULT_FILTER_ADD_UNFILTERED
      :annotation: = False

      
   .. py:method:: _create_collection(self)


   .. py:method:: build(cls, station: str, **kwargs)
      :classmethod:

      Return initialised class.


   .. py:method:: build_update_transformation(cls, kwargs_dict, dh_type='filtered')
      :classmethod:


   .. py:method:: transformation(cls, set_stations, tmp_path=None, dh_transformation=None, **kwargs)
      :classmethod:

      ### supported transformation methods

      Currently supported methods are:

      * standardise (default, if method is not given)
      * centre
      * min_max
      * log

      ### mean and std estimation

      Mean and std (depending on method) are estimated. For each station, mean and std are calculated and afterwards
      aggregated using the mean value over all station-wise metrics. This method is not exactly accurate, especially
      regarding the std calculation but therefore much faster. Furthermore, it is a weighted mean weighted by the
      time series length / number of data itself - a longer time series has more influence on the transformation
      settings than a short time series. The estimation of the std in less accurate, because the unweighted mean of
      all stds in not equal to the true std, but still the mean of all station-wise std is a decent estimate. Finally,
      the real accuracy of mean and std is less important, because it is "just" a transformation / scaling.

      ### mean and std given

      If mean and std are not None, the default data handler expects this parameters to match the data and applies
      this values to the data. Make sure that all dimensions and/or coordinates are in agreement.

      ### min and max given
      If min and max are not None, the default data handler expects this parameters to match the data and applies
      this values to the data. Make sure that all dimensions and/or coordinates are in agreement.


.. py:class:: DataHandlerMixedSamplingWithClimateAndFirFilter(data_handler_class_chem, data_handler_class_meteo, data_handler_class_chem_unfiltered, data_handler_class_meteo_unfiltered, chem_vars, meteo_vars, *args, **kwargs)

   Bases: :py:obj:`DataHandlerMixedSamplingWithClimateFirFilter`

   Data handler using mixed sampling for input and target. Inputs are temporal filtered.

   .. py:attribute:: data_handler_climate_fir
      

   .. py:attribute:: data_handler_fir
      

   .. py:attribute:: data_handler_fir_pos
      

   .. py:attribute:: data_handler
      

   .. py:attribute:: data_handler_unfiltered
      

   .. py:attribute:: _requirements
      

   .. py:attribute:: chem_indicator
      :annotation: = chem

      
   .. py:attribute:: meteo_indicator
      :annotation: = meteo

      
   .. py:method:: _split_chem_and_meteo_variables(cls, **kwargs)
      :classmethod:

      Select all used variables and split them into categories chem and other.

      Chemical variables are indicated by `cls.data_handler_climate_fir.chem_vars`. To indicate used variables, this
      method uses 1) parameter `variables`, 2) keys from `statistics_per_var`, 3) keys from
      `cls.data_handler_climate_fir.DEFAULT_VAR_ALL_DICT`. Option 3) is also applied if 1) or 2) are given but None.


   .. py:method:: build(cls, station: str, **kwargs)
      :classmethod:

      Return initialised class.


   .. py:method:: correct_overwrite_option(cls, kwargs)
      :classmethod:

      Set `overwrite_local_data=False`.


   .. py:method:: set_data_handler_fir_pos(cls, **kwargs)
      :classmethod:

      Set position of fir data handler to use either faster FIR version or slower climate FIR.

      This method will set data handler indicator to 0 if either no parameter "extend_length_opts" is given or the
      parameter is of type dict but has no entry for the meteo_indicator. In all other cases, indicator is set to 1.


   .. py:method:: prepare_build(cls, kwargs, var_list, var_type)
      :classmethod:

      Prepares for build of class.

      `variables` parameter is updated by `var_list`, which should only include variables of a specific type (e.g.
      only chemical variables) indicated by `var_type`. Furthermore, this method cleans the `kwargs` dictionary as
      follows: For all parameters provided as dict to separate between chem and meteo options (dict must have keys
      from `cls.chem_indicator` and/or `cls.meteo_indicator`), this parameter is removed from kwargs and its value
      related to `var_type` added again. In case there is no value for given `var_type`, the parameter is not added
      at all (as this parameter is assumed to affect only other types of variables).


   .. py:method:: _create_collection(self)


   .. py:method:: transformation(cls, set_stations, tmp_path=None, **kwargs)
      :classmethod:

      ### supported transformation methods

      Currently supported methods are:

      * standardise (default, if method is not given)
      * centre
      * min_max
      * log

      ### mean and std estimation

      Mean and std (depending on method) are estimated. For each station, mean and std are calculated and afterwards
      aggregated using the mean value over all station-wise metrics. This method is not exactly accurate, especially
      regarding the std calculation but therefore much faster. Furthermore, it is a weighted mean weighted by the
      time series length / number of data itself - a longer time series has more influence on the transformation
      settings than a short time series. The estimation of the std in less accurate, because the unweighted mean of
      all stds in not equal to the true std, but still the mean of all station-wise std is a decent estimate. Finally,
      the real accuracy of mean and std is less important, because it is "just" a transformation / scaling.

      ### mean and std given

      If mean and std are not None, the default data handler expects this parameters to match the data and applies
      this values to the data. Make sure that all dimensions and/or coordinates are in agreement.

      ### min and max given
      If min and max are not None, the default data handler expects this parameters to match the data and applies
      this values to the data. Make sure that all dimensions and/or coordinates are in agreement.


.. py:class:: DataHandlerIFSSingleStation(*args, **kwargs)

   Bases: :py:obj:`DataHandlerMixedSamplingWithClimateFirFilterSingleStation`

   Data handler for a single station to be used by a superior data handler. Inputs are FIR filtered. In contrast to
   the simple DataHandlerFirFilterSingleStation, this data handler is centered around t0 to have no time delay. For
   values in the future (t > t0), this data handler assumes a climatological value for the low pass data and values of
   0 for all residuum components.

   :param apriori: Data to use as apriori information. This should be either a xarray dataarray containing monthly or
       any other heuristic to support the clim filter, or a list of such arrays containing heuristics for all residua
       in addition. The 2nd can be used together with apriori_type `residuum_stats` which estimates the error of the
       residuum when the clim filter should be applied with exogenous parameters. If apriori_type is None/`zeros` data
       can be provided, but this is not required in this case.
   :param apriori_type: set type of information that is provided to the clim filter. For the first low pass always a
       calculated or given statistic is used. For residuum prediction a constant value of zero is assumed if
       apriori_type is None or `zeros`, and a climatology of the residuum is used for `residuum_stats`.
   :param apriori_diurnal: use diurnal anomalies of each hour as addition to the apriori information type chosen by
       parameter apriori_type. This is only applicable for hourly resolution data.
   :param apriori_sel_opts: specify some parameters to select a subset of data before calculating the apriori
       information. Use this parameter for example, if apriori shall only calculated on a shorter time period than
       available in given data.
   :param extend_length_opts: use this parameter to use future data in the filter calculation. This parameter does not
       affect the size of the history samples as this is handled by the window_history_size parameter. Example: set
       extend_length_opts=7*24 to use the observation of the next 7 days to calculate the filtered components. Which
       data are finally used for the input samples is not affected by these 7 days. In case the range of history sample
       exceeds the horizon of extend_length_opts, the history sample will also include data from climatological
       estimates.

   .. py:method:: load_and_interpolate(self, ind) -> [xarray.DataArray, pandas.DataFrame]


   .. py:method:: make_input_target(self)

      A FIR filter is applied on the input data that has hourly resolution. Labels Y are provided as aggregated values
      with daily resolution.


.. py:class:: DataHandlerIFS(data_handler_class_chem, data_handler_class_meteo, data_handler_class_chem_unfiltered, data_handler_class_meteo_unfiltered, chem_vars, meteo_vars, *args, **kwargs)

   Bases: :py:obj:`DataHandlerMixedSamplingWithClimateAndFirFilter`

   Data handler using mixed sampling for input and target. Inputs are temporal filtered.

   .. py:attribute:: data_handler_fir
      

   .. py:method:: set_data_handler_fir_pos(cls, **kwargs)
      :classmethod:

      Set position of fir data handler to always use climate FIR.