:py:mod:`mlair.data_handler.default_data_handler`
=================================================

.. py:module:: mlair.data_handler.default_data_handler


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   mlair.data_handler.default_data_handler.DefaultDataHandler


Functions
~~~~~~~~~

.. autoapisummary::

   mlair.data_handler.default_data_handler.f_proc


Attributes
~~~~~~~~~~

.. autoapisummary::

   mlair.data_handler.default_data_handler.__author__
   mlair.data_handler.default_data_handler.__date__
   mlair.data_handler.default_data_handler.number
   mlair.data_handler.default_data_handler.num_or_list


.. py:data:: __author__
   :annotation: = Lukas Leufen

   
.. py:data:: __date__
   :annotation: = 2020-09-21

   
.. py:data:: number
   

.. py:data:: num_or_list
   

.. py:class:: DefaultDataHandler(id_class: data_handler, experiment_path: str, min_length: int = 0, extreme_values: num_or_list = None, extremes_on_right_tail_only: bool = False, name_affix=None, store_processed_data=True, iter_dim=DEFAULT_ITER_DIM, time_dim=DEFAULT_TIME_DIM, use_multiprocessing=True, max_number_multiprocessing=MAX_NUMBER_MULTIPROCESSING)

   Bases: :py:obj:`mlair.data_handler.abstract_data_handler.AbstractDataHandler`

   .. py:attribute:: _requirements
      

   .. py:attribute:: _store_attributes
      

   .. py:attribute:: _skip_args
      

   .. py:attribute:: DEFAULT_ITER_DIM
      :annotation: = Stations

      
   .. py:attribute:: DEFAULT_TIME_DIM
      :annotation: = datetime

      
   .. py:attribute:: MAX_NUMBER_MULTIPROCESSING
      :annotation: = 16

      
   .. py:method:: build(cls, station: str, **kwargs)
      :classmethod:

      Return initialised class.


   .. py:method:: _create_collection(self)


   .. py:method:: _reset_data(self)


   .. py:method:: _cleanup(self)


   .. py:method:: _store(self, fresh_store=False, store_processed_data=True)


   .. py:method:: get_store_attributes(self)

      Returns all attribute names and values that are indicated by the store_attributes method.


   .. py:method:: _force_dask_computation(data)
      :staticmethod:


   .. py:method:: _load(self)


   .. py:method:: get_data(self, upsampling=False, as_numpy=True)


   .. py:method:: __repr__(self)

      Return repr(self).


   .. py:method:: __len__(self, upsampling=False)


   .. py:method:: get_X_original(self)


   .. py:method:: get_Y_original(self)


   .. py:method:: _to_numpy(d)
      :staticmethod:


   .. py:method:: get_X(self, upsampling=False, as_numpy=True)


   .. py:method:: get_Y(self, upsampling=False, as_numpy=True)


   .. py:method:: harmonise_X(self)


   .. py:method:: get_observation(self)


   .. py:method:: apply_transformation(self, data, base='target', dim=0, inverse=False)

      This method must return transformed data. The flag inverse can be used to trigger either transformation or its
      inverse method.


   .. py:method:: multiply_extremes(self, extreme_values: num_or_list = 1.0, extremes_on_right_tail_only: bool = False, timedelta: Tuple[int, str] = (1, 'm'), dim=DEFAULT_TIME_DIM)

      Multiply extremes.

      This method extracts extreme values from self.labels which are defined in the argument extreme_values. One can
      also decide only to extract extremes on the right tail of the distribution. When extreme_values is a list of
      floats/ints all values larger (and smaller than negative extreme_values; extraction is performed in standardised
      space) than are extracted iteratively. If for example extreme_values = [1.,2.] then a value of 1.5 would be
      extracted once (for 0th entry in list), while a 2.5 would be extracted twice (once for each entry). Timedelta is
      used to mark those extracted values by adding one min to each timestamp. As TOAR Data are hourly one can
      identify those "artificial" data points later easily. Extreme inputs and labels are stored in
      self.extremes_history and self.extreme_labels, respectively.

      :param extreme_values: user definition of extreme
      :param extremes_on_right_tail_only: if False also multiply values which are smaller then -extreme_values,
          if True only extract values larger than extreme_values
      :param timedelta: used as arguments for np.timedelta in order to mark extreme values on datetime


   .. py:method:: _add_timedelta(data, dim, timedelta)
      :staticmethod:


   .. py:method:: transformation(cls, set_stations, tmp_path=None, dh_transformation=None, **kwargs)
      :classmethod:

      ### supported transformation methods

      Currently supported methods are:

      * standardise (default, if method is not given)
      * centre
      * min_max
      * log

      ### mean and std estimation

      Mean and std (depending on method) are estimated. For each station, mean and std are calculated and afterwards
      aggregated using the mean value over all station-wise metrics. This method is not exactly accurate, especially
      regarding the std calculation but therefore much faster. Furthermore, it is a weighted mean weighted by the
      time series length / number of data itself - a longer time series has more influence on the transformation
      settings than a short time series. The estimation of the std in less accurate, because the unweighted mean of
      all stds in not equal to the true std, but still the mean of all station-wise std is a decent estimate. Finally,
      the real accuracy of mean and std is less important, because it is "just" a transformation / scaling.

      ### mean and std given

      If mean and std are not None, the default data handler expects this parameters to match the data and applies
      this values to the data. Make sure that all dimensions and/or coordinates are in agreement.

      ### min and max given
      If min and max are not None, the default data handler expects this parameters to match the data and applies
      this values to the data. Make sure that all dimensions and/or coordinates are in agreement.


   .. py:method:: aggregate_transformation(cls, transformation_dict, iter_dim)
      :classmethod:


   .. py:method:: update_transformation_dict(cls, dh, transformation_dict)
      :classmethod:

      Inner method that is performed in both serial and parallel approach.


   .. py:method:: get_coordinates(self)

      Return coordinates as dictionary with keys `lon` and `lat`.


.. py:function:: f_proc(data_handler, station, return_strategy='', tmp_path=None, **sp_keys)

   Try to create a data handler for given arguments. If build fails, this station does not fulfil all requirements and
   therefore f_proc will return None as indication. On a successful build, f_proc returns the built data handler and
   the station that was used. This function must be implemented globally to work together with multiprocessing.