mlair.data_handler.data_handler_mixed_sampling

Module Contents

Classes

DataHandlerMixedSamplingSingleStation

param window_history_offset

used to shift t0 according to the specified value.

DataHandlerMixedSampling

Data handler using mixed sampling for input and target.

DataHandlerMixedSamplingWithFilterSingleStation

param window_history_offset

used to shift t0 according to the specified value.

DataHandlerMixedSamplingWithFirFilterSingleStation

param window_history_offset

used to shift t0 according to the specified value.

DataHandlerMixedSamplingWithFirFilter

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

DataHandlerMixedSamplingWithClimateFirFilterSingleStation

Data handler for a single station to be used by a superior data handler. Inputs are FIR filtered. In contrast to

DataHandlerMixedSamplingWithClimateFirFilter

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

DataHandlerMixedSamplingWithClimateAndFirFilter

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

DataHandlerIFSSingleStation

Data handler for a single station to be used by a superior data handler. Inputs are FIR filtered. In contrast to

DataHandlerIFS

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

Attributes

__author__

__date__

mlair.data_handler.data_handler_mixed_sampling.__author__ = Lukas Leufen
mlair.data_handler.data_handler_mixed_sampling.__date__ = 2020-11-05
class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingSingleStation(*args, **kwargs)

Bases: mlair.data_handler.data_handler_single_station.DataHandlerSingleStation

Parameters
  • window_history_offset – used to shift t0 according to the specified value.

  • window_history_end – used to set the last time step that is used to create a sample. A negative value indicates that not all values up to t0 are used, a positive values indicates usage of values at t>t0. Default is 0.

static update_kwargs(parameter_name: str, default: Any, kwargs: dict)

Update a single element of kwargs inplace to be usable for inputs and targets.

The updated value in the kwargs dictionary is a tuple consisting on the value applicable to the inputs as first element and the target’s value as second element: (<value_input>, <value_target>). If the value for the given parameter_name is already a tuple, it is checked to have exact two entries. If the paramter_name is not included in kwargs, the given default value is used and applied to both elements of the update tuple.

Parameters
  • parameter_name – name of the parameter that should be transformed to 2-dim

  • default – the default value to fill if parameter is not in kwargs

  • kwargs – the kwargs dictionary containing parameters

make_input_target(self)
load_and_interpolate(self, ind)[xarray.DataArray, pandas.DataFrame]
set_inputs_and_targets(self)
setup_data_path(self, data_path, sampling)

Sets two paths instead of single path. Expects sampling arg to be a list with two entries

_extract_lazy(self, lazy_data)
class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSampling(id_class: data_handler, experiment_path: str, min_length: int = 0, extreme_values: num_or_list = None, extremes_on_right_tail_only: bool = False, name_affix=None, store_processed_data=True, iter_dim=DEFAULT_ITER_DIM, time_dim=DEFAULT_TIME_DIM, use_multiprocessing=True, max_number_multiprocessing=MAX_NUMBER_MULTIPROCESSING)

Bases: mlair.data_handler.DefaultDataHandler

Data handler using mixed sampling for input and target.

data_handler
data_handler_transformation
_requirements
class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithFilterSingleStation(*args, **kwargs)

Bases: DataHandlerMixedSamplingSingleStation, mlair.data_handler.data_handler_with_filter.DataHandlerFilterSingleStation

Parameters
  • window_history_offset – used to shift t0 according to the specified value.

  • window_history_end – used to set the last time step that is used to create a sample. A negative value indicates that not all values up to t0 are used, a positive values indicates usage of values at t>t0. Default is 0.

_check_sampling(self, **kwargs)
abstract apply_filter(self)
abstract create_filter_index(self)pandas.Index

Create name for filter dimension.

abstract _create_lazy_data(self)
make_input_target(self)

A FIR filter is applied on the input data that has hourly resolution. Labels Y are provided as aggregated values with daily resolution.

abstract estimate_filter_width(self)

Return maximum filter width.

static _add_time_delta(date, delta)
update_start_end(self, ind)
load_and_interpolate(self, ind)[xarray.DataArray, pandas.DataFrame]
_extract_lazy(self, lazy_data)
class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithFirFilterSingleStation(*args, **kwargs)

Bases: DataHandlerMixedSamplingWithFilterSingleStation, mlair.data_handler.data_handler_with_filter.DataHandlerFirFilterSingleStation

Parameters
  • window_history_offset – used to shift t0 according to the specified value.

  • window_history_end – used to set the last time step that is used to create a sample. A negative value indicates that not all values up to t0 are used, a positive values indicates usage of values at t>t0. Default is 0.

estimate_filter_width(self)

Filter width is determined by the filter with the highest order.

apply_filter(self)

Apply FIR filter only on inputs.

create_filter_index(self, add_unfiltered_index=True)pandas.Index

Create name for filter dimension.

_extract_lazy(self, lazy_data)
_create_lazy_data(self)
static _get_fs(**kwargs)

Return frequency in 1/day (not Hz)

class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithFirFilter(*args, use_filter_branches=False, **kwargs)

Bases: mlair.data_handler.data_handler_with_filter.DataHandlerFirFilter

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

data_handler
data_handler_transformation
_requirements
class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithClimateFirFilterSingleStation(*args, **kwargs)

Bases: mlair.data_handler.data_handler_with_filter.DataHandlerClimateFirFilterSingleStation, DataHandlerMixedSamplingWithFirFilterSingleStation

Data handler for a single station to be used by a superior data handler. Inputs are FIR filtered. In contrast to the simple DataHandlerFirFilterSingleStation, this data handler is centered around t0 to have no time delay. For values in the future (t > t0), this data handler assumes a climatological value for the low pass data and values of 0 for all residuum components.

Parameters
  • apriori – Data to use as apriori information. This should be either a xarray dataarray containing monthly or any other heuristic to support the clim filter, or a list of such arrays containing heuristics for all residua in addition. The 2nd can be used together with apriori_type residuum_stats which estimates the error of the residuum when the clim filter should be applied with exogenous parameters. If apriori_type is None/zeros data can be provided, but this is not required in this case.

  • apriori_type – set type of information that is provided to the clim filter. For the first low pass always a calculated or given statistic is used. For residuum prediction a constant value of zero is assumed if apriori_type is None or zeros, and a climatology of the residuum is used for residuum_stats.

  • apriori_diurnal – use diurnal anomalies of each hour as addition to the apriori information type chosen by parameter apriori_type. This is only applicable for hourly resolution data.

  • apriori_sel_opts – specify some parameters to select a subset of data before calculating the apriori information. Use this parameter for example, if apriori shall only calculated on a shorter time period than available in given data.

  • extend_length_opts – use this parameter to use future data in the filter calculation. This parameter does not affect the size of the history samples as this is handled by the window_history_size parameter. Example: set extend_length_opts=7*24 to use the observation of the next 7 days to calculate the filtered components. Which data are finally used for the input samples is not affected by these 7 days. In case the range of history sample exceeds the horizon of extend_length_opts, the history sample will also include data from climatological estimates.

_extract_lazy(self, lazy_data)
class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithClimateFirFilter(*args, data_handler_class_unfiltered: data_handler_unfiltered = None, filter_add_unfiltered: bool = DEFAULT_FILTER_ADD_UNFILTERED, **kwargs)

Bases: mlair.data_handler.data_handler_with_filter.DataHandlerClimateFirFilter

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

data_handler
data_handler_transformation
data_handler_unfiltered
_requirements
DEFAULT_FILTER_ADD_UNFILTERED = False
_create_collection(self)
classmethod build(cls, station: str, **kwargs)

Return initialised class.

classmethod build_update_transformation(cls, kwargs_dict, dh_type='filtered')
classmethod transformation(cls, set_stations, tmp_path=None, dh_transformation=None, **kwargs)

### supported transformation methods

Currently supported methods are:

  • standardise (default, if method is not given)

  • centre

  • min_max

  • log

### mean and std estimation

Mean and std (depending on method) are estimated. For each station, mean and std are calculated and afterwards aggregated using the mean value over all station-wise metrics. This method is not exactly accurate, especially regarding the std calculation but therefore much faster. Furthermore, it is a weighted mean weighted by the time series length / number of data itself - a longer time series has more influence on the transformation settings than a short time series. The estimation of the std in less accurate, because the unweighted mean of all stds in not equal to the true std, but still the mean of all station-wise std is a decent estimate. Finally, the real accuracy of mean and std is less important, because it is “just” a transformation / scaling.

### mean and std given

If mean and std are not None, the default data handler expects this parameters to match the data and applies this values to the data. Make sure that all dimensions and/or coordinates are in agreement.

### min and max given If min and max are not None, the default data handler expects this parameters to match the data and applies this values to the data. Make sure that all dimensions and/or coordinates are in agreement.

class mlair.data_handler.data_handler_mixed_sampling.DataHandlerMixedSamplingWithClimateAndFirFilter(data_handler_class_chem, data_handler_class_meteo, data_handler_class_chem_unfiltered, data_handler_class_meteo_unfiltered, chem_vars, meteo_vars, *args, **kwargs)

Bases: DataHandlerMixedSamplingWithClimateFirFilter

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

data_handler_climate_fir
data_handler_fir
data_handler_fir_pos
data_handler
data_handler_unfiltered
_requirements
chem_indicator = chem
meteo_indicator = meteo
classmethod _split_chem_and_meteo_variables(cls, **kwargs)

Select all used variables and split them into categories chem and other.

Chemical variables are indicated by cls.data_handler_climate_fir.chem_vars. To indicate used variables, this method uses 1) parameter variables, 2) keys from statistics_per_var, 3) keys from cls.data_handler_climate_fir.DEFAULT_VAR_ALL_DICT. Option 3) is also applied if 1) or 2) are given but None.

classmethod build(cls, station: str, **kwargs)

Return initialised class.

classmethod correct_overwrite_option(cls, kwargs)

Set overwrite_local_data=False.

classmethod set_data_handler_fir_pos(cls, **kwargs)

Set position of fir data handler to use either faster FIR version or slower climate FIR.

This method will set data handler indicator to 0 if either no parameter “extend_length_opts” is given or the parameter is of type dict but has no entry for the meteo_indicator. In all other cases, indicator is set to 1.

classmethod prepare_build(cls, kwargs, var_list, var_type)

Prepares for build of class.

variables parameter is updated by var_list, which should only include variables of a specific type (e.g. only chemical variables) indicated by var_type. Furthermore, this method cleans the kwargs dictionary as follows: For all parameters provided as dict to separate between chem and meteo options (dict must have keys from cls.chem_indicator and/or cls.meteo_indicator), this parameter is removed from kwargs and its value related to var_type added again. In case there is no value for given var_type, the parameter is not added at all (as this parameter is assumed to affect only other types of variables).

_create_collection(self)
classmethod transformation(cls, set_stations, tmp_path=None, **kwargs)

### supported transformation methods

Currently supported methods are:

  • standardise (default, if method is not given)

  • centre

  • min_max

  • log

### mean and std estimation

Mean and std (depending on method) are estimated. For each station, mean and std are calculated and afterwards aggregated using the mean value over all station-wise metrics. This method is not exactly accurate, especially regarding the std calculation but therefore much faster. Furthermore, it is a weighted mean weighted by the time series length / number of data itself - a longer time series has more influence on the transformation settings than a short time series. The estimation of the std in less accurate, because the unweighted mean of all stds in not equal to the true std, but still the mean of all station-wise std is a decent estimate. Finally, the real accuracy of mean and std is less important, because it is “just” a transformation / scaling.

### mean and std given

If mean and std are not None, the default data handler expects this parameters to match the data and applies this values to the data. Make sure that all dimensions and/or coordinates are in agreement.

### min and max given If min and max are not None, the default data handler expects this parameters to match the data and applies this values to the data. Make sure that all dimensions and/or coordinates are in agreement.

class mlair.data_handler.data_handler_mixed_sampling.DataHandlerIFSSingleStation(*args, **kwargs)

Bases: DataHandlerMixedSamplingWithClimateFirFilterSingleStation

Data handler for a single station to be used by a superior data handler. Inputs are FIR filtered. In contrast to the simple DataHandlerFirFilterSingleStation, this data handler is centered around t0 to have no time delay. For values in the future (t > t0), this data handler assumes a climatological value for the low pass data and values of 0 for all residuum components.

Parameters
  • apriori – Data to use as apriori information. This should be either a xarray dataarray containing monthly or any other heuristic to support the clim filter, or a list of such arrays containing heuristics for all residua in addition. The 2nd can be used together with apriori_type residuum_stats which estimates the error of the residuum when the clim filter should be applied with exogenous parameters. If apriori_type is None/zeros data can be provided, but this is not required in this case.

  • apriori_type – set type of information that is provided to the clim filter. For the first low pass always a calculated or given statistic is used. For residuum prediction a constant value of zero is assumed if apriori_type is None or zeros, and a climatology of the residuum is used for residuum_stats.

  • apriori_diurnal – use diurnal anomalies of each hour as addition to the apriori information type chosen by parameter apriori_type. This is only applicable for hourly resolution data.

  • apriori_sel_opts – specify some parameters to select a subset of data before calculating the apriori information. Use this parameter for example, if apriori shall only calculated on a shorter time period than available in given data.

  • extend_length_opts – use this parameter to use future data in the filter calculation. This parameter does not affect the size of the history samples as this is handled by the window_history_size parameter. Example: set extend_length_opts=7*24 to use the observation of the next 7 days to calculate the filtered components. Which data are finally used for the input samples is not affected by these 7 days. In case the range of history sample exceeds the horizon of extend_length_opts, the history sample will also include data from climatological estimates.

load_and_interpolate(self, ind)[xarray.DataArray, pandas.DataFrame]
make_input_target(self)

A FIR filter is applied on the input data that has hourly resolution. Labels Y are provided as aggregated values with daily resolution.

class mlair.data_handler.data_handler_mixed_sampling.DataHandlerIFS(data_handler_class_chem, data_handler_class_meteo, data_handler_class_chem_unfiltered, data_handler_class_meteo_unfiltered, chem_vars, meteo_vars, *args, **kwargs)

Bases: DataHandlerMixedSamplingWithClimateAndFirFilter

Data handler using mixed sampling for input and target. Inputs are temporal filtered.

data_handler_fir
classmethod set_data_handler_fir_pos(cls, **kwargs)

Set position of fir data handler to always use climate FIR.