mlair.helpers.filter

Module Contents

Classes

FIRFilter

ClimateFIRFilter

KolmogorovZurbenkoBaseClass

KolmogorovZurbenkoFilterMovingWindow

Functions

fir_filter(data, fs, order=5, cutoff_low=None, cutoff_high=None, window=’hamming’, dim=’variables’, h=None, causal=True, padlen=None)

Expects xarray.

fir_filter_convolve(data, h)

firwin_kzf(m: int, k: int) → numpy.array

Calculate weights of window for Kolmogorov Zurbenko filter.

omega_null_kzf(m: int, k: int, alpha: float = 0.5) → float

filter_width_kzf(m: int, k: int) → int

Returns window width of the Kolmorogov Zurbenko filter.

class mlair.helpers.filter.FIRFilter(data, fs, order, cutoff, window, var_dim, time_dim, display_name=None, minimum_length=None, extend_end=0, plot_path=None, plot_dates=None, offset=0)
run(self)
create_visualization(self, filtered, filter_input_data, plot_dates, time_dim, sampling, h, minimum_length, order, i, extend_end, var_dim)
property filter_coefficients(self)
property filtered_data(self)
fir_filter(self, data, fs, cutoff_high, order, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=None, new_dim='window', plot_dates=None, display_name=None)
static _calculate_filter_coefficients(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) → numpy.array

Calculate filter coefficients for moving window using scipy’s signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter.

Parameters
  • window – name of the window type which is either a string with the window’s name or a tuple containing the name but also some parameters (e.g. (“kaiser”, 5))

  • order – order of the filter to create as int or parameters m and k of kzf

  • cutoff_high – cutoff frequency to use for low-pass filter in frequency of fs

  • fs – sampling frequency of time series

static _create_full_filter_result_array(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) → xarray.DataArray

Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans.

Parameters
  • template_array – this array is used as template for shape and ordering of dims

  • result_array – array with data that are filled into template

  • new_dim – new dimension which is shifted/appended to/at the end (if present or not)

  • display_name – string that is attached to logging (default None)

class mlair.helpers.filter.ClimateFIRFilter(data, fs, order, cutoff, window, time_dim, var_dim, apriori=None, apriori_type=None, apriori_diurnal=False, sel_opts=None, plot_path=None, minimum_length=None, new_dim=None, display_name=None, extend_length_opts: int = 0, extend_end: Union[dict, int] = 0, plot_dates=None, offset: int = 0)

Bases: FIRFilter

run(self)
_check_sel_opts(self)
static _next_order(order: list, minimum_length: Union[int, None], pos: int, window: Union[str, tuple])int
static create_monthly_unity_array(data: xarray.DataArray, time_dim: str, extend_range: int = 366) → xarray.DataArray

Create a xarray data array filled with ones with monthly resolution (set on 16th of month). Data is extended by extend_range days in future and past along time_dim.

Parameters
  • data – data to create monthly unity array from, must contain dimension time_dim

  • time_dim – name of temporal dimension

  • extend_range – number of days to extend data (default 366)

Returns

xarray in monthly resolution (centered at 16th day of month) with all values equal to 1

create_monthly_mean(self, data: xarray.DataArray, time_dim: str, sel_opts: dict = None, sampling: str = '1d') → xarray.DataArray

Calculate monthly means (12 values) and return a data array with same resolution as given data containing these monthly mean values. Sampling points are the 16th of each month (this value is equal to the true monthly mean) and all other values between two points are interpolated linearly. It is possible to apply some pre-selection to use only a subset of given data using the sel_opts parameter. Only data from this subset are used to calculate the monthly statistic.

Parameters
  • data – data to apply statistical calculation on

  • time_dim – name of temporal axis

  • sel_opts – selection options as dict to select a subset of data (default None). A given sel_opts with sel_opts={<time_dim>: “2006”} forces the method e.g. to derive the monthly means only from data of the year 2006.

  • sampling – sampling of the returned data (default 1d)

Returns

array in desired resolution containing interpolated monthly values. Months with no valid data are returned as np.nan which also effects data in the neighbouring months (before / after sampling points which are the 16th of each month).

static _compute_hourly_mean_per_month(data: xarray.DataArray, time_dim: str, as_anomaly: bool) → Dict[int, xarray.DataArray]

Calculate for each hour in each month a separate mean value (12 x 24 values in total). Average is either the anomaly of a monthly mean state or the raw mean value.

Parameters
  • data – data to calculate averages on

  • time_dim – name of temporal dimension

  • as_anomaly – indicates whether to calculate means as anomaly of a monthly mean or as raw mean values.

Returns

dictionary containing 12 months each with a 24-valued array (1 entry for each hour)

static _create_seasonal_cycle_of_single_hour_mean(result_arr: xarray.DataArray, means: Dict[int, xarray.DataArray], hour: int, time_dim: str, sampling: str) → xarray.DataArray

Use monthly means of a given hour to create an array with interpolated values at the indicated hour for each day of the full time span indicated by given result_arr.

Parameters
  • result_arr – template array indicating the full time range and additional dimensions to keep

  • means – dictionary containing 24 hourly averages for each month (12 x 24 values in total)

  • hour – integer of hour of interest

  • time_dim – name of temporal dimension

  • sampling – sampling rate to interpolate

Returns

array with interpolated averages in sampling resolution containing only values for hour of interest

create_seasonal_hourly_mean(self, data: xarray.DataArray, time_dim: str, sel_opts: Dict[str, Any] = None, sampling: str = '1H', as_anomaly: bool = True) → xarray.DataArray

Compute climatological statistics on hourly base either as raw data or anomalies. For each month, an overall mean value (only used if requiring anomalies) and the mean of each hour are calculated. The climatological diurnal cycle is positioned on the 16th of each month and interpolated in between by using a distinct interpolation for each hour of day. The returned array therefore contains data with a yearly cycle (if anomaly is not calculated) or data without a yearly cycle (if using anomalies). In both cases, the data have an amplitude that varies over the year.

Parameters
  • data – data to apply this method to

  • time_dim – name of temporal axis

  • sel_opts – specific selection options that are applied before calculation of climatological statistics (default None)

  • sampling – temporal resolution of data (default “1H”)

  • as_anomaly – specify whether to use anomalies or raw data including a seasonal cycle of the mean value (default: True)

Returns

climatological statistics for given data interpolated with given sampling rate

static extend_apriori(data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, sampling: str = '1d', display_name: str = None) → xarray.DataArray

Extend time range of apriori information to span a longer period as data (or at least of equal length). This method may not working properly if length of apriori contains data from less then one year.

Parameters
  • data – data to get time range of which apriori should span in minimum

  • apriori – data that is adjusted. It is assumed that this data varies in the course of the year but is same for the same day in different years. Otherwise this method will introduce some unintended artefacts in the apriori data.

  • time_dim – name of temporal dimension

  • sampling – sampling of data (e.g. “1m”, “1d”, default “1d”)

  • display_name – name to use for logging message (default None)

Returns

array which adjusted temporal coverage derived from apriori

static get_forecast_run_delta(data, time_dim)
combine_observation_and_apriori(self, data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, new_dim: str, extend_length_history: int, extend_length_future: int, extend_length_separator: int = 0, forecasts: xarray.DataArray = None, sampling: str = '1H', extend_end: int = 0, offset: int = 0) → xarray.DataArray

Combine historical data / observations (“data”) and climatological statistics (“apriori”). Historical data are used on interval [t0 - extend_length_history, t0] and apriori is used on [t0 + 1, t0 + extend_length_future]. If indicated by the extend_length_seperator, it is possible to shift end of history interval and start of apriori interval by given number of time steps.

Parameters
  • data – historical data for past values, must contain dimensions time_dim and var_dim and might also have a new_dim dimension

  • apriori – climatological estimate for future values, must contain dimensions time_dim and var_dim, but can also have dimension new_dim

  • time_dim – name of temporal dimension

  • new_dim – name of new dim on which data is combined along

  • extend_length_history – number of time steps to use from data

  • extend_length_future – number of time steps to use from apriori (minus 1)

  • extend_length_separator – position of last history value to use (default 0), this position indicates the last value that is used from data (followed by values from apriori). In other words, end of history interval and start of apriori interval are shifted by this value from t0 (positive or negative).

Returns

combined data array

static create_full_time_dim(data, dim, freq)

Ensure time dimension to be equidistant. Sometimes dates if missing values have been dropped.

create_pseudo_timeseries(self, data, time_dim, sampling, window_dim)
create_visualization(self, filtered, data, filter_input_data, plot_dates, time_dim, new_dim, sampling, extend_length_history, extend_length_future, minimum_length, h, variable_name, extend_length_opts=None, extend_end=None, offset=None, forecast=None)
static _get_year_interval(data: xarray.DataArray, time_dim: str) → Tuple[int, int]

Get year of start and end date of given data.

Parameters
  • data – data to extract dates from

  • time_dim – name of temporal axis

Returns

two-element tuple with start and end

static _calculate_filter_coefficients(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) → numpy.array

Calculate filter coefficients for moving window using scipy’s signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter.

Parameters
  • window – name of the window type which is either a string with the window’s name or a tuple containing the name but also some parameters (e.g. (“kaiser”, 5))

  • order – order of the filter to create as int or parameters m and k of kzf

  • cutoff_high – cutoff frequency to use for low-pass filter in frequency of fs

  • fs – sampling frequency of time series

static _trim_data_to_minimum_length(data: xarray.DataArray, extend_length_history: int, dim: str, extend_length_future: int = 0, offset: int = 0) → xarray.DataArray

Trim data along given axis between either -minimum_length (if given) or -extend_length_history and extend_length_opts (which is default set to 0).

Parameters
  • data – data to trim

  • extend_length_history – start number for trim range, only used if parameter minimum_length is not provided

  • dim – dim to apply trim on

  • extend_length_future – number to use in “future”

Returns

trimmed data

static _create_full_filter_result_array(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) → xarray.DataArray

Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans.

Parameters
  • template_array – this array is used as template for shape and ordering of dims

  • result_array – array with data that are filled into template

  • new_dim – new dimension which is shifted/appended to/at the end (if present or not)

  • display_name – string that is attached to logging (default None)

clim_filter(self, data, fs, cutoff_high, order, apriori=None, sel_opts=None, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=0, next_order=0, new_dim='window', plot_dates=None, display_name=None, extend_opts: int = 0, extend_end: int = 0, forecasts=None, offset: int = 0)
static _create_time_range_extend(year: int, sampling: str, extend_length: int)slice

Create a slice object for given year plus extend_length in sampling resolution.

Parameters
  • year – year to create time range for

  • sampling – sampling of time range

  • extend_length – number of time steps to extend out of given year

Returns

slice object with time range

static _create_tmp_dimension(data: xarray.DataArray)str

Create a tmp dimension with name ‘window’ preferably. If name is already part of one dimensions, tmp dimension name is multiplied by itself until not present in dims. Method will raise ValueError after 10 tries.

Parameters

data – data array to create a new tmp dimension for with unique name

Returns

valid name for a tmp dimension (preferably ‘window’)

_shift_data(self, data: xarray.DataArray, index_value: range, time_dim: str, new_dim: str) → xarray.DataArray

Shift data multiple times to create history or future along dimension new_dim for each time step.

Parameters
  • data – data set to shift

  • index_value – range of integers to span history and/or future

  • time_dim – name of temporal dimension that should be shifted

  • new_dim – name of dimension create by data shift

Returns

shifted data

static create_index_array(index_name: str, index_value: range)

Create index array from a range object to use as index of a data array.

Parameters
  • index_name – name of the index dimension

  • index_value – range of values to use as indexes

Returns

index array for given range of values

property apriori_data(self)
property initial_apriori_data(self)
mlair.helpers.filter.fir_filter(data, fs, order=5, cutoff_low=None, cutoff_high=None, window='hamming', dim='variables', h=None, causal=True, padlen=None)

Expects xarray.

mlair.helpers.filter.fir_filter_convolve(data, h)
class mlair.helpers.filter.KolmogorovZurbenkoBaseClass(df, wl, itr, is_child=False, filter_dim='window')
set_child(self)
kz_filter(self, df, m, k)
spectral_calc(self)
static subtract(minuend, subtrahend)
run(self)
transfer_function(self)
omega_null(self, alpha=0.5)
period_null(self, alpha=0.5)
period_null_days(self, alpha=0.5)
plot_transfer_function(self, fig=None, name=None)
class mlair.helpers.filter.KolmogorovZurbenkoFilterMovingWindow(df, wl: Union[list, int], itr: Union[list, int], is_child=False, filter_dim='window', method='mean', percentile=0.5)

Bases: KolmogorovZurbenkoBaseClass

set_child(self)
kz_filter_new(self, df, wl, itr)

It passes the low frequency time series.

If filter method is from mean, max, min this method will call construct and rechunk before the actual calculation to improve performance. If filter method is either median or percentile this approach is not applicable and depending on the data and window size, this method can become slow.

Parameters
  • wl (int) – a window length

  • itr (int) – a number of iteration

kz_filter(self, df, wl, itr)

It passes the low frequency time series.

Parameters
  • wl (int) – a window length

  • itr (int) – a number of iteration

mlair.helpers.filter.firwin_kzf(m: int, k: int) → numpy.array

Calculate weights of window for Kolmogorov Zurbenko filter.

mlair.helpers.filter.omega_null_kzf(m: int, k: int, alpha: float = 0.5)float
mlair.helpers.filter.filter_width_kzf(m: int, k: int)int

Returns window width of the Kolmorogov Zurbenko filter.