`mlair.helpers.filter`¶

Module Contents¶

Classes¶

`FIRFilter`
`ClimateFIRFilter`
`KolmogorovZurbenkoBaseClass`
`KolmogorovZurbenkoFilterMovingWindow`

Functions¶

`fir_filter`(data, fs, order=5, cutoff_low=None, cutoff_high=None, window=’hamming’, dim=’variables’, h=None, causal=True, padlen=None)	Expects xarray.
`fir_filter_convolve`(data, h)
`firwin_kzf`(m: int, k: int) → numpy.array	Calculate weights of window for Kolmogorov Zurbenko filter.
`omega_null_kzf`(m: int, k: int, alpha: float = 0.5) → float
`filter_width_kzf`(m: int, k: int) → int	Returns window width of the Kolmorogov Zurbenko filter.

class mlair.helpers.filter.FIRFilter(data, fs, order, cutoff, window, var_dim, time_dim, display_name=None, minimum_length=None, extend_end=0, plot_path=None, plot_dates=None, offset=0)¶

run(self)¶

create_visualization(self, filtered, filter_input_data, plot_dates, time_dim, sampling, h, minimum_length, order, i, extend_end, var_dim)¶

property filter_coefficients(self)¶

property filtered_data(self)¶

fir_filter(self, data, fs, cutoff_high, order, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=None, new_dim='window', plot_dates=None, display_name=None)¶

static _calculate_filter_coefficients(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) → numpy.array¶

Calculate filter coefficients for moving window using scipy’s signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter.

Parameters

window – name of the window type which is either a string with the window’s name or a tuple containing the name but also some parameters (e.g. (“kaiser”, 5))
order – order of the filter to create as int or parameters m and k of kzf
cutoff_high – cutoff frequency to use for low-pass filter in frequency of fs
fs – sampling frequency of time series

static _create_full_filter_result_array(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) → xarray.DataArray¶

Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans.

Parameters

template_array – this array is used as template for shape and ordering of dims
result_array – array with data that are filled into template
new_dim – new dimension which is shifted/appended to/at the end (if present or not)
display_name – string that is attached to logging (default None)

class mlair.helpers.filter.ClimateFIRFilter(data, fs, order, cutoff, window, time_dim, var_dim, apriori=None, apriori_type=None, apriori_diurnal=False, sel_opts=None, plot_path=None, minimum_length=None, new_dim=None, display_name=None, extend_length_opts: int = 0, extend_end: Union[dict, int] = 0, plot_dates=None, offset: int = 0)¶

Bases: FIRFilter

run(self)¶

_check_sel_opts(self)¶

static _next_order(order: list, minimum_length: Union[int, None], pos: int, window: Union[str, tuple]) → int ¶

static create_monthly_unity_array(data: xarray.DataArray, time_dim: str, extend_range: int = 366) → xarray.DataArray¶

Create a xarray data array filled with ones with monthly resolution (set on 16th of month). Data is extended by extend_range days in future and past along time_dim.

Parameters

data – data to create monthly unity array from, must contain dimension time_dim
time_dim – name of temporal dimension
extend_range – number of days to extend data (default 366)

Returns

xarray in monthly resolution (centered at 16th day of month) with all values equal to 1

create_monthly_mean(self, data: xarray.DataArray, time_dim: str, sel_opts: dict = None, sampling: str = '1d') → xarray.DataArray¶

Calculate monthly means (12 values) and return a data array with same resolution as given data containing these monthly mean values. Sampling points are the 16th of each month (this value is equal to the true monthly mean) and all other values between two points are interpolated linearly. It is possible to apply some pre-selection to use only a subset of given data using the sel_opts parameter. Only data from this subset are used to calculate the monthly statistic.

Parameters

data – data to apply statistical calculation on
time_dim – name of temporal axis
sel_opts – selection options as dict to select a subset of data (default None). A given sel_opts with sel_opts={<time_dim>: “2006”} forces the method e.g. to derive the monthly means only from data of the year 2006.
sampling – sampling of the returned data (default 1d)

Returns

array in desired resolution containing interpolated monthly values. Months with no valid data are returned as np.nan which also effects data in the neighbouring months (before / after sampling points which are the 16th of each month).

static _compute_hourly_mean_per_month(data: xarray.DataArray, time_dim: str, as_anomaly: bool) → Dict[int, xarray.DataArray]¶

Calculate for each hour in each month a separate mean value (12 x 24 values in total). Average is either the anomaly of a monthly mean state or the raw mean value.

Parameters

data – data to calculate averages on
time_dim – name of temporal dimension
as_anomaly – indicates whether to calculate means as anomaly of a monthly mean or as raw mean values.

Returns

dictionary containing 12 months each with a 24-valued array (1 entry for each hour)

static _create_seasonal_cycle_of_single_hour_mean(result_arr: xarray.DataArray, means: Dict[int, xarray.DataArray], hour: int, time_dim: str, sampling: str) → xarray.DataArray¶

Use monthly means of a given hour to create an array with interpolated values at the indicated hour for each day of the full time span indicated by given result_arr.

Parameters

result_arr – template array indicating the full time range and additional dimensions to keep
means – dictionary containing 24 hourly averages for each month (12 x 24 values in total)
hour – integer of hour of interest
time_dim – name of temporal dimension
sampling – sampling rate to interpolate

Returns

array with interpolated averages in sampling resolution containing only values for hour of interest

create_seasonal_hourly_mean(self, data: xarray.DataArray, time_dim: str, sel_opts: Dict[str, Any] = None, sampling: str = '1H', as_anomaly: bool = True) → xarray.DataArray¶

Compute climatological statistics on hourly base either as raw data or anomalies. For each month, an overall mean value (only used if requiring anomalies) and the mean of each hour are calculated. The climatological diurnal cycle is positioned on the 16th of each month and interpolated in between by using a distinct interpolation for each hour of day. The returned array therefore contains data with a yearly cycle (if anomaly is not calculated) or data without a yearly cycle (if using anomalies). In both cases, the data have an amplitude that varies over the year.

Parameters

data – data to apply this method to
time_dim – name of temporal axis
sel_opts – specific selection options that are applied before calculation of climatological statistics (default None)
sampling – temporal resolution of data (default “1H”)
as_anomaly – specify whether to use anomalies or raw data including a seasonal cycle of the mean value (default: True)

Returns

climatological statistics for given data interpolated with given sampling rate

static extend_apriori(data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, sampling: str = '1d', display_name: str = None) → xarray.DataArray¶

Extend time range of apriori information to span a longer period as data (or at least of equal length). This method may not working properly if length of apriori contains data from less then one year.

Parameters

data – data to get time range of which apriori should span in minimum
apriori – data that is adjusted. It is assumed that this data varies in the course of the year but is same for the same day in different years. Otherwise this method will introduce some unintended artefacts in the apriori data.
time_dim – name of temporal dimension
sampling – sampling of data (e.g. “1m”, “1d”, default “1d”)
display_name – name to use for logging message (default None)

Returns

array which adjusted temporal coverage derived from apriori

static get_forecast_run_delta(data, time_dim)¶

combine_observation_and_apriori(self, data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, new_dim: str, extend_length_history: int, extend_length_future: int, extend_length_separator: int = 0, forecasts: xarray.DataArray = None, sampling: str = '1H', extend_end: int = 0, offset: int = 0) → xarray.DataArray¶

Combine historical data / observations (“data”) and climatological statistics (“apriori”). Historical data are used on interval [t0 - extend_length_history, t0] and apriori is used on [t0 + 1, t0 + extend_length_future]. If indicated by the extend_length_seperator, it is possible to shift end of history interval and start of apriori interval by given number of time steps.

Parameters

data – historical data for past values, must contain dimensions time_dim and var_dim and might also have a new_dim dimension
apriori – climatological estimate for future values, must contain dimensions time_dim and var_dim, but can also have dimension new_dim
time_dim – name of temporal dimension
new_dim – name of new dim on which data is combined along
extend_length_history – number of time steps to use from data
extend_length_future – number of time steps to use from apriori (minus 1)
extend_length_separator – position of last history value to use (default 0), this position indicates the last value that is used from data (followed by values from apriori). In other words, end of history interval and start of apriori interval are shifted by this value from t0 (positive or negative).

Returns

combined data array

static create_full_time_dim(data, dim, freq)¶: Ensure time dimension to be equidistant. Sometimes dates if missing values have been dropped.

create_pseudo_timeseries(self, data, time_dim, sampling, window_dim)¶

create_visualization(self, filtered, data, filter_input_data, plot_dates, time_dim, new_dim, sampling, extend_length_history, extend_length_future, minimum_length, h, variable_name, extend_length_opts=None, extend_end=None, offset=None, forecast=None)¶

static _get_year_interval(data: xarray.DataArray, time_dim: str) → Tuple[int, int]¶

Get year of start and end date of given data.

Parameters

data – data to extract dates from
time_dim – name of temporal axis

Returns

two-element tuple with start and end

static _calculate_filter_coefficients(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) → numpy.array¶

Calculate filter coefficients for moving window using scipy’s signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter.

Parameters

window – name of the window type which is either a string with the window’s name or a tuple containing the name but also some parameters (e.g. (“kaiser”, 5))
order – order of the filter to create as int or parameters m and k of kzf
cutoff_high – cutoff frequency to use for low-pass filter in frequency of fs
fs – sampling frequency of time series

static _trim_data_to_minimum_length(data: xarray.DataArray, extend_length_history: int, dim: str, extend_length_future: int = 0, offset: int = 0) → xarray.DataArray¶

Trim data along given axis between either -minimum_length (if given) or -extend_length_history and extend_length_opts (which is default set to 0).

Parameters

data – data to trim
extend_length_history – start number for trim range, only used if parameter minimum_length is not provided
dim – dim to apply trim on
extend_length_future – number to use in “future”

Returns

trimmed data

static _create_full_filter_result_array(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) → xarray.DataArray¶

Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans.

Parameters

template_array – this array is used as template for shape and ordering of dims
result_array – array with data that are filled into template
new_dim – new dimension which is shifted/appended to/at the end (if present or not)
display_name – string that is attached to logging (default None)

clim_filter(self, data, fs, cutoff_high, order, apriori=None, sel_opts=None, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=0, next_order=0, new_dim='window', plot_dates=None, display_name=None, extend_opts: int = 0, extend_end: int = 0, forecasts=None, offset: int = 0)¶

static _create_time_range_extend(year: int, sampling: str, extend_length: int) → slice ¶

Create a slice object for given year plus extend_length in sampling resolution.

Parameters

year – year to create time range for
sampling – sampling of time range
extend_length – number of time steps to extend out of given year

Returns

slice object with time range

static _create_tmp_dimension(data: xarray.DataArray) → str ¶

Create a tmp dimension with name ‘window’ preferably. If name is already part of one dimensions, tmp dimension name is multiplied by itself until not present in dims. Method will raise ValueError after 10 tries.

Parameters: data – data array to create a new tmp dimension for with unique name
Returns: valid name for a tmp dimension (preferably ‘window’)

_shift_data(self, data: xarray.DataArray, index_value: range, time_dim: str, new_dim: str) → xarray.DataArray¶

Shift data multiple times to create history or future along dimension new_dim for each time step.

Parameters

data – data set to shift
index_value – range of integers to span history and/or future
time_dim – name of temporal dimension that should be shifted
new_dim – name of dimension create by data shift

Returns

shifted data

static create_index_array(index_name: str, index_value: range)¶

Create index array from a range object to use as index of a data array.

Parameters

index_name – name of the index dimension
index_value – range of values to use as indexes

Returns

index array for given range of values

property apriori_data(self)¶

property initial_apriori_data(self)¶

mlair.helpers.filter.fir_filter(data, fs, order=5, cutoff_low=None, cutoff_high=None, window='hamming', dim='variables', h=None, causal=True, padlen=None)¶: Expects xarray.

mlair.helpers.filter.fir_filter_convolve(data, h)¶

class mlair.helpers.filter.KolmogorovZurbenkoBaseClass(df, wl, itr, is_child=False, filter_dim='window')¶

set_child(self)¶

kz_filter(self, df, m, k)¶

spectral_calc(self)¶

static subtract(minuend, subtrahend)¶

run(self)¶

transfer_function(self)¶

omega_null(self, alpha=0.5)¶

period_null(self, alpha=0.5)¶

period_null_days(self, alpha=0.5)¶

plot_transfer_function(self, fig=None, name=None)¶

class mlair.helpers.filter.KolmogorovZurbenkoFilterMovingWindow(df, wl: Union[list, int], itr: Union[list, int], is_child=False, filter_dim='window', method='mean', percentile=0.5)¶

Bases: KolmogorovZurbenkoBaseClass

set_child(self)¶

kz_filter_new(self, df, wl, itr)¶

It passes the low frequency time series.

If filter method is from mean, max, min this method will call construct and rechunk before the actual calculation to improve performance. If filter method is either median or percentile this approach is not applicable and depending on the data and window size, this method can become slow.

Parameters

wl (int) – a window length
itr (int) – a number of iteration

kz_filter(self, df, wl, itr)¶

It passes the low frequency time series.

Parameters

wl (int) – a window length
itr (int) – a number of iteration

mlair.helpers.filter.firwin_kzf(m: int, k: int) → numpy.array¶: Calculate weights of window for Kolmogorov Zurbenko filter.

mlair.helpers.filter.omega_null_kzf(m: int, k: int, alpha: float = 0.5) → float ¶

mlair.helpers.filter.filter_width_kzf(m: int, k: int) → int ¶: Returns window width of the Kolmorogov Zurbenko filter.

mlair.helpers.filter¶

Module Contents¶

Classes¶

Functions¶

`mlair.helpers.filter`¶