mlair.helpers.filter
¶
Module Contents¶
Classes¶
Functions¶
|
Expects xarray. |
|
|
|
Calculate weights of window for Kolmogorov Zurbenko filter. |
|
|
|
Returns window width of the Kolmorogov Zurbenko filter. |
-
class
mlair.helpers.filter.
FIRFilter
(data, fs, order, cutoff, window, var_dim, time_dim, display_name=None, minimum_length=None, extend_end=0, plot_path=None, plot_dates=None, offset=0)¶ -
run
(self)¶
-
create_visualization
(self, filtered, filter_input_data, plot_dates, time_dim, sampling, h, minimum_length, order, i, extend_end, var_dim)¶
-
property
filter_coefficients
(self)¶
-
property
filtered_data
(self)¶
-
fir_filter
(self, data, fs, cutoff_high, order, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=None, new_dim='window', plot_dates=None, display_name=None)¶
-
static
_calculate_filter_coefficients
(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) → numpy.array¶ Calculate filter coefficients for moving window using scipy’s signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter.
- Parameters
window – name of the window type which is either a string with the window’s name or a tuple containing the name but also some parameters (e.g. (“kaiser”, 5))
order – order of the filter to create as int or parameters m and k of kzf
cutoff_high – cutoff frequency to use for low-pass filter in frequency of fs
fs – sampling frequency of time series
-
static
_create_full_filter_result_array
(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) → xarray.DataArray¶ Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans.
- Parameters
template_array – this array is used as template for shape and ordering of dims
result_array – array with data that are filled into template
new_dim – new dimension which is shifted/appended to/at the end (if present or not)
display_name – string that is attached to logging (default None)
-
-
class
mlair.helpers.filter.
ClimateFIRFilter
(data, fs, order, cutoff, window, time_dim, var_dim, apriori=None, apriori_type=None, apriori_diurnal=False, sel_opts=None, plot_path=None, minimum_length=None, new_dim=None, display_name=None, extend_length_opts: int = 0, extend_end: Union[dict, int] = 0, plot_dates=None, offset: int = 0)¶ Bases:
FIRFilter
-
run
(self)¶
-
_check_sel_opts
(self)¶
-
static
_next_order
(order: list, minimum_length: Union[int, None], pos: int, window: Union[str, tuple]) → int¶
-
static
create_monthly_unity_array
(data: xarray.DataArray, time_dim: str, extend_range: int = 366) → xarray.DataArray¶ Create a xarray data array filled with ones with monthly resolution (set on 16th of month). Data is extended by extend_range days in future and past along time_dim.
- Parameters
data – data to create monthly unity array from, must contain dimension time_dim
time_dim – name of temporal dimension
extend_range – number of days to extend data (default 366)
- Returns
xarray in monthly resolution (centered at 16th day of month) with all values equal to 1
-
create_monthly_mean
(self, data: xarray.DataArray, time_dim: str, sel_opts: dict = None, sampling: str = '1d') → xarray.DataArray¶ Calculate monthly means (12 values) and return a data array with same resolution as given data containing these monthly mean values. Sampling points are the 16th of each month (this value is equal to the true monthly mean) and all other values between two points are interpolated linearly. It is possible to apply some pre-selection to use only a subset of given data using the sel_opts parameter. Only data from this subset are used to calculate the monthly statistic.
- Parameters
data – data to apply statistical calculation on
time_dim – name of temporal axis
sel_opts – selection options as dict to select a subset of data (default None). A given sel_opts with sel_opts={<time_dim>: “2006”} forces the method e.g. to derive the monthly means only from data of the year 2006.
sampling – sampling of the returned data (default 1d)
- Returns
array in desired resolution containing interpolated monthly values. Months with no valid data are returned as np.nan which also effects data in the neighbouring months (before / after sampling points which are the 16th of each month).
-
static
_compute_hourly_mean_per_month
(data: xarray.DataArray, time_dim: str, as_anomaly: bool) → Dict[int, xarray.DataArray]¶ Calculate for each hour in each month a separate mean value (12 x 24 values in total). Average is either the anomaly of a monthly mean state or the raw mean value.
- Parameters
data – data to calculate averages on
time_dim – name of temporal dimension
as_anomaly – indicates whether to calculate means as anomaly of a monthly mean or as raw mean values.
- Returns
dictionary containing 12 months each with a 24-valued array (1 entry for each hour)
-
static
_create_seasonal_cycle_of_single_hour_mean
(result_arr: xarray.DataArray, means: Dict[int, xarray.DataArray], hour: int, time_dim: str, sampling: str) → xarray.DataArray¶ Use monthly means of a given hour to create an array with interpolated values at the indicated hour for each day of the full time span indicated by given result_arr.
- Parameters
result_arr – template array indicating the full time range and additional dimensions to keep
means – dictionary containing 24 hourly averages for each month (12 x 24 values in total)
hour – integer of hour of interest
time_dim – name of temporal dimension
sampling – sampling rate to interpolate
- Returns
array with interpolated averages in sampling resolution containing only values for hour of interest
-
create_seasonal_hourly_mean
(self, data: xarray.DataArray, time_dim: str, sel_opts: Dict[str, Any] = None, sampling: str = '1H', as_anomaly: bool = True) → xarray.DataArray¶ Compute climatological statistics on hourly base either as raw data or anomalies. For each month, an overall mean value (only used if requiring anomalies) and the mean of each hour are calculated. The climatological diurnal cycle is positioned on the 16th of each month and interpolated in between by using a distinct interpolation for each hour of day. The returned array therefore contains data with a yearly cycle (if anomaly is not calculated) or data without a yearly cycle (if using anomalies). In both cases, the data have an amplitude that varies over the year.
- Parameters
data – data to apply this method to
time_dim – name of temporal axis
sel_opts – specific selection options that are applied before calculation of climatological statistics (default None)
sampling – temporal resolution of data (default “1H”)
as_anomaly – specify whether to use anomalies or raw data including a seasonal cycle of the mean value (default: True)
- Returns
climatological statistics for given data interpolated with given sampling rate
-
static
extend_apriori
(data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, sampling: str = '1d', display_name: str = None) → xarray.DataArray¶ Extend time range of apriori information to span a longer period as data (or at least of equal length). This method may not working properly if length of apriori contains data from less then one year.
- Parameters
data – data to get time range of which apriori should span in minimum
apriori – data that is adjusted. It is assumed that this data varies in the course of the year but is same for the same day in different years. Otherwise this method will introduce some unintended artefacts in the apriori data.
time_dim – name of temporal dimension
sampling – sampling of data (e.g. “1m”, “1d”, default “1d”)
display_name – name to use for logging message (default None)
- Returns
array which adjusted temporal coverage derived from apriori
-
static
get_forecast_run_delta
(data, time_dim)¶
-
combine_observation_and_apriori
(self, data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, new_dim: str, extend_length_history: int, extend_length_future: int, extend_length_separator: int = 0, forecasts: xarray.DataArray = None, sampling: str = '1H', extend_end: int = 0, offset: int = 0) → xarray.DataArray¶ Combine historical data / observations (“data”) and climatological statistics (“apriori”). Historical data are used on interval [t0 - extend_length_history, t0] and apriori is used on [t0 + 1, t0 + extend_length_future]. If indicated by the extend_length_seperator, it is possible to shift end of history interval and start of apriori interval by given number of time steps.
- Parameters
data – historical data for past values, must contain dimensions time_dim and var_dim and might also have a new_dim dimension
apriori – climatological estimate for future values, must contain dimensions time_dim and var_dim, but can also have dimension new_dim
time_dim – name of temporal dimension
new_dim – name of new dim on which data is combined along
extend_length_history – number of time steps to use from data
extend_length_future – number of time steps to use from apriori (minus 1)
extend_length_separator – position of last history value to use (default 0), this position indicates the last value that is used from data (followed by values from apriori). In other words, end of history interval and start of apriori interval are shifted by this value from t0 (positive or negative).
- Returns
combined data array
-
static
create_full_time_dim
(data, dim, freq)¶ Ensure time dimension to be equidistant. Sometimes dates if missing values have been dropped.
-
create_pseudo_timeseries
(self, data, time_dim, sampling, window_dim)¶
-
create_visualization
(self, filtered, data, filter_input_data, plot_dates, time_dim, new_dim, sampling, extend_length_history, extend_length_future, minimum_length, h, variable_name, extend_length_opts=None, extend_end=None, offset=None, forecast=None)¶
-
static
_get_year_interval
(data: xarray.DataArray, time_dim: str) → Tuple[int, int]¶ Get year of start and end date of given data.
- Parameters
data – data to extract dates from
time_dim – name of temporal axis
- Returns
two-element tuple with start and end
-
static
_calculate_filter_coefficients
(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) → numpy.array¶ Calculate filter coefficients for moving window using scipy’s signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter.
- Parameters
window – name of the window type which is either a string with the window’s name or a tuple containing the name but also some parameters (e.g. (“kaiser”, 5))
order – order of the filter to create as int or parameters m and k of kzf
cutoff_high – cutoff frequency to use for low-pass filter in frequency of fs
fs – sampling frequency of time series
-
static
_trim_data_to_minimum_length
(data: xarray.DataArray, extend_length_history: int, dim: str, extend_length_future: int = 0, offset: int = 0) → xarray.DataArray¶ Trim data along given axis between either -minimum_length (if given) or -extend_length_history and extend_length_opts (which is default set to 0).
- Parameters
data – data to trim
extend_length_history – start number for trim range, only used if parameter minimum_length is not provided
dim – dim to apply trim on
extend_length_future – number to use in “future”
- Returns
trimmed data
-
static
_create_full_filter_result_array
(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) → xarray.DataArray¶ Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans.
- Parameters
template_array – this array is used as template for shape and ordering of dims
result_array – array with data that are filled into template
new_dim – new dimension which is shifted/appended to/at the end (if present or not)
display_name – string that is attached to logging (default None)
-
clim_filter
(self, data, fs, cutoff_high, order, apriori=None, sel_opts=None, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=0, next_order=0, new_dim='window', plot_dates=None, display_name=None, extend_opts: int = 0, extend_end: int = 0, forecasts=None, offset: int = 0)¶
-
static
_create_time_range_extend
(year: int, sampling: str, extend_length: int) → slice¶ Create a slice object for given year plus extend_length in sampling resolution.
- Parameters
year – year to create time range for
sampling – sampling of time range
extend_length – number of time steps to extend out of given year
- Returns
slice object with time range
-
static
_create_tmp_dimension
(data: xarray.DataArray) → str¶ Create a tmp dimension with name ‘window’ preferably. If name is already part of one dimensions, tmp dimension name is multiplied by itself until not present in dims. Method will raise ValueError after 10 tries.
- Parameters
data – data array to create a new tmp dimension for with unique name
- Returns
valid name for a tmp dimension (preferably ‘window’)
-
_shift_data
(self, data: xarray.DataArray, index_value: range, time_dim: str, new_dim: str) → xarray.DataArray¶ Shift data multiple times to create history or future along dimension new_dim for each time step.
- Parameters
data – data set to shift
index_value – range of integers to span history and/or future
time_dim – name of temporal dimension that should be shifted
new_dim – name of dimension create by data shift
- Returns
shifted data
-
static
create_index_array
(index_name: str, index_value: range)¶ Create index array from a range object to use as index of a data array.
- Parameters
index_name – name of the index dimension
index_value – range of values to use as indexes
- Returns
index array for given range of values
-
property
apriori_data
(self)¶
-
property
initial_apriori_data
(self)¶
-
-
mlair.helpers.filter.
fir_filter
(data, fs, order=5, cutoff_low=None, cutoff_high=None, window='hamming', dim='variables', h=None, causal=True, padlen=None)¶ Expects xarray.
-
mlair.helpers.filter.
fir_filter_convolve
(data, h)¶
-
class
mlair.helpers.filter.
KolmogorovZurbenkoBaseClass
(df, wl, itr, is_child=False, filter_dim='window')¶ -
set_child
(self)¶
-
kz_filter
(self, df, m, k)¶
-
spectral_calc
(self)¶
-
static
subtract
(minuend, subtrahend)¶
-
run
(self)¶
-
transfer_function
(self)¶
-
omega_null
(self, alpha=0.5)¶
-
period_null
(self, alpha=0.5)¶
-
period_null_days
(self, alpha=0.5)¶
-
plot_transfer_function
(self, fig=None, name=None)¶
-
-
class
mlair.helpers.filter.
KolmogorovZurbenkoFilterMovingWindow
(df, wl: Union[list, int], itr: Union[list, int], is_child=False, filter_dim='window', method='mean', percentile=0.5)¶ Bases:
KolmogorovZurbenkoBaseClass
-
set_child
(self)¶
-
kz_filter_new
(self, df, wl, itr)¶ It passes the low frequency time series.
If filter method is from mean, max, min this method will call construct and rechunk before the actual calculation to improve performance. If filter method is either median or percentile this approach is not applicable and depending on the data and window size, this method can become slow.
-