:py:mod:`mlair.helpers.filter` ============================== .. py:module:: mlair.helpers.filter Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: mlair.helpers.filter.FIRFilter mlair.helpers.filter.ClimateFIRFilter mlair.helpers.filter.KolmogorovZurbenkoBaseClass mlair.helpers.filter.KolmogorovZurbenkoFilterMovingWindow Functions ~~~~~~~~~ .. autoapisummary:: mlair.helpers.filter.fir_filter mlair.helpers.filter.fir_filter_convolve mlair.helpers.filter.firwin_kzf mlair.helpers.filter.omega_null_kzf mlair.helpers.filter.filter_width_kzf .. py:class:: FIRFilter(data, fs, order, cutoff, window, var_dim, time_dim, display_name=None, minimum_length=None, extend_end=0, plot_path=None, plot_dates=None, offset=0) .. py:method:: run(self) .. py:method:: create_visualization(self, filtered, filter_input_data, plot_dates, time_dim, sampling, h, minimum_length, order, i, extend_end, var_dim) .. py:method:: filter_coefficients(self) :property: .. py:method:: filtered_data(self) :property: .. py:method:: fir_filter(self, data, fs, cutoff_high, order, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=None, new_dim='window', plot_dates=None, display_name=None) .. py:method:: _calculate_filter_coefficients(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) -> numpy.array :staticmethod: Calculate filter coefficients for moving window using scipy's signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter. :param window: name of the window type which is either a string with the window's name or a tuple containing the name but also some parameters (e.g. `("kaiser", 5)`) :param order: order of the filter to create as int or parameters m and k of kzf :param cutoff_high: cutoff frequency to use for low-pass filter in frequency of fs :param fs: sampling frequency of time series .. py:method:: _create_full_filter_result_array(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) -> xarray.DataArray :staticmethod: Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans. :param template_array: this array is used as template for shape and ordering of dims :param result_array: array with data that are filled into template :param new_dim: new dimension which is shifted/appended to/at the end (if present or not) :param display_name: string that is attached to logging (default None) .. py:class:: ClimateFIRFilter(data, fs, order, cutoff, window, time_dim, var_dim, apriori=None, apriori_type=None, apriori_diurnal=False, sel_opts=None, plot_path=None, minimum_length=None, new_dim=None, display_name=None, extend_length_opts: int = 0, extend_end: Union[dict, int] = 0, plot_dates=None, offset: int = 0) Bases: :py:obj:`FIRFilter` .. py:method:: run(self) .. py:method:: _check_sel_opts(self) .. py:method:: _next_order(order: list, minimum_length: Union[int, None], pos: int, window: Union[str, tuple]) -> int :staticmethod: .. py:method:: create_monthly_unity_array(data: xarray.DataArray, time_dim: str, extend_range: int = 366) -> xarray.DataArray :staticmethod: Create a xarray data array filled with ones with monthly resolution (set on 16th of month). Data is extended by extend_range days in future and past along time_dim. :param data: data to create monthly unity array from, must contain dimension time_dim :param time_dim: name of temporal dimension :param extend_range: number of days to extend data (default 366) :returns: xarray in monthly resolution (centered at 16th day of month) with all values equal to 1 .. py:method:: create_monthly_mean(self, data: xarray.DataArray, time_dim: str, sel_opts: dict = None, sampling: str = '1d') -> xarray.DataArray Calculate monthly means (12 values) and return a data array with same resolution as given data containing these monthly mean values. Sampling points are the 16th of each month (this value is equal to the true monthly mean) and all other values between two points are interpolated linearly. It is possible to apply some pre-selection to use only a subset of given data using the sel_opts parameter. Only data from this subset are used to calculate the monthly statistic. :param data: data to apply statistical calculation on :param time_dim: name of temporal axis :param sel_opts: selection options as dict to select a subset of data (default None). A given sel_opts with `sel_opts={: "2006"}` forces the method e.g. to derive the monthly means only from data of the year 2006. :param sampling: sampling of the returned data (default 1d) :returns: array in desired resolution containing interpolated monthly values. Months with no valid data are returned as np.nan which also effects data in the neighbouring months (before / after sampling points which are the 16th of each month). .. py:method:: _compute_hourly_mean_per_month(data: xarray.DataArray, time_dim: str, as_anomaly: bool) -> Dict[int, xarray.DataArray] :staticmethod: Calculate for each hour in each month a separate mean value (12 x 24 values in total). Average is either the anomaly of a monthly mean state or the raw mean value. :param data: data to calculate averages on :param time_dim: name of temporal dimension :param as_anomaly: indicates whether to calculate means as anomaly of a monthly mean or as raw mean values. :returns: dictionary containing 12 months each with a 24-valued array (1 entry for each hour) .. py:method:: _create_seasonal_cycle_of_single_hour_mean(result_arr: xarray.DataArray, means: Dict[int, xarray.DataArray], hour: int, time_dim: str, sampling: str) -> xarray.DataArray :staticmethod: Use monthly means of a given hour to create an array with interpolated values at the indicated hour for each day of the full time span indicated by given result_arr. :param result_arr: template array indicating the full time range and additional dimensions to keep :param means: dictionary containing 24 hourly averages for each month (12 x 24 values in total) :param hour: integer of hour of interest :param time_dim: name of temporal dimension :param sampling: sampling rate to interpolate :returns: array with interpolated averages in sampling resolution containing only values for hour of interest .. py:method:: create_seasonal_hourly_mean(self, data: xarray.DataArray, time_dim: str, sel_opts: Dict[str, Any] = None, sampling: str = '1H', as_anomaly: bool = True) -> xarray.DataArray Compute climatological statistics on hourly base either as raw data or anomalies. For each month, an overall mean value (only used if requiring anomalies) and the mean of each hour are calculated. The climatological diurnal cycle is positioned on the 16th of each month and interpolated in between by using a distinct interpolation for each hour of day. The returned array therefore contains data with a yearly cycle (if anomaly is not calculated) or data without a yearly cycle (if using anomalies). In both cases, the data have an amplitude that varies over the year. :param data: data to apply this method to :param time_dim: name of temporal axis :param sel_opts: specific selection options that are applied before calculation of climatological statistics (default None) :param sampling: temporal resolution of data (default "1H") :param as_anomaly: specify whether to use anomalies or raw data including a seasonal cycle of the mean value (default: True) :returns: climatological statistics for given data interpolated with given sampling rate .. py:method:: extend_apriori(data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, sampling: str = '1d', display_name: str = None) -> xarray.DataArray :staticmethod: Extend time range of apriori information to span a longer period as data (or at least of equal length). This method may not working properly if length of apriori contains data from less then one year. :param data: data to get time range of which apriori should span in minimum :param apriori: data that is adjusted. It is assumed that this data varies in the course of the year but is same for the same day in different years. Otherwise this method will introduce some unintended artefacts in the apriori data. :param time_dim: name of temporal dimension :param sampling: sampling of data (e.g. "1m", "1d", default "1d") :param display_name: name to use for logging message (default None) :returns: array which adjusted temporal coverage derived from apriori .. py:method:: get_forecast_run_delta(data, time_dim) :staticmethod: .. py:method:: combine_observation_and_apriori(self, data: xarray.DataArray, apriori: xarray.DataArray, time_dim: str, new_dim: str, extend_length_history: int, extend_length_future: int, extend_length_separator: int = 0, forecasts: xarray.DataArray = None, sampling: str = '1H', extend_end: int = 0, offset: int = 0) -> xarray.DataArray Combine historical data / observations ("data") and climatological statistics ("apriori"). Historical data are used on interval [t0 - extend_length_history, t0] and apriori is used on [t0 + 1, t0 + extend_length_future]. If indicated by the extend_length_seperator, it is possible to shift end of history interval and start of apriori interval by given number of time steps. :param data: historical data for past values, must contain dimensions time_dim and var_dim and might also have a new_dim dimension :param apriori: climatological estimate for future values, must contain dimensions time_dim and var_dim, but can also have dimension new_dim :param time_dim: name of temporal dimension :param new_dim: name of new dim on which data is combined along :param extend_length_history: number of time steps to use from data :param extend_length_future: number of time steps to use from apriori (minus 1) :param extend_length_separator: position of last history value to use (default 0), this position indicates the last value that is used from data (followed by values from apriori). In other words, end of history interval and start of apriori interval are shifted by this value from t0 (positive or negative). :returns: combined data array .. py:method:: create_full_time_dim(data, dim, freq) :staticmethod: Ensure time dimension to be equidistant. Sometimes dates if missing values have been dropped. .. py:method:: create_pseudo_timeseries(self, data, time_dim, sampling, window_dim) .. py:method:: create_visualization(self, filtered, data, filter_input_data, plot_dates, time_dim, new_dim, sampling, extend_length_history, extend_length_future, minimum_length, h, variable_name, extend_length_opts=None, extend_end=None, offset=None, forecast=None) .. py:method:: _get_year_interval(data: xarray.DataArray, time_dim: str) -> Tuple[int, int] :staticmethod: Get year of start and end date of given data. :param data: data to extract dates from :param time_dim: name of temporal axis :returns: two-element tuple with start and end .. py:method:: _calculate_filter_coefficients(window: Union[str, tuple], order: Union[int, tuple], cutoff_high: float, fs: float) -> numpy.array :staticmethod: Calculate filter coefficients for moving window using scipy's signal package for common filter types and local method firwin_kzf for Kolmogorov Zurbenko filter (kzf). The filter is a low-pass filter. :param window: name of the window type which is either a string with the window's name or a tuple containing the name but also some parameters (e.g. `("kaiser", 5)`) :param order: order of the filter to create as int or parameters m and k of kzf :param cutoff_high: cutoff frequency to use for low-pass filter in frequency of fs :param fs: sampling frequency of time series .. py:method:: _trim_data_to_minimum_length(data: xarray.DataArray, extend_length_history: int, dim: str, extend_length_future: int = 0, offset: int = 0) -> xarray.DataArray :staticmethod: Trim data along given axis between either -minimum_length (if given) or -extend_length_history and extend_length_opts (which is default set to 0). :param data: data to trim :param extend_length_history: start number for trim range, only used if parameter minimum_length is not provided :param dim: dim to apply trim on :param extend_length_future: number to use in "future" :returns: trimmed data .. py:method:: _create_full_filter_result_array(template_array: xarray.DataArray, result_array: xarray.DataArray, new_dim: str, display_name: str = None) -> xarray.DataArray :staticmethod: Create result filter array with same shape line given template data (should be the original input data before filtering the data). All gaps are filled by nans. :param template_array: this array is used as template for shape and ordering of dims :param result_array: array with data that are filled into template :param new_dim: new dimension which is shifted/appended to/at the end (if present or not) :param display_name: string that is attached to logging (default None) .. py:method:: clim_filter(self, data, fs, cutoff_high, order, apriori=None, sel_opts=None, sampling='1d', time_dim='datetime', var_dim='variables', window: Union[str, Tuple] = 'hamming', minimum_length=0, next_order=0, new_dim='window', plot_dates=None, display_name=None, extend_opts: int = 0, extend_end: int = 0, forecasts=None, offset: int = 0) .. py:method:: _create_time_range_extend(year: int, sampling: str, extend_length: int) -> slice :staticmethod: Create a slice object for given year plus extend_length in sampling resolution. :param year: year to create time range for :param sampling: sampling of time range :param extend_length: number of time steps to extend out of given year :returns: slice object with time range .. py:method:: _create_tmp_dimension(data: xarray.DataArray) -> str :staticmethod: Create a tmp dimension with name 'window' preferably. If name is already part of one dimensions, tmp dimension name is multiplied by itself until not present in dims. Method will raise ValueError after 10 tries. :param data: data array to create a new tmp dimension for with unique name :returns: valid name for a tmp dimension (preferably 'window') .. py:method:: _shift_data(self, data: xarray.DataArray, index_value: range, time_dim: str, new_dim: str) -> xarray.DataArray Shift data multiple times to create history or future along dimension new_dim for each time step. :param data: data set to shift :param index_value: range of integers to span history and/or future :param time_dim: name of temporal dimension that should be shifted :param new_dim: name of dimension create by data shift :return: shifted data .. py:method:: create_index_array(index_name: str, index_value: range) :staticmethod: Create index array from a range object to use as index of a data array. :param index_name: name of the index dimension :param index_value: range of values to use as indexes :returns: index array for given range of values .. py:method:: apriori_data(self) :property: .. py:method:: initial_apriori_data(self) :property: .. py:function:: fir_filter(data, fs, order=5, cutoff_low=None, cutoff_high=None, window='hamming', dim='variables', h=None, causal=True, padlen=None) Expects xarray. .. py:function:: fir_filter_convolve(data, h) .. py:class:: KolmogorovZurbenkoBaseClass(df, wl, itr, is_child=False, filter_dim='window') .. py:method:: set_child(self) .. py:method:: kz_filter(self, df, m, k) .. py:method:: spectral_calc(self) .. py:method:: subtract(minuend, subtrahend) :staticmethod: .. py:method:: run(self) .. py:method:: transfer_function(self) .. py:method:: omega_null(self, alpha=0.5) .. py:method:: period_null(self, alpha=0.5) .. py:method:: period_null_days(self, alpha=0.5) .. py:method:: plot_transfer_function(self, fig=None, name=None) .. py:class:: KolmogorovZurbenkoFilterMovingWindow(df, wl: Union[list, int], itr: Union[list, int], is_child=False, filter_dim='window', method='mean', percentile=0.5) Bases: :py:obj:`KolmogorovZurbenkoBaseClass` .. py:method:: set_child(self) .. py:method:: kz_filter_new(self, df, wl, itr) It passes the low frequency time series. If filter method is from mean, max, min this method will call construct and rechunk before the actual calculation to improve performance. If filter method is either median or percentile this approach is not applicable and depending on the data and window size, this method can become slow. :param wl: a window length :type wl: int :param itr: a number of iteration :type itr: int .. py:method:: kz_filter(self, df, wl, itr) It passes the low frequency time series. :param wl: a window length :type wl: int :param itr: a number of iteration :type itr: int .. py:function:: firwin_kzf(m: int, k: int) -> numpy.array Calculate weights of window for Kolmogorov Zurbenko filter. .. py:function:: omega_null_kzf(m: int, k: int, alpha: float = 0.5) -> float .. py:function:: filter_width_kzf(m: int, k: int) -> int Returns window width of the Kolmorogov Zurbenko filter.