:py:mod:`mlair.data_handler.data_handler_single_station` ======================================================== .. py:module:: mlair.data_handler.data_handler_single_station .. autoapi-nested-parse:: Data Preparation class to handle data processing for machine learning. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: mlair.data_handler.data_handler_single_station.DataHandlerSingleStation Attributes ~~~~~~~~~~ .. autoapisummary:: mlair.data_handler.data_handler_single_station.__author__ mlair.data_handler.data_handler_single_station.__date__ mlair.data_handler.data_handler_single_station.date mlair.data_handler.data_handler_single_station.str_or_list mlair.data_handler.data_handler_single_station.number mlair.data_handler.data_handler_single_station.num_or_list mlair.data_handler.data_handler_single_station.data_or_none .. py:data:: __author__ :annotation: = Lukas Leufen, Felix Kleinert .. py:data:: __date__ :annotation: = 2020-07-20 .. py:data:: date .. py:data:: str_or_list .. py:data:: number .. py:data:: num_or_list .. py:data:: data_or_none .. py:class:: DataHandlerSingleStation(station, data_path, statistics_per_var=None, sampling: Union[str, Tuple[str]] = DEFAULT_SAMPLING, target_dim=DEFAULT_TARGET_DIM, target_var=DEFAULT_TARGET_VAR, time_dim=DEFAULT_TIME_DIM, iter_dim=DEFAULT_ITER_DIM, window_dim=DEFAULT_WINDOW_DIM, window_history_size=DEFAULT_WINDOW_HISTORY_SIZE, window_history_offset=DEFAULT_WINDOW_HISTORY_OFFSET, window_history_end=DEFAULT_WINDOW_HISTORY_END, window_lead_time=DEFAULT_WINDOW_LEAD_TIME, interpolation_limit: Union[int, Tuple[int]] = DEFAULT_INTERPOLATION_LIMIT, interpolation_method: Union[str, Tuple[str]] = DEFAULT_INTERPOLATION_METHOD, overwrite_local_data: bool = False, transformation=None, store_data_locally: bool = True, min_length: int = 0, start=None, end=None, variables=None, data_origin: Dict = None, lazy_preprocessing: bool = False, overwrite_lazy_data=False, era5_data_path=None, era5_file_names=None, ifs_data_path=None, ifs_file_names=None, **kwargs) Bases: :py:obj:`mlair.data_handler.abstract_data_handler.AbstractDataHandler` :param window_history_offset: used to shift t0 according to the specified value. :param window_history_end: used to set the last time step that is used to create a sample. A negative value indicates that not all values up to t0 are used, a positive values indicates usage of values at t>t0. Default is 0. .. py:attribute:: DEFAULT_VAR_ALL_DICT .. py:attribute:: DEFAULT_WINDOW_LEAD_TIME :annotation: = 3 .. py:attribute:: DEFAULT_WINDOW_HISTORY_SIZE :annotation: = 13 .. py:attribute:: DEFAULT_WINDOW_HISTORY_OFFSET :annotation: = 0 .. py:attribute:: DEFAULT_WINDOW_HISTORY_END :annotation: = 0 .. py:attribute:: DEFAULT_TIME_DIM :annotation: = datetime .. py:attribute:: DEFAULT_TARGET_VAR :annotation: = o3 .. py:attribute:: DEFAULT_TARGET_DIM :annotation: = variables .. py:attribute:: DEFAULT_ITER_DIM :annotation: = Stations .. py:attribute:: DEFAULT_WINDOW_DIM :annotation: = window .. py:attribute:: DEFAULT_SAMPLING :annotation: = daily .. py:attribute:: DEFAULT_INTERPOLATION_LIMIT :annotation: = 0 .. py:attribute:: DEFAULT_INTERPOLATION_METHOD :annotation: = linear .. py:attribute:: chem_vars :annotation: = ['benzene', 'ch4', 'co', 'ethane', 'no', 'no2', 'nox', 'o3', 'ox', 'pm1', 'pm10', 'pm2p5',... .. py:attribute:: _hash :annotation: = ['station', 'statistics_per_var', 'data_origin', 'sampling', 'target_dim', 'target_var',... .. py:method:: clean_up(self) .. py:method:: __str__(self) Return str(self). .. py:method:: __len__(self) .. py:method:: shape(self) :property: .. py:method:: __repr__(self) Return repr(self). .. py:method:: get_transposed_history(self) -> xarray.DataArray Return history. :return: history with dimensions datetime, window, Stations, variables. .. py:method:: get_transposed_label(self) -> xarray.DataArray Return label. :return: label with dimensions datetime*, window*, Stations, variables. .. py:method:: get_X(self, **kwargs) .. py:method:: get_Y(self, **kwargs) .. py:method:: get_coordinates(self) Return coordinates as dictionary with keys `lon` and `lat`. .. py:method:: call_transform(self, inverse=False) .. py:method:: transform(self, data_in, dim: Union[str, int] = 0, inverse: bool = False, opts=None, transformation_dim=DEFAULT_TARGET_DIM) Transform data according to given transformation settings. This function transforms a xarray.dataarray (along dim) or pandas.DataFrame (along axis) either with mean=0 and std=1 (`method=standardise`) or centers the data with mean=0 and no change in data scale (`method=centre`). Furthermore, this sets an internal instance attribute for later inverse transformation. This method will raise an AssertionError if an internal transform method was already set ('inverse=False') or if the internal transform method, internal mean and internal standard deviation weren't set ('inverse=True'). :param string/int dim: This param is not used for inverse transformation. | for xarray.DataArray as string: name of dimension which should be standardised | for pandas.DataFrame as int: axis of dimension which should be standardised :param inverse: Switch between transformation and inverse transformation. :return: xarray.DataArrays or pandas.DataFrames: #. mean: Mean of data #. std: Standard deviation of data #. data: Standardised data .. py:method:: setup_samples(self) Setup samples. This method prepares and creates samples X, and labels Y. .. py:method:: store_lazy(self) .. py:method:: _create_lazy_data(self) .. py:method:: load_lazy(self) .. py:method:: _extract_lazy(self, lazy_data) .. py:method:: make_input_target(self) .. py:method:: set_inputs_and_targets(self) .. py:method:: make_samples(self) .. py:method:: load_data(self, path, station, statistics_per_var, sampling, store_data_locally=False, data_origin: Dict = None, start=None, end=None) Load data and meta data either from local disk (preferred) or download new data by using a custom download method. Data is either downloaded, if no local data is available or parameter overwrite_local_data is true. In both cases, downloaded data is only stored locally if store_data_locally is not disabled. If this parameter is not set, it is assumed, that data should be saved locally. .. py:method:: check_station_meta(meta, station, data_origin, statistics_per_var) :staticmethod: Search for the entries in meta data and compare the value with the requested values. Will raise a FileNotFoundError if the values mismatch. .. py:method:: check_for_negative_concentrations(self, data: xarray.DataArray, minimum: int = 0) -> xarray.DataArray Set all negative concentrations to zero. Names of all concentrations are extracted from https://join.fz-juelich.de/services/rest/surfacedata/ #2.1 Parameters. Currently, this check is applied on "benzene", "ch4", "co", "ethane", "no", "no2", "nox", "o3", "ox", "pm1", "pm10", "pm2p5", "propane", "so2", and "toluene". :param data: data array containing variables to check :param minimum: minimum value, by default this should be 0 :return: corrected data .. py:method:: setup_data_path(self, data_path: str, sampling: str) .. py:method:: shift(self, data: xarray.DataArray, dim: str, window: int, offset: int = 0) -> xarray.DataArray Shift data multiple times to represent history (if window <= 0) or lead time (if window > 0). :param data: data set to shift :param dim: dimension along shift is applied :param window: number of steps to shift (corresponds to the window length) :param offset: use offset to move the window by as many time steps as given in offset. This can be used, if the index time of a history element is not the last timestamp. E.g. you could use offset=23 when dealing with hourly data in combination with daily data (values from 00 to 23 are aggregated on 00 the same day). :return: shifted data .. py:method:: create_index_array(index_name: str, index_value: Iterable[int], squeeze_dim: str) -> xarray.DataArray :staticmethod: Create an 1D xr.DataArray with given index name and value. :param index_name: name of dimension :param index_value: values of this dimension :return: this array .. py:method:: _set_file_name(path, station, statistics_per_var) :staticmethod: .. py:method:: _set_meta_file_name(path, station, statistics_per_var) :staticmethod: .. py:method:: interpolate(self, data, dim: str, method: str = 'linear', limit: int = None, use_coordinate: Union[bool, str] = True, sampling='daily', **kwargs) Interpolate values according to different methods. (Copy paste from dataarray.interpolate_na) :param dim: Specifies the dimension along which to interpolate. :param method: {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'polynomial', 'barycentric', 'krog', 'pchip', 'spline', 'akima'}, optional String indicating which method to use for interpolation: - 'linear': linear interpolation (Default). Additional keyword arguments are passed to ``numpy.interp`` - 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'polynomial': are passed to ``scipy.interpolate.interp1d``. If method=='polynomial', the ``order`` keyword argument must also be provided. - 'barycentric', 'krog', 'pchip', 'spline', and `akima`: use their respective``scipy.interpolate`` classes. :param limit: default None Maximum number of consecutive NaNs to fill. Must be greater than 0 or None for no limit. :param use_coordinate: default True Specifies which index to use as the x values in the interpolation formulated as `y = f(x)`. If False, values are treated as if eqaully-spaced along `dim`. If True, the IndexVariable `dim` is used. If use_coordinate is a string, it specifies the name of a coordinate variariable to use as the index. :param kwargs: :return: xarray.DataArray .. py:method:: create_full_time_dim(data, dim, sampling) :staticmethod: Ensure time dimension to be equidistant. Sometimes dates if missing values have been dropped. .. py:method:: make_history_window(self, dim_name_of_inputs: str, window: int, dim_name_of_shift: str) -> None Create a xr.DataArray containing history data. Shift the data window+1 times and return a xarray which has a new dimension 'window' containing the shifted data. This is used to represent history in the data. Results are stored in history attribute. :param dim_name_of_inputs: Name of dimension which contains the input variables :param window: number of time steps to look back in history Note: window will be treated as negative value. This should be in agreement with looking back on a time line. Nonetheless positive values are allowed but they are converted to its negative expression :param dim_name_of_shift: Dimension along shift will be applied .. py:method:: make_labels(self, dim_name_of_target: str, target_var: str_or_list, dim_name_of_shift: str, window: int) -> None Create a xr.DataArray containing labels. Labels are defined as the consecutive target values (t+1, ...t+n) following the current time step t. Set label attribute. :param dim_name_of_target: Name of dimension which contains the target variable :param target_var: Name of target variable in 'dimension' :param dim_name_of_shift: Name of dimension on which xarray.DataArray.shift will be applied :param window: lead time of label .. py:method:: make_observation(self, dim_name_of_target: str, target_var: str_or_list, dim_name_of_shift: str) -> None Create a xr.DataArray containing observations. Observations are defined as value of the current time step t. Set observation attribute. :param dim_name_of_target: Name of dimension which contains the observation variable :param target_var: Name of observation variable(s) in 'dimension' :param dim_name_of_shift: Name of dimension on which xarray.DataArray.shift will be applied .. py:method:: remove_nan(self, dim: str) -> None Remove all NAs slices along dim which contain nans in history, label and observation. This is done to present only a full matrix to keras.fit. Update history, label, and observation attribute. :param dim: dimension along the remove is performed. .. py:method:: _slice_prep(self, data: xarray.DataArray, start=None, end=None) -> xarray.DataArray Set start and end date for slicing and execute self._slice(). :param data: data to slice :param coord: name of axis to slice :return: sliced data .. py:method:: _slice(data: xarray.DataArray, start: Union[date, str], end: Union[date, str], coord: str) -> xarray.DataArray :staticmethod: Slice through a given data_item (for example select only values of 2011). :param data: data to slice :param start: start date of slice :param end: end date of slice :param coord: name of axis to slice :return: sliced data .. py:method:: setup_transformation(self, transformation: Union[None, dict, Tuple]) -> Tuple[Optional[dict], Optional[dict]] Set up transformation by extracting all relevant information. * Either return new empty DataClass instances if given transformation arg is None, * or return given object twice if transformation is a DataClass instance, * or return the inputs and targets attributes if transformation is a TransformationClass instance (default design behaviour) .. py:method:: check_inverse_transform_params(method: str, mean=None, std=None, min=None, max=None) -> None :staticmethod: Support inverse_transformation method. Validate if all required statistics are available for given method. E.g. centering requires mean only, whereas normalisation requires mean and standard deviation. Will raise an AttributeError on missing requirements. :param mean: data with all mean values :param std: data with all standard deviation values :param method: name of transformation method .. py:method:: inverse_transform(self, data_in, opts, transformation_dim) -> xarray.DataArray Perform inverse transformation. Will raise an AssertionError, if no transformation was performed before. Checks first, if all required statistics are available for inverse transformation. Class attributes data, mean and std are overwritten by new data afterwards. Thereby, mean, std, and the private transform method are set to None to indicate, that the current data is not transformed. .. py:method:: apply_transformation(self, data, base=None, dim=0, inverse=False) Apply transformation on external data. Specify if transformation should be based on parameters related to input or target data using `base`. This method can also apply inverse transformation. :param data: :param base: :param dim: :param inverse: :return: .. py:method:: _hash_list(self) .. py:method:: _get_hash(self)