mlair.helpers.data_sources.join

Functions to access join database.

Module Contents

Functions

download_join(station_name: Union[str, List[str]], stat_var: dict, station_type: str = None, sampling: str = ‘daily’, data_origin: Dict = None) → [pandas.DataFrame, pandas.DataFrame]

Read data from JOIN/TOAR.

_correct_meta(meta)

split_network_and_origin(origin_network_dict: dict) → Tuple[Union[None, dict], Union[None, dict]]

Split given dict into network and data origin.

filter_network(network: list) → Union[list, None]

Filter given list of networks.

correct_data_format(data)

Transform to the standard data format.

load_series_information(station_name: List[str], station_type: str_or_none, network_name: str_or_none, join_url_base: str, headers: Dict, data_origin: Dict = None, stat_var: Dict = None) → [Dict, Dict]

List all series ids that are available for given station id and network name.

_create_parameter_name_opts(stat_var)

_create_network_name_opts(network_name)

_select_distinct_series(vars: List[Dict], data_origin: Dict = None, network_name: Union[str, List[str]] = None) → [Dict, Dict]

Select distinct series ids for all variables. Also check if a parameter is from REA or not.

_select_distinct_network(vars: dict, network_name: Union[list, dict]) → dict

Select distinct series regarding network name. The order the network names are provided in parameter network_name

_select_distinct_data_origin(vars: List[Dict], data_origin: Dict) → (Dict[str, List], Dict)

Select distinct series regarding their data origin. Series are grouped as list according to their variable’s name.

_save_to_pandas(df: Union[pandas.DataFrame, None], data: dict, stat: str, var: str) → pandas.DataFrame

Save given data in data frame.

_lower_list(args: List[str]) → Iterator[str]

Lower all elements of given list.

Attributes

__author__

__date__

str_or_none

var_all_dic

mlair.helpers.data_sources.join.__author__ = Felix Kleinert, Lukas Leufen
mlair.helpers.data_sources.join.__date__ = 2019-10-16
mlair.helpers.data_sources.join.str_or_none
mlair.helpers.data_sources.join.download_join(station_name: Union[str, List[str]], stat_var: dict, station_type: str = None, sampling: str = 'daily', data_origin: Dict = None)[pandas.DataFrame, pandas.DataFrame]

Read data from JOIN/TOAR.

Parameters
  • station_name – Station name e.g. DEBY122

  • stat_var – key as variable like ‘O3’, values as statistics on keys like ‘mean’

  • station_type – set the station type like “traffic” or “background”, can be none

  • sampling – sampling rate of the downloaded data, either set to daily or hourly (default daily)

  • data_origin – additional dictionary to specify data origin as key (for variable) value (origin) pair. Valid origins are “REA” for reanalysis data and “” (empty string) for observational data.

Returns

data frame with all variables and statistics and meta data frame with all meta information

mlair.helpers.data_sources.join._correct_meta(meta)
mlair.helpers.data_sources.join.split_network_and_origin(origin_network_dict: dict) → Tuple[Union[None, dict], Union[None, dict]]

Split given dict into network and data origin.

Method is required to transform Toar-Data v2 structure (using only origin) into Toar-Data v1 (JOIN) structure (which uses origin and network parameter). Furthermore, EEA network (v2) is renamed to AIRBASE (v1).

mlair.helpers.data_sources.join.filter_network(network: list) → Union[list, None]

Filter given list of networks.

Parameters

network – list of various network names (can contain duplicates)

Returns

sorted list with unique entries

mlair.helpers.data_sources.join.correct_data_format(data)

Transform to the standard data format.

For some cases (e.g. hourly data), the data is returned as list instead of a dictionary with keys datetime, values and metadata. This functions addresses this issue and transforms the data into the dictionary version.

Parameters

data – data in hourly format

Returns

the same data but formatted to fit with aggregated format

mlair.helpers.data_sources.join.load_series_information(station_name: List[str], station_type: str_or_none, network_name: str_or_none, join_url_base: str, headers: Dict, data_origin: Dict = None, stat_var: Dict = None)[Dict, Dict]

List all series ids that are available for given station id and network name.

Parameters
  • station_name – Station name e.g. DEBW107

  • station_type – station type like “traffic” or “background”

  • network_name – measurement network of the station like “UBA” or “AIRBASE”

  • join_url_base – base url name to download data from

  • headers – additional headers information like authorization, can be empty

  • data_origin – additional information to select a distinct series e.g. from reanalysis (REA) or from observation (“”, empty string). This dictionary should contain a key for each variable and the information as key

Returns

all available series for requested station stored in an dictionary with parameter name (variable) as key and the series id as value.

mlair.helpers.data_sources.join._create_parameter_name_opts(stat_var)
mlair.helpers.data_sources.join._create_network_name_opts(network_name)
mlair.helpers.data_sources.join._select_distinct_series(vars: List[Dict], data_origin: Dict = None, network_name: Union[str, List[str]] = None)[Dict, Dict]

Select distinct series ids for all variables. Also check if a parameter is from REA or not.

mlair.helpers.data_sources.join._select_distinct_network(vars: dict, network_name: Union[list, dict])dict

Select distinct series regarding network name. The order the network names are provided in parameter network_name indicates priority (from high to low). If no network name is provided, first entry is used and a logging info is issued. In case network names are given but no match can be found, this method raises a ValueError.

Parameters
  • vars – dictionary with all series candidates already grouped by variable name as key. Value should be a list of possible candidates to select from. Each candidate must be a dictionary with at least keys id and network_name.

  • network_name – list of networks to use with increasing priority (1st element has priority). Can be empty list indicating to use always first candidate for each variable.

Returns

dictionary with single series reference for each variable

mlair.helpers.data_sources.join._select_distinct_data_origin(vars: List[Dict], data_origin: Dict)

Select distinct series regarding their data origin. Series are grouped as list according to their variable’s name. As series can be reported with different network attribution, results might contain multiple entries for a variable. This method assumes the default data origin for chemical variables as `` (empty source) and for meteorological variables as REA. :param vars: list of all entries to check data origin for :param data_origin: data origin to match series with, if empty default values are used :return: dictionary with unique variable names as keys and list of respective series as values

mlair.helpers.data_sources.join._save_to_pandas(df: Union[pandas.DataFrame, None], data: dict, stat: str, var: str)pandas.DataFrame

Save given data in data frame.

If given data frame is not empty, the data is appened as new column.

Parameters
  • df – data frame to append the new data, can be none

  • data – new data to append or format as data frame containing the keys ‘datetime’ and ‘<stat>’

  • stat – extracted statistic to get values from data (e.g. ‘mean’, ‘dma8eu’)

  • var – variable the data is from (e.g. ‘o3’)

Returns

new created or concatenated data frame

mlair.helpers.data_sources.join._lower_list(args: List[str]) → Iterator[str]

Lower all elements of given list.

Parameters

args – list with string entries to lower

Returns

iterator that lowers all list entries

mlair.helpers.data_sources.join.var_all_dic