`mlair.helpers.data_sources.join`¶

Functions to access join database.

Module Contents¶

Functions¶

`download_join`(station_name: Union[str, List[str]], stat_var: dict, station_type: str = None, sampling: str = ‘daily’, data_origin: Dict = None) → [pandas.DataFrame, pandas.DataFrame]	Read data from JOIN/TOAR.
`_correct_meta`(meta)
`split_network_and_origin`(origin_network_dict: dict) → Tuple[Union[None, dict], Union[None, dict]]	Split given dict into network and data origin.
`filter_network`(network: list) → Union[list, None]	Filter given list of networks.
`correct_data_format`(data)	Transform to the standard data format.
`load_series_information`(station_name: List[str], station_type: str_or_none, network_name: str_or_none, join_url_base: str, headers: Dict, data_origin: Dict = None, stat_var: Dict = None) → [Dict, Dict]	List all series ids that are available for given station id and network name.
`_create_parameter_name_opts`(stat_var)
`_create_network_name_opts`(network_name)
`_select_distinct_series`(vars: List[Dict], data_origin: Dict = None, network_name: Union[str, List[str]] = None) → [Dict, Dict]	Select distinct series ids for all variables. Also check if a parameter is from REA or not.
`_select_distinct_network`(vars: dict, network_name: Union[list, dict]) → dict	Select distinct series regarding network name. The order the network names are provided in parameter network_name
`_select_distinct_data_origin`(vars: List[Dict], data_origin: Dict) → (Dict[str, List], Dict)	Select distinct series regarding their data origin. Series are grouped as list according to their variable’s name.
`_save_to_pandas`(df: Union[pandas.DataFrame, None], data: dict, stat: str, var: str) → pandas.DataFrame	Save given data in data frame.
`_lower_list`(args: List[str]) → Iterator[str]	Lower all elements of given list.

Attributes¶

`__author__`
`__date__`
`str_or_none`
`var_all_dic`

mlair.helpers.data_sources.join.__author__ = Felix Kleinert, Lukas Leufen¶

mlair.helpers.data_sources.join.__date__ = 2019-10-16¶

mlair.helpers.data_sources.join.str_or_none¶

mlair.helpers.data_sources.join.download_join(station_name: Union[str, List[str]], stat_var: dict, station_type: str = None, sampling: str = 'daily', data_origin: Dict = None) → [pandas.DataFrame, pandas.DataFrame]¶

Read data from JOIN/TOAR.

Parameters

station_name – Station name e.g. DEBY122
stat_var – key as variable like ‘O3’, values as statistics on keys like ‘mean’
station_type – set the station type like “traffic” or “background”, can be none
sampling – sampling rate of the downloaded data, either set to daily or hourly (default daily)
data_origin – additional dictionary to specify data origin as key (for variable) value (origin) pair. Valid origins are “REA” for reanalysis data and “” (empty string) for observational data.

Returns

data frame with all variables and statistics and meta data frame with all meta information

mlair.helpers.data_sources.join._correct_meta(meta)¶

mlair.helpers.data_sources.join.split_network_and_origin(origin_network_dict: dict) → Tuple[Union[None, dict], Union[None, dict]]¶

Split given dict into network and data origin.

Method is required to transform Toar-Data v2 structure (using only origin) into Toar-Data v1 (JOIN) structure (which uses origin and network parameter). Furthermore, EEA network (v2) is renamed to AIRBASE (v1).

mlair.helpers.data_sources.join.filter_network(network: list) → Union[list, None]¶

Filter given list of networks.

Parameters: network – list of various network names (can contain duplicates)
Returns: sorted list with unique entries

mlair.helpers.data_sources.join.correct_data_format(data)¶

Transform to the standard data format.

For some cases (e.g. hourly data), the data is returned as list instead of a dictionary with keys datetime, values and metadata. This functions addresses this issue and transforms the data into the dictionary version.

Parameters: data – data in hourly format
Returns: the same data but formatted to fit with aggregated format

mlair.helpers.data_sources.join.load_series_information(station_name: List[str], station_type: str_or_none, network_name: str_or_none, join_url_base: str, headers: Dict, data_origin: Dict = None, stat_var: Dict = None) → [Dict, Dict]¶

List all series ids that are available for given station id and network name.

Parameters

station_name – Station name e.g. DEBW107
station_type – station type like “traffic” or “background”
network_name – measurement network of the station like “UBA” or “AIRBASE”
join_url_base – base url name to download data from
headers – additional headers information like authorization, can be empty
data_origin – additional information to select a distinct series e.g. from reanalysis (REA) or from observation (“”, empty string). This dictionary should contain a key for each variable and the information as key

Returns

all available series for requested station stored in an dictionary with parameter name (variable) as key and the series id as value.

mlair.helpers.data_sources.join._create_parameter_name_opts(stat_var)¶

mlair.helpers.data_sources.join._create_network_name_opts(network_name)¶

mlair.helpers.data_sources.join._select_distinct_series(vars: List[Dict], data_origin: Dict = None, network_name: Union[str, List[str]] = None) → [Dict, Dict]¶: Select distinct series ids for all variables. Also check if a parameter is from REA or not.

mlair.helpers.data_sources.join._select_distinct_network(vars: dict, network_name: Union[list, dict]) → dict ¶

Select distinct series regarding network name. The order the network names are provided in parameter network_name indicates priority (from high to low). If no network name is provided, first entry is used and a logging info is issued. In case network names are given but no match can be found, this method raises a ValueError.

Parameters

vars – dictionary with all series candidates already grouped by variable name as key. Value should be a list of possible candidates to select from. Each candidate must be a dictionary with at least keys id and network_name.
network_name – list of networks to use with increasing priority (1st element has priority). Can be empty list indicating to use always first candidate for each variable.

Returns

dictionary with single series reference for each variable

mlair.helpers.data_sources.join._select_distinct_data_origin(vars: List[Dict], data_origin: Dict)¶: Select distinct series regarding their data origin. Series are grouped as list according to their variable’s name. As series can be reported with different network attribution, results might contain multiple entries for a variable. This method assumes the default data origin for chemical variables as `` (empty source) and for meteorological variables as REA. :param vars: list of all entries to check data origin for :param data_origin: data origin to match series with, if empty default values are used :return: dictionary with unique variable names as keys and list of respective series as values

mlair.helpers.data_sources.join._save_to_pandas(df: Union[pandas.DataFrame, None], data: dict, stat: str, var: str) → pandas.DataFrame ¶

Save given data in data frame.

If given data frame is not empty, the data is appened as new column.

Parameters

df – data frame to append the new data, can be none
data – new data to append or format as data frame containing the keys ‘datetime’ and ‘<stat>’
stat – extracted statistic to get values from data (e.g. ‘mean’, ‘dma8eu’)
var – variable the data is from (e.g. ‘o3’)

Returns

new created or concatenated data frame

mlair.helpers.data_sources.join._lower_list(args: List[str]) → Iterator[str]¶

Lower all elements of given list.

Parameters: args – list with string entries to lower
Returns: iterator that lowers all list entries

mlair.helpers.data_sources.join.var_all_dic¶

mlair.helpers.data_sources.join¶

Module Contents¶

Functions¶

Attributes¶

`mlair.helpers.data_sources.join`¶