:py:mod:`mlair.helpers.data_sources.join`
=========================================

.. py:module:: mlair.helpers.data_sources.join

.. autoapi-nested-parse::

   Functions to access join database.


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   mlair.helpers.data_sources.join.download_join
   mlair.helpers.data_sources.join._correct_meta
   mlair.helpers.data_sources.join.split_network_and_origin
   mlair.helpers.data_sources.join.filter_network
   mlair.helpers.data_sources.join.correct_data_format
   mlair.helpers.data_sources.join.load_series_information
   mlair.helpers.data_sources.join._create_parameter_name_opts
   mlair.helpers.data_sources.join._create_network_name_opts
   mlair.helpers.data_sources.join._select_distinct_series
   mlair.helpers.data_sources.join._select_distinct_network
   mlair.helpers.data_sources.join._select_distinct_data_origin
   mlair.helpers.data_sources.join._save_to_pandas
   mlair.helpers.data_sources.join._lower_list


Attributes
~~~~~~~~~~

.. autoapisummary::

   mlair.helpers.data_sources.join.__author__
   mlair.helpers.data_sources.join.__date__
   mlair.helpers.data_sources.join.str_or_none
   mlair.helpers.data_sources.join.var_all_dic


.. py:data:: __author__
   :annotation: = Felix Kleinert, Lukas Leufen

   
.. py:data:: __date__
   :annotation: = 2019-10-16

   
.. py:data:: str_or_none
   

.. py:function:: download_join(station_name: Union[str, List[str]], stat_var: dict, station_type: str = None, sampling: str = 'daily', data_origin: Dict = None) -> [pandas.DataFrame, pandas.DataFrame]

   Read data from JOIN/TOAR.

   :param station_name: Station name e.g. DEBY122
   :param stat_var: key as variable like 'O3', values as statistics on keys like 'mean'
   :param station_type: set the station type like "traffic" or "background", can be none
   :param sampling: sampling rate of the downloaded data, either set to daily or hourly (default daily)
   :param data_origin: additional dictionary to specify data origin as key (for variable) value (origin) pair. Valid
       origins are "REA" for reanalysis data and "" (empty string) for observational data.

   :returns: data frame with all variables and statistics and meta data frame with all meta information


.. py:function:: _correct_meta(meta)


.. py:function:: split_network_and_origin(origin_network_dict: dict) -> Tuple[Union[None, dict], Union[None, dict]]

   Split given dict into network and data origin.

   Method is required to transform Toar-Data v2 structure (using only origin) into Toar-Data v1 (JOIN) structure (which
   uses origin and network parameter). Furthermore, EEA network (v2) is renamed to AIRBASE (v1).


.. py:function:: filter_network(network: list) -> Union[list, None]

   Filter given list of networks.

   :param network: list of various network names (can contain duplicates)
   :return: sorted list with unique entries


.. py:function:: correct_data_format(data)

   Transform to the standard data format.

   For some cases (e.g. hourly data), the data is returned as list instead of a dictionary with keys datetime, values
   and metadata. This functions addresses this issue and transforms the data into the dictionary version.

   :param data: data in hourly format

   :return: the same data but formatted to fit with aggregated format


.. py:function:: load_series_information(station_name: List[str], station_type: str_or_none, network_name: str_or_none, join_url_base: str, headers: Dict, data_origin: Dict = None, stat_var: Dict = None) -> [Dict, Dict]

   List all series ids that are available for given station id and network name.

   :param station_name: Station name e.g. DEBW107
   :param station_type: station type like "traffic" or "background"
   :param network_name: measurement network of the station like "UBA" or "AIRBASE"
   :param join_url_base: base url name to download data from
   :param headers: additional headers information like authorization, can be empty
   :param data_origin: additional information to select a distinct series e.g. from reanalysis (REA) or from observation
       ("", empty string). This dictionary should contain a key for each variable and the information as key
   :return: all available series for requested station stored in an dictionary with parameter name (variable) as key
       and the series id as value.


.. py:function:: _create_parameter_name_opts(stat_var)


.. py:function:: _create_network_name_opts(network_name)


.. py:function:: _select_distinct_series(vars: List[Dict], data_origin: Dict = None, network_name: Union[str, List[str]] = None) -> [Dict, Dict]

   Select distinct series ids for all variables. Also check if a parameter is from REA or not.


.. py:function:: _select_distinct_network(vars: dict, network_name: Union[list, dict]) -> dict

   Select distinct series regarding network name. The order the network names are provided in parameter `network_name`
   indicates priority (from high to low). If no network name is provided, first entry is used and a logging info is
   issued. In case network names are given but no match can be found, this method raises a ValueError.

   :param vars: dictionary with all series candidates already grouped by variable name as key. Value should be a list
       of possible candidates to select from. Each candidate must be a dictionary with at least keys `id` and
       `network_name`.
   :param network_name: list of networks to use with increasing priority (1st element has priority). Can be empty list
       indicating to use always first candidate for each variable.
   :return: dictionary with single series reference for each variable


.. py:function:: _select_distinct_data_origin(vars: List[Dict], data_origin: Dict) -> (Dict[str, List], Dict)

   Select distinct series regarding their data origin. Series are grouped as list according to their variable's name.
   As series can be reported with different network attribution, results might contain multiple entries for a variable.
   This method assumes the default data origin for chemical variables as `` (empty source) and for meteorological
   variables as `REA`.
   :param vars: list of all entries to check data origin for
   :param data_origin: data origin to match series with, if empty default values are used
   :return: dictionary with unique variable names as keys and list of respective series as values


.. py:function:: _save_to_pandas(df: Union[pandas.DataFrame, None], data: dict, stat: str, var: str) -> pandas.DataFrame

   Save given data in data frame.

   If given data frame is not empty, the data is appened as new column.

   :param df: data frame to append the new data, can be none
   :param data: new data to append or format as data frame containing the keys 'datetime' and '<stat>'
   :param stat: extracted statistic to get values from data (e.g. 'mean', 'dma8eu')
   :param var: variable the data is from (e.g. 'o3')

   :return: new created or concatenated data frame


.. py:function:: _lower_list(args: List[str]) -> Iterator[str]

   Lower all elements of given list.

   :param args: list with string entries to lower

   :return: iterator that lowers all list entries


.. py:data:: var_all_dic