old database

TABLE parameters

parameter_name: name of the parameter [always lowercase] (primary key)

parameter_long_name: long name of the parameter
parameter_display_name: display name of the parameter
parameter_cf_standard_name: CF standard name of parameter (if defined)
parameter_units: physical unit in which the parameter is stored in the database
(all chemical gas-phase compounds are stored in nmol mol-1, for example;
note that the unit in which data were originally were reported may differ)
parameter_formula chemical formula of the parameter

TABLE networks

network_name: name of the observing network as published (e.g. GAW, EMEP, AirBase, CAPMoN, …) (primary key)

datacenter_name: name of the datacenter that hosts the data from this network
(e.g. wdcgg, ebas, …)
datacenter_fullname: unabbreviated name of the datacenter
datacenter_url: the base url (template?) for accessing data from the network

TABLE stations

numid: internal serial number of the station (primary key)

network_name: name of the network (see networks table)
station_id: station code within the network
station_local_id: alternate station code
station_type: characterisation of site (e.g. “background”, “industrial”, “traffic”)
station_type_of_area: characterisation of station environment (e.g. “urban”, “suburban”, “rural”, “remote”)
station_category: other classification of stations (e.g. GAW category (global, regional, contributing))
station_name: full name of the station
station_country: country which operates the station
station_state: province/state/territory to which station belongs (may be blank)
station_lon: longitude coordinate of station (decimal degrees_east). This is our best estimate of the station location which is not always identical to the official station coordinates.
station_lat: latitude coordinate of station (decimal degrees_north). This is our best estimate of the station location which is not always identical to the official station coordinates.
station_alt: altitude of station (in m above sea level). This is our best estimate of the station altitude, which is not always identical to the reported station altitude, but frequently uses the elevation from google earth instead (see station_altitude_flag).
station_alt_flag: Flag value to document where station_alt was taken from:
0 : Reported station altitude
1 : Google maps elevation
2 : ETOPO1 elevation
3 : Station report or similar
4 : Personal communication
5 : Other source
station_coordinate_status: an integer flag indicating our knowledge about the real station location. Note that this flag has been introduced rather late and may not always reflect the actual status of verification yet. Flag values are:
-1 : not checked (default value)
0 : verified by google earth or other means. This means that a building or container which looks like a measurement site could be visibly identified or that a google earth feature is consistent with a detailed station description and is found at the location that is given in the station description. While in most cases the coordinates associated with a flag value of 0 will be exact within 10 metres or so, there are some stations where the accuracy is lower, for example if the air quality monitoring site is part of a larger campus and we could not exactly identify the building or container site of the air quality measurements.
1 : verification not possible, but no reason to doubt that the measurement location should be accurate to within 100 metres or so. This means that no obvious station feature could be seen on google earth, but the area corresponds to the station description and could be a place where measurements are made.
2 : unspecified potential issue with the station coordinates. This means that after checking the station location on google earth, comparing the reported station altitude to the google elevation, and looking at the station_type, station_type_of_area, and station_category information, something appears wrong, but for lack of better knowledge we retain station coordinates as given. This flag value is used particularly in cases when the coordinates of the same station are reported differently in various archives and if we could not locate the exact station location on google earth.
3 : obvious error in station coordinate information. For example, a continental site is located in an ocean or lake, the measurement site is in the middle of a dense forest, etc. The station coordinates could not be corrected for lack of better information.
4 : severe mismatch between reported station altitude and google elevation at station location (> 100 m) indicating wrong station coordinates. This flag value is only set after a potential correction of the station_alt value (see station_alt_flag), i.e. if we could not resolve a gross altitude difference. Note that for measurement sites on tall towers or in mountaineous terrain, altitude differences > 100 m may be correct and the coordinate status will not be flagged as 4 then.
5 : no coordinates available – given coordinates are completely invented!
6 : no station metadata available – given metadata is completely invented!
station_reported_alt: This is the station altitude as reported by the data provider. Note: due to edits of obvious station coordinate errores before introducing the coordinate flagging scheme, there may be cases where the reported altitude in our database differs from the reported altitude in the original data sets.
station_google_alt: Terrain elevation derived from the google maps API (see https://maps.googleapis.com/maps/api/elevation/json?locations=47.05444,12.958342; example coordinates of Sonnblick, Austria).
google_resolution: the horizontal resolution of google maps at the station location. This provides some indication of the accuracy of the station_google_alt information.
station_etopo_alt: Terrain elevation at the station location from the ~1 km resolution ETOPO1 dataset.
station_etopo_min_alt_5km: Minimum elevation from the ETOPO1 dataset in an area of 5 km radius around the station location. This can be used to find out if a high altitude station is located in mountaineous terrain or on a plateau (see station_etopo_relative_alt).
station_etopo_relative_alt: Station elevation above the surrounding area. Derived by subtracting the minimum altitude within a 5 km radius around the station location from the actual station altitude. The area altitude is obtained from the _etopo1_ map.
station_timezone: time zone of station - note that all data will be stored as UTC, but the timezone information is needed to convert data back to local time for display.
station_population_density: Y2010 human population per square km for the year 2010 (original horizontal resolution of ~?? km)
station_max_population_density_5km: maximum population density in a radius of 5 km around the station location.
station_max_population_density_25km: maximum population density in a radius of 25 km around the station location.
station_nightlight_1km: Y2013 Nighttime lights brightness values for the year 2013 (original 1 km horizontal resolution)
station_nightlight_5km: Y2013 Nighttime lights brightness values for the year 2013 (5 km horizontal resolution)
station_max_nightlight_25km: Maximum nighttime light intensity in a radius of 25 km around the station location.
station_nox_emissions: Y2010 NOx emissisons from EDGAR HTAP inventory V2 (gridded data in units of g m-2 yr-1)
station_omi_no2_column: Average Y2011-Y2015 tropospheric NO2 columns from OMI at 0.1 degree resolution in units of 10^15 molecules cm-2.
station_rice_production: Y2000 rice production amount at station location (units: thousand tons)
station_wheat_production: Y2000 wheat production amount at station location (units: thousand tons)
station_climatic_zone: Climatic zone
The climate zones are grouped according to IPCC, 2006 as follows:
0 : unclassified
1 : Warm Temperate Moist
2 : Warm Temperate Dry
3 : Cool Temperate Moist
4 : Cool Temperate Dry
5 : Polar Moist
6 : Polar Dry
7 : Boreal Moist
8 : Boreal Dry
9 : Tropical Montane
10 : Tropical Wet
11 : Tropical Moist
12 : Tropical Dry
station_htap_region: An integer denoting the “tier1” region defined in the task force on hemispheric transport of air pollution (TFHTAP) coordinated model studies (see http://www.htap.org). Region codes are:
02 OCN Non-arctic/Antarctic Ocean
03 NAM US+Canada (upto 66 N; polar circle)
04 EUR Western + Eastern EU+Turkey (upto 66 N polar circle)
05 SAS South Asia: India, Nepal, Pakistan, Afghanistan, Bangadesh, Sri Lanka
06 EAS East Asia: China, Korea, Japan
07 SEA South East Asia
08 PAN Pacific, Australia+ New Zealand
09 NAF Northern Africa+Sahara+Sahel
10 SAF Sub Saharan/sub Sahel Africa
11 MDE Middle East: S. Arabia, Oman, etc, Iran, Iraq
12 MCA Mexico, Central America, Caribbean, Guyanas, Venezuela, Columbia
13 SAM S. America
14 RBU Russia, Belarussia, Ukraine
15 CAS Central Asia
16 NPO Arctic Circle (North of 66 N)+Greenland
17 SPO Antarctic
station_dominant_landcover: The dominant IGBP landcover classification at the station location extracted from the MODIS MCD12C1 dataset. Landcover type values are:
0 Water
1 Evergreen Needleleaf forest
2 Evergreen Broadleaf forest
3 Deciduous Needleleaf forest
4 Deciduous Broadleaf forest
5 Mixed forest
6 Closed shrublands
7 Open shrublands
8 Woody savannas
9 Savannas
10 Grasslands
11 Permanent wetlands
12 Croplands
13 Urban and built-up
14 Cropland/Natural vegetation mosaic
15 Snow and ice
16 Barren or sparsely vegetated
255 Fill Value/Unclassified
station_landcover_description: Text information about the landcover types and their area fractions in a radius of 25 km around the station location.
station_toar_category: A station classification for the Tropsopheric Ozone Assessment Report based on the station proxy data that are stored in the database. These categories are:
0 unclassified
1 rural, low elevation: derived as (station_omi_no2_column <= 8 and station_nightlight_1km <= 25 and station_population_density <= 2500 and station_altitude <= 1500 and station_relative_alt < 500). Note that this scheme may not catch all sites that are designated as rural. It will, however, provide a selection with reasonable certainty that no urban sites are included.
2 rural, high elevation: (station_omi_no2_column <= 8 and station_nightlight_1km <= 25 and station_population_density <= 2500 and station_altitude > 1500).
3 urban; classified (station_population_density >= 15000 and station_nightlight_1km >= 60 and station_max_nightlight_25km == 63). Again, the intention here is to make reasonably sure that a site classified as urban really carries an urban signature.

Data sources:

Unique constraint for (network_name, station_id).

TABLE parameter_series

id: an internal serial number (primary key)

station_numid: foreign reference to numeric station identifier
parameter_name: name of the species or variable; foreign key to variable table
parameter_label: automatically generated label of a parameter_series (see below)
parameter_attribute: a user-defined attribute that may distinguish two series of the
same variable measured at the same station (e.g. cpt/all and cpt/filtered)
parameter_sampling_type: examples: “continuous”, “flask”, “filter”
parameter_measurement_method: instrument principle of measurement
parameter_original_units: physical units in which variable values were expressed in
the original data files
parameter_calibration: information on the calibration of the variable, such as
calibration procedure and/or calibration scale
parameter_contributor_shortname: abbreviated string of parameter_contributor
parameter_contributor: institute or name who provided data to network datacenter
If more than one contributor exists, the names will be separated by ;
parameter_contributor_country: country of contributor
parameter_dataset_type: the type of the high-frequency data (“hourly”, “event”, etc.);
this determines the data table name (e.g. “o3_hourly”, “co_event”).
parameter_status: an internal status flag which may be used to suppress display or analysis of
an individual timeseries. Flag values are:
0 : everything OK - use this dataset in any analyses
1 : data was embargoed by originator; do not display publicly
2 : NRT data ingestion; no metadata available, metadata was invented
creation_date: creation date when this entry is added in parameter_series table
modification_date: date when this entry is last modified
comments: any comments on this variable data for this station
data_start_date: start date of the variable data available for this station
data_end_date: end date of the variable data for this station (note: data start and end date is not
considering the gaps (missing data) between the available data)
parameter_pi: principal investigator of timeseries
parameter_pi_email: email of parameter_pi

Unique constraint = (station_numid, parameter_label) Explanation of parameter_label:

Rule: With N = parameter_name, A = parameter_attribute, C = parameter_contributor_shortname, and T = parameter_dataset_type, the parameter_label is defined as N[-A[-C]][:T]. A is only added to the label if N (or N:T) is not unique, C is only added if N-A (or N-A:T) is not unique. Note that parameter_labels for a given station and parameter_name are automatically recomputed if another dataset of this variable at this station is added. Thus, parameter_labels are not static. Their primary use is the display of unique links to individual parameter_series in the map markers of the JOIN interface.

Examples: 1. CPT134S00 has two different hourly ozone series [station_numid, ‘o3’, ‘filtered’, ‘SAWS’, ‘hourly’] [station_numid, ‘o3’, ‘all’, ‘SAWS’, ‘hourly’] Since everything to the right of the parameter_attribute is the same, the two labels will become O3-filtered, O3-all.

2. Let’s assume we have two CO series from one site, one as hourly data and one from flask samples: [station_numid, ‘co’, ‘’, ‘Juelich’, ‘hourly’] [station_numid, ‘co’, ‘’, ‘Juelich’, ‘flask’] Here the labels will be CO:hourly and CO:flask.

3. If the variable is contributed by two different providers, the contributor_shortname will be added: [station_numid, ‘c2h6’, ‘’, ‘DWD’, ‘hourly’] [station_numid, ‘c2h6’, ‘’, ‘NOAA’, ‘event’] Here the labels will be C2H6–DWD:hourly and C2H6–NOAA:event.

data tables and unit conversions

data tables

:red:`a. TABLE <parameter>_hourly`

id: the parameter_series id
datetime: the date and time value of the observation (UTC time, beginning of (1 hour) averaging period)
value: the measurement value
flag: a data quality flag (WMO code table 033 020)
preliminary: a boolean value indicating if these are final, validated or preliminary (near realtime) data

primary key: (id, datetime)

:red:`b. TABLE <parameter>_monthly`

id: the parameter_series id
datetime: the date and time value of the observation (UTC time, beginning of (1 month) averaging period)
value: the measurement value
flag: a data quality flag (WMO code table 033 020)
preliminary: a boolean value indicating if these are final, validated or preliminary (near realtime) data

primary key: (id, datetime)

:red:`c. TABLE <parameter>_event`

id: the parameter_series id
datetime: the date and time value of the observation (UTC time, beginning of averaging period)
value: the measurement value
flag: a data quality flag (WMO code table 033 020)
preliminary: a boolean value indicating if these are final, validated or preliminary (near realtime) data

primary key: (id, datetime)

unit conversions

In the database all parameter series are stored in common default units. These are listed in the table below. In some cases, unit conmversion may be necessary when adding data from the original format into the JOIN database. These conversions are also described below.

name

display name

formula

default unit

moleweight

unit conversion

benzene

Benzene

C6H6

nmol mol-1

78.1104

µg m-3->default: *0.30802

ch4

CH4

CH4

nmol mol-1

16.0425

µg m-3->default: *1.49973

co

CO

CO

nmol mol-1

28.0104

µg m-3->default: *0.85895

ethane

Ethane

C2H6

nmol mol-1

30.0669

µg m-3->default: *0.77698

mpxylene

m,p-Xylene

C8H10

nmol mol-1

106.165

µg m-3->default: *0.22662

no

NO

NO

nmol mol-1

30.0061

µg m-3->default: *0.80182

no2

NO2

NO2

nmol mol-1

46.0055

µg m-3->default: *0.52297

nox

NOx

nmol mol-1

µg m-3->default: not possible!

µg(NO2) m-3-> default: *0.52297

o3

Ozone

O3

nmol mol-1

47.9982

µg m-3->default: *0.50124

oxylene

o-Xylene

C8H10

nmol mol-1

106.165

µg m-3->default: *0.22662

propane

Propane

C3H8

nmol mol-1

44.0922

µg m-3->default: *0.52982

so2

SO2

SO2

nmol mol-1

64.0648

µg m-3->default: *0.37555

toluene

Toluene

C7H8

nmol mol-1

92.1362

µg m-3->default: *0.26113

pm1

PM 1

µg m-3

none

pm10

PM 10

µg m-3

none

pm2p5

PM 2.5

µg m-3

none

humidity

Humidity

g kg-1

kg kg-1->g kg-1: *1000

g m-3->g kg-1: *1.2041

*293K/(actual temp[K])

*(actual press[hPa])

/1013.25hPa

(for sea-level stations the

last factor can be omitted)

press

Pressure

hPa

Pa->hPa: *0.01

mbar->hPa: *1.

temp

Temperature

degC

K->degC: -273.15

wdir

Wind direction

degrees

none

wspeed

Wind speed

m s-1

knots->m/s: *0.51444

Explanation of conversion from concentration to mixing ratio:

We assume that data are reported in concentration “at standard conditions”, and we take “standard conditions” to mean 20 degC and 1013.25 hPa. The density of air under these conditions is 1.2041 kg m-3. The mole fraction of a gas X is given by mu = c(X)/dens * mw(air)/mw(X), hence for ozone for example, you will get mu = c(X)/1.2041 * 28.97 / 48.0 = c(X) * 0.50124. Note that µg m-3 is 10^9 times kg m-3, thus this conversion factor directly converts from µg m-3 to nmol mol-1. Useful tools: - Air density calculator (http://www.gribble.org/cycling/air_density.html) - Molecular Mass Calculator (http://www.bmrb.wisc.edu/metabolomics/mol_mass.php)

Note that for NOx no such conversion can be performed, because NOx is a mixture of two components with different molecular masses. Hence, by default the concept of NOx only makes sense for mole fractions (or mixing ratios). Sometimes NOx is given as “NOx expressed as NO2” or “NOx expressed as N”. Then you can of course use the conversion formula above using either 46 or 14 as molecular mass.