Changelog¶

All notable changes to this project will be documented in this file.

v2.4.0 - 2023-06-30- IFS data and bias-corrected evaluation¶

general:¶

support IFS data (local) and ERA5 data (from toar db)
bias free evaluation ### new features:
can load local IFS forecast data as input (#450)
can also load ERA5 data from toar db as alternative to locally stored data (#449)
new plot to show monthly data distributions in subsets (#445)
can load a DL model from external path (#448)
introduced option to bias-correct model’s and competitors’ forecasts (#442)
can use different interpolation methods when having CAMS as competitor (#444) ### technical:
change toar statistics from api v1 to api v2 (#454)
now able to set configuration paths for local era5 and ifs data as experiment parameter (#457)
improved retry strategy when downloading data from toar db (#453)
updated packages (#452)
calculation of filter apriori is more robust now, properties are stored inside experiment folder (#447, #451)

v2.3.0 - 2022-11-25 - new models and plots¶

general:¶

new model classes for ResNet and U-Net
new plots and variations of existing plots

new features:¶

new model classes: ResNet (#419), U-Net (#423)
seasonal mse stack plot (#422)
new aggregated and line versions of Time Evolution Plot (#424, #427)
box-and-whisker plots are created for all error metrics (#431)
new split and frequency distribution versions of box-and-whisker plots for error metrics (#425, #434)
new evaluation metric: mean error / bias (#430)
conditional quantiles are now available for all competitors too (#435)
new map plot showing mse at locations (#432)

technical:¶

speed up in model setup (#421)
bugfix for boundary trim in FIR filter (#418)
persistence is now calculated only on demand (#426)
block mse are stored locally in a file (#428)
fix issue with boolean variables not recognized by argparse (#417)
renaming of ahead labels (#436)

v2.2.0 - 2022-08-16 - new data sources and python3.9¶

general:¶

new data sources: era5 data and ToarDB V2
CAMS competitor available
improved execution speed
MLAir is now updated to python3.9

new features:¶

new data loading method to load era5 data on Jülich systems (#393)
new data loading method to load data from ToarDB V2 (#396)
implemented competitor model using CAMS ensemble forecasts (#394)
OLS competitor is only calculated if provided in competitor list (#404)
experimental: snapshot creation to skip preprocessing stage (#346, #405, #406)
new workflow HyperSearchWorkflow stopping after training stage (#408)

technical:¶

fixed minor issues and improved execution speed in postprocessing (#401, #413)
improved speed in keras iterator creation (#409)
solved bug for very long competitor time series (#395)
updated python, HPC and CI environment (#402, #403, #407, #410)
fix for climateFIR data handler (#399)
fix for report model error (#416)

v2.1.0 - 2022-06-07 - new evaluation metrics and improved training¶

general:¶

new evaluation metrics, IOA and MNMB
advanced train options for early stopping
reduced execution time by refactoring

new features:¶

uncertainty estimation of MSE is now applied for each season separately (#374)
added different configurations of early stopping to use either last trained or best epoch (#378)
train monitoring plots now add a star for best epoch when using early stopping (#367)
new evaluation metric index of agreement, IOA (#376)
new evaluation metric modified normalised mean bias, MNMB (#380)
new plot available that shows temporal evolution of MSE for each station (#381)

technical:¶

reduced loading of forecast path from data store (#328)
bug fix for not catched error during transformation (#385)
bug fix for data handler with climate and fir filter leading to calculate transformation always with fir filter (#387)
improved duration for latex report creation at end of preprocessing (#388)
enhanced speed for make prediction in postprocessing (#389)
fix to always create version badge from version and not from tag name (#382)

v2.0.0 - 2022-04-08 - tf2 usage, new model classes, and improved uncertainty estimate¶

general:¶

MLAir now uses tensorflow v2
new customisable model classes for CNN and RNN
improved uncertainty estimate

new features:¶

MLAir depends now on tensorflow v2 (#331)
new CNN class that can be configured layer-wise (#368)
new RNN class that can be configured in more detail (#361)
new branched-input CNN class (#368)
new branched-input RNN class (#362)
set custom model display name that is used in plots (#341)
specify names of input branches to use in feature importance plots (#356)
uncertainty estimate of model error is now calculated for each forecast step additionally (#359)
data transformation properties are stored locally and can be loaded into an experiment run (#345)
uncertainty estimate includes now a Mann-Whitney U rank test (#355)
data handlers can now have access to “future” data specified by new parameter extend_length_opts (#339)

technical:¶

MLAir now uses python3.8 on Jülich HPC systems (#375)
no support of MLAir for tensorflow v1.X, replaced by tf v2.X (#331)
all data handlers with filters can return data as branches (#370)
bug fix to force model name and competitor names to be unique (#366, #369)
fix to use only a single forecast step (#315)
CI pipeline adjustments (#340, #365)
new option to set the level of the print logging (#364)
advanced logging for batch data creation and in postprocessing (#350, #360)
batch data creation is skipped on disabled training (#341)
multiprocessing pools are now closed properly (#342)
bug fix if no competitor data is available (#343)
bug fix for model loading (#343)
models plotted by PlotSampleUncertaintyFromBootstrap are now ordered by mean error (#344)
fix for usage of lazy data caused unintended reloading of data (#347)
fix for latex reports no showing all stations and competitors (#349)
refactoring of hard coded dimension names in skill scores calculation (#357)
bug fix of order of bootstrap method in feature importance calculation causes errors (#358)
distinguish now between window_history_offset (pos of last time step), window_history_size (total length of input sample), and extend_length_opts (“future” data that is available at given time) (#353)

v1.5.0 - 2021-11-11 - new uncertainty estimation¶

general:¶

introduces method to estimate sample uncertainty
improved multiprocessing
last release with tensorflow v1 support

new features:¶

test set sample uncertainty estmation during postprocessing (#333)
support of Kolmogorov Zurbenko filter for data handlers with filters (#334)

technical:¶

new communication scheme for multiprocessing (#321, #322)
improved error reporting (#323)
feature importance returns now unaggregated results (#335)
error metrics are reported for all competitors (#332)
minor bugfixes and refacs (#330, #326, #329, #325, #324, #320, #337)

v1.4.0 - 2021-07-27 - new model classes and data handlers, improved usability and transparency¶

general:¶

many technical adjustments to improve usability and transparency of MLAir
new FCN and CNN classes for easy NN model creation
new plots

new features:¶

new FCN class that can be customized in many ways (#284)
also new CNN class (#289)
added new bootstrap analysis method: mean bootstrapping (#300)
new data handler using FIR filters (#306)
performance measures are now stored in local files (#286)
histogram plots for inputs and targets (#299)
periodogram plots for filtered data (#298)

technical:¶

a calling run script can be stored inside experiment folder if reference to this script is parsed as argument (#99)
new callback to track epoch-runtime (#312)
added switch to use multiprocessing (#297)
customize maximum number of parallel processes (#308)
support non-monotonic window lead times (#313)
resolved bug with FileExistsError (#311)
resolved bug if no chemical is used at all (#307)
min/max scaler now scales between -1 and 1 (#302)
added missing offset parameter to some data handlers (#305)
improved data store logging (#304)
improved logging message on station removal in preprocessing (#294)
limited number of retries in JOIN module (#296)
adjusted competing skill score plot (#301)
transformation parameter check (#295)
implemented lazy data preprocessing for selected data handlers (#292)
fix bug in separation of scales data handler (#290)

v1.3.0 - 2021-02-24 - competitors and improved transformation¶

general:¶

release of official MLAir logo (#274)
new transformation schema for better independence of MLAir and data handler (#272)
competing models can be included in postprocessing for direct comparison (#198)

new features:¶

new helper functions for geographic issues (#280)
default data handler and inheritances can use min/max and log transformation (#276, #275)
include IntelliO3-ts model as reference via automatic download (#131)

technical:¶

experiment name now always includes target sampling type (#263)
competitive skill score plot is refactored (#260)
bug fix for climatological skill scores (#259)
bug fix for custom objects handling (#277)
bug fix for monitoring plots when multiple output branches are used (#278)
update requirements to newer version and dependencies (#262, #273)
HPC scripts are updated to work properly with parallel data processing (#281)

v1.2.1 - 2021-02-08 - bug fix for recursive import error¶

general:¶

applied bug fix

technical:¶

bug fix for recursive import error, (#269)

v1.2.0 - 2020-12-18 - parallel preprocessing and improved data handlers¶

general:¶

new plots
parallelism for faster preprocessing
improved data handler with mixed sampling types
enhanced test coverage

new features:¶

station map plot highlights now subsets on the map and displays number of stations for each subset (#227, #231)
two new data availability plots PlotAvailabilityHistogram (#191, #192, #223)
introduced parallel code in preprocessing if system supports parallelism (#164, #224, #225)
data handler DataHandlerMixedSampling (and inheritances) supports an offset parameter to end inputs at a different time than 00 hours (#220)
args for data handler DataHandlerMixedSampling (and inheritances) that differ for input and target can now be parsed as tuple (#229)

technical:¶

added templates for release and bug issues (#189)
improved test coverage (#236, #238, #239, #240, #241, #242, #243, #244, #245)
station map plot includes now number of stations for each subset (#231)
postprocessing plots are encapsulated in try except statements (#107)
updated git settings (#213)
bug fix for data handler (#235)
reordering and bug fix for preprocessing reporting (#207, #232)
bug fix for outdated system path style (#226)
new plots are included in default plot list (#211)
helpers/join connection to ToarDB (e.g. used by DefaultDataHandler) reports now which variable could not be loaded (#222)
plot PlotBootstrapSkillScore can now additionally highlight specific variables, but not included in postprocessing up to now (#201)
data handler DataHandlerMixedSampling has now a reduced data loading (#221)

v1.1.0 - 2020-11-18 - hourly resolution support and new data handlers¶

general:¶

MLAir can be used with 1H resolution data from JOIN
new data handlers to use the Kolmogorov-Zurbenko filter and mixed sampling types

new features:¶

new data handler DataHandlerKzFilter to use Kolmogorov-Zurbenko filter (kz filter) on inputs (#195)
new data handler DataHandlerMixedSampling that can used mixed sampling types for input and target (#197)
new data handler DataHandlerMixedSamplingWithFilter that uses kz filter and mixed sampling (#197)
new data handler DataHandlerSeparationOfScales to filter-depended time steps sizes on filtered inputs using mixed sampling (#196)

technical:¶

bug fix for very short time series in TimeSeriesPlot (#215)
bug fix for variable dictionary when using hourly resolution (#212)
variable naming for data from JOIN interface harmonised (#206)
transformation setup is now separated for inputs and targets (#202)
bug fix in PlotClimatologicalSkillScore if only single station is used (#193)
preprocessed data is now stored inside experiment and not in the data folder

v1.0.0 - 2020-10-08 - official release of new version 1.0.0¶

general:¶

This is the first official release of MLAir ready for use
updated license, installation instruction

technical:¶

restructured order of packages in requirements

v0.12.2 - 2020-10-01 - HDFML support¶

general:¶

HDFML support

technical:¶

installation script for HDFML adjusted, #183

v0.12.1 - 2020-09-28 - examples in notebook¶

general:¶

introduced a notebook documentation for easy starting, #174
updated special installation instructions for the Juelich HPC systems, #172

new features:¶

names of input and output shape are renamed consistently to: input_shape, and output_shape, #175

technical:¶

it is possible to assign a custom name to a run module (e.g. used in logging), #173

v0.12.0 - 2020-09-21 - Documentation and Bugfixes¶

general:¶

improved documentation include installation instructions and many examples from the paper, #153
bugfixes (see technical)

new features:¶

MyLittleModel is now a pure feed-forward network (before it had a CNN part), #168

technical:¶

new compile options check to ensure its execution, #154
bugfix for key errors in time series plot, #169
bugfix for not used kwargs in DefaultDataHandler, #170
trainable parameter is renamed by train_model to prevent confusion with the tf trainable parameter, #162
fixed HPC installation failure, #159

v0.11.0 - 2020-08-24 - Advanced Data Handling for MLAir¶

general¶

Introduce advanced data handling with much more flexibility (independent of TOAR DB, custom data handling is pluggable), #144
default data handler is still using TOAR DB

new features¶

default data handler using TOAR DB refactored according to advanced data handling, #140, #141, #152
data sets are handled as collections, #142, and are iterable in a standard way (StandardIterator) and optimised for keras (KerasIterator), #143
automatically moving station map plot, #136

technical¶

model modules available from package, #139
renaming of parameter time dimension, #151
refactoring of README.md, #138

v0.10.0 - 2020-07-15 - MLAir is official name, Workflows, easy Model plug-in¶

general¶

Official project name is released: MLAir (Machine Learning on Air data)
a model class can now easily be plugged in into MLAir. #121
introduced new concept of workflows, #134

new features¶

workflows are used to execute a sequence of run modules, #134
default workflows for standard and the Juelich HPC systems are available, custom workflows can be defined, #134
seasonal decomposition is available for conditional quantile plot, #112
map plot is created with coordinates, #108
flatten_tails are now more general and easier to customise, #114
model classes have custom compile options (replaces set_loss), #110
model can be set in ExperimentSetup from outside, #121
default experiment settings can be queried using get_defaults(), #123
training and model settings are reported as MarkDown and Tex tables, #145

technical¶

Juelich HPC systems are supported and installation scripts are available, #106
data store is tracked, I/O is saved and illustrated in a plot, #116
batch size, epoch parameter have to be defined in ExperimentSetup, #127, #122
automatic documentation with sphinx, #109
default experiment settings are updated, #123
refactoring of experiment path and its default naming, #124
refactoring of some parameter names, #146
preparation for package distribution with pip, #119
all run scripts are updated to run with workflows, #134
the experiment folder is restructured, #130

v0.9.0 - 2020-04-15 - faster bootstraps, extreme value upsamling¶

general¶

improved and faster bootstrap workflow
new plot PlotAvailability
extreme values upsampling
improved runtime environment

new features¶

entire bootstrap workflow has been refactored and much faster now, can be skipped with evaluate_bootstraps=False, #60
upsampling of extreme values, set with parameter extreme_values=[your_values_standardised] (e.g. [1, 2]) and extremes_on_right_tail_only=<True/False> if only right tail of distribution is affected or both, #58, #87
minimal data length property (in total and for all subsets), #76
custom objects in model class to load customised model objects like padding class, loss, #72
new plot for data availability: PlotAvailability, #103
introduced (default) plot_list to specify which plots to draw
latex and markdown information on sample sizes for each station, #90

technical¶

implemented tests on gpu and from scratch for develop, release and master branches, #95
usage of tensorflow 1.13.1 (gpu / cpu), separated in 2 different requirements, #81
new abstract plot class to have uniform plot class design
New time tracking wrapper to use for functions or classes
improved logger (info on display, debug into file), #73, #85, #88
improved run environment, especially for error handling, #86
prefix general in data store scope is now optional and can be skipped. If given scope is not general, it is treated as subscope, #82
all 2D Padding classes are now selected by Padding2D(padding_name=<padding_type>) e.g. Padding2D(padding_name="SymPad2D"), #78
custom learning rate (or lr_decay) is optional now, #71