Changelog

All notable changes to this project will be documented in this file.

v1.4.0 - 2021-07-27 -

general:

  • many technical adjustments to improve usability and transparency of MLAir

  • new FCN and CNN classes for easy NN model creation

  • new plots

new features:

  • new FCN class that can be customized in many ways (#284)

  • also new CNN class (#289)

  • added new bootstrap analysis method: mean bootstrapping (#300)

  • new data handler using FIR filters (#306)

  • performance measures are now stored in local files (#286)

  • histogram plots for inputs and targets (#299)

  • periodogram plots for filtered data (#298)

technical:

  • a calling run script can be stored inside experiment folder if reference to this script is parsed as argument (#99)

  • new callback to track epoch-runtime (#312)

  • added switch to use multiprocessing (#297)

  • customize maximum number of parallel processes (#308)

  • support non-monotonic window lead times (#313)

  • resolved bug with FileExistsError (#311)

  • resolved bug if no chemical is used at all (#307)

  • min/max scaler now scales between -1 and 1 (#302)

  • added missing offset parameter to some data handlers (#305)

  • improved data store logging (#304)

  • improved logging message on station removal in preprocessing (#294)

  • limited number of retries in JOIN module (#296)

  • adjusted competing skill score plot (#301)

  • transformation parameter check (#295)

  • implemented lazy data preprocessing for selected data handlers (#292)

  • fix bug in separation of scales data handler (#290)

v1.3.0 - 2021-02-24 - competitors and improved transformation

general:

  • release of official MLAir logo (#274)

  • new transformation schema for better independence of MLAir and data handler (#272)

  • competing models can be included in postprocessing for direct comparison (#198)

new features:

  • new helper functions for geographic issues (#280)

  • default data handler and inheritances can use min/max and log transformation (#276, #275)

  • include IntelliO3-ts model as reference via automatic download (#131)

technical:

  • experiment name now always includes target sampling type (#263)

  • competitive skill score plot is refactored (#260)

  • bug fix for climatological skill scores (#259)

  • bug fix for custom objects handling (#277)

  • bug fix for monitoring plots when multiple output branches are used (#278)

  • update requirements to newer version and dependencies (#262, #273)

  • HPC scripts are updated to work properly with parallel data processing (#281)

v1.2.1 - 2021-02-08 - bug fix for recursive import error

general:

  • applied bug fix

technical:

  • bug fix for recursive import error, (#269)

v1.2.0 - 2020-12-18 - parallel preprocessing and improved data handlers

general:

  • new plots

  • parallelism for faster preprocessing

  • improved data handler with mixed sampling types

  • enhanced test coverage

new features:

  • station map plot highlights now subsets on the map and displays number of stations for each subset (#227, #231)

  • two new data availability plots PlotAvailabilityHistogram (#191, #192, #223)

  • introduced parallel code in preprocessing if system supports parallelism (#164, #224, #225)

  • data handler DataHandlerMixedSampling (and inheritances) supports an offset parameter to end inputs at a different time than 00 hours (#220)

  • args for data handler DataHandlerMixedSampling (and inheritances) that differ for input and target can now be parsed as tuple (#229)

technical:

  • added templates for release and bug issues (#189)

  • improved test coverage (#236, #238, #239, #240, #241, #242, #243, #244, #245)

  • station map plot includes now number of stations for each subset (#231)

  • postprocessing plots are encapsulated in try except statements (#107)

  • updated git settings (#213)

  • bug fix for data handler (#235)

  • reordering and bug fix for preprocessing reporting (#207, #232)

  • bug fix for outdated system path style (#226)

  • new plots are included in default plot list (#211)

  • helpers/join connection to ToarDB (e.g. used by DefaultDataHandler) reports now which variable could not be loaded (#222)

  • plot PlotBootstrapSkillScore can now additionally highlight specific variables, but not included in postprocessing up to now (#201)

  • data handler DataHandlerMixedSampling has now a reduced data loading (#221)

v1.1.0 - 2020-11-18 - hourly resolution support and new data handlers

general:

  • MLAir can be used with 1H resolution data from JOIN

  • new data handlers to use the Kolmogorov-Zurbenko filter and mixed sampling types

new features:

  • new data handler DataHandlerKzFilter to use Kolmogorov-Zurbenko filter (kz filter) on inputs (#195)

  • new data handler DataHandlerMixedSampling that can used mixed sampling types for input and target (#197)

  • new data handler DataHandlerMixedSamplingWithFilter that uses kz filter and mixed sampling (#197)

  • new data handler DataHandlerSeparationOfScales to filter-depended time steps sizes on filtered inputs using mixed sampling (#196)

technical:

  • bug fix for very short time series in TimeSeriesPlot (#215)

  • bug fix for variable dictionary when using hourly resolution (#212)

  • variable naming for data from JOIN interface harmonised (#206)

  • transformation setup is now separated for inputs and targets (#202)

  • bug fix in PlotClimatologicalSkillScore if only single station is used (#193)

  • preprocessed data is now stored inside experiment and not in the data folder

v1.0.0 - 2020-10-08 - official release of new version 1.0.0

general:

  • This is the first official release of MLAir ready for use

  • updated license, installation instruction

technical:

  • restructured order of packages in requirements

v0.12.2 - 2020-10-01 - HDFML support

general:

  • HDFML support

technical:

  • installation script for HDFML adjusted, #183

v0.12.1 - 2020-09-28 - examples in notebook

general:

  • introduced a notebook documentation for easy starting, #174

  • updated special installation instructions for the Juelich HPC systems, #172

new features:

  • names of input and output shape are renamed consistently to: input_shape, and output_shape, #175

technical:

  • it is possible to assign a custom name to a run module (e.g. used in logging), #173

v0.12.0 - 2020-09-21 - Documentation and Bugfixes

general:

  • improved documentation include installation instructions and many examples from the paper, #153

  • bugfixes (see technical)

new features:

  • MyLittleModel is now a pure feed-forward network (before it had a CNN part), #168

technical:

  • new compile options check to ensure its execution, #154

  • bugfix for key errors in time series plot, #169

  • bugfix for not used kwargs in DefaultDataHandler, #170

  • trainable parameter is renamed by train_model to prevent confusion with the tf trainable parameter, #162

  • fixed HPC installation failure, #159

v0.11.0 - 2020-08-24 - Advanced Data Handling for MLAir

general

  • Introduce advanced data handling with much more flexibility (independent of TOAR DB, custom data handling is pluggable), #144

  • default data handler is still using TOAR DB

new features

  • default data handler using TOAR DB refactored according to advanced data handling, #140, #141, #152

  • data sets are handled as collections, #142, and are iterable in a standard way (StandardIterator) and optimised for keras (KerasIterator), #143

  • automatically moving station map plot, #136

technical

  • model modules available from package, #139

  • renaming of parameter time dimension, #151

  • refactoring of README.md, #138

v0.10.0 - 2020-07-15 - MLAir is official name, Workflows, easy Model plug-in

general

  • Official project name is released: MLAir (Machine Learning on Air data)

  • a model class can now easily be plugged in into MLAir. #121

  • introduced new concept of workflows, #134

new features

  • workflows are used to execute a sequence of run modules, #134

  • default workflows for standard and the Juelich HPC systems are available, custom workflows can be defined, #134

  • seasonal decomposition is available for conditional quantile plot, #112

  • map plot is created with coordinates, #108

  • flatten_tails are now more general and easier to customise, #114

  • model classes have custom compile options (replaces set_loss), #110

  • model can be set in ExperimentSetup from outside, #121

  • default experiment settings can be queried using get_defaults(), #123

  • training and model settings are reported as MarkDown and Tex tables, #145

technical

  • Juelich HPC systems are supported and installation scripts are available, #106

  • data store is tracked, I/O is saved and illustrated in a plot, #116

  • batch size, epoch parameter have to be defined in ExperimentSetup, #127, #122

  • automatic documentation with sphinx, #109

  • default experiment settings are updated, #123

  • refactoring of experiment path and its default naming, #124

  • refactoring of some parameter names, #146

  • preparation for package distribution with pip, #119

  • all run scripts are updated to run with workflows, #134

  • the experiment folder is restructured, #130

v0.9.0 - 2020-04-15 - faster bootstraps, extreme value upsamling

general

  • improved and faster bootstrap workflow

  • new plot PlotAvailability

  • extreme values upsampling

  • improved runtime environment

new features

  • entire bootstrap workflow has been refactored and much faster now, can be skipped with evaluate_bootstraps=False, #60

  • upsampling of extreme values, set with parameter extreme_values=[your_values_standardised] (e.g. [1, 2]) and extremes_on_right_tail_only=<True/False> if only right tail of distribution is affected or both, #58, #87

  • minimal data length property (in total and for all subsets), #76

  • custom objects in model class to load customised model objects like padding class, loss, #72

  • new plot for data availability: PlotAvailability, #103

  • introduced (default) plot_list to specify which plots to draw

  • latex and markdown information on sample sizes for each station, #90

technical

  • implemented tests on gpu and from scratch for develop, release and master branches, #95

  • usage of tensorflow 1.13.1 (gpu / cpu), separated in 2 different requirements, #81

  • new abstract plot class to have uniform plot class design

  • New time tracking wrapper to use for functions or classes

  • improved logger (info on display, debug into file), #73, #85, #88

  • improved run environment, especially for error handling, #86

  • prefix general in data store scope is now optional and can be skipped. If given scope is not general, it is treated as subscope, #82

  • all 2D Padding classes are now selected by Padding2D(padding_name=<padding_type>) e.g. Padding2D(padding_name="SymPad2D"), #78

  • custom learning rate (or lr_decay) is optional now, #71