mlair.data_handler.iterator

Module Contents

Classes

StandardIterator

DataCollection

KerasIterator

Base object for fitting to a sequence of data, such as a dataset.

Functions

_save_to_pickle(path, X: List[numpy.ndarray], Y: List[numpy.ndarray], index: int) → None

Save data as pickle file with variables X and Y and given index as <index>.pickle .

_get_batch(data_list: List[numpy.ndarray], b: int, batch_size: int) → List[numpy.ndarray]

Get batch according to batch size from data list.

_permute_data(X, Y)

_get_number_of_mini_batches(number_of_samples: int, batch_size: int) → int

Return number of mini batches as the floored ration of number of samples to batch size.

f_proc(data, upsampling, mod_rank, batch_size, _path, index)

Attributes

__author__

__date__

mlair.data_handler.iterator.__author__ = Lukas Leufen
mlair.data_handler.iterator.__date__ = 2020-07-07
class mlair.data_handler.iterator.StandardIterator(collection: list)

Bases: collections.Iterator

_position :int
__next__(self)

Return next element or stop iteration.

class mlair.data_handler.iterator.DataCollection(collection: list = None, name: str = None)

Bases: collections.Iterable

property name(self)
__len__(self)
__iter__(self) → collections.Iterator
__getitem__(self, index)
add(self, element)
_set_mapping(self)
keys(self)
class mlair.data_handler.iterator.KerasIterator(collection: DataCollection, batch_size: int, batch_path: str, shuffle_batches: bool = False, model=None, upsampling=False, name=None, use_multiprocessing=False, max_number_multiprocessing=1)

Bases: tensorflow.keras.utils.Sequence

Base object for fitting to a sequence of data, such as a dataset.

Every Sequence must implement the __getitem__ and the __len__ methods. If you want to modify your dataset between epochs you may implement on_epoch_end. The method __getitem__ should return a complete batch.

Notes:

Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once

on each sample per epoch which is not the case with generators.

Examples:

```python from skimage.io import imread from skimage.transform import resize import numpy as np import math

# Here, x_set is list of path to the images # and y_set are the associated classes.

class CIFAR10Sequence(Sequence):

def __init__(self, x_set, y_set, batch_size):

self.x, self.y = x_set, y_set self.batch_size = batch_size

def __len__(self):

return math.ceil(len(self.x) / self.batch_size)

def __getitem__(self, idx):

batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]

return np.array([
resize(imread(file_name), (200, 200))

for file_name in batch_x]), np.array(batch_y)

```

__len__(self)int

Number of batch in the Sequence.

Returns

The number of batches in the Sequence.

__getitem__(self, index: int) → Tuple[numpy.ndarray, numpy.ndarray]

Get batch for given index.

_get_model_rank(self)
__data_generation(self, index: int) → Tuple[numpy.ndarray, numpy.ndarray]

Load pickle data from disk.

static _concatenate(new: List[numpy.ndarray], old: List[numpy.ndarray]) → List[numpy.ndarray]

Concatenate two lists of data along axis=0.

static _concatenate_multi(*args: List[numpy.ndarray]) → List[numpy.ndarray]

Concatenate two lists of data along axis=0.

_prepare_batches(self, use_multiprocessing=False, max_process=1)None

Prepare all batches as locally stored files.

Walk through all elements of collection and split (or merge) data according to the batch size. Too long data sets are divided into multiple batches. Not fully filled batches are retained together with remains from the next collection elements. These retained data are concatenated and also split into batches. If data are still remaining afterwards, they are saved as final smaller batch. All batches are enumerated by a running index starting at 0. A list with all batch numbers is stored in class’s parameter indexes. This method can either use a serial approach or use multiprocessing to decrease computational time.

static _cleanup_path(path: str, create_new: bool = True)None

First remove existing path, second create empty path if enabled.

on_epoch_end(self)None

Randomly shuffle indexes if enabled.

mlair.data_handler.iterator._save_to_pickle(path, X: List[numpy.ndarray], Y: List[numpy.ndarray], index: int)None

Save data as pickle file with variables X and Y and given index as <index>.pickle .

mlair.data_handler.iterator._get_batch(data_list: List[numpy.ndarray], b: int, batch_size: int) → List[numpy.ndarray]

Get batch according to batch size from data list.

mlair.data_handler.iterator._permute_data(X, Y)
mlair.data_handler.iterator._get_number_of_mini_batches(number_of_samples: int, batch_size: int)int

Return number of mini batches as the floored ration of number of samples to batch size.

mlair.data_handler.iterator.f_proc(data, upsampling, mod_rank, batch_size, _path, index)