:py:mod:`mlair.data_handler.iterator` ===================================== .. py:module:: mlair.data_handler.iterator Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: mlair.data_handler.iterator.StandardIterator mlair.data_handler.iterator.DataCollection mlair.data_handler.iterator.KerasIterator Functions ~~~~~~~~~ .. autoapisummary:: mlair.data_handler.iterator._save_to_pickle mlair.data_handler.iterator._get_batch mlair.data_handler.iterator._permute_data mlair.data_handler.iterator._get_number_of_mini_batches mlair.data_handler.iterator.f_proc Attributes ~~~~~~~~~~ .. autoapisummary:: mlair.data_handler.iterator.__author__ mlair.data_handler.iterator.__date__ .. py:data:: __author__ :annotation: = Lukas Leufen .. py:data:: __date__ :annotation: = 2020-07-07 .. py:class:: StandardIterator(collection: list) Bases: :py:obj:`collections.Iterator` .. py:attribute:: _position :annotation: :int .. py:method:: __next__(self) Return next element or stop iteration. .. py:class:: DataCollection(collection: list = None, name: str = None) Bases: :py:obj:`collections.Iterable` .. py:method:: name(self) :property: .. py:method:: __len__(self) .. py:method:: __iter__(self) -> collections.Iterator .. py:method:: __getitem__(self, index) .. py:method:: add(self, element) .. py:method:: _set_mapping(self) .. py:method:: keys(self) .. py:class:: KerasIterator(collection: DataCollection, batch_size: int, batch_path: str, shuffle_batches: bool = False, model=None, upsampling=False, name=None, use_multiprocessing=False, max_number_multiprocessing=1) Bases: :py:obj:`tensorflow.keras.utils.Sequence` Base object for fitting to a sequence of data, such as a dataset. Every `Sequence` must implement the `__getitem__` and the `__len__` methods. If you want to modify your dataset between epochs you may implement `on_epoch_end`. The method `__getitem__` should return a complete batch. Notes: `Sequence` are a safer way to do multiprocessing. This structure guarantees that the network will only train once on each sample per epoch which is not the case with generators. Examples: ```python from skimage.io import imread from skimage.transform import resize import numpy as np import math # Here, `x_set` is list of path to the images # and `y_set` are the associated classes. class CIFAR10Sequence(Sequence): def __init__(self, x_set, y_set, batch_size): self.x, self.y = x_set, y_set self.batch_size = batch_size def __len__(self): return math.ceil(len(self.x) / self.batch_size) def __getitem__(self, idx): batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size] return np.array([ resize(imread(file_name), (200, 200)) for file_name in batch_x]), np.array(batch_y) ``` .. py:method:: __len__(self) -> int Number of batch in the Sequence. :returns: The number of batches in the Sequence. .. py:method:: __getitem__(self, index: int) -> Tuple[numpy.ndarray, numpy.ndarray] Get batch for given index. .. py:method:: _get_model_rank(self) .. py:method:: __data_generation(self, index: int) -> Tuple[numpy.ndarray, numpy.ndarray] Load pickle data from disk. .. py:method:: _concatenate(new: List[numpy.ndarray], old: List[numpy.ndarray]) -> List[numpy.ndarray] :staticmethod: Concatenate two lists of data along axis=0. .. py:method:: _concatenate_multi(*args: List[numpy.ndarray]) -> List[numpy.ndarray] :staticmethod: Concatenate two lists of data along axis=0. .. py:method:: _prepare_batches(self, use_multiprocessing=False, max_process=1) -> None Prepare all batches as locally stored files. Walk through all elements of collection and split (or merge) data according to the batch size. Too long data sets are divided into multiple batches. Not fully filled batches are retained together with remains from the next collection elements. These retained data are concatenated and also split into batches. If data are still remaining afterwards, they are saved as final smaller batch. All batches are enumerated by a running index starting at 0. A list with all batch numbers is stored in class's parameter indexes. This method can either use a serial approach or use multiprocessing to decrease computational time. .. py:method:: _cleanup_path(path: str, create_new: bool = True) -> None :staticmethod: First remove existing path, second create empty path if enabled. .. py:method:: on_epoch_end(self) -> None Randomly shuffle indexes if enabled. .. py:function:: _save_to_pickle(path, X: List[numpy.ndarray], Y: List[numpy.ndarray], index: int) -> None Save data as pickle file with variables X and Y and given index as .pickle . .. py:function:: _get_batch(data_list: List[numpy.ndarray], b: int, batch_size: int) -> List[numpy.ndarray] Get batch according to batch size from data list. .. py:function:: _permute_data(X, Y) .. py:function:: _get_number_of_mini_batches(number_of_samples: int, batch_size: int) -> int Return number of mini batches as the floored ration of number of samples to batch size. .. py:function:: f_proc(data, upsampling, mod_rank, batch_size, _path, index)