:py:mod:`mlair.data_handler.iterator`
=====================================

.. py:module:: mlair.data_handler.iterator


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   mlair.data_handler.iterator.StandardIterator
   mlair.data_handler.iterator.DataCollection
   mlair.data_handler.iterator.KerasIterator


Functions
~~~~~~~~~

.. autoapisummary::

   mlair.data_handler.iterator._save_to_pickle
   mlair.data_handler.iterator._get_batch
   mlair.data_handler.iterator._permute_data
   mlair.data_handler.iterator._get_number_of_mini_batches
   mlair.data_handler.iterator.f_proc


Attributes
~~~~~~~~~~

.. autoapisummary::

   mlair.data_handler.iterator.__author__
   mlair.data_handler.iterator.__date__


.. py:data:: __author__
   :annotation: = Lukas Leufen

   
.. py:data:: __date__
   :annotation: = 2020-07-07

   
.. py:class:: StandardIterator(collection: list)

   Bases: :py:obj:`collections.Iterator`

   .. py:attribute:: _position
      :annotation: :int

      
   .. py:method:: __next__(self)

      Return next element or stop iteration.


.. py:class:: DataCollection(collection: list = None, name: str = None)

   Bases: :py:obj:`collections.Iterable`

   .. py:method:: name(self)
      :property:


   .. py:method:: __len__(self)


   .. py:method:: __iter__(self) -> collections.Iterator


   .. py:method:: __getitem__(self, index)


   .. py:method:: add(self, element)


   .. py:method:: _set_mapping(self)


   .. py:method:: keys(self)


.. py:class:: KerasIterator(collection: DataCollection, batch_size: int, batch_path: str, shuffle_batches: bool = False, model=None, upsampling=False, name=None, use_multiprocessing=False, max_number_multiprocessing=1)

   Bases: :py:obj:`tensorflow.keras.utils.Sequence`

   Base object for fitting to a sequence of data, such as a dataset.

   Every `Sequence` must implement the `__getitem__` and the `__len__` methods.
   If you want to modify your dataset between epochs you may implement
   `on_epoch_end`.
   The method `__getitem__` should return a complete batch.

   Notes:

   `Sequence` are a safer way to do multiprocessing. This structure guarantees
   that the network will only train once
    on each sample per epoch which is not the case with generators.

   Examples:

   ```python
   from skimage.io import imread
   from skimage.transform import resize
   import numpy as np
   import math

   # Here, `x_set` is list of path to the images
   # and `y_set` are the associated classes.

   class CIFAR10Sequence(Sequence):

       def __init__(self, x_set, y_set, batch_size):
           self.x, self.y = x_set, y_set
           self.batch_size = batch_size

       def __len__(self):
           return math.ceil(len(self.x) / self.batch_size)

       def __getitem__(self, idx):
           batch_x = self.x[idx * self.batch_size:(idx + 1) *
           self.batch_size]
           batch_y = self.y[idx * self.batch_size:(idx + 1) *
           self.batch_size]

           return np.array([
               resize(imread(file_name), (200, 200))
                  for file_name in batch_x]), np.array(batch_y)
   ```

   .. py:method:: __len__(self) -> int

      Number of batch in the Sequence.

      :returns: The number of batches in the Sequence.


   .. py:method:: __getitem__(self, index: int) -> Tuple[numpy.ndarray, numpy.ndarray]

      Get batch for given index.


   .. py:method:: _get_model_rank(self)


   .. py:method:: __data_generation(self, index: int) -> Tuple[numpy.ndarray, numpy.ndarray]

      Load pickle data from disk.


   .. py:method:: _concatenate(new: List[numpy.ndarray], old: List[numpy.ndarray]) -> List[numpy.ndarray]
      :staticmethod:

      Concatenate two lists of data along axis=0.


   .. py:method:: _concatenate_multi(*args: List[numpy.ndarray]) -> List[numpy.ndarray]
      :staticmethod:

      Concatenate two lists of data along axis=0.


   .. py:method:: _prepare_batches(self, use_multiprocessing=False, max_process=1) -> None

      Prepare all batches as locally stored files.

      Walk through all elements of collection and split (or merge) data according to the batch size. Too long data
      sets are divided into multiple batches. Not fully filled batches are retained together with remains from the
      next collection elements. These retained data are concatenated and also split into batches. If data are still
      remaining afterwards, they are saved as final smaller batch. All batches are enumerated by a running index
      starting at 0. A list with all batch numbers is stored in class's parameter indexes. This method can either
      use a serial approach or use multiprocessing to decrease computational time.


   .. py:method:: _cleanup_path(path: str, create_new: bool = True) -> None
      :staticmethod:

      First remove existing path, second create empty path if enabled.


   .. py:method:: on_epoch_end(self) -> None

      Randomly shuffle indexes if enabled.


.. py:function:: _save_to_pickle(path, X: List[numpy.ndarray], Y: List[numpy.ndarray], index: int) -> None

   Save data as pickle file with variables X and Y and given index as <index>.pickle .


.. py:function:: _get_batch(data_list: List[numpy.ndarray], b: int, batch_size: int) -> List[numpy.ndarray]

   Get batch according to batch size from data list.


.. py:function:: _permute_data(X, Y)


.. py:function:: _get_number_of_mini_batches(number_of_samples: int, batch_size: int) -> int

   Return number of mini batches as the floored ration of number of samples to batch size.


.. py:function:: f_proc(data, upsampling, mod_rank, batch_size, _path, index)