mlair.data_handler.iterator
¶
Module Contents¶
Classes¶
Base object for fitting to a sequence of data, such as a dataset. |
Functions¶
|
Save data as pickle file with variables X and Y and given index as <index>.pickle . |
|
Get batch according to batch size from data list. |
|
|
|
Return number of mini batches as the floored ration of number of samples to batch size. |
|
Attributes¶
-
mlair.data_handler.iterator.
__date__
= 2020-07-07¶
-
class
mlair.data_handler.iterator.
StandardIterator
(collection: list)¶ Bases:
collections.Iterator
-
_position
:int¶
-
__next__
(self)¶ Return next element or stop iteration.
-
-
class
mlair.data_handler.iterator.
DataCollection
(collection: list = None, name: str = None)¶ Bases:
collections.Iterable
-
property
name
(self)¶
-
__len__
(self)¶
-
__iter__
(self) → collections.Iterator¶
-
__getitem__
(self, index)¶
-
add
(self, element)¶
-
_set_mapping
(self)¶
-
keys
(self)¶
-
property
-
class
mlair.data_handler.iterator.
KerasIterator
(collection: DataCollection, batch_size: int, batch_path: str, shuffle_batches: bool = False, model=None, upsampling=False, name=None, use_multiprocessing=False, max_number_multiprocessing=1)¶ Bases:
tensorflow.keras.utils.Sequence
Base object for fitting to a sequence of data, such as a dataset.
Every Sequence must implement the __getitem__ and the __len__ methods. If you want to modify your dataset between epochs you may implement on_epoch_end. The method __getitem__ should return a complete batch.
Notes:
Sequence are a safer way to do multiprocessing. This structure guarantees that the network will only train once
on each sample per epoch which is not the case with generators.
Examples:
```python from skimage.io import imread from skimage.transform import resize import numpy as np import math
# Here, x_set is list of path to the images # and y_set are the associated classes.
class CIFAR10Sequence(Sequence):
- def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set self.batch_size = batch_size
- def __len__(self):
return math.ceil(len(self.x) / self.batch_size)
- def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
- return np.array([
- resize(imread(file_name), (200, 200))
for file_name in batch_x]), np.array(batch_y)
-
__len__
(self) → int¶ Number of batch in the Sequence.
- Returns
The number of batches in the Sequence.
-
__getitem__
(self, index: int) → Tuple[numpy.ndarray, numpy.ndarray]¶ Get batch for given index.
-
_get_model_rank
(self)¶
-
__data_generation
(self, index: int) → Tuple[numpy.ndarray, numpy.ndarray]¶ Load pickle data from disk.
-
static
_concatenate
(new: List[numpy.ndarray], old: List[numpy.ndarray]) → List[numpy.ndarray]¶ Concatenate two lists of data along axis=0.
-
static
_concatenate_multi
(*args: List[numpy.ndarray]) → List[numpy.ndarray]¶ Concatenate two lists of data along axis=0.
-
_prepare_batches
(self, use_multiprocessing=False, max_process=1) → None¶ Prepare all batches as locally stored files.
Walk through all elements of collection and split (or merge) data according to the batch size. Too long data sets are divided into multiple batches. Not fully filled batches are retained together with remains from the next collection elements. These retained data are concatenated and also split into batches. If data are still remaining afterwards, they are saved as final smaller batch. All batches are enumerated by a running index starting at 0. A list with all batch numbers is stored in class’s parameter indexes. This method can either use a serial approach or use multiprocessing to decrease computational time.
-
mlair.data_handler.iterator.
_save_to_pickle
(path, X: List[numpy.ndarray], Y: List[numpy.ndarray], index: int) → None¶ Save data as pickle file with variables X and Y and given index as <index>.pickle .
-
mlair.data_handler.iterator.
_get_batch
(data_list: List[numpy.ndarray], b: int, batch_size: int) → List[numpy.ndarray]¶ Get batch according to batch size from data list.
-
mlair.data_handler.iterator.
_permute_data
(X, Y)¶
-
mlair.data_handler.iterator.
_get_number_of_mini_batches
(number_of_samples: int, batch_size: int) → int¶ Return number of mini batches as the floored ration of number of samples to batch size.
-
mlair.data_handler.iterator.
f_proc
(data, upsampling, mod_rank, batch_size, _path, index)¶