nalp.datasets¶
Because we need data, right? Datasets are composed by classes and methods that allow to prepare data for further neural networks.
A dataset package to transform encoded data into real datasets.
- class nalp.datasets.ImageDataset(images: numpy.array, batch_size: Optional[int] = 256, shape: Optional[Tuple[int, int]] = None, normalize: Optional[bool] = True, shuffle: Optional[bool] = True)¶
Bases:
nalp.core.Dataset
An ImageDataset class is responsible for creating a dataset that encodes images for adversarial generation.
- __init__(self, images: numpy.array, batch_size: Optional[int] = 256, shape: Optional[Tuple[int, int]] = None, normalize: Optional[bool] = True, shuffle: Optional[bool] = True)¶
Initialization method.
- Parameters
images – An array of images.
batch_size – Size of batches.
shape – A tuple containing the shape if the array should be forced to reshape.
normalize – Whether images should be normalized between -1 and 1.
shuffle – Whether batches should be shuffled or not.
- _preprocess(self, images: numpy.array, shape: Tuple[int, int], normalize: bool)¶
Pre-process an array of images by reshaping and normalizing, if necessary.
- Parameters
images – An array of images.
shape – A tuple containing the shape if the array should be forced to reshape.
normalize – Whether images should be normalized between -1 and 1.
- Returns
Slices of pre-processed tensor-based images.
- Return type
(tf.data.Dataset)
- class nalp.datasets.LanguageModelingDataset(encoded_tokens: numpy.array, max_contiguous_pad_length: Optional[int] = 1, batch_size: Optional[int] = 64, shuffle: Optional[bool] = True)¶
Bases:
nalp.core.Dataset
A LanguageModelingDataset class is responsible for creating a dataset that predicts the next timestep (t+1) given a timestep (t).
- __init__(self, encoded_tokens: numpy.array, max_contiguous_pad_length: Optional[int] = 1, batch_size: Optional[int] = 64, shuffle: Optional[bool] = True)¶
Initialization method.
- Parameters
encoded_tokens – An array of encoded tokens.
max_contiguous_pad_length – Maximum length to pad contiguous text.
batch_size – Size of batches.
shuffle – Whether batches should be shuffled or not.
- _create_input_target(self, sequence: tensorflow.Tensor)¶
Creates input (t) and targets (t+1) using the next timestep approach.
- Parameters
sequence – A tensor holding the sequence to be mapped.
- Returns
Input and target tensors.
- Return type
(Tuple[tf.Tensor, tf.Tensor])
- _create_sequences(self, encoded_tokens: numpy.array, rank: int, max_contiguous_pad_length: int)¶
Creates sequences of the desired length.
- Parameters
encoded_tokens – An array of encoded tokens.
rank – Number of array dimensions (rank).
max_contiguous_pad_length – Maximum sequences’ length.
- Returns
Slices of tensor-based sequences.
- Return type
(tf.data.Dataset)