nalp.core

The core is the core. Essentially, it is the parent of everything. You should find parent classes defining the basic of our structure. They should provide variables and methods that will help to construct other modules.

A core package, containing all the basic class and functions that serves as the foundation of NALP common modules.

class nalp.core.Adversarial(discriminator: Discriminator, generator: Generator, name: Optional[str] = '')

Bases: tensorflow.keras.Model

An Adversarial class is responsible for customly implementing Generative Adversarial Networks.

property D(self)

Discriminator architecture.

property G(self)

Generator architecture.

__init__(self, discriminator: Discriminator, generator: Generator, name: Optional[str] = '')

Initialization method.

Parameters
  • discriminator – Network’s discriminator architecture.

  • generator – Network’s generator architecture.

  • name – The model’s identifier string.

_discriminator_loss(self, y_real: tensorflow.Tensor, y_fake: tensorflow.Tensor)

Calculates the loss out of the discriminator architecture.

Parameters
  • y_real – A tensor containing the real data targets.

  • y_fake – A tensor containing the fake data targets.

Returns

The loss based on the discriminator network.

Return type

(tf.Tensor)

_generator_loss(self, y_fake: tensorflow.Tensor)

Calculates the loss out of the generator architecture.

Parameters

y_fake – A tensor containing the fake data targets.

Returns

The loss based on the generator network.

Return type

(tf.Tensor)

compile(self, d_optimizer: tensorflow.keras.optimizers, g_optimizer: tensorflow.keras.optimizers)

Main building method.

Parameters
  • d_optimizer – An optimizer instance for the discriminator.

  • g_optimizer – An optimizer instance for the generator.

fit(self, batches: nalp.core.dataset.Dataset, epochs: Optional[int] = 100)

Trains the model.

Parameters
  • batches – Training batches containing samples.

  • epochs – The maximum number of training epochs.

property history(self)

History dictionary.

step(self, x: tensorflow.Tensor)

Performs a single batch optimization step.

Parameters

x – A tensor containing the inputs.

class nalp.core.Corpus(min_frequency: Optional[int] = 1)

A Corpus class is used to defined the first step of the workflow.

It serves as a basis class to load raw text, audio and sentences.

Note that this class only provides basic properties and methods that are invoked by its childs, thus, it should not be instantiated.

__init__(self, min_frequency: Optional[int] = 1)

Initialization method.

_build(self)

Builds the vocabulary based on the tokens.

_check_token_frequency(self)

Cuts tokens that do not meet a minimum frequency value.

_create_tokenizer(self, corpus_type: str)

Creates a tokenizer based on the input type.

Parameters

corpus_type – A type to create the tokenizer. Should be char or word.

Returns

The created tokenizer.

Return type

(callable)

property index_vocab(self)

Maps indexes to vocabulary tokens.

property min_frequency(self)

Minimum token frequency.

property tokens(self)

List of input tokens.

property vocab(self)

Vocabulary tokens.

property vocab_index(self)

Maps vocabulary tokens to indexes.

property vocab_size(self)

Vocabulary size.

class nalp.core.Dataset(shuffle: Optional[bool] = True)

A Dataset class is responsible for receiving encoded tokens and persisting data that will be feed as an input to the networks.

__init__(self, shuffle: Optional[bool] = True)

Initialization method.

Parameters

shuffle – Whether batches should be shuffled or not.

_build(self, sliced_data: tensorflow.Tensor, batch_size: int)

Builds the batches based on the pre-processed images.

Parameters
  • sliced_data – Slices of tensor-based data.

  • batch_size – Size of batches.

property batches(self)

An instance of tensorflow’s dataset batches.

property shuffle(self)

Whether data should be shuffled or not.

class nalp.core.Discriminator(name: Optional[str] = '')

Bases: tensorflow.keras.Model

A Discriminator class is responsible for easily-implementing the discriminative part of a neural network, when custom training or additional sets are not needed.

__init__(self, name: Optional[str] = '')

Initialization method.

Note that basic variables shared by all childs should be declared here, e.g., layers.

Parameters

name – The model’s identifier string.

abstract call(self, x: tensorflow.Tensor, training: Optional[bool] = True)

Method that holds vital information whenever this class is called.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own forward pass implementation.

Parameters
  • x – A tensorflow’s tensor holding input data.

  • training – Whether architecture is under training or not.

Raises

NotImplementedError.

class nalp.core.Encoder

An Encoder class is responsible for receiving a Corpus and enconding it on a representation (i.e., integer, word2vec).

abstract decode(self)

This method decodes the encoded representation. Also, note that you need to define your own encoding algorithm when using its childs.

Raises

NotImplementedError.

abstract encode(self)

This method encodes new data based on previous learning. Also, note that you need to define your own encoding algorithm when using its childs.

Raises

NotImplementedError.

property encoder(self)

An encoder generic object.

abstract learn(self)

This method learns an encoding representation. Note that for each child, you need to define your own learning algorithm (representation).

Raises

NotImplementedError.

class nalp.core.Generator(name: Optional[str] = '')

Bases: tensorflow.keras.Model

A Generator class is responsible for easily-implementing the generative part of a neural network, when custom training or additional sets are not needed.

__init__(self, name: Optional[str] = '')

Initialization method.

Note that basic variables shared by all childs should be declared here, e.g., layers.

Parameters

name – The model’s identifier string.

abstract call(self, x: tensorflow.Tensor, training: Optional[bool] = True)

Method that holds vital information whenever this class is called.

Note that you will need to implement this method directly on its child. Essentially, each neural network has its own forward pass implementation.

Parameters
  • x – A tensorflow’s tensor holding input data.

  • training – Whether architecture is under training or not.

Raises

NotImplementedError.

Generates text by using greedy search, where the sampled token is always sampled according to the maximum probability.

Parameters
  • start – The start string to generate the text.

  • max_length – Maximum length of generated text.

Returns

Generated text.

Return type

(List[str])

generate_temperature_sampling(self, start: str, max_length: Optional[int] = 100, temperature: Optional[float] = 1.0)

Generates text by using temperature sampling, where the sampled token is sampled according to a multinomial/categorical distribution.

Parameters
  • start – The start string to generate the text.

  • max_length – Length of generated text.

  • temperature – A temperature value to sample the token.

Returns

Generated text.

Return type

(List[str])

generate_top_sampling(self, start: str, max_length: Optional[int] = 100, k: Optional[int] = 0, p: Optional[float] = 0.0)

Generates text by using top-k and top-p sampling, where the sampled token is sampled according to the k most likely words distribution, as well as to the maximum cumulative probability p.

Parameters
  • start – The start string to generate the text.

  • max_length – Length of generated text.

  • k – Indicates the amount of likely words.

  • p – Maximum cumulative probability to be thresholded.

Returns

Generated text.

Return type

(List[str])