
Text or Numbers? Encodings are used to make embeddings. Embeddings are used to feed into neural networks. Remember that networks cannot read raw data, therefore you might want to pre-encode your data using well-known encoders.

An encoding package, containing encoders, decoders and all text-to-vector necessities.

class nalp.encoders.IntegerEncoder

Bases: nalp.core.encoder.Encoder

An IntegerEncoder class is responsible for encoding text into integers.


Initizaliation method.

decode(self, encoded_tokens: numpy.array)

Decodes the encoding back to tokens.


encoded_tokens – A numpy array containing the encoded tokens.


Decoded tokens.

Return type


property decoder(self)

Decoder dictionary.

encode(self, tokens: List[str])

Encodes new tokens based on previous learning.


tokens – A list of tokens to be encoded.


Encoded tokens.

Return type


learn(self, dictionary: Dict[str, Any], reverse_dictionary: Dict[str, Any])

Learns an integer vectorization encoding.

  • dictionary – The vocabulary to index mapping.

  • reverse_dictionary – The index to vocabulary mapping.

class nalp.encoders.Word2vecEncoder

Bases: nalp.core.encoder.Encoder

A Word2vecEncoder class is responsible for learning a Word2Vec encode and further encoding new data.


Initizaliation method.

decode(self, encoded_tokens: numpy.array)

Decodes the encoding back to tokens.


encoded_tokens – A numpy array containing the encoded tokens.


Decoded tokens.

Return type


encode(self, tokens: List[str])

Encodes the data into a Word2Vec representation.


tokens – Tokens to be encoded.

learn(self, tokens: List[str], max_features: Optional[int] = 128, window_size: Optional[int] = 5, min_count: Optional[int] = 1, algorithm: Optional[bool] = 0, learning_rate: Optional[float] = 0.01, iterations: Optional[int] = 1000)

Learns a Word2Vec representation based on the its methodology.

One can use CBOW or Skip-gram algorithm for the learning procedure.

  • tokens – A list of tokens.

  • max_features – Maximum number of features to be fitted.

  • window_size – Maximum distance between current and predicted word.

  • min_count – Minimum count of words for its use.

  • algorithm – 1 for skip-gram, while 0 for CBOW.

  • learning_rate – Value of the learning rate.

  • iterations – Number of iterations.