ITG
/

DialoGPT-medium-spanish-chitchat

@@ -98,7 +98,11 @@ You can check the [original GitHub repository](https://github.com/microsoft/Dial
 ## Limitations
-- This model uses the original English-based tokenizer from [GPT-2 paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf). Spanish tokenization is not considered but it has similarities in grammatical structure for encoding text. This overlap may help the model transfer its knowledge from English to Spanish.
 - This model is intended to be used **just for single-turn chitchat conversations in Spanish**.
 - This model's generation capabilities are limited to the extent of the aforementioned fine-tuning dataset.
 - This model generates short answers, providing general context dialogue in a professional style.

 ## Limitations
+- This model uses the original English-based tokenizer from [GPT-2 paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf).
+  Spanish tokenization is not considered but it has similarities in grammatical structure for encoding text. This overlap may help the model transfer its knowledge from English to Spanish.
+  Moreover, the BPE (Byte Pair Encoding) implementation of the GPT-2 tokenizer **can assign a representation to every Unicode string**.
+    **From the GPT-2 paper**:
+    > Since our approach can assign a probability to any Unicode string, this allows us to evaluate our LMs on any dataset regardless of pre-processing, tokenization, or vocab size.
 - This model is intended to be used **just for single-turn chitchat conversations in Spanish**.
 - This model's generation capabilities are limited to the extent of the aforementioned fine-tuning dataset.
 - This model generates short answers, providing general context dialogue in a professional style.