Text preprocessing
This tokenizer has been trained with tweets that have been preprocessed as follows:
- User mentions (@user_name) have been replaced with the word user.
- URLs have been replace with the word url.
- WIP. If you are going to use this tokenizer, we recommend you to preprocess your own dataset in the same manner.