jcollado's picture
Card, work in progress
1756e41

Text preprocessing

This tokenizer has been trained with tweets that have been preprocessed as follows:

  1. User mentions (@user_name) have been replaced with the word user.
  2. URLs have been replace with the word url.
  3. WIP. If you are going to use this tokenizer, we recommend you to preprocess your own dataset in the same manner.