File size: 341 Bytes
1756e41
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
# Text preprocessing

This tokenizer has been trained with tweets that have been preprocessed as follows:

1) User mentions (@user_name) have been replaced with the word *user*.
2) URLs have been replace with the word *url*.
3) WIP.
If you are going to use this tokenizer, we recommend you to preprocess your own dataset in the same manner.