jcollado commited on
Commit
1756e41
·
1 Parent(s): b3dbfef

Card, work in progress

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ # Text preprocessing
2
+
3
+ This tokenizer has been trained with tweets that have been preprocessed as follows:
4
+
5
+ 1) User mentions (@user_name) have been replaced with the word *user*.
6
+ 2) URLs have been replace with the word *url*.
7
+ 3) WIP.
8
+ If you are going to use this tokenizer, we recommend you to preprocess your own dataset in the same manner.