jcollado commited on
Commit
a6beb47
·
1 Parent(s): bbb094c

Card template added.

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Text preprocessing
2
+
3
+ This tokenizer has been trained with tweets that have been preprocessed as follows:
4
+
5
+ 1) User mentions (@user_name) have been replaced with the word *user*.
6
+ 2) URLs have been replace with the word *url*.
7
+ 3) WIP.
8
+
9
+ If you are going to use this tokenizer, we recommend you to preprocess your own dataset in the same manner.