Set strip_accents to true in tokenizer_config.json
Browse files- README.md +2 -0
- tokenizer_config.json +1 -1
README.md
CHANGED
@@ -3,6 +3,8 @@ language: de
|
|
3 |
license: mit
|
4 |
---
|
5 |
|
|
|
|
|
6 |
# ๐ค + ๐ dbmdz German BERT models
|
7 |
|
8 |
In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
|
|
|
3 |
license: mit
|
4 |
---
|
5 |
|
6 |
+
This is a fork of [dbmdz/bert-base-german-uncased](https://huggingface.co/dbmdz/bert-base-german-uncased) with `strip_accents` being set to `true` in the tokenizer.
|
7 |
+
|
8 |
# ๐ค + ๐ dbmdz German BERT models
|
9 |
|
10 |
In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
|
tokenizer_config.json
CHANGED
@@ -1 +1 @@
|
|
1 |
-
{"do_lower_case": true, "max_len": 512, "init_inputs": []}
|
|
|
1 |
+
{"do_lower_case": true, "max_len": 512, "init_inputs": [], "strip_accents": true}
|