erksch commited on
Commit
f9a58b3
ยท
1 Parent(s): b705f0e

Set strip_accents to true in tokenizer_config.json

Browse files
Files changed (2) hide show
  1. README.md +2 -0
  2. tokenizer_config.json +1 -1
README.md CHANGED
@@ -3,6 +3,8 @@ language: de
3
  license: mit
4
  ---
5
 
 
 
6
  # ๐Ÿค— + ๐Ÿ“š dbmdz German BERT models
7
 
8
  In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
 
3
  license: mit
4
  ---
5
 
6
+ This is a fork of [dbmdz/bert-base-german-uncased](https://huggingface.co/dbmdz/bert-base-german-uncased) with `strip_accents` being set to `true` in the tokenizer.
7
+
8
  # ๐Ÿค— + ๐Ÿ“š dbmdz German BERT models
9
 
10
  In this repository the MDZ Digital Library team (dbmdz) at the Bavarian State
tokenizer_config.json CHANGED
@@ -1 +1 @@
1
- {"do_lower_case": true, "max_len": 512, "init_inputs": []}
 
1
+ {"do_lower_case": true, "max_len": 512, "init_inputs": [], "strip_accents": true}