E3-JSI
/

gliner-multi-pii-domains-v1

Token Classification

GLiNER

PyTorch

Safetensors

Model card Files Files and versions Community

eriknovak commited on Aug 14, 2024

Commit

4ac4f51

verified ·

1 Parent(s): f6fb544

Update README.md

Browse files

Files changed (1) hide show

README.md +47 -12

README.md CHANGED Viewed

@@ -22,11 +22,28 @@ GLiNER is a Named Entity Recognition (NER) model capable of identifying any enti
 This model has been trained by fine-tuning `urchade/gliner_multi_pii-v1` on the synthetic dataset covering PPIs for the domains: `healthcare`, `finance`, `legal`, `banking` and `general`.
 This model is capable of recognizing various types of *personally identifiable information* (PII), including but not limited to these entity types: `person`, `organization`, `phone number`, `address`, `passport number`, `email`, `credit card number`, `social security number`, `health insurance id number`, `date of birth`, `mobile phone number`, `bank account number`, `medication`, `cpf`, `driver's license number`, `tax identification number`, `medical condition`, `identity card number`, `national id number`, `ip address`, `email address`, `iban`, `credit card expiration date`, `username`, `health insurance number`, `registration number`, `student id number`, `insurance number`, `flight number`, `landline phone number`, `blood type`, `cvv`, `reservation number`, `digital signature`, `social media handle`, `license plate number`, `cnpj`, `postal code`, `passport number`, `serial number`, `vehicle registration number`, `credit card brand`, `fax number`, `visa number`, `insurance company`, `identity document number`, `transaction number`, `national health insurance number`, `cvc`, `birth certificate number`, `train ticket number`, `passport expiration date`, and `social security number`.
-## English example
 ```python
 text = """
 Medical Record
@@ -46,17 +63,20 @@ Next Examination Date:
 15-11-2024
 """
-# Labels for entity prediction
 labels = ["name", "social security number", "date of birth", "date"]
-# Perform entity prediction
-entities = trained_model.predict_entities(text, labels, threshold=0.5)
-# Display predicted entities and their labels
 for entity in entities:
     print(entity["text"], "=>", entity["label"])
 ```
 ```text
 John Doe => name
 15-01-1985 => date of birth
@@ -66,9 +86,17 @@ John Doe => name
 15-11-2024 => date
 ```
-## Dutch example
 ```python
 text = """
 Medisch dossier
@@ -89,17 +117,20 @@ Volgende onderzoekdatum:
 15-11-2024
 """
-# Labels for entity prediction
 labels = ["naam", "bmurgerservicenummer", "geboortedatum", "datum"]
-# Perform entity prediction
-entities = trained_model.predict_entities(text, labels, threshold=0.2)
-# Display predicted entities and their labels
 for entity in entities:
     print(entity["text"], "=>", entity["label"])
 ```
 ```text
 Jan de Vries => naam
 15-01-1985 => geboortedatum
@@ -107,4 +138,8 @@ Jan de Vries => naam
 987-65-4321 => bmurgerservicenummer
 Jan de Vries => naam
 15-11-2024 => datum
-```

 This model has been trained by fine-tuning `urchade/gliner_multi_pii-v1` on the synthetic dataset covering PPIs for the domains: `healthcare`, `finance`, `legal`, `banking` and `general`.
 This model is capable of recognizing various types of *personally identifiable information* (PII), including but not limited to these entity types: `person`, `organization`, `phone number`, `address`, `passport number`, `email`, `credit card number`, `social security number`, `health insurance id number`, `date of birth`, `mobile phone number`, `bank account number`, `medication`, `cpf`, `driver's license number`, `tax identification number`, `medical condition`, `identity card number`, `national id number`, `ip address`, `email address`, `iban`, `credit card expiration date`, `username`, `health insurance number`, `registration number`, `student id number`, `insurance number`, `flight number`, `landline phone number`, `blood type`, `cvv`, `reservation number`, `digital signature`, `social media handle`, `license plate number`, `cnpj`, `postal code`, `passport number`, `serial number`, `vehicle registration number`, `credit card brand`, `fax number`, `visa number`, `insurance company`, `identity document number`, `transaction number`, `national health insurance number`, `cvc`, `birth certificate number`, `train ticket number`, `passport expiration date`, and `social security number`.
+## Usage
+To use the model, one must use the [GLiNER](https://github.com/urchade/GLiNER) library. Once installed, the user can load the model and use it to discern the entities within the text.
+```bash
+pip install gliner
+```
+What follows are some examples of its intended use.
+### Extract entities from English medical text
 ```python
+from gliner import GLiNER
+# initialize the GLiNER using this model
+model = GLiNER.from_pretrained("E3-JSI/gliner-multi-pii-domains-v1")
+# prepare the text for entity extraction
 text = """
 Medical Record
 15-11-2024
 """
+# prepare the labels/entities to be extracted
+# this model should work best when entity types are in lowercase
 labels = ["name", "social security number", "date of birth", "date"]
+# perform entity extraction
+entities = model.predict_entities(text, labels, threshold=0.5)
+# display predicted entities and their labels
 for entity in entities:
     print(entity["text"], "=>", entity["label"])
 ```
+**Expected output**
 ```text
 John Doe => name
 15-01-1985 => date of birth
 15-11-2024 => date
 ```
+### Extract entities from Dutch medical text
 ```python
+from gliner import GLiNER
+# initialize the GLiNER using this model
+model = GLiNER.from_pretrained("E3-JSI/gliner-multi-pii-domains-v1")
+# prepare the text for entity extraction
 text = """
 Medisch dossier
 15-11-2024
 """
+# prepare the labels/entities to be extracted
+# this model should work best when entity types are in lowercase
 labels = ["naam", "bmurgerservicenummer", "geboortedatum", "datum"]
+# perform entity extraction
+entities = model.predict_entities(text, labels, threshold=0.2)
+# display predicted entities and their labels
 for entity in entities:
     print(entity["text"], "=>", entity["label"])
 ```
+**Expected output**
 ```text
 Jan de Vries => naam
 15-01-1985 => geboortedatum
 987-65-4321 => bmurgerservicenummer
 Jan de Vries => naam
 15-11-2024 => datum
+```
+## Aknowledgements
+ Funded by the European Union. UK participants in Horizon Europe Project PREPARE are supported by UKRI grant number 10086219 (Trilateral Research). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Health and Digital Executive Agency (HADEA) or UKRI. Neither the European Union nor the granting authority nor UKRI can be held responsible for them. Grant Agreement 101080288 PREPARE HORIZON-HLTH-2022-TOOL-12-01.