Token Classification
GLiNER
PyTorch
Safetensors
Erik Novak commited on
Commit
510c82c
·
verified ·
1 Parent(s): 39f8623

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -0
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - fr
6
+ - de
7
+ - es
8
+ - pt
9
+ - it
10
+ - sl
11
+ - el
12
+ - nl
13
+ library_name: gliner
14
+ pipeline_tag: token-classification
15
+ ---
16
+
17
+
18
+ # Model Card for GLiNER PII Domains
19
+
20
+ GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoder (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios.
21
+
22
+ This model has been trained by fine-tuning `urchade/gliner_multi_pii-v1` on the synthetic dataset covering PPIs for the domains: `healthcare`, `finance`, `legal`, `banking` and `general`.
23
+
24
+ This model is capable of recognizing various types of *personally identifiable information* (PII), including but not limited to these entity types: `person`, `organization`, `phone number`, `address`, `passport number`, `email`, `credit card number`, `social security number`, `health insurance id number`, `date of birth`, `mobile phone number`, `bank account number`, `medication`, `cpf`, `driver's license number`, `tax identification number`, `medical condition`, `identity card number`, `national id number`, `ip address`, `email address`, `iban`, `credit card expiration date`, `username`, `health insurance number`, `registration number`, `student id number`, `insurance number`, `flight number`, `landline phone number`, `blood type`, `cvv`, `reservation number`, `digital signature`, `social media handle`, `license plate number`, `cnpj`, `postal code`, `passport number`, `serial number`, `vehicle registration number`, `credit card brand`, `fax number`, `visa number`, `insurance company`, `identity document number`, `transaction number`, `national health insurance number`, `cvc`, `birth certificate number`, `train ticket number`, `passport expiration date`, and `social security number`.
25
+
26
+
27
+ ## English example
28
+
29
+ ```python
30
+ text = """
31
+ Medical Record
32
+
33
+ Patient Name: John Doe
34
+ Date of Birth: 15-01-1985
35
+ Date of Examination: 20-05-2024
36
+ Social Security Number: 123-45-6789
37
+
38
+ Examination Procedure:
39
+ John Doe underwent a routine physical examination. The procedure included measuring vital signs (blood pressure, heart rate, temperature), a comprehensive blood panel, and a cardiovascular stress test. The patient also reported occasional headaches and dizziness, prompting a neurological assessment and an MRI scan to rule out any underlying issues.
40
+
41
+ Medication Prescribed:
42
+
43
+ Ibuprofen 200 mg: Take one tablet every 6-8 hours as needed for headache and pain relief.
44
+ Lisinopril 10 mg: Take one tablet daily to manage high blood pressure.
45
+ Next Examination Date:
46
+ 15-11-2024
47
+ """
48
+
49
+ # Labels for entity prediction
50
+ labels = ["name", "social security number", "date of birth", "date"]
51
+
52
+ # Perform entity prediction
53
+ entities = trained_model.predict_entities(text, labels, threshold=0.5)
54
+
55
+ # Display predicted entities and their labels
56
+ for entity in entities:
57
+ print(entity["text"], "=>", entity["label"])
58
+ ```
59
+
60
+ ```text
61
+ John Doe => name
62
+ 15-01-1985 => date of birth
63
+ 20-05-2024 => date
64
+ 123-45-6789 => social security number
65
+ John Doe => name
66
+ 15-11-2024 => date
67
+ ```
68
+
69
+ ## Dutch example
70
+
71
+ ```python
72
+ text = """
73
+ Medisch dossier
74
+
75
+ Naam patiënt: Jan de Vries
76
+ Geboortedatum: 15-01-1985
77
+ Datum van onderzoek: 20-05-2024
78
+ Burgerservicenummer: 987-65-4321
79
+
80
+ Onderzoeksprocedure:
81
+ Jan de Vries onderging een routine lichamelijk onderzoek. De procedure omvatte het meten van de vitale functies (bloeddruk, hartslag, temperatuur), een uitgebreid bloedonderzoek en een cardiovasculaire inspanningstest. De patiënt meldde ook af en toe hoofdpijn en duizeligheid, wat aanleiding gaf tot een neurologische beoordeling en een MRI-scan om eventuele onderliggende problemen uit te sluiten.
82
+
83
+ Voorgeschreven medicatie:
84
+
85
+ Paracetamol 500 mg: Neem één tablet elke 6-8 uur indien nodig voor hoofdpijn en pijnverlichting.
86
+ Amlodipine 5 mg: Neem één tablet dagelijks om hoge bloeddruk te beheersen.
87
+
88
+ Volgende onderzoekdatum:
89
+ 15-11-2024
90
+ """
91
+
92
+ # Labels for entity prediction
93
+ labels = ["naam", "bmurgerservicenummer", "geboortedatum", "datum"] # for v2.1 use capital case for better performance
94
+
95
+ # Perform entity prediction
96
+ entities = trained_model.predict_entities(text, labels, threshold=0.2)
97
+
98
+ # Display predicted entities and their labels
99
+ for entity in entities:
100
+ print(entity["text"], "=>", entity["label"])
101
+ ```
102
+
103
+ ```text
104
+ Jan de Vries => naam
105
+ 15-01-1985 => geboortedatum
106
+ 20-05-2024 => datum
107
+ 987-65-4321 => bmurgerservicenummer
108
+ Jan de Vries => naam
109
+ 15-11-2024 => datum
110
+ ```