File size: 1,276 Bytes
67ae663 842e4a2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
---
license: apache-2.0
---
# Extract Legal Entities from Insurance Documents using BERT transfomers
This model is a fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.
Dataset is publicly available here
https://github.com/aws-samples/aws-legal-entity-extraction.git
The model extracts the following entities:
* Law Firm
* Law Office Address
* Insurance Company
* Insurance Company Address
* Policy Holder Name
* Beneficiary Name
* Policy Number
* Payout
* Required Action
* Sender
## HF Space
https://huggingface.co/spaces/aimlnerd/legal-entity-ner-transformers
This space expose the model as gradio app and contains, training dataset and code for training.
Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.
## Data preprocessing
The OCRed data is present as JSON here ```data/raw_data/annotations```.
I wrote this code to convert the JSON data in format suitable for HF TokenClassification
```source/services/ner/awscomprehend_2_ner_format.py```
## Finetuning BERT Transformers model
```source/services/ner/train/train.py```
This code fine tune the BERT model and uploads to huggingface
|