license: apache-2.0
Extract Legal Entities from Insurance Documents using BERT transfomers
This model is a fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.
Dataset is publicly available here https://github.com/aws-samples/aws-legal-entity-extraction.git
The model extracts the following entities:
- Law Firm
- Law Office Address
- Insurance Company
- Insurance Company Address
- Policy Holder Name
- Beneficiary Name
- Policy Number
- Payout
- Required Action
- Sender
HF Space
https://huggingface.co/spaces/aimlnerd/legal-entity-ner-transformers This space expose the model as gradio app and contains, training dataset and code for training.
Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.
Data preprocessing
The OCRed data is present as JSON here data/raw_data/annotations
.
I wrote this code to convert the JSON data in format suitable for HF TokenClassification
source/services/ner/awscomprehend_2_ner_format.py
Finetuning BERT Transformers model
source/services/ner/train/train.py
This code fine tune the BERT model and uploads to huggingface