Extract Legal Entities from Insurance Documents using BERT transfomers

This model is a fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.

Dataset is publicly available here https://github.com/aws-samples/aws-legal-entity-extraction.git

The model extracts the following entities:

  • Law Firm
  • Law Office Address
  • Insurance Company
  • Insurance Company Address
  • Policy Holder Name
  • Beneficiary Name
  • Policy Number
  • Payout
  • Required Action
  • Sender

HF Space

https://huggingface.co/spaces/aimlnerd/legal-entity-ner-transformers This space expose the model as gradio app and contains, training dataset and code for training.

Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.

Data preprocessing

The OCRed data is present as JSON here data/raw_data/annotations. I wrote this code to convert the JSON data in format suitable for HF TokenClassification source/services/ner/awscomprehend_2_ner_format.py

Finetuning BERT Transformers model

source/services/ner/train/train.py This code fine tune the BERT model and uploads to huggingface

Downloads last month
23
Safetensors
Model size
108M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using aimlnerd/bert-finetuned-legalentity-ner-accelerate 1