aimlnerd's picture
Update README.md
842e4a2
metadata
license: apache-2.0

Extract Legal Entities from Insurance Documents using BERT transfomers

This model is a fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.

Dataset is publicly available here https://github.com/aws-samples/aws-legal-entity-extraction.git

The model extracts the following entities:

  • Law Firm
  • Law Office Address
  • Insurance Company
  • Insurance Company Address
  • Policy Holder Name
  • Beneficiary Name
  • Policy Number
  • Payout
  • Required Action
  • Sender

HF Space

https://huggingface.co/spaces/aimlnerd/legal-entity-ner-transformers This space expose the model as gradio app and contains, training dataset and code for training.

Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.

Data preprocessing

The OCRed data is present as JSON here data/raw_data/annotations. I wrote this code to convert the JSON data in format suitable for HF TokenClassification source/services/ner/awscomprehend_2_ner_format.py

Finetuning BERT Transformers model

source/services/ner/train/train.py This code fine tune the BERT model and uploads to huggingface