File size: 1,276 Bytes
67ae663
 
 
842e4a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
license: apache-2.0
---

# Extract Legal Entities from Insurance Documents using BERT transfomers

This model is a fine tuned BERT transfomers for NER of legal entities in Life Insurance demand letters.

Dataset is publicly available here
https://github.com/aws-samples/aws-legal-entity-extraction.git

The model extracts the following entities:

* Law Firm
* Law Office Address
* Insurance Company
* Insurance Company Address
* Policy Holder Name
* Beneficiary Name
* Policy Number
* Payout
* Required Action
* Sender

## HF Space
https://huggingface.co/spaces/aimlnerd/legal-entity-ner-transformers
This space expose the model as gradio app and contains, training dataset and code for training.

Dataset consists of legal requisition/demand letters for Life Insurance, however this approach can be used across any industry & document which may benefit from spatial data in NER training.

## Data preprocessing
The OCRed data is present as JSON here ```data/raw_data/annotations```.
I wrote this code to convert the JSON data in format suitable for HF TokenClassification 
```source/services/ner/awscomprehend_2_ner_format.py```

## Finetuning BERT Transformers model
```source/services/ner/train/train.py```
This code fine tune the BERT model and uploads to huggingface