Overview

This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.


Features

  • Improved Precision: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.

  • Model Versions:

  • Maximum Accuracy Focus: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.

  • Maximum Precision Focus: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.


Installation

To run this model, you will need to install the dependencies:

pip install torch transformers safetensors

Usage

Load and run the model using PyTorch and transformers:

from transformers import AutoModelForTokenClassification, AutoConfig, BertTokenizerFast
from safetensors.torch import load_file

# Load the config
config = AutoConfig.from_pretrained("folder_to_model")

# Initialize the model with the config
model = AutoModelForTokenClassification.from_config(config)

# Load the safetensors weights
state_dict = load_file("folder_to_tensors")

# Load the state dict into the model
model.load_state_dict(state_dict)

# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")

# Load the label mapper if needed
with open("pii_model/label_mapper.json", 'r') as f:
    label_mapper_data = json.load(f)

label_mapper = LabelMapper()
label_mapper.label_to_id = label_mapper_data['label_to_id']
label_mapper.id_to_label = {int(k): v for k, v in label_mapper_data['id_to_label'].items()}
label_mapper.num_labels = label_mapper_data['num_labels']

# Process outputs for analysis...

Evaluation

  • Accuracy Model: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
  • Precision Model: Designed to minimize false positives, optimizing for precision-driven applications.

Disclaimer

The publisher of this repository is not affiliated with Ai4Privacy and Ai Suisse SA

Honorary Mention

This repo created during the Hackaton organized by NeuralWave

Downloads last month
13
Safetensors
Model size
278M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hyacinthum/Piidgeon-ai4privacy

Dataset used to train hyacinthum/Piidgeon-ai4privacy