King-Harry commited on
Commit
b2116cb
Β·
verified Β·
1 Parent(s): 2aaeaca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -4,7 +4,7 @@ Ninja Masker 2
4
 
5
  # Model Card: Ninja-Masker-2-PII-Redaction
6
 
7
- ## Model Overview
8
 
9
  **Model Name:** Ninja-Masker-2-PII-Redaction
10
  **Model Type:** Language Model for PII Redaction
@@ -13,19 +13,19 @@ Ninja Masker 2
13
 
14
  **Model Repository:** [Hugging Face Hub - Ninja-Masker-2-PII-Redaction](https://huggingface.co/King-Harry/Ninja-Masker-2-PII-Redaction)
15
 
16
- ### Model Description
17
 
18
  Ninja-Masker-2-PII-Redaction is an updated fine-tuned language model designed to identify and redact Personally Identifiable Information (PII) from text data. The model is based on the Meta-Llama-3.1-8B architecture and has been fine-tuned on a dataset of over 30,000 input-output pairs to perform accurate PII masking using a set of predefined tags. It is nice and small and thus fairly cost efficient, yet powerful.
19
 
20
- ### Preprocessing
21
 
22
  The training data was formatted using a specific Alpaca-style prompt structure. Each prompt was paired with an instruction and input context, and the model was trained to generate the appropriate redacted output. The model was trained on a variety of PII types, including but not limited to names, email addresses, phone numbers, and credit card information.
23
 
24
- ### Quantization and Optimization
25
 
26
  To optimize performance and reduce memory usage, the model was fine-tuned using 4-bit quantization. Additional optimizations included the use of Flash Attention (Xformers) and gradient checkpointing, which allowed for efficient training and inference.
27
 
28
- ### Training Details
29
 
30
  - **Dataset:** HarryRoy/Ninja-Redact-2-large (Custom PII redaction dataset)
31
  - **Training Environment:** Google Colab, NVIDIA A100 GPU
@@ -38,27 +38,27 @@ To optimize performance and reduce memory usage, the model was fine-tuned using
38
  - Epochs: 1 (500 steps)
39
  - Optimizer: AdamW 8-bit
40
 
41
- ### Model Performance
42
 
43
  The model was evaluated based on its ability to accurately redact PII from text while maintaining the original context and meaning. The fine-tuning process resulted in a model that effectively identifies and replaces PII with the appropriate tags in various text scenarios.
44
 
45
- ### Use Cases
46
 
47
  - **Data Anonymization:** Useful for redacting PII in datasets before sharing or analysis.
48
  - **Email and Document Redaction:** Can be integrated into email processing systems or document management workflows to automatically redact sensitive information.
49
  - **Customer Support:** Enhances customer support systems by ensuring PII is automatically redacted in customer communications.
50
 
51
- ### Limitations
52
 
53
  - **Tag Set:** The model relies on a predefined set of tags for redaction. It may not recognize PII types outside of this set.
54
  - **Context Dependence:** While the model performs well in most scenarios, its accuracy may decrease with highly complex or ambiguous input contexts.
55
  - **Inference Speed:** Depending on the hardware, the model's inference speed may vary, especially for long sequences.
56
 
57
- ### Ethical Considerations
58
 
59
  The model is designed for responsible data management, ensuring that sensitive information is properly anonymized. However, users should be aware of the limitations and should not rely solely on automated redaction for highly sensitive data.
60
 
61
- ### How to Use
62
 
63
  To use this model, you can load it from the Hugging Face Hub and integrate it into your Python or API-based applications. Below is an example of how to load and use the model:
64
 
@@ -119,7 +119,7 @@ print(redacted_text[0])
119
  ```
120
 
121
 
122
- ### Citation
123
 
124
  If you use this model, please consider citing the model repository:
125
 
 
4
 
5
  # Model Card: Ninja-Masker-2-PII-Redaction
6
 
7
+ ## 🧠 Model Overview
8
 
9
  **Model Name:** Ninja-Masker-2-PII-Redaction
10
  **Model Type:** Language Model for PII Redaction
 
13
 
14
  **Model Repository:** [Hugging Face Hub - Ninja-Masker-2-PII-Redaction](https://huggingface.co/King-Harry/Ninja-Masker-2-PII-Redaction)
15
 
16
+ ### πŸ“ Model Description
17
 
18
  Ninja-Masker-2-PII-Redaction is an updated fine-tuned language model designed to identify and redact Personally Identifiable Information (PII) from text data. The model is based on the Meta-Llama-3.1-8B architecture and has been fine-tuned on a dataset of over 30,000 input-output pairs to perform accurate PII masking using a set of predefined tags. It is nice and small and thus fairly cost efficient, yet powerful.
19
 
20
+ ### πŸ› οΈ Preprocessing
21
 
22
  The training data was formatted using a specific Alpaca-style prompt structure. Each prompt was paired with an instruction and input context, and the model was trained to generate the appropriate redacted output. The model was trained on a variety of PII types, including but not limited to names, email addresses, phone numbers, and credit card information.
23
 
24
+ ### βš™οΈ Quantization and Optimization
25
 
26
  To optimize performance and reduce memory usage, the model was fine-tuned using 4-bit quantization. Additional optimizations included the use of Flash Attention (Xformers) and gradient checkpointing, which allowed for efficient training and inference.
27
 
28
+ ### πŸ“‰ Training Details
29
 
30
  - **Dataset:** HarryRoy/Ninja-Redact-2-large (Custom PII redaction dataset)
31
  - **Training Environment:** Google Colab, NVIDIA A100 GPU
 
38
  - Epochs: 1 (500 steps)
39
  - Optimizer: AdamW 8-bit
40
 
41
+ ### πŸš€ Model Performance
42
 
43
  The model was evaluated based on its ability to accurately redact PII from text while maintaining the original context and meaning. The fine-tuning process resulted in a model that effectively identifies and replaces PII with the appropriate tags in various text scenarios.
44
 
45
+ ### πŸ’‘ Use Cases
46
 
47
  - **Data Anonymization:** Useful for redacting PII in datasets before sharing or analysis.
48
  - **Email and Document Redaction:** Can be integrated into email processing systems or document management workflows to automatically redact sensitive information.
49
  - **Customer Support:** Enhances customer support systems by ensuring PII is automatically redacted in customer communications.
50
 
51
+ ### ⚠️ Limitations
52
 
53
  - **Tag Set:** The model relies on a predefined set of tags for redaction. It may not recognize PII types outside of this set.
54
  - **Context Dependence:** While the model performs well in most scenarios, its accuracy may decrease with highly complex or ambiguous input contexts.
55
  - **Inference Speed:** Depending on the hardware, the model's inference speed may vary, especially for long sequences.
56
 
57
+ ### βš–οΈ Ethical Considerations
58
 
59
  The model is designed for responsible data management, ensuring that sensitive information is properly anonymized. However, users should be aware of the limitations and should not rely solely on automated redaction for highly sensitive data.
60
 
61
+ ### πŸ“– How to Use
62
 
63
  To use this model, you can load it from the Hugging Face Hub and integrate it into your Python or API-based applications. Below is an example of how to load and use the model:
64
 
 
119
  ```
120
 
121
 
122
+ ### πŸ“„ Citation
123
 
124
  If you use this model, please consider citing the model repository:
125