Create README.md

Browse files

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+language:
+- en
+---
+# Text Classification Toxicity
+This model is a fined-tuned version of [nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Large) on the on the [Jigsaw 1st Kaggle competition](https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge) dataset using [unitary/toxic-bert](https://huggingface.co/unitary/toxic-bert) as teacher model.
+The quantized version in ONNX format can be found [here](https://huggingface.co/minuva/MiniLMv2-toxic-jigaw-lite-onnx).
+The model contains two labels only (toxicity and severe toxicity). For the model with all labels refer to this [page]()
+# Load the Model
+```py
+from transformers import pipeline
+pipe = pipeline(model='minuva/MiniLMv2-toxic-jijgsaw-lite', task='text-classification')
+pipe("This is pure trash")
+# [{'label': 'toxic', 'score': 0.9383478164672852}]
+```
+# Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 6e-05
+- train_batch_size: 48
+- eval_batch_size: 48
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 10
+- warmup_ratio: 0.1
+# Metrics (comparison with teacher model)
+| Teacher (params)    |   Student (params)     | Set  (metric)     | Score (teacher)    |    Score (student)      |
+|--------------------|-------------|----------|--------| --------|
+| unitary/toxic-bert (110M) |  MiniLMv2-toxic-jijgsaw-lite (23M)  | Test (ROC_AUC)  | 0.982677 | 0.9815 |
+# Deployment
+Check this [repository](https://github.com/minuva/toxicity-prediction-serverless) to see how to easily deploy this model in a serverless environment with fast CPU inference and light resource utilization.