SetFit with sentence-transformers/paraphrase-mpnet-base-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
RequestMoveToFloor
  • 'Please go to the 3rd floor.'
  • 'Can you take me to floor 5?'
  • 'I need to go to the 8th floor.'
RequestMoveToFloorByX
  • 'Go one floor up'
  • 'Take me up two floors'
  • 'Move me down one level'
Confirm
  • "Yes, that's right."
  • 'Sure.'
  • 'Exactly.'
RequestEmployeeLocation
  • 'Where is Erik Velldal’s office?'
  • 'Which floor is Andreas Austeng on?'
  • 'Can you tell me where Birthe Soppe’s office is?'
CurrentFloor
  • 'Which floor are we on?'
  • 'What floor is this?'
  • 'Are we on the 5th floor?'
Stop
  • 'Stop the elevator.'
  • "Wait, don't go to that floor."
  • 'No, not that floor.'
OutOfCoverage
  • "What's the capital of France?"
  • 'How many floors does this building have?'
  • 'Can you make a phone call for me?'

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("victomoe/setfit-intent-classifier-2")
# Run inference
preds = model("Absolutely.")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 5.1533 9
Label Training Sample Count
Confirm 22
CurrentFloor 21
OutOfCoverage 22
RequestEmployeeLocation 22
RequestMoveToFloor 23
RequestMoveToFloorByX 20
Stop 20

Training Hyperparameters

  • batch_size: (32, 32)
  • num_epochs: (10, 10)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0017 1 0.1415 -
0.0829 50 0.1863 -
0.1658 100 0.1559 -
0.2488 150 0.0966 -
0.3317 200 0.0363 -
0.4146 250 0.009 -
0.4975 300 0.0035 -
0.5804 350 0.0024 -
0.6633 400 0.0017 -
0.7463 450 0.0015 -
0.8292 500 0.0011 -
0.9121 550 0.0009 -
0.9950 600 0.0008 -
1.0779 650 0.0007 -
1.1609 700 0.0006 -
1.2438 750 0.0005 -
1.3267 800 0.0005 -
1.4096 850 0.0005 -
1.4925 900 0.0007 -
1.5755 950 0.0004 -
1.6584 1000 0.0004 -
1.7413 1050 0.0004 -
1.8242 1100 0.0004 -
1.9071 1150 0.0003 -
1.9900 1200 0.0003 -
2.0730 1250 0.0003 -
2.1559 1300 0.0003 -
2.2388 1350 0.0003 -
2.3217 1400 0.0003 -
2.4046 1450 0.0003 -
2.4876 1500 0.0003 -
2.5705 1550 0.0002 -
2.6534 1600 0.0002 -
2.7363 1650 0.0004 -
2.8192 1700 0.0002 -
2.9022 1750 0.0002 -
2.9851 1800 0.0002 -
3.0680 1850 0.0002 -
3.1509 1900 0.0002 -
3.2338 1950 0.0002 -
3.3167 2000 0.0002 -
3.3997 2050 0.0002 -
3.4826 2100 0.0002 -
3.5655 2150 0.0002 -
3.6484 2200 0.0002 -
3.7313 2250 0.0002 -
3.8143 2300 0.0002 -
3.8972 2350 0.0002 -
3.9801 2400 0.0002 -
4.0630 2450 0.0002 -
4.1459 2500 0.0002 -
4.2289 2550 0.0002 -
4.3118 2600 0.0002 -
4.3947 2650 0.0002 -
4.4776 2700 0.0002 -
4.5605 2750 0.0002 -
4.6434 2800 0.0001 -
4.7264 2850 0.0001 -
4.8093 2900 0.0001 -
4.8922 2950 0.0001 -
4.9751 3000 0.0001 -
5.0580 3050 0.0001 -
5.1410 3100 0.0001 -
5.2239 3150 0.0001 -
5.3068 3200 0.0001 -
5.3897 3250 0.0001 -
5.4726 3300 0.0001 -
5.5556 3350 0.0003 -
5.6385 3400 0.0004 -
5.7214 3450 0.0001 -
5.8043 3500 0.0001 -
5.8872 3550 0.0001 -
5.9701 3600 0.0001 -
6.0531 3650 0.0001 -
6.1360 3700 0.0001 -
6.2189 3750 0.0001 -
6.3018 3800 0.0001 -
6.3847 3850 0.0001 -
6.4677 3900 0.0001 -
6.5506 3950 0.0001 -
6.6335 4000 0.0001 -
6.7164 4050 0.0001 -
6.7993 4100 0.0001 -
6.8823 4150 0.0001 -
6.9652 4200 0.0001 -
7.0481 4250 0.0001 -
7.1310 4300 0.0001 -
7.2139 4350 0.0001 -
7.2968 4400 0.0001 -
7.3798 4450 0.0001 -
7.4627 4500 0.0001 -
7.5456 4550 0.0001 -
7.6285 4600 0.0001 -
7.7114 4650 0.0001 -
7.7944 4700 0.0001 -
7.8773 4750 0.0001 -
7.9602 4800 0.0001 -
8.0431 4850 0.0001 -
8.1260 4900 0.0001 -
8.2090 4950 0.0001 -
8.2919 5000 0.0001 -
8.3748 5050 0.0001 -
8.4577 5100 0.0001 -
8.5406 5150 0.0001 -
8.6235 5200 0.0001 -
8.7065 5250 0.0001 -
8.7894 5300 0.0001 -
8.8723 5350 0.0001 -
8.9552 5400 0.0001 -
9.0381 5450 0.0001 -
9.1211 5500 0.0001 -
9.2040 5550 0.0001 -
9.2869 5600 0.0001 -
9.3698 5650 0.0001 -
9.4527 5700 0.0001 -
9.5357 5750 0.0001 -
9.6186 5800 0.0001 -
9.7015 5850 0.0001 -
9.7844 5900 0.0001 -
9.8673 5950 0.0001 -
9.9502 6000 0.0001 -

Framework Versions

  • Python: 3.10.8
  • SetFit: 1.1.0
  • Sentence Transformers: 3.1.1
  • Transformers: 4.38.2
  • PyTorch: 2.1.2
  • Datasets: 2.17.1
  • Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
2
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for victomoe/setfit-intent-classifier-2

Finetuned
(260)
this model