SetFit with mini1013/master_domain

This is a SetFit model that can be used for Text Classification. This SetFit model uses mini1013/master_domain as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
1.0
  • '프리스타일 리브레 무채혈 연속혈당측정기(24년1월)얼라이브패치1매 거래명세서 광명헬스케어'
  • 'SD 코드프리 혈당측정기(측정기+채혈기+침10매+파우치)P 스토어알파'
  • '올메디쿠스 글루코닥터 탑 혈당계 AGM-4100+파우치+채혈기+채혈침 10개 엠에스메디칼'
2.0
  • '에스디 SD 코드프리 측정지
0.0
  • '비디 울트라파인 인슐린 주사기 1박스 100입 324901 [31G 6mm 0.5ml] BD 펜니들 주사바늘 울트라파인2 BD 인슐린 31G 8mm 3/10ml(0.5단위) 1박스(320440) 더메디칼샵'
  • 'BD 비디 울트라파인 인슐린 주사기 시린지 31G 6mm 1ml 324903 100입 주식회사 더에스지엠'
  • '정림 멸균 일회용 주사기 3cc 23g 25mm 100개입 멸균주사기 10cc 18G 38mm(100ea/pck) (주)케이디상사'

Evaluation

Metrics

Label Metric
all 0.9787

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("mini1013/master_cate_lh7")
# Run inference
preds = model("녹십자 혈당시험지 당뇨 시험지 그린닥터 50매 시험지100매+체혈침100개 자재스토어")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 4 9.62 21
Label Training Sample Count
0.0 50
1.0 50
2.0 50

Training Hyperparameters

  • batch_size: (512, 512)
  • num_epochs: (20, 20)
  • max_steps: -1
  • sampling_strategy: oversampling
  • num_iterations: 40
  • body_learning_rate: (2e-05, 2e-05)
  • head_learning_rate: 2e-05
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0417 1 0.4565 -
2.0833 50 0.1836 -
4.1667 100 0.1645 -
6.25 150 0.0004 -
8.3333 200 0.0001 -
10.4167 250 0.0001 -
12.5 300 0.0 -
14.5833 350 0.0 -
16.6667 400 0.0 -
18.75 450 0.0 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.1.0.dev0
  • Sentence Transformers: 3.1.1
  • Transformers: 4.46.1
  • PyTorch: 2.4.0+cu121
  • Datasets: 2.20.0
  • Tokenizers: 0.20.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
123
Safetensors
Model size
111M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mini1013/master_cate_lh7

Base model

klue/roberta-base
Finetuned
(136)
this model

Evaluation results