SentenceTransformer based on x2bee/KoModernBERT-base-mlm_v02

This is a sentence-transformers model finetuned from x2bee/KoModernBERT-base-mlm_v02 on the korean_nli_dataset dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("x2bee/KoModernBERT_SBERT_compare_mlmlv5")
# Run inference
sentences = [
    '한 여자와 소년이 경찰 오토바이에 앉아 있다.',
    '여자와 소년이 밖에 있다.',
    '한 남자가 물 위에 밧줄을 매고 있다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.6374
spearman_cosine 0.6328
pearson_euclidean 0.6327
spearman_euclidean 0.6122
pearson_manhattan 0.6346
spearman_manhattan 0.6154
pearson_dot 0.5941
spearman_dot 0.5742
pearson_max 0.6374
spearman_max 0.6328

Training Details

Training Dataset

korean_nli_dataset

  • Dataset: korean_nli_dataset at 51cc968
  • Size: 550,152 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 8 tokens
    • mean: 21.76 tokens
    • max: 76 tokens
    • min: 4 tokens
    • mean: 14.36 tokens
    • max: 44 tokens
    • min: 0.0
    • mean: 0.49
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    몸에 맞지 않는 노란색 셔츠와 파란색 플래드 스커트를 입은 나이든 여성이 두 개의 통 옆에 앉아 있다. 여자가 역기를 들어올리고 있다. 0.0
    갈색 코트를 입은 선글라스를 쓴 한 남성이 담배를 피우며 손님들이 길거리 스탠드에서 물건을 구입하자 코를 긁는다. 갈색 코트를 입은 선글라스를 쓴 청년이 담배를 피우며 손님들이 스테이트 스탠드에서 구매하고 있을 때 코를 긁는다. 0.5
    소녀들은 물을 뿌리며 놀면서 킥킥 웃는다. 수도 본관이 고장나서 큰길이 범람했다. 0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

korean_nli_dataset

  • Dataset: korean_nli_dataset at 51cc968
  • Size: 550,152 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 21.88 tokens
    • max: 76 tokens
    • min: 5 tokens
    • mean: 14.14 tokens
    • max: 38 tokens
    • min: 0.0
    • mean: 0.52
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    한 역사학자와 그의 친구는 연구를 위해 더 많은 화석을 찾기 위해 광산을 파고 있다. 역사가는 공부를 위해 친구와 함께 땅을 파고 있다. 0.5
    소년은 회전목마에 도움을 받는다. 소년이 당나귀를 타고 있다. 0.0
    세탁실에서 사색적인 포즈를 취하고 있는 남자. 한 남자가 파티오 밖에 있다. 0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 2
  • learning_rate: 1e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.3
  • push_to_hub: True
  • hub_model_id: x2bee/KoModernBERT_SBERT_compare_mlmlv5
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: x2bee/KoModernBERT_SBERT_compare_mlmlv5
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss sts_dev_spearman_max
0 0 - - 0.3994
0.0980 100 0.3216 - -
0.1960 200 0.2019 - -
0.2940 300 0.1451 - -
0.3920 400 0.1327 - -
0.4900 500 0.1231 - -
0.5879 600 0.1138 - -
0.6859 700 0.1091 - -
0.7839 800 0.106 - -
0.8819 900 0.1047 - -
0.9799 1000 0.1029 - -
1.0 1021 - 0.1003 0.6352
1.0774 1100 0.0999 - -
1.1754 1200 0.0994 - -
1.2734 1300 0.0989 - -
1.3714 1400 0.0974 - -
1.4694 1500 0.0975 - -
1.5674 1600 0.0945 - -
1.6654 1700 0.0933 - -
1.7634 1800 0.0922 - -
1.8613 1900 0.0928 - -
1.9593 2000 0.0928 - -
1.9985 2040 - 0.0955 0.6328

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0.dev0
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
6
Safetensors
Model size
186M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for x2bee/KoModernBERT_SBERT_compare_mlmlv5

Finetuned
(1)
this model

Evaluation results