SentenceTransformer based on x2bee/ModernBert_MLM_kotoken_v02
This is a sentence-transformers model finetuned from x2bee/ModernBert_MLM_kotoken_v02 on the korean_nli_dataset dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: x2bee/ModernBert_MLM_kotoken_v02
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': True, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("x2bee/KoModernBERT_SBERT_compare_mlmlv2")
# Run inference
sentences = [
'한 여자와 소년이 경찰 오토바이에 앉아 있다.',
'여자와 소년이 밖에 있다.',
'한 남자가 물 위에 밧줄을 매고 있다.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Dataset:
sts_dev
- Evaluated with
EmbeddingSimilarityEvaluator
Metric | Value |
---|---|
pearson_cosine | 0.6423 |
spearman_cosine | 0.6388 |
pearson_euclidean | 0.6317 |
spearman_euclidean | 0.6107 |
pearson_manhattan | 0.6325 |
spearman_manhattan | 0.6126 |
pearson_dot | 0.5865 |
spearman_dot | 0.5676 |
pearson_max | 0.6423 |
spearman_max | 0.6388 |
Training Details
Training Dataset
korean_nli_dataset
- Dataset: korean_nli_dataset at 51cc968
- Size: 550,152 training samples
- Columns:
sentence1
,sentence2
, andscore
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 8 tokens
- mean: 21.76 tokens
- max: 76 tokens
- min: 4 tokens
- mean: 14.36 tokens
- max: 44 tokens
- min: 0.0
- mean: 0.49
- max: 1.0
- Samples:
sentence1 sentence2 score 몸에 맞지 않는 노란색 셔츠와 파란색 플래드 스커트를 입은 나이든 여성이 두 개의 통 옆에 앉아 있다.
여자가 역기를 들어올리고 있다.
0.0
갈색 코트를 입은 선글라스를 쓴 한 남성이 담배를 피우며 손님들이 길거리 스탠드에서 물건을 구입하자 코를 긁는다.
갈색 코트를 입은 선글라스를 쓴 청년이 담배를 피우며 손님들이 스테이트 스탠드에서 구매하고 있을 때 코를 긁는다.
0.5
소녀들은 물을 뿌리며 놀면서 킥킥 웃는다.
수도 본관이 고장나서 큰길이 범람했다.
0.0
- Loss:
CosineSimilarityLoss
with these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Evaluation Dataset
korean_nli_dataset
- Dataset: korean_nli_dataset at 51cc968
- Size: 550,152 evaluation samples
- Columns:
sentence1
,sentence2
, andscore
- Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score type string string float details - min: 4 tokens
- mean: 21.88 tokens
- max: 76 tokens
- min: 5 tokens
- mean: 14.14 tokens
- max: 38 tokens
- min: 0.0
- mean: 0.52
- max: 1.0
- Samples:
sentence1 sentence2 score 한 역사학자와 그의 친구는 연구를 위해 더 많은 화석을 찾기 위해 광산을 파고 있다.
역사가는 공부를 위해 친구와 함께 땅을 파고 있다.
0.5
소년은 회전목마에 도움을 받는다.
소년이 당나귀를 타고 있다.
0.0
세탁실에서 사색적인 포즈를 취하고 있는 남자.
한 남자가 파티오 밖에 있다.
0.0
- Loss:
CosineSimilarityLoss
with these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 32per_device_eval_batch_size
: 32gradient_accumulation_steps
: 2learning_rate
: 1e-05num_train_epochs
: 2warmup_ratio
: 0.3push_to_hub
: Truehub_model_id
: x2bee/KoModernBERT_SBERT_compare_mlmlv2batch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 2eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 1e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 2max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.3warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Truedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Trueresume_from_checkpoint
: Nonehub_model_id
: x2bee/KoModernBERT_SBERT_compare_mlmlv2hub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | sts_dev_spearman_max |
---|---|---|---|---|
0 | 0 | - | - | 0.4039 |
0.0980 | 100 | 0.3204 | - | - |
0.1960 | 200 | 0.1993 | - | - |
0.2940 | 300 | 0.1451 | - | - |
0.3920 | 400 | 0.1332 | - | - |
0.4900 | 500 | 0.1235 | - | - |
0.5879 | 600 | 0.1153 | - | - |
0.6859 | 700 | 0.1109 | - | - |
0.7839 | 800 | 0.1073 | - | - |
0.8819 | 900 | 0.1065 | - | - |
0.9799 | 1000 | 0.1039 | - | - |
1.0 | 1021 | - | 0.1022 | 0.6360 |
1.0774 | 1100 | 0.1013 | - | - |
1.1754 | 1200 | 0.1009 | - | - |
1.2734 | 1300 | 0.1003 | - | - |
1.3714 | 1400 | 0.0987 | - | - |
1.4694 | 1500 | 0.0985 | - | - |
1.5674 | 1600 | 0.0954 | - | - |
1.6654 | 1700 | 0.0942 | - | - |
1.7634 | 1800 | 0.0932 | - | - |
1.8613 | 1900 | 0.094 | - | - |
1.9593 | 2000 | 0.0937 | - | - |
1.9985 | 2040 | - | 0.0972 | 0.6388 |
Framework Versions
- Python: 3.11.10
- Sentence Transformers: 3.3.1
- Transformers: 4.48.0.dev0
- PyTorch: 2.5.1+cu124
- Accelerate: 1.2.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for x2bee/KoModernBERT_SBERT_compare_mlmlv2
Base model
x2bee/ModernBert_MLM_kotoken_v02Evaluation results
- Pearson Cosine on sts devself-reported0.642
- Spearman Cosine on sts devself-reported0.639
- Pearson Euclidean on sts devself-reported0.632
- Spearman Euclidean on sts devself-reported0.611
- Pearson Manhattan on sts devself-reported0.633
- Spearman Manhattan on sts devself-reported0.613
- Pearson Dot on sts devself-reported0.586
- Spearman Dot on sts devself-reported0.568
- Pearson Max on sts devself-reported0.642
- Spearman Max on sts devself-reported0.639