|
--- |
|
license: mit |
|
--- |
|
|
|
japanese-sexual-moderation is a fine-tuned version of [studio-ousia/luke-japanese-large-lite](https://huggingface.co/studio-ousia/luke-japanese-large-lite). |
|
It scores whether a short sentence is sexual or not. |
|
The version as of 20230/9/17 was trained on a limited number of data, and scoring trends may have biases due to the data set. |
|
This model was created to calculate ERP scores for [japanese-llm-roleplay-benchmark](https://github.com/oshizo/japanese-llm-roleplay-benchmark). |
|
|
|
|
|
japanese-sexual-moderationは、[studio-ousia/luke-japanese-large-lite](https://huggingface.co/studio-ousia/luke-japanese-large-lite)をファインチューニングしたモデルです。 |
|
短文が性的かどうかをスコアリングします。 |
|
20230/9/17時点のバージョンは限られたデータ数で訓練されており、スコアリングの傾向にはデータセットに起因するバイアスがある可能性があります。 |
|
このモデルは[japanese-llm-roleplay-benchmark](https://github.com/oshizo/japanese-llm-roleplay-benchmark)でのERPスコアを算出するために作成されました。 |
|
|
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
import numpy as np |
|
|
|
model_id = "oshizo/japanese-sexual-moderation" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForSequenceClassification.from_pretrained( |
|
model_id, |
|
problem_type="multi_label_classification", |
|
num_labels=1 |
|
) |
|
|
|
text = "富士山は日本で一番高い山です。" |
|
with torch.no_grad(): |
|
encoding = tokenizer(text, return_tensors="pt") |
|
score = model(**encoding).logits |
|
|
|
# tensor([[-2.7863]]) |
|
|
|
``` |
|
|
|
|