Model Card for robeczech-binary-online-risks-cs

This model is fine-tuned for binary text classification of Online Risks in Instant Messenger dialogs of Adolescents in Czech.

Model Description

The model was fine-tuned on a dataset of Czech Instant Messenger dialogs of Adolescents. The classification is binary and the model outputs probablities for labels {0,1}: Online Risks present or not.

  • Developed by: Anonymous
  • Language(s): cs
  • Finetuned from: ufal/robeczech-base

Model Sources

Usage

Here is how to use this model to classify a context-window of a dialogue:

import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Prepare input texts. This model is fine-tuned for Czech
test_texts = ['Utterance1;Utterance2;Utterance3']

# Load the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    'justtherightsize/robeczech-binary-online-risks-cs', num_labels=2).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(
    'justtherightsize/robeczech-binary-online-risks-cs',
    use_fast=False, truncation_side='left')
assert tokenizer.truncation_side == 'left'

# Define helper functions
def get_probs(text, tokenizer, model):
    inputs = tokenizer(text, padding=True, truncation=True, max_length=256,
                       return_tensors="pt").to("cuda")
    outputs = model(**inputs)
    return outputs[0].softmax(1)

def preds2class(probs, threshold=0.5):
    pclasses = np.zeros(probs.shape)
    pclasses[np.where(probs >= threshold)] = 1
    return pclasses.argmax(-1)

def print_predictions(texts):
    probabilities = [get_probs(
        texts[i], tokenizer, model).cpu().detach().numpy()[0]
                     for i in range(len(texts))]
    predicted_classes = preds2class(np.array(probabilities))
    for c, p in zip(predicted_classes, probabilities):
        print(f'{c}: {p}')

# Run the prediction
print_predictions(test_texts)
Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.