File size: 3,377 Bytes
3bc4ffd
a8a5ccc
0a6e134
 
 
 
 
 
a8a5ccc
 
 
 
6f4cc65
 
 
 
e8854c0
6f4cc65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19d99a5
8ced5e9
6f4cc65
 
 
8ced5e9
6f4cc65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a8a5ccc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: mit
language:
- uk
metrics:
- f1
pipeline_tag: token-classification
widget:
- text: Як пукнути та не вкакатися - розказує лікар вищого ступення
  example_title: Приклад 1
- text: Що таке дихання маткою? - розказує лікар гінеколог
  example_title: Приклад 2
---

# Instruction Detection Model

Welcome to the repository for the instruction detection model! This model, hosted on the [Hugging Face Model Hub](https://huggingface.co/models), is designed specifically for detecting instructions in newspaper titles. **Ukranian Language Only**. Utilizing token classification, it scans through the given input - a newspaper title, and labels tokens that appear to signify an instruction.

Model Card: [zeusfsx/instruction-detection](https://huggingface.co/zeusfsx/instruction-detection)

## Table of Contents

- [Introduction](#introduction)
- [Usage](#usage)
- [Training](#training)
- [Evaluation](#evaluation)
- [License](#license)
- [Citation](#citation)
- [Contact Information](#contact-information)

## Introduction

In the age of information, newspaper titles are often crafted to attract attention and occasionally incorporate direct or indirect instructions. This model can help analyze these titles, detect such instructions, and tag them accordingly.

It employs token classification task, a common technique in Natural Language Processing (NLP), to detect and label instructions in the text.

## Usage

Here's how to use this model:

### In Python

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("zeusfsx/instruction-detection")
model = AutoModelForTokenClassification.from_pretrained("zeusfsx/instruction-detection")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Your example newspaper title here"

output = nlp(example)
print(output)
```

This will return a list of recognized tokens marked with label 'INSTRUCTION'. 

## Training

It's based on the transformer architecture and specifically uses the [xlm-roberta-base-uk](https://huggingface.co/ukr-models/xlm-roberta-base-uk) model from `ukr-models`, fine-tuned for the token classification task. The training data was carefully chosen to include a balanced distribution of titles containing instructions and those not containing instructions.
The dataset contains newspaper titles (~6k titles), with tokens representing instructions manually labeled.

## Evaluation

Model performance was evaluated using a held-out test set, again consisting of manually labeled newspaper titles. F1 - 0.9601, ACCURACY - 0.9968 for the 'INSTRUCTION' label

## License

This project is licensed under the terms of the MIT license.

## Citation

If you use our model or this repository in your research, please cite it as follows:

```
@misc{instruction-detection,
  author = {Oleksandr Korovii},
  title = {Instruction Detection Model},
  year = {2023},
  publisher = {HuggingFace Model Hub},
  url = {https://huggingface.co/zeusfsx/instruction-detection},
  note = {Accessed: 2023-06-20}
}

```

## Contact Information

For any questions or suggestions, feel free to open an issue in this repository.
Contributions to improve this model or the associated documentation are welcome!