File size: 3,933 Bytes
d01e73c
 
0cda9dc
 
 
 
 
 
 
 
 
 
d01e73c
0cda9dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9daac66
0cda9dc
 
 
 
 
 
 
ee26528
0cda9dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ee26528
0cda9dc
 
 
 
 
 
 
 
ee26528
0cda9dc
ee26528
9daac66
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: mit
language:
- vi
metrics:
- accuracy
library_name: transformers
tags:
- poem
- vietnamese
- classification
- evaluation
---
# Vietnamese poem classification and evaluation 📜🔍

A Vietnamese poem classifer using [BertForSequenceClassification](https://huggingface.co/trituenhantaoio/bert-base-vietnamese-uncased) with the accuracy of ```99.7%```

This is a side project during the making of our [Vietnamese poem generator](https://github.com/Anshler/poem_generator)

## Features

* Classify Vietnamese poem into categories of ```4 chu```, ```5 chu```, ```7 chu```, ```luc bat``` and ```8 chu```
* Score the quality of each poem, based soldly on its conformation to the rigid rule of various types of Vietnamese poem. Using 3 criterias: Length, Tone and Rhyme as follow: ```score = L/10 + 3T/10 + 6R/10```

The rules for each genre are defined below:

| Genre | Length | Tone | Rhyme |
|------------------|------------------|--------------|------------------------|
| 4 chu    | - 4 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng) <br>- Vice versa | Last word (4th) of each line: <br>- Continuous rhyme (gieo vần tiếp) <br>- Alternating rhyme (gieo vần tréo) <br>- Three-line rhyme (gieo vần ba)|
| 5 chu    | - 5 words per line <br>- 4 lines per stanza (optional)  | Same as "4 chu" | Same as "4 chu" |
| 7 chu    | - 7 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 2nd word is uneven (trắc), the 4th word is even (bằng), the 6th word is uneven (trắc) <br> - 5th word and last word (7th) must have different tone | The last word of 1st, 2nd, 4th line per stanza must have same tone and rhyme |
| luc bat    | - 6 words in odd line <br>- 8 words in even line <br>- 4 lines per stanza (optional) | For 6-word line: <br>- If the 2nd word is uneven (trắc) the 4th word is even (bằng), the 6th word is uneven (trắc) <br><br> For 8-word line: <br>- Must be same as previous 6-word line <br>- The last word (8th) mut have same tone as 6th word but different accent | The last word (6th) in 6-word line must rhyme with the 6th word in the next 8-word line and the 8th word in the previous 8-word line |
| 8 chu    | - 8 words per line <br>- 4 lines per stanza (optional) | For each line: <br>- If the 3rd word is uneven (trắc), the 5th word is even (bằng), the 8th word is uneven (trắc)| Same as "4 chu" |




## Data

A collection of 171188 Vietnamese poems with different genres: luc-bat, 5-chu, 7-chu, 8-chu, 4-chu. Download [here](https://github.com/fsoft-ailab/Poem-Generator/raw/master/dataset/poems_dataset.zip)

For more detail, refer to the _Acknowledgments_ section

## Training

Training code is in our repo [Vietnamese poem generator](https://github.com/Anshler/poem_generator)

Run:
```
python poem_classifier_training.py
```

## Installation

```
pip install vietnamese-poem-classifier
```
Or

```
pip install git+https://github.com/Anshler/vietnamese-poem-classifier
```

## Inference

```python
from vietnamese_poem_classifier.poem_classifier import PoemClassifier

classifier = PoemClassifier()

poem = '''Người đi theo gió đuổi mây
          Tôi buồn nhặt nhạnh tháng ngày lãng quên
          Em theo hú bóng kim tiền
          Bần thần tôi ngẫm triền miên thói đời.'''

classifier.predict(poem)

#>> [{'label': 'luc bat', 'confidence': 0.9999017715454102, 'poem_score': 0.75, 'l_score': 1.0, 't_score': 1.0, 'r_score': 0.5833333333333333}]
```

## Model

The model's weights are published at Huggingface [Anshler/vietnamese-poem-classifier](https://huggingface.co/Anshler/vietnamese-poem-classifier) 

## Acknowledgments

_This project was inspired by the evaluation method from ```fsoft-ailab```'s_ [SP-GPT2 Poem-Generator](https://github.com/fsoft-ailab/Poem-Generator)

_Dataset also taken from their repo_