File size: 9,413 Bytes
c56db09
 
f8388af
 
 
 
 
c56db09
 
f8388af
c56db09
f8388af
c56db09
f8388af
c56db09
 
 
f8388af
 
 
 
 
9c3bc62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f8388af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
library_name: transformers
license: llama3
language:
- ko
- en
pipeline_tag: text-generation
---

# davidkim205/ko-gemma-2-9b-it

davidkim205/ko-gemma-2-9b-it is one of several models being researched to improve the performance of Korean language models. 

(would be released soon)

## Model Details

* **Model Developers** :  davidkim(changyeon kim)
* **Repository** : - 
* **base mode** : google/gemma-2-9b-it
* **sft dataset** : qa_ability_1851.jsonl

## Usage
### Chat Template
```
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "davidkim205/ko-gemma-2-9b-it"

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config)

chat = [
    { "role": "system", "content":"๋‹น์‹ ์€ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๋Š” AI์ž…๋‹ˆ๋‹ค."},
    { "role": "user", "content": "๋”ฅ๋Ÿฌ๋‹์„ ์–ด๋–ป๊ฒŒ ๊ณต๋ถ€ํ•ด์•ผํ•˜๋‚˜์š”?" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1024)
print(tokenizer.decode(outputs[0]))

```
output
```
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4/4 [00:04<00:00,  1.04s/it]
/home/david/anaconda3/envs/eval/lib/python3.10/site-packages/bitsandbytes/nn/modules.py:426: UserWarning: Input type into Linear4bit is torch.float16, but bnb_4bit_compute_dtype=torch.float32 (default). This will lead to slow inference or training speed.
  warnings.warn(
<bos>๋‹น์‹ ์€ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์„ค๋ช…ํ•˜๋Š” AI์ž…๋‹ˆ๋‹ค.<start_of_turn>user
๋”ฅ๋Ÿฌ๋‹์„ ์–ด๋–ป๊ฒŒ ๊ณต๋ถ€ํ•ด์•ผํ•˜๋‚˜์š”?<end_of_turn>
<start_of_turn>model
๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜๋Š” ๊ฒƒ์€ ํฅ๋ฏธ๋กญ๊ณ  ๋ณด๋žŒ ์žˆ๋Š” ์—ฌ์ •์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! 

ํ•˜์ง€๋งŒ ์–ด๋””์„œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์•ผ ํ• ์ง€ ๋ง‰๋ง‰ํ•˜๊ฒŒ ๋Š๊ปด์งˆ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. 

๋‹ค์Œ์€ ๋”ฅ๋Ÿฌ๋‹์„ ๊ณต๋ถ€ํ•˜๊ธฐ ์œ„ํ•œ ๋‹จ๊ณ„๋ณ„ ๊ฐ€์ด๋“œ์ž…๋‹ˆ๋‹ค.

**1๋‹จ๊ณ„: ๊ธฐ์ดˆ ๋‹ค์ง€๊ธฐ**

* **์ˆ˜ํ•™**: ๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ์„ ํ˜•๋Œ€์ˆ˜, ๋ฏธ์ ๋ถ„, ํ™•๋ฅ  ๋ฐ ํ†ต๊ณ„์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ์ง€์‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Khan Academy, Coursera ๋“ฑ ์˜จ๋ผ์ธ ํ”Œ๋žซํผ์—์„œ ์ˆ˜ํ•™ ๊ฐ•์ขŒ๋ฅผ ๋“ฃ๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค.
* **ํ”„๋กœ๊ทธ๋ž˜๋ฐ**: Python์€ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด์ž…๋‹ˆ๋‹ค. Python ๊ธฐ์ดˆ ๋ฌธ๋ฒ•, ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ, ํ•จ์ˆ˜ ๋“ฑ์„ ์ตํžˆ์„ธ์š”. Codecademy, Google's Python Class ๋“ฑ์˜ ํ”Œ๋žซํผ์—์„œ Python์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
* **๊ธฐ๋ณธ ๋จธ์‹ ๋Ÿฌ๋‹**: ๋”ฅ๋Ÿฌ๋‹์„ ์ดํ•ดํ•˜๊ธฐ ์ „์— ๊ธฐ๋ณธ์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐœ๋…์„ ์ตํžˆ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. 
    * ๋ถ„๋ฅ˜, ํšŒ๊ท€, ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋“ฑ์˜ ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ดํ•ดํ•˜๊ณ , Scikit-learn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์‹ค์Šต์„ ํ•ด๋ณด์„ธ์š”.

**2๋‹จ๊ณ„: ๋”ฅ๋Ÿฌ๋‹ ๊ฐœ๋… ํ•™์Šต**

* **์˜จ๋ผ์ธ ๊ฐ•์ขŒ**: Coursera, edX, Udacity ๋“ฑ์˜ ํ”Œ๋žซํผ์—์„œ ์ œ๊ณตํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ๊ฐ•์ขŒ๋ฅผ ์ˆ˜๊ฐ•ํ•˜์„ธ์š”. Andrew Ng์˜ Deep Learning Specialization์€ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์˜ ๊ธฐ๋ณธ ๊ฐœ๋…์„ ํƒ„ํƒ„ํ•˜๊ฒŒ ๋‹ค์ง€๋Š” ๋ฐ ์ข‹์€ ์„ ํƒ์ž…๋‹ˆ๋‹ค.
* **์ฑ…**: ๋”ฅ๋Ÿฌ๋‹์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ์‹ฌํ™”์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์ฑ…์„ ์ฝ๋Š” ๊ฒƒ๋„ ์ข‹์€ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. 
    * "Deep Learning" (Ian Goodfellow, Yoshua Bengio, Aaron Courville)์€ ๋”ฅ๋Ÿฌ๋‹ ๋ถ„์•ผ์˜ ์ „๋ฌธ๊ฐ€๋ฅผ ์œ„ํ•œ ์‹ฌ๋„ ์žˆ๋Š” ์ฑ…์ž…๋‹ˆ๋‹ค. 
    * "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" (Aurรฉlien Gรฉron)์€ ์‹ค์Šต ์ค‘์‹ฌ์œผ๋กœ ๋”ฅ๋Ÿฌ๋‹์„ ๋ฐฐ์šฐ๊ณ  ์‹ถ์€ ์‚ฌ๋žŒ์—๊ฒŒ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
* **๋ธ”๋กœ๊ทธ ๋ฐ ๊ธฐ์‚ฌ**: ๋”ฅ๋Ÿฌ๋‹ ๊ด€๋ จ ์ตœ์‹  ํŠธ๋ Œ๋“œ์™€ ์—ฐ๊ตฌ ๋™ํ–ฅ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ๋ธ”๋กœ๊ทธ ๋ฐ ๊ธฐ์‚ฌ๋ฅผ ์ฝ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

**3๋‹จ๊ณ„: ์‹ค์Šต ๋ฐ ํ”„๋กœ์ ํŠธ ์ง„ํ–‰**

* **๋ฐ์ดํ„ฐ์…‹**: Kaggle, UCI Machine Learning Repository ๋“ฑ์˜ ํ”Œ๋žซํผ์—์„œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์ฐพ์•„ ์‹ค์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
* **๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ**: TensorFlow, PyTorch, Keras ๋“ฑ์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๊ณ  ํ›ˆ๋ จํ•˜์„ธ์š”.
* **ํ”„๋กœ์ ํŠธ**: ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ์ ์šฉํ•˜์—ฌ ์‹ค์ œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. 
    * ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ, ์˜ˆ์ธก ๋ชจ๋ธ ๊ฐœ๋ฐœ ๋“ฑ ๋‹ค์–‘ํ•œ ํ”„๋กœ์ ํŠธ๋ฅผ ํ†ตํ•ด ๋”ฅ๋Ÿฌ๋‹ ์‹ค๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

**์ถ”๊ฐ€ ํŒ**

* **์ปค๋ฎค๋‹ˆํ‹ฐ ํ™œ๋™**: ๋”ฅ๋Ÿฌ๋‹ ๊ด€๋ จ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ฐธ์—ฌํ•˜์—ฌ ๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค๊ณผ ๊ต๋ฅ˜ํ•˜๊ณ  ์งˆ๋ฌธ์„ ํ•ด๋ณด์„ธ์š”.
* **๊พธ์ค€ํ•จ**: ๋”ฅ๋Ÿฌ๋‹์€ ๋ณต์žกํ•œ ๋ถ„์•ผ์ด๋ฏ€๋กœ ๊พธ์ค€ํžˆ ๊ณต๋ถ€ํ•˜๊ณ  ์‹ค์Šตํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.


<end_of_turn><eos>

```
## Benchmark

### kollm_evaluation
https://github.com/davidkim205/kollm_evaluation

|       Tasks       |Version|Filter|n-shot| Metric |Value |   |Stderr|
|-------------------|-------|------|-----:|--------|-----:|---|------|
|kobest             |N/A    |none  |     0|acc     |0.5150|ยฑ  |0.0073|
|                   |       |none  |     0|f1      |0.4494|ยฑ  |N/A   |
| - kobest_boolq    |      1|none  |     0|acc     |0.6154|ยฑ  |0.0130|
|                   |       |none  |     0|f1      |0.5595|ยฑ  |N/A   |
| - kobest_copa     |      1|none  |     0|acc     |0.4710|ยฑ  |0.0158|
|                   |       |none  |     0|f1      |0.4700|ยฑ  |N/A   |
| - kobest_hellaswag|      1|none  |     0|acc     |0.3880|ยฑ  |0.0218|
|                   |       |none  |     0|f1      |0.3832|ยฑ  |N/A   |
|                   |       |none  |     0|acc_norm|0.4780|ยฑ  |0.0224|
| - kobest_sentineg |      1|none  |     0|acc     |0.5189|ยฑ  |0.0251|
|                   |       |none  |     0|f1      |0.4773|ยฑ  |N/A   |
| - kobest_wic      |      1|none  |     0|acc     |0.4873|ยฑ  |0.0141|
|                   |       |none  |     0|f1      |0.3276|ยฑ  |N/A   |
|ko_truthfulqa      |      2|none  |     0|acc     |0.3390|ยฑ  |0.0166|
|ko_mmlu            |      1|none  |     0|acc     |0.1469|ยฑ  |0.0019|
|                   |       |none  |     0|acc_norm|0.1469|ยฑ  |0.0019|
|ko_hellaswag       |      1|none  |     0|acc     |0.2955|ยฑ  |0.0046|
|                   |       |none  |     0|acc_norm|0.3535|ยฑ  |0.0048|
|ko_common_gen      |      1|none  |     0|acc     |0.5825|ยฑ  |0.0126|
|                   |       |none  |     0|acc_norm|0.5825|ยฑ  |0.0126|
|ko_arc_easy        |      1|none  |     0|acc     |0.2329|ยฑ  |0.0124|
|                   |       |none  |     0|acc_norm|0.2867|ยฑ  |0.0132|



### Evaluation of KEval
keval is an evaluation model that learned the prompt and dataset used in the benchmark for evaluating Korean language models among various methods of evaluating models with chatgpt to compensate for the shortcomings of the existing lm-evaluation-harness.

https://huggingface.co/davidkim205/keval-7b

| model                                                                                    |   ned |   exe_time |   evalscore |   count |
|:-----------------------------------------------------------------------------------------|------:|-----------:|------------:|--------:|
| claude-3-opus-20240229                                                                   | nan   |      nan   |        8.79 |      42 |
| gpt-4-turbo-2024-04-09                                                                   | nan   |      nan   |        8.71 |      42 |
| Qwen2-72B-Instruct                                                                       | nan   |    29850.5 |        7.85 |      42 |
| WizardLM-2-8x22B                                                                         | nan   |   133831   |        7.57 |      42 |
| ***ko-gemma-2-9b-it***                                                                   | nan   |    30789.5 |        7.52 |      42 |
| HyperClovaX                                                                              | nan   |      nan   |        7.44 |      42 |
| gemma-2-9b-it                                                                            | nan   |    23531.7 |        7.4  |      42 |
| glm-4-9b-chat                                                                            | nan   |    24825.6 |        7.31 |      42 |
| Ko-Llama-3-8B-Instruct                                                                   | nan   |    10697.5 |        6.81 |      42 |
| Qwen2-7B-Instruct                                                                        | nan   |    11856.3 |        6.02 |      42 |
| Not-WizardLM-2-7B                                                                        | nan   |    12955.7 |        5.26 |      42 |
| gemma-1.1-7b-it                                                                          | nan   |     6950.5 |        4.99 |      42 |
| Mistral-7B-Instruct-v0.3                                                                 | nan   |    19631.4 |        4.89 |      42 |
| Phi-3-small-128k-instruct                                                                | nan   |    26747.5 |        3.52 |      42 |