File size: 6,487 Bytes
b9685b1
 
22c1783
 
 
 
 
402456c
22c1783
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
license: other
commercial: false
datasets:
- aisquared/databricks-dolly-15k
language:
- en
library_name: transformers
---


# Model Card for `chopt-1_3b`

<!-- Provide a quick summary of what the model is/does. -->

AI Squared's `chopt-1_3b` is a large language model which is derived from Meta AI's Open Pre-trained Transformer language modelsand fine-tuned on a corpus of 15k records ([Databricks' "Dolly 15k" Dataset](https://huggingface.co/datasets/aisquared/databricks-dolly-15k)) to help it exhibit chat-based capabilities. Despite the permissive license of the Dolly 15k dataset, due to this model being a derivative of OPT it is restricted to use for **non-commercial research purposes**. The ChOPT family of models from AI Squared are licensed under the OPT-175B license, Copyright (c) Meta Platforms, Inc. All Rights Reserved.

While `chopt-1_3b` is **not a state-of-the-art model**, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought. 


### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** AI Squared, Inc.
- **Shared by:** AI Squared, Inc.
- **Model type:** Large Language Model
- **Language(s) (NLP):** EN
- **License:** other
- **Finetuned from model:** OPT


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

**`chopt-1_3b` is not a state-of-the-art language model.** `chopt-1_3b` is an experimental technology and is not designed for use in any
environment other than for research purposes. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include,
but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations.
Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology.


## Usage

The code below shows how to use `chopt-1_3b` in the way which it was trained.  While the model can be used "out of the box" using the
`transformers` library, using the function defined below to create a response from the model will achieve better results.

### Load Model and Tokenizer from this Repository Using the `transformers` Package

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
import re

model_id = 'aisquared/chopt-1_3b'

tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side = 'left')
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code = True, device_map = 'auto')
```


### Create the Prompt Format and Other Variables

```python
PROMPT = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
"""

END_KEY = '### End'
RESPONSE_KEY = '### Response:\n'
```


### Create a Function to Retrieve a Response

```python
def create_response(
        instruction,
        model,
        tokenizer,
        do_sample = True,
        max_new_tokens = 256,
        top_p = 0.92,
        top_k = 0,
        **kwargs
):
    """
    Create a response from the model by using a formatted prompt
    """
    input_ids = tokenizer(
        PROMPT.format(instruction=instruction), return_tensors="pt"
    ).input_ids

    gen_tokens = model.generate(
        input_ids,
        pad_token_id=tokenizer.pad_token_id,
        do_sample=do_sample,
        max_new_tokens=max_new_tokens,
        top_p=top_p,
        top_k=top_k,
        **kwargs,
    )
    decoded = tokenizer.batch_decode(gen_tokens)[0]

    # The response appears after "### Response:".  The model has been trained to append "### End" at the end.
    m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", decoded, flags=re.DOTALL)

    response = None
    if m:
        response = m.group(1).strip()
    else:
        # The model might not generate the "### End" sequence before reaching the max tokens.  In this case, return
        # everything after "### Response:".
        m = re.search(r"#+\s*Response:\s*(.+)", decoded, flags=re.DOTALL)
        if m:
            response = m.group(1).strip()
        else:
            pass
    return response
```

### Model Performance Metrics

We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the ChOPT family.
Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.

| Model               |   openbookqa |   arc_easy |   winogrande |   hellaswag |   arc_challenge |     piqa |    boolq |
|:--------------------|-------------:|-----------:|-------------:|------------:|----------------:|---------:|---------:|
| chopt-125m          |        0.178 |   0.443182 |     0.501973 |    0.294165 |        0.197099 | 0.630577 | 0.476758 |
| chopt-research-125m |        0.17  |   0.436027 |     0.503552 |    0.294762 |        0.205631 | 0.62568  | 0.48685  |
| opt-125m            |        0.166 |   0.435606 |     0.501973 |    0.291775 |        0.190273 | 0.6284   | 0.554434 |
| chopt-350m          |        0.178 |   0.450758 |     0.508287 |    0.325334 |        0.21843  | 0.650707 | 0.559633 |
| opt_350m            |        0.176 |   0.441077 |     0.52644  |    0.320056 |        0.207338 | 0.645267 | 0.57737  |
| chopt-research-350m |        0.172 |   0.462542 |     0.514601 |    0.327524 |        0.235495 | 0.643634 | 0.589908 |
| opt-1.3b            |        0.234 |   0.569865 |     0.596685 |    0.414957 |        0.232935 | 0.718172 | 0.577676 |
| chopt-research-1_3b |        0.232 |   0.564815 |     0.59116  |    0.424716 |        0.276451 | 0.713275 | 0.634557 |
| chopt-1_3b          |        0.236 |   0.569444 |     0.584057 |    0.42621  |        0.268771 | 0.723069 | 0.658104 |
| opt-2.7b            |        0.25  |   0.608165 |     0.608524 |    0.458176 |        0.267918 | 0.738303 | 0.603058 |
| chopt-2_7b          |        0.276 |   0.616582 |     0.601421 |    0.472615 |        0.288396 | 0.75136  | 0.552294 |
| chopt-research-2_7b |        0.262 |   0.610269 |     0.625099 |    0.458176 |        0.295222 | 0.742111 | 0.636697 |