|
--- |
|
datasets: |
|
- Trelis/openassistant-yi |
|
- Trelis/function_calling_v3 |
|
language: |
|
- en |
|
inference: false |
|
extra_gated_prompt: "Purchase access to this repo [HERE](https://buy.stripe.com/bIYfZKbSd81Rf3a29d)!" |
|
tags: |
|
- yi |
|
- long context |
|
- commercial use |
|
- gguf |
|
- function-calling |
|
- function calling |
|
--- |
|
# Function Calling Fine-tuned Yi Chat 200k Context |
|
|
|
Purchase access to this model [here](https://buy.stripe.com/bIYfZKbSd81Rf3a29d). |
|
|
|
This model is fine-tuned for function calling. |
|
- The function metadata format is the same as used for OpenAI. |
|
- The model is suitable for commercial use. |
|
- A GGUF version is in the gguf branch. |
|
|
|
> There are AWQ and GPTQ models available in their respective branches, but there are bugs inferencing both. Therefore using the main branch with bf16 precision or bits and bytes nf-4 is recommended. |
|
|
|
Check out other fine-tuned function calling models [here](https://trelis.com/function-calling/). |
|
|
|
## Quick Server Setup |
|
Runpod one click template [here](https://runpod.io/gsc?template=j29uypqrc1&ref=jmfkcdio). You must add a HuggingFace Hub access token (HUGGING_FACE_HUB_TOKEN) to the environment variables as this is a gated model. |
|
|
|
Runpod Affiliate [Link](https://runpod.io?ref=jmfkcdio) (helps support the Trelis channel). |
|
|
|
## Inference Scripts |
|
See below for sample prompt format. |
|
|
|
Complete inference scripts are available for purchase [here](https://trelis.com/enterprise-server-api-and-inference-guide/): |
|
- Easily format prompts using tokenizer.apply_chat_format (starting from openai formatted functions and a list of messages) |
|
- Automate catching, handling and chaining of function calls. |
|
|
|
## Prompt Format |
|
``` |
|
B_FUNC, E_FUNC = "You have access to the following functions. Use them if required:\n\n", "\n\n" |
|
B_INST, E_INST = "Human: ", " Assistant:" #Yi Style for function calling, no training space |
|
prompt = f"{B_INST}{B_FUNC}{functionList.strip()}{E_FUNC}{user_prompt.strip()}{E_INST}\n\n" |
|
``` |
|
|
|
### Using tokenizer.apply_chat_template |
|
For an easier application of the prompt, you can set up as follows: |
|
|
|
Set up `messages`: |
|
``` |
|
[ |
|
{ |
|
"role": "function_metadata", |
|
"content": "FUNCTION_METADATA" |
|
}, |
|
{ |
|
"role": "user", |
|
"content": "What is the current weather in London?" |
|
}, |
|
{ |
|
"role": "function_call", |
|
"content": "{\n \"name\": \"get_current_weather\",\n \"arguments\": {\n \"city\": \"London\"\n }\n}" |
|
}, |
|
{ |
|
"role": "function_response", |
|
"content": "{\n \"temperature\": \"15 C\",\n \"condition\": \"Cloudy\"\n}" |
|
}, |
|
{ |
|
"role": "assistant", |
|
"content": "The current weather in London is Cloudy with a temperature of 15 Celsius" |
|
} |
|
] |
|
``` |
|
|
|
with `FUNCTION_METADATA` as: |
|
``` |
|
[ |
|
{ |
|
"type": "function", |
|
"function": { |
|
"name": "get_current_weather", |
|
"description": "This function gets the current weather in a given city", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"city": { |
|
"type": "string", |
|
"description": "The city, e.g., San Francisco" |
|
}, |
|
"format": { |
|
"type": "string", |
|
"enum": ["celsius", "fahrenheit"], |
|
"description": "The temperature unit to use." |
|
} |
|
}, |
|
"required": ["city"] |
|
} |
|
} |
|
}, |
|
{ |
|
"type": "function", |
|
"function": { |
|
"name": "get_clothes", |
|
"description": "This function provides a suggestion of clothes to wear based on the current weather", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"temperature": { |
|
"type": "string", |
|
"description": "The temperature, e.g., 15 C or 59 F" |
|
}, |
|
"condition": { |
|
"type": "string", |
|
"description": "The weather condition, e.g., 'Cloudy', 'Sunny', 'Rainy'" |
|
} |
|
}, |
|
"required": ["temperature", "condition"] |
|
} |
|
} |
|
} |
|
] |
|
``` |
|
and then apply the chat template to get a formatted prompt: |
|
``` |
|
tokenizer = AutoTokenizer.from_pretrained('Trelis/Yi-34B-200K-Llamafied-chat-SFT-function-calling-v3', trust_remote_code=True) |
|
|
|
prompt = tokenizer.apply_chat_template(prompt, tokenize=False) |
|
``` |
|
If you are using a gated model, you need to first run: |
|
``` |
|
pip install huggingface_hub |
|
huggingface-cli login |
|
``` |
|
|
|
### Manual Prompt: |
|
``` |
|
Human: You have access to the following functions. Use them if required: |
|
|
|
[ |
|
{ |
|
"type": "function", |
|
"function": { |
|
"name": "get_stock_price", |
|
"description": "Get the stock price of an array of stocks", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"names": { |
|
"type": "array", |
|
"items": { |
|
"type": "string" |
|
}, |
|
"description": "An array of stocks" |
|
} |
|
}, |
|
"required": [ |
|
"names" |
|
] |
|
} |
|
} |
|
}, |
|
{ |
|
"type": "function", |
|
"function": { |
|
"name": "get_big_stocks", |
|
"description": "Get the names of the largest N stocks by market cap", |
|
"parameters": { |
|
"type": "object", |
|
"properties": { |
|
"number": { |
|
"type": "integer", |
|
"description": "The number of largest stocks to get the names of, e.g. 25" |
|
}, |
|
"region": { |
|
"type": "string", |
|
"description": "The region to consider, can be \"US\" or \"World\"." |
|
} |
|
}, |
|
"required": [ |
|
"number" |
|
] |
|
} |
|
} |
|
} |
|
] |
|
|
|
Get the names of the five largest stocks by market cap Assistant: |
|
|
|
{ |
|
"name": "get_big_stocks", |
|
"arguments": { |
|
"number": 5 |
|
} |
|
}<|endoftext|> |
|
``` |
|
|
|
# Dataset |
|
See [Trelis/function_calling_v3](https://huggingface.co/datasets/Trelis/function_calling_v3). |
|
|
|
# License |
|
This model may be used commercially for inference according to the terms of the Yi license, or for further fine-tuning and inference. Users may not re-publish or re-sell this model in the same or derivative form (including fine-tunes). |
|
|
|
** |
|
The SFT chat fine-tuned model's repo card follows below. |
|
** |
|
# ✨ Yi 200k context SFT models |
|
These are chat fine-tuned versions of the Yi 200k context length models: |
|
- Supervised Fine-tuning allows the model to respond in a cleaner chat format that ends with EOS tokens. |
|
- Note that this is a fine-tune of the llamafied model, meaning that all llama platforms can be used for inference. |
|
|
|
Available models: |
|
- Purchase access to the 6B model [here](https://buy.stripe.com/9AQ00M5tP3LBg7e00J) |
|
- Purchase access to the 34B model [here](https://buy.stripe.com/28o00Mf4p81RaMUdRA) |
|
|
|
GGUF models are in the base model repos (along with the bf16 weight safetensors). AWQ models are in the '-AWQ' repos (34B AWQ will be released by EOD 20 Nov 2023). When you purchase access, you get access to all model variants for that model size. |
|
|
|
Notably: |
|
- The data used for fine-tuning is Apache 2 licensed and not generated using AI, thereby allowing this chat model to be used commercially, which is particularly useful for data preparation and generation for training other models. |
|
- The purchase of access to this model grants the user permission to use the model commercially for inference or fine-tuning and inference. |
|
|
|
## Prompt format: |
|
``` |
|
# Yi style |
|
B_INST, E_INST = "Human: ", " Assistant:" |
|
prompt = f"{B_INST}{user_prompt.strip()}{E_INST}" |
|
``` |
|
|
|
THE ORIGINAL MODEL CARD FOLLOWS BELOW. |
|
|
|
Llamafied version of 01-ai's [Yi-6B-200k](https://huggingface.co/01-ai/Yi-6B-200K) for ease of use. |
|
|
|
## Model Performance |
|
|
|
| Model | MMLU | CMMLU | C-Eval | GAOKAO | BBH | Common-sense Reasoning | Reading Comprehension | Math & Code | |
|
| :------------ | :------: | :------: | :------: | :------: | :------: | :--------------------: | :-------------------: | :---------: | |
|
| | 5-shot | 5-shot | 5-shot | 0-shot | 3-shot@1 | - | - | - | |
|
| LLaMA2-34B | 62.6 | - | - | - | 44.1 | 69.9 | 68.0 | 26.0 | |
|
| LLaMA2-70B | 68.9 | 53.3 | - | 49.8 | 51.2 | 71.9 | 69.4 | 36.8 | |
|
| Baichuan2-13B | 59.2 | 62.0 | 58.1 | 54.3 | 48.8 | 64.3 | 62.4 | 23.0 | |
|
| Qwen-14B | 66.3 | 71.0 | 72.1 | 62.5 | 53.4 | 73.3 | 72.5 | **39.8** | |
|
| Skywork-13B | 62.1 | 61.8 | 60.6 | 68.1 | 41.7 | 72.4 | 61.4 | 24.9 | |
|
| InternLM-20B | 62.1 | 59.0 | 58.8 | 45.5 | 52.5 | 78.3 | - | 30.4 | |
|
| Aquila-34B | 67.8 | 71.4 | 63.1 | - | - | - | - | - | |
|
| Falcon-180B | 70.4 | 58.0 | 57.8 | 59.0 | 54.0 | 77.3 | 68.8 | 34.0 | |
|
| Yi-6B | 63.2 | 75.5 | 72.0 | 72.2 | 42.8 | 72.3 | 68.7 | 19.8 | |
|
| Yi-6B-200K | 64.0 | 75.3 | 73.5 | 73.9 | 42.0 | 72.0 | 69.1 | 19.0 | |
|
| **Yi-34B** | **76.3** | **83.7** | 81.4 | 82.8 | **54.3** | **80.1** | 76.4 | 37.1 | |
|
| Yi-34B-200K | 76.1 | 83.6 | **81.9** | **83.4** | 52.7 | 79.7 | **76.6** | 36.3 | |
|
|
|
While benchmarking open-source models, we have observed a disparity between the |
|
results generated by our pipeline and those reported in public sources (e.g. |
|
OpenCompass). Upon conducting a more in-depth investigation of this difference, |
|
we have discovered that various models may employ different prompts, |
|
post-processing strategies, and sampling techniques, potentially resulting in |
|
significant variations in the outcomes. Our prompt and post-processing strategy |
|
remains consistent with the original benchmark, and greedy decoding is employed |
|
during evaluation without any post-processing for the generated content. For |
|
scores that were not reported by the original authors (including scores reported |
|
with different settings), we try to get results with our pipeline. |
|
|
|
To evaluate the model's capability extensively, we adopted the methodology |
|
outlined in Llama2. Specifically, we included PIQA, SIQA, HellaSwag, WinoGrande, |
|
ARC, OBQA, and CSQA to assess common sense reasoning. SquAD, QuAC, and BoolQ |
|
were incorporated to evaluate reading comprehension. CSQA was exclusively tested |
|
using a 7-shot setup, while all other tests were conducted with a 0-shot |
|
configuration. Additionally, we introduced GSM8K (8-shot@1), MATH (4-shot@1), |
|
HumanEval (0-shot@1), and MBPP (3-shot@1) under the category "Math & Code". Due |
|
to technical constraints, we did not test Falcon-180 on QuAC and OBQA; the score |
|
is derived by averaging the scores on the remaining tasks. Since the scores for |
|
these two tasks are generally lower than the average, we believe that |
|
Falcon-180B's performance was not underestimated. |
|
|
|
## Usage |
|
|
|
Please visit our [github repository](https://github.com/01-ai/Yi) for general |
|
guidance on how to use this model. |
|
|
|
## Disclaimer |
|
|
|
Although we use data compliance checking algorithms during the training process |
|
to ensure the compliance of the trained model to the best of our ability, due to |
|
the complexity of the data and the diversity of language model usage scenarios, |
|
we cannot guarantee that the model will generate correct and reasonable output |
|
in all scenarios. Please be aware that there is still a risk of the model |
|
producing problematic outputs. We will not be responsible for any risks and |
|
issues resulting from misuse, misguidance, illegal usage, and related |
|
misinformation, as well as any associated data security concerns. |
|
|
|
## License |
|
|
|
The Yi series models are fully open for academic research and free commercial |
|
usage with permission via applications. All usage must adhere to the [Model |
|
License Agreement 2.0](https://huggingface.co/01-ai/Yi-6B-200K/blob/main/LICENSE). To |
|
apply for the official commercial license, please contact us |
|
([[email protected]](mailto:[email protected])). |
|
|