Model Summary

phi-2-tool-use is fine-tuned version of Phi-2 for function calling purposes. The model was fine-tuned on the public function call dataset glaiveai/glaive-function-calling-v2.

The purpose of the experiment is to understand the quality of the pre-trained Phi-2 model. phi-2-tool-use can generalize to call simple tools/functions not seen during fine-tuning.

Decoding

Format your prompt as

"""SYSTEM: {system_content}\n\nUSER: {user_content} {eos_token} ASSISTANT:"""

where system_content is the system message containing a description of the tool/function as a json schema, user_content is the user message, and eos_token is the EOS token. The model can handle multi-turn dialogue as it was trained on such data.

Here's a full-fledged example:

import torch
import transformers

model_name_or_path = "lxuechen/phi-2-tool-use"
model: transformers.PreTrainedModel = transformers.AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    low_cpu_mem_usage=True,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16
)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name_or_path)

input_text = """SYSTEM: You are a helpful assistant with access to the following functions. Use them if required - { "name": "get_exchange_rate", "description": "Get the exchange rate between two currencies", "parameters": { "type": "object", "properties": { "base_currency": { "type": "string", "description": "The currency to convert from" }, "target_currency": { "type": "string", "description": "The currency to convert to" } }, "required": [ "base_currency", "target_currency" ] } }\n\nUSER: Convert 100 USD to CAD <|endoftext|> ASSISTANT:"""

outputs = model.generate(
    tokenizer(input_text, return_tensors="pt").to(model.device)['input_ids'],
    max_length=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training

The model was fine-tuned with SFT on glaiveai/glaive-function-calling-v2.

Hyperparameters:

  • learning rate: 3% linear warmup, with a peak of 2e-5 and cosine decay
  • epochs: 2
  • batch size: 64
  • context length: 2048
Downloads last month
16
Safetensors
Model size
2.78B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.