AmeliaYin's picture
Update README.md
4f921f4 verified
|
raw
history blame
5.05 kB
metadata
language:
  - code
pipeline_tag: text-generation
tags:
  - llama-2
license: llama2

Opencsg-CodeLlama-13b-v0.1 [中文] [English]

OpenCSG

[OpenCSG Community] [github] [wechat] [Twitter]

OpenCSG stands for Converged resources, Software refined, and Generative LM. The 'C' represents Converged resources, indicating the integration and full utilization of hybrid resources. The 'S' stands for Software refined, signifying software that is refined by large models. The 'G' represents Generative LM, which denotes widespread, inclusive, and democratized generative large models.

The vision of OpenCSG is to empower every industry, every company, and every individual to own their models. We adhere to the principles of openness and open source, making the large model software stack of OpenCSG available to the community. We welcome everyone to use, feedback, and collaborative contribute.

Model Description

CodeLlama is a collection of pretrained and fine-tuned generative text models from Llama2, which ranges in scale from 7 billion to 34 billion parameters. Based on CodeLlama, opencsg-CodeLlama-v0.1 is a series of fintuned models througth full-paramters fine-tuning method.

This is the repository for the base 13B version finetuned based on CodeLlama-13b-hf.

Model Eval

HumanEval is the commonest code generation benchmark to evaluate the performance of models, especially on the the compeltion of code exercise cases. Somehow, model evaluation is a kind of metaphysics. Different models are sensitive to different decoding methods, paramters and instructions. It is impratical for us to manually set specific configuration for each fine-tuned model, because a real LLM should master the universal capability despite the parameters manipulated by users.

Thus, OpenCSG strained our brains to provide a relatively fair method to compare the fine-tuned models on HumanEval benchmark. To simplify the comparision, we chosed the Pass@1 metric on python language, but our finetuning dataset includes samples in multi language.

For fair, we evaluated the fine-tuned and origin codellama models only with the original cases' prompts, not including any other instruction else.

Otherwise, we use greedy decoding method for each model during the evaluation.

Model HumanEval python pass@1
CodeLlama-7b-hf 30.5%
opencsg-CodeLlama-7b-v0.1(4k) 42.7%
CodeLlama-13b-hf 36.0%
opencsg-CodeLlama-13b-v0.1(4k) 45.1%
CodeLlama-34b-hf 48.2%
opencsg-CodeLlama-34b-v0.1(4k) 48.8%

TODO

  • we will provide much more benchmark scores on fine-tuned models in future.
  • we will provide different practical problems to evaluate the performance of fine-tuned models in the field of software engineering.

Model Usage

from transformers import AutoTokenizer
import transformers
import torch

model = "opencsg/opencsg-CodeLlama-13b-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)
input_text = "#write a quick sort algorithm."
sequences = pipeline(
    input_text,
    do_sample=False,
    top_k=10,
    temperature=0.1,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer1.eos_token_id,
    max_length=256,
)
for seq in sequences:
    print(seq['generated_text'][len(input_text):])

Training

Basic Model

codellama-13b-hf

Hardware

  • GPUs: 8 Tesla A800
  • Training time: 4 hours

Software

License