hipnologo's picture
Update README.md
27375e8
|
raw
history blame
3.4 kB
metadata
library_name: peft
license: apache-2.0
datasets:
  - Abirate/english_quotes
language:
  - en
pipeline_tag: text-generation
tags:
  - text-generation-inference

hipnologo/GPT-Neox-20b-QLoRA-FineTune-english_quotes_dataset

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in-4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: bfloat16

Model description

This model is a fine-tuned version of the EleutherAI/gpt-neox-20b model using the QLoRa library and the PEFT library.

How to use

The code below performs the following steps:

  1. Imports the necessary libraries: torch and classes from the transformers library.
  2. Specifies the model_id as "hipnologo/GPT-Neox-20b-QLoRA-FineTune-english_quotes_dataset".
  3. Defines a BitsAndBytesConfig object named bnb_config with the following configuration:
    • load_in_4bit set to True
    • bnb_4bit_use_double_quant set to True
    • bnb_4bit_quant_type set to "nf4"
    • bnb_4bit_compute_dtype set to torch.bfloat16
  4. Initializes an AutoTokenizer object named tokenizer by loading the tokenizer for the specified model_id.
  5. Initializes an AutoModelForCausalLM object named model by loading the pre-trained model for the specified model_id and providing the quantization_config as bnb_config. The model is loaded on device cuda:0.
  6. Defines a variable text with the value "Twenty years from now".
  7. Defines a variable device with the value "cuda:0", representing the device on which the model will be executed.
  8. Encodes the text using the tokenizer and converts it to a PyTorch tensor, assigning it to the inputs variable. The tensor is moved to the specified device.
  9. Generates text using the model.generate method by passing the inputs tensor and setting the max_new_tokens parameter to 20. The generated output is assigned to the outputs variable.
  10. Decodes the outputs tensor using the tokenizer to obtain the generated text without special tokens, and assigns it to the generated_text variable.
  11. Prints the generated_text.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "hipnologo/GPT-Neox-20b-QLoRA-FineTune-english_quotes_dataset"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

text = "Twenty years from now"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=20)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Framework versions

  • PEFT 0.4.0.dev0

Training procedure

  • Trainable params: 8650752
  • all params: 10597552128
  • trainable%: 0.08162971878329976

License

This model is licensed under Apache 2.0. Please see the LICENSE for more information.