Introducing GenZ Infinite

The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity

Generate responses

Use the generate.py file from the github repo

python generate.py --base_model budecosystem/genz-13b-infinite

You can integrate the model in your code my loading convert_llama_model function.

import torch
from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
from model.llama import convert_llama_model

local_branch = 2048
global_branch = 10
limit_distance = 2048

model = AutoModelForCausalLM.from_pretrained(
    "budecosystem/genz-13b-infinite",
    torch_dtype=torch.float16,
    device_map="auto",
)
model = convert_llama_model(model, local_branch, global_branch)

Evaluation

Task 4096 5120 8192 16384
Passkey retreival 100 75 48 30

Training details

The model is trained of 4 A100 80GB for approximately 55hrs.

Hyperparameters Value
per_device_train_batch_size 1
gradient_accumulation_steps 1
epoch 3
steps 8550
learning_rate 2e-4
lr schedular type cosine
warmup steps 1000
optimizer adamw
fp16 True
GPU 4 A100 80GB

Acknowledgments

We'd like to thank the open-source community and the researchers whose foundational work laid the path to this model. Special shoutout to the authors of LM-Infinite paper and the GitHub repo

Downloads last month
855
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.