han1997's picture
Update README.md (#1)
05db6f3 verified
metadata
library_name: transformers
license: apache-2.0

The following content is mostly from https://huggingface.co/state-spaces/mamba-2.8b-hf

Mamba

This repository contains the transfromers compatible mamba-2.8b-ultrachat. The checkpoints are untouched, but the full config.json and tokenizer are pushed to this repo. For details of the original model before conversion, see https://huggingface.co/xiuyul/mamba-2.8b-ultrachat.

Usage

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

If any of these two is not installed, the "eager" implementation will be used. Otherwise the more optimised cuda kernels will be used.

Generation

You can use the classic generate API:

>>> from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
>>> import torch

>>> tokenizer = AutoTokenizer.from_pretrained("han1997/mamba-2.8b-ultrachat-hf")
>>> model = MambaForCausalLM.from_pretrained("han1997/mamba-2.8b-ultrachat-hf")
>>> input_ids = tokenizer("Hey how are you doing?", return_tensors="pt")["input_ids"]

>>> out = model.generate(input_ids, max_new_tokens=10)
>>> print(tokenizer.batch_decode(out))
["Hey how are you doing?\n\nI'm doing great.\n\nI"]

PEFT finetuning example

In order to finetune using the peft library, we recommend keeping the model in float32!

from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
tokenizer = AutoTokenizer.from_pretrained("han1997/mamba-2.8b-ultrachat-hf")
model = AutoModelForCausalLM.from_pretrained("han1997/mamba-2.8b-ultrachat-hf")
dataset = load_dataset("Abirate/english_quotes", split="train")
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    logging_dir='./logs',
    logging_steps=10,
    learning_rate=2e-3
)
lora_config =  LoraConfig(
        r=8,
        target_modules=["x_proj", "embeddings", "in_proj", "out_proj"],
        task_type="CAUSAL_LM",
        bias="none"
)
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=lora_config,
    train_dataset=dataset,
    dataset_text_field="quote",
)
trainer.train()