leafspark
/

IridiumLlama-72B-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

IridiumLlama-72B-v0.1 / README.md

leafspark's picture

docs: add model card

3a58e69 verified 6 months ago

|

1.74 kB

metadata

license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
language:
  - en
  - zh
library_name: transformers
tags:
  - mergekit
  - llama

FeatherLlama-72B-v0.1

Model Description

FeatherLlama is a 72B parameter language model created through a merge of Qwen2-72B-Instruct, calme2.1-72b, and magnum-72b-v1 using model_stock.

This is converted from leafspark/FeatherQwen2-72B-v0.1

Features

72 billion parameters
Sharded in 31 files (unlike FeatherQwen2, which has 1,043 shards due to the merging process)
Combines Magnum prose with Calam smarts
Llamaified for easy use

Technical Specifications

Architecture

LlamaForCasualLM
Models: Qwen2-72B-Instruct (base), calme2.1-72b, magnum-72b-v1
Merged layers: 80
Total tensors: 1,043

Tensor Distribution

Attention layers: 560 files
MLP layers: 240 files
Layer norms: 160 files
Miscellaneous (embeddings, output): 83 files

Merging

Custom script utilizing safetensors library.

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("leafspark/FeatherLlama-72B-v0.1", 
                                             device_map="auto", 
                                             torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("leafspark/FeatherLlama-72B-v0.1")

GGUFs

Find them here: leafspark/FeatherLlama-72B-v0.1-GGUF

Hardware Requirements

Minimum ~140GB of storage
~140GB VRAM