leafspark's picture
docs: add model card
3a58e69 verified
|
raw
history blame
1.74 kB
metadata
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
pipeline_tag: text-generation
language:
  - en
  - zh
library_name: transformers
tags:
  - mergekit
  - llama

FeatherLlama-72B-v0.1

Model Description

FeatherLlama is a 72B parameter language model created through a merge of Qwen2-72B-Instruct, calme2.1-72b, and magnum-72b-v1 using model_stock.

This is converted from leafspark/FeatherQwen2-72B-v0.1

Features

  • 72 billion parameters
  • Sharded in 31 files (unlike FeatherQwen2, which has 1,043 shards due to the merging process)
  • Combines Magnum prose with Calam smarts
  • Llamaified for easy use

Technical Specifications

Architecture

  • LlamaForCasualLM
  • Models: Qwen2-72B-Instruct (base), calme2.1-72b, magnum-72b-v1
  • Merged layers: 80
  • Total tensors: 1,043

Tensor Distribution

  • Attention layers: 560 files
  • MLP layers: 240 files
  • Layer norms: 160 files
  • Miscellaneous (embeddings, output): 83 files

Merging

Custom script utilizing safetensors library.

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("leafspark/FeatherLlama-72B-v0.1", 
                                             device_map="auto", 
                                             torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("leafspark/FeatherLlama-72B-v0.1")

GGUFs

Find them here: leafspark/FeatherLlama-72B-v0.1-GGUF

Hardware Requirements

  • Minimum ~140GB of storage
  • ~140GB VRAM