Monet: Mixture of Monosemantic Experts for Transformers
Model Summary
Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance.
Resources and Technical Documentation
- GitHub Repository: https://github.com/dmis-lab/Monet
- Paper: https://arxiv.org/abs/2412.04139
- Model Hub: https://huggingface.co/MonetLLM
- Demo: https://huggingface.co/spaces/MonetLLM/monet-vd-1.4B-100BT-hf-viewer
Available Checkpoints
Base Models
Model | Dataset | #Params | #Tokens | Checkpoint | Demo |
Monet-VD | FineWeb-Edu | 850M | 100BT | monet-vd-850M-100BT-hf | |
1.4B | 100BT | monet-vd-1.4B-100BT-hf | Viewer | ||
4.1B | 100BT | monet-vd-4.1B-100BT-hf | |||
StarCoderData | 1.4B | 100BT | codemonet-vd-1.4B-100BT-hf | Viewer | |
Monet-HD | FineWeb-Edu | 850M | 100BT | monet-hd-850M-100BT-hf | |
1.4B | 100BT | monet-hd-1.4B-100BT-hf | |||
4.1B | 100BT | monet-hd-4.1B-100BT-hf |
Instruction-Tuned Models
Model | Purpose | Recipe | #Params | Checkpoint |
Monet-VD | Chat Completion | SmolLM | 1.4B | monet-vd-1.4B-100BT-chat-hf |
Vision-Language Model | LLaVA | 1.6B | visionmonet-vd-1.4B-100BT-hf |
Evaluation
Open-Ended LLM Benchmarks
Model | MMLU | ARC | WG | PIQA | SIQA | OBQA | HS | CSQA | Avg. |
---|---|---|---|---|---|---|---|---|---|
0-shot | |||||||||
Monet-HD 850M | 0.320 | 0.460 | 0.506 | 0.699 | 0.416 | 0.364 | 0.465 | 0.337 | 0.446 |
Monet-VD 850M | 0.328 | 0.456 | 0.530 | 0.708 | 0.417 | 0.356 | 0.488 | 0.343 | 0.453 |
Monet-HD 1.4B | 0.338 | 0.471 | 0.538 | 0.714 | 0.418 | 0.382 | 0.501 | 0.339 | 0.463 |
Monet-VD 1.4B | 0.352 | 0.495 | 0.522 | 0.727 | 0.423 | 0.418 | 0.529 | 0.363 | 0.478 |
Monet-HD 4.1B | 0.375 | 0.558 | 0.560 | 0.741 | 0.427 | 0.414 | 0.571 | 0.379 | 0.503 |
Monet-VD 4.1B | 0.380 | 0.547 | 0.557 | 0.751 | 0.437 | 0.424 | 0.604 | 0.389 | 0.511 |
5-shot | |||||||||
Monet-HD 850M | 0.332 | 0.537 | 0.510 | 0.697 | 0.409 | 0.346 | 0.479 | 0.420 | 0.466 |
Monet-VD 850M | 0.341 | 0.548 | 0.520 | 0.709 | 0.437 | 0.368 | 0.504 | 0.454 | 0.485 |
Monet-HD 1.4B | 0.352 | 0.544 | 0.530 | 0.720 | 0.432 | 0.360 | 0.518 | 0.441 | 0.487 |
Monet-VD 1.4B | 0.360 | 0.547 | 0.526 | 0.730 | 0.441 | 0.422 | 0.551 | 0.501 | 0.510 |
Monet-HD 4.1B | 0.385 | 0.603 | 0.545 | 0.742 | 0.463 | 0.412 | 0.588 | 0.545 | 0.535 |
Monet-VD 4.1B | 0.398 | 0.625 | 0.564 | 0.761 | 0.470 | 0.438 | 0.619 | 0.525 | 0.550 |
Detoxification
Detoxification task performances are evaluated on the Monet-VD 1.4B model.
RealToxicityPrompts
Masking Threshold |
Masking Ratio |
Exp. Max. Toxicity | Toxicity Prob. | Avg. Perf. | ||
---|---|---|---|---|---|---|
Toxic | Non-Toxic | Toxic | Non-Toxic | |||
– | – | 0.795 | 0.269 | 0.926 | 0.08 | 0.478 |
0.2 | 1.0% | 0.767 | 0.268 | 0.909 | 0.07 | 0.479 |
0.1 | 4.1% | 0.657 | 0.270 | 0.768 | 0.08 | 0.478 |
0.05 | 14.4% | 0.552 | 0.256 | 0.564 | 0.05 | 0.467 |
ToxiGen
Masking Threshold |
Masking Ratio |
RoBERTa Score | Avg. Perf. | |
---|---|---|---|---|
Hate | Neutral | |||
– | – | 0.642 | 0.035 | 0.478 |
0.2 | 1.4% | 0.643 | 0.033 | 0.478 |
0.1 | 5.4% | 0.504 | 0.028 | 0.473 |
0.05 | 15.0% | 0.430 | 0.027 | 0.455 |
Examples
Text Generation
from transformers import pipeline
model_name = "MonetLLM/monet-vd-1.4B-100BT-hf"
pipe = pipeline(
"text-generation",
model_name,
tokenizer=AutoTokenizer.from_pretrained(model_name),
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"])
Code Generation
from transformers import pipeline
model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf"
pipe = pipeline(
"text-generation",
model_name,
tokenizer=AutoTokenizer.from_pretrained(model_name),
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
text = '''
def print_len(x: str):
"""For a given string x, print the length of x."""
'''
print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0])
Chat Completion
from transformers import pipeline
model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf"
pipe = pipeline(
"text-generation",
model_name,
tokenizer=AutoTokenizer.from_pretrained(model_name),
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
text = tokenizer.apply_chat_template(
[{"role": "user", "content": "Hi! How are you?"}],
add_generation_prompt=True,
tokenize=False,
)
print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"])
Using vLLM
The custom implementation of vLLM is provided in the repository.
from vllm import LLM, ModelRegistry, SamplingParams
from modeling_monet_vllm import MonetForCausalLM
# Register Monet architecture with vLLM
ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM)
model = LLM(
"MonetLLM/monet-vd-1.4B-100BT-hf",
trust_remote_code=True,
dtype="bfloat16",
gpu_memory_utilization=0.8
)
sampling_params = SamplingParams(max_tokens=20, temperature=1.0)
print(model.generate("The key to life is", sampling_params)[0].outputs[0].text)
Training
Model
- Architecture: Monet
- Pretraining tokens: 100B
- Precision: bfloat16
Hardware
- TPUs: TPU-v4-64 Pod Slice (supported by TRC Program)
Software
Intended Use
Primary Intended Uses
This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include:
- Mechanistic interpretability research for language models
- Text generation with enhanced interpretability
- Code generation (CodeMonet variant)
- Chat completion (instruction-tuned variant)
- Vision-language tasks (VisionMonet variant)
Out-of-Scope Uses
This model has not been explicitly developed or tested for all potential downstream applications. Therefore:
- Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios.
- Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to FineWeb-Edu).
- No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released.
- Unsupported Programming Languages: Programming in languages not covered by StarCoderData(CodeMonet variant) is not within the model’s intended scope.
Model Architecture
Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations:
- Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts
- Fine-grained expert specialization: offers clear insight into model behavior
- Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level.
Ethical Considerations
Transparency
- Designed specifically for enhanced interpretability
- Enables understanding of internal model behavior
- Allows tracking of knowledge attribution
Control
- Supports toxicity mitigation
- Enables domain-specific knowledge control
- Maintains performance while adjusting behavior
License and Usage
Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes:
- Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models
- Research and educational use is encouraged
- Commercial use is subject to Apache 2.0 license terms
Citation
@article{park2024monet,
title={{Monet: Mixture of Monosemantic Experts for Transformers}},
author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang},
journal={arXiv preprint arXiv:2404.05567},
year={2024}
}
- Downloads last month
- 71