Monet: Mixture of Monosemantic Experts for Transformers

Model Summary

Monet introduces a novel approach for improving mechanistic interpretability in large language models (LLMs) using a Sparse Mixture-of-Experts (SMoE) architecture with 262,144 experts. By integrating sparse dictionary learning directly into end-to-end pretraining, Monet tackles the core issue of polysemanticity—where single neurons encode multiple unrelated concepts—while preserving overall model performance.

Resources and Technical Documentation

Available Checkpoints

Base Models

Model Dataset #Params #Tokens Checkpoint Demo
Monet-VD FineWeb-Edu 850M 100BT monet-vd-850M-100BT-hf
1.4B 100BT monet-vd-1.4B-100BT-hf Viewer
4.1B 100BT monet-vd-4.1B-100BT-hf
StarCoderData 1.4B 100BT codemonet-vd-1.4B-100BT-hf Viewer
Monet-HD FineWeb-Edu 850M 100BT monet-hd-850M-100BT-hf
1.4B 100BT monet-hd-1.4B-100BT-hf
4.1B 100BT monet-hd-4.1B-100BT-hf

Instruction-Tuned Models

Model Purpose Recipe #Params Checkpoint
Monet-VD Chat Completion SmolLM 1.4B monet-vd-1.4B-100BT-chat-hf
Vision-Language Model LLaVA 1.6B visionmonet-vd-1.4B-100BT-hf

Evaluation

Open-Ended LLM Benchmarks

ModelMMLUARCWGPIQASIQAOBQAHSCSQAAvg.
0-shot
Monet-HD 850M0.3200.4600.5060.6990.4160.3640.4650.3370.446
Monet-VD 850M0.3280.4560.5300.7080.4170.3560.4880.3430.453
Monet-HD 1.4B0.3380.4710.5380.7140.4180.3820.5010.3390.463
Monet-VD 1.4B0.3520.4950.5220.7270.4230.4180.5290.3630.478
Monet-HD 4.1B0.3750.5580.5600.7410.4270.4140.5710.3790.503
Monet-VD 4.1B0.3800.5470.5570.7510.4370.4240.6040.3890.511
5-shot
Monet-HD 850M0.3320.5370.5100.6970.4090.3460.4790.4200.466
Monet-VD 850M0.3410.5480.5200.7090.4370.3680.5040.4540.485
Monet-HD 1.4B0.3520.5440.5300.7200.4320.3600.5180.4410.487
Monet-VD 1.4B0.3600.5470.5260.7300.4410.4220.5510.5010.510
Monet-HD 4.1B0.3850.6030.5450.7420.4630.4120.5880.5450.535
Monet-VD 4.1B0.3980.6250.5640.7610.4700.4380.6190.5250.550

Detoxification

Detoxification task performances are evaluated on the Monet-VD 1.4B model.

RealToxicityPrompts

Masking
Threshold
Masking
Ratio
Exp. Max. Toxicity Toxicity Prob. Avg. Perf.
Toxic Non-Toxic Toxic Non-Toxic
0.795 0.269 0.926 0.08 0.478
0.2 1.0% 0.767 0.268 0.909 0.07 0.479
0.1 4.1% 0.657 0.270 0.768 0.08 0.478
0.05 14.4% 0.552 0.256 0.564 0.05 0.467

ToxiGen

Masking
Threshold
Masking
Ratio
RoBERTa Score Avg. Perf.
Hate Neutral
0.642 0.035 0.478
0.2 1.4% 0.643 0.033 0.478
0.1 5.4% 0.504 0.028 0.473
0.05 15.0% 0.430 0.027 0.455

Examples

Text Generation

from transformers import pipeline

model_name = "MonetLLM/monet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
print(pipe("The key to life is", max_new_tokens=20, do_sample=True)[0]["generated_text"])

Code Generation

from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = '''
def print_len(x: str):
    """For a given string x, print the length of x."""
'''
print(pipe(text, max_new_tokens=10)[0]["generated_text"].split("\n\n")[0])

Chat Completion

from transformers import pipeline

model_name = "MonetLLM/codemonet-vd-1.4B-100BT-chat-hf"
pipe = pipeline(
    "text-generation",
    model_name,
    tokenizer=AutoTokenizer.from_pretrained(model_name),
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

text = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hi! How are you?"}],
    add_generation_prompt=True,
    tokenize=False,
)
print(pipe(text, max_new_tokens=30, do_sample=True)[0]["generated_text"])

Using vLLM

The custom implementation of vLLM is provided in the repository.

from vllm import LLM, ModelRegistry, SamplingParams
from modeling_monet_vllm import MonetForCausalLM

# Register Monet architecture with vLLM
ModelRegistry.register_model("MonetForCausalLM", MonetForCausalLM)

model = LLM(
    "MonetLLM/monet-vd-1.4B-100BT-hf",
    trust_remote_code=True,
    dtype="bfloat16",
    gpu_memory_utilization=0.8
)
sampling_params = SamplingParams(max_tokens=20, temperature=1.0)
print(model.generate("The key to life is", sampling_params)[0].outputs[0].text)

Training

Model

  • Architecture: Monet
  • Pretraining tokens: 100B
  • Precision: bfloat16

Hardware

Software

Intended Use

Primary Intended Uses

This model is designed to advance research on language models and serve as a foundational component for generative AI-driven functionalities. Its primary applications, mostly in English, include:

  • Mechanistic interpretability research for language models
  • Text generation with enhanced interpretability
  • Code generation (CodeMonet variant)
  • Chat completion (instruction-tuned variant)
  • Vision-language tasks (VisionMonet variant)

Out-of-Scope Uses

This model has not been explicitly developed or tested for all potential downstream applications. Therefore:

  1. Limitations & Mitigations: Developers should be mindful of common language model limitations, and thoroughly evaluate and mitigate risks regarding accuracy, safety, and fairness—especially in high-stakes or high-risk scenarios.
  2. Legal & Regulatory Compliance: Developers must comply with any applicable laws and regulations (e.g., privacy, trade compliance), taking into account the model’s English-focused training (refer to FineWeb-Edu).
  3. No License Modification: Nothing in this Model Card modifies or restricts the license under which this model is released.
  4. Unsupported Programming Languages: Programming in languages not covered by StarCoderData(CodeMonet variant) is not within the model’s intended scope.

Model Architecture

Monet introduces a novel Mixture-of-Experts (MoE) architecture with several key innovations:

  • Parameter-efficient expert decomposition: overall parameter count grows in proportion to the square root of the number of experts
  • Fine-grained expert specialization: offers clear insight into model behavior
  • Precise manipulation of knowledge: enables control over domain knowledge, programming language capabilities, and toxicity level.

Ethical Considerations

Transparency

  • Designed specifically for enhanced interpretability
  • Enables understanding of internal model behavior
  • Allows tracking of knowledge attribution

Control

  • Supports toxicity mitigation
  • Enables domain-specific knowledge control
  • Maintains performance while adjusting behavior

License and Usage

Monet is licensed under the Apache 2.0 license. The model is primarily intended for research and educational use. Important licensing notes:

  • Instruction-tuned models have been fine-tuned using a dataset mix with outputs generated from third party models
  • Research and educational use is encouraged
  • Commercial use is subject to Apache 2.0 license terms

Citation

@article{park2024monet,
      title={{Monet: Mixture of Monosemantic Experts for Transformers}}, 
      author={Jungwoo Park and Young Jin Ahn and Kee-Eung Kim and Jaewoo Kang},
      journal={arXiv preprint arXiv:2404.05567},
      year={2024}
}
Downloads last month
2
Safetensors
Model size
1.56B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.