English

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

This repository contains sparse autoencoders trained to analyze the internal representations of the Llama 3.1 8B Instruct model. The autoencoders are trained on the residual stream activations when processing code-related instruction data.

We apply these specialized, lightweight SAEs on a coding task in our blog post Sieve.

Model Details

  • Model Type: TopK Sparse Autoencoder
  • Base Model: Llama 3.1 8B Instruct
  • Training Data: 1B tokens of code data from:
    • StackOverflow Python dataset
    • Tested-143k Python Alpaca dataset
  • Architecture: Linear encoder-decoder with ReLU and TopK activation (k=64, 512)
  • File Format: PyTorch .pt files containing:
    • W_enc_DF: Encoder weight matrix
    • b_enc_F: Encoder bias vector
    • W_dec_FD: Decoder weight matrix
    • b_dec_D: Decoder bias vector

Usage

The autoencoders can be used to analyze and interpret the internal representations formed by Llama 3.1 8B Instruct when processing code. Since these autoencoders are trained on a very specific sub data mixture, they are not recommended for general purpose. They can be used to reproduce the result of Sieve evaluation for Llama 3.1 8B Instruct.

Example usage can be found in the Sieve repo

Training Details

  • Training Data Size: 1B tokens
  • Domain: Python code and code-related instructions
  • Target: Residual stream activations from Llama 3.1 8B Instruct from layers 8, 10, and 12
  • Compute: Around 9 A100 hours

License

MIT

Citation

If you use these models in your research, please cite:

@article{karvonen2024sieve,
    title={Sieve: SAEs Beat Baselines on a Real-World Task (A Code Generation Case Study)},
    author={Karvonen, Adam and Pai, Dhruv and Wang, Mason and Keigwin, Ben},
    journal={Tilde Research Blog},
    year={2024},
    month={12},
    url={https://www.tilderesearch.com/blog/sieve},
    note={Blog post}
}
Downloads last month
36
Inference API
Unable to determine this model's library. Check the docs .

Model tree for tilde-research/sieve_coding

Finetuned
(609)
this model

Datasets used to train tilde-research/sieve_coding