File size: 2,848 Bytes
cc36528
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4fd505e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: apache-2.0
language:
- en
tags:
- mechanistic interpretability
- sparse autoencoder
- llama
- llama-3
---

## Model Information

A SAE (Sparse Autoencoder) for [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct).

It is trained specifically on layer 9 of Llama 3.2 1B and achieves a final L0 of 63 during training. 

This model is used to decompose Llama's activations into interpretable features.

The SAE weights are released under Apache, however Llama 3.2 1B is to be used under Meta's Llama 3.2 License.

## How to use

A Jupyter Notebook is provided to test the model

<a target="_blank" href="https://colab.research.google.com/github/qrsch/SAE/blob/main/SAE.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab", width="200px"/>
</a>

## Training

Our SAE was trained using [LMSYS-Chat-1M dataset](https://arxiv.org/pdf/2309.11998), on a single RTX 3090. The training script will be provided soon in the following repository: https://github.com/qrsch/SAE

## Acknowledgements

This release wouldn't have been possible without the work of [Goodfire](https://www.goodfire.ai/) and [Anthropic](https://transformer-circuits.pub/)

```
                                       .x+=:.                                                             
                                      z`    ^%                                                  .uef^"    
               .u    .                   .   <k                           .u    .             :d88E       
    .u@u     .d88B :@8c       .u       .@8Ned8"      .u          u      .d88B :@8c        .   `888E       
 .zWF8888bx ="8888f8888r   ud8888.   .@^%8888"    ud8888.     us888u.  ="8888f8888r  .udR88N   888E .z8k  
.888  9888    4888>'88"  :888'8888. x88:  `)8b. :888'8888. .@88 "8888"   4888>'88"  <888'888k  888E~?888L 
I888  9888    4888> '    d888 '88%" 8888N=*8888 d888 '88%" 9888  9888    4888> '    9888 'Y"   888E  888E 
I888  9888    4888>      8888.+"     %8"    R88 8888.+"    9888  9888    4888>      9888       888E  888E 
I888  9888   .d888L .+   8888L        @8Wou 9%  8888L      9888  9888   .d888L .+   9888       888E  888E 
`888Nx?888   ^"8888*"    '8888c. .+ .888888P`   '8888c. .+ 9888  9888   ^"8888*"    ?8888u../  888E  888E 
 "88" '888      "Y"       "88888%   `   ^"F      "88888%   "888*""888"     "Y"       "8888P'  m888N= 888> 
       88E                  "YP'                   "YP'     ^Y"   ^Y'                  "P'     `Y"   888  
       98>                                                                                          J88"  
       '8                                                                                           @%    
        `                                                                                         :"      
```