Buck Shlegeris
bshlgrs
·
AI & ML interests
None yet
Recent Activity
authored
a paper
about 1 month ago
Alignment faking in large language models
authored
a paper
about 1 year ago
AI Control: Improving Safety Despite Intentional Subversion
authored
a paper
about 1 year ago
Interpretability in the Wild: a Circuit for Indirect Object
Identification in GPT-2 small
Organizations
bshlgrs's activity
No public activity