Buck Shlegeris's picture

1

Buck Shlegeris

bshlgrs

·

bshlgrs

AI & ML interests

None yet

Recent Activity

authored a paper about 1 month ago

Alignment faking in large language models

authored a paper about 1 year ago

AI Control: Improving Safety Despite Intentional Subversion

authored a paper about 1 year ago

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

View all activity

Organizations

bshlgrs's activity

No public activity