Jordan Taylor's picture

31

Jordan Taylor

JordanTensor

·

https://sites.google.com/view/jordantensor

AI & ML interests

Mechanistic interpretability, mechanistic anomaly detection, model internals techniques and AI safety techniques generally.

Recent Activity

liked a model 11 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

updated a collection 17 days ago

Obfuscated Backdoors

updated a collection 17 days ago

Obfuscated Backdoors

View all activity

Organizations

Collections 1

models 44

JordanTensor/gemma-sandbagging-mzpd84pf-step1984

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step1968

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step1952

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step1936

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step800

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step400

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step384

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step368

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step352

Updated Dec 11, 2024

JordanTensor/gemma-sandbagging-mzpd84pf-step336

Updated Dec 11, 2024

datasets 3

JordanTensor/sandbagging-sciq

Viewer • Updated Dec 7, 2024 • 13.7k • 49 • 1

JordanTensor/sandbagging-prefixes

Viewer • Updated Dec 7, 2024 • 9.9k • 116 • 1

JordanTensor/bias_in_bios_verified_software_devs_only

Viewer • Updated Oct 9, 2024 • 5.9k • 33