I've tried using a bunch of your models, but it always fails. Where can I find a working unlearned model?

#1
by MKhoriaty - opened

Load model directly

from transformers import AutoModel
model = AutoModel.from_pretrained("PhillipGuo/gemma-7b_Unlearning_basketball_Lora128")

OSError: Can't load tokenizer for 'LLM-LAT/zephyr7b-beta-rmu-lat-unlearn-wmdp-bio-cyber'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'LLM-LAT/zephyr7b-beta-rmu-lat-unlearn-wmdp-bio-cyber' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

Apologies about that. We have been focusing on rebuttals recently and haven't had a chance to publicly release our models. Here are some models for which we have edited their answers for "What sport does play?", on only the first 64 rows of this dataset: https://github.com/magikarp01/tasks/blob/master/facts/data/sports.csv (edited to the "inject_sport_without_golf" column)
https://huggingface.co/PhillipGuo/gemma-manual_interp-forget_first_64_unsplit-inject_random_without_golf-run1 (using our interpretability method)
https://huggingface.co/PhillipGuo/gemma-localized_ap-forget_first_64_unsplit-inject_random_without_golf-run1 (using attribution patching)

Here's one which we have edited all the athletes who play Basketball to now say they play Golf: https://huggingface.co/PhillipGuo/gemma-manual_interp-forget_basketball_split-inject_golf-run1, and https://huggingface.co/PhillipGuo/gemma-localized_ct-forget_basketball_split-inject_golf-run1. These are from earlier and the hyperparameters may not have been optimal, sorry. we will finalize these models and the code release after rebuttals. For now, you should just be able to load these models directly with AutoModel.from_pretrained("PhillipGuo/gemma-manual_interp-forget_first_64_unsplit-inject_random_without_golf-run1").

We also have private gemma 2 and llama models (we don't upload every model we train because we have so many different combinations of parameters that it is typically easier to just retrain a model to perform a new evaluation), if you need those I can upload them to huggingface.

Sign up or log in to comment