load model
Hello, I would like to know how to load and use the model, considering the different attention design?
From config.json
it is based on BertForMaskedLM
, and can we load directly with BertForMaskedLM.from_pretrained('magicslabnu/OutEffHop_bert_base')?
Thanks for your message, we will upload the model file to Hugging Face next week. And you can use the model file from Hugging Face in a short period. However, for now, you still can reproduce the result. For example, as we mentioned the Attention is a special case of the Hopfield, and BERT model is based on the Attention Architecture, so you can simply change the Vanilla SoftMax to Softmax_1 in BERT model, then you get the OutEffHop version of the BERT model. After that you can reproduce your results from the Hugging Face checkpoints (load the model from the Hugging Face with the changed Architecture). If you have more question, welcome to directly contact me with [email protected]
I think you can use the code like this to replace the layer from vanilla version to ours.
if model_args.model_name_or_path:
torch_dtype = (
model_args.torch_dtype
if model_args.torch_dtype in ["auto", None]
else getattr(torch, model_args.torch_dtype)
)
model = AutoModelForMaskedLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
token=model_args.token,
trust_remote_code=model_args.trust_remote_code,
torch_dtype=torch_dtype,
low_cpu_mem_usage=model_args.low_cpu_mem_usage,
)
else:
logger.info("Training new model from scratch")
model = AutoModelForMaskedLM.from_config(config, trust_remote_code=model_args.trust_remote_code)
# >> replace Self-attention module with ours
# NOTE: currently assumes BERT
for layer_idx in range(len(model.bert.encoder.layer)):
old_self = model.bert.encoder.layer[layer_idx].attention.self
print("----------------------------------------------------------")
print("Inside BERT custom attention")
print("----------------------------------------------------------")
new_self = BertUnpadSelfAttentionWithExtras(
config,
position_embedding_type=None,
softmax_fn=SOFTMAX_MAPPING["softmax1"],
ssm_eps=None,
tau=None,
max_seq_length=data_args.max_seq_length,
skip_attn=False,
fine_tuning=False,
)
# copy loaded weights
if model_args.model_name_or_path is not None:
new_self.load_state_dict(old_self.state_dict(), strict=False)
model.bert.encoder.layer[layer_idx].attention.self = new_self
print(model)