Error deploying Aguila on AWS SageMaker
Hello!
I'm trying to deploy this model on AWS SageMaker by following the steps provided in the documentation. However, I'm encountering some errors during the endpoint creation process. I've double-checked my configurations, but the issues persist.
If anyone has experience deploying this model on AWS SageMaker or any insights into resolving similar errors, I'd greatly appreciate your help. Thanks in advance for any assistance you can offer!
Errors:
Error: DownloadError
utils.convert_files(local_pt_files, local_st_files)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in convert_files
convert_file(pt_file, sf_file)
File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 62, in convert_file
save_file(pt_state, str(sf_file), metadata={
"format": "pt"
})
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 232, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 394, in _flatten
raise RuntimeError(
And the following error:
RuntimeError:
Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'transformer.h.6.mlp.dense_4h_to_h.weight', 'transformer.h.26.mlp.dense_4h_to_h.weight', 'transformer.h.4.self_attention.query_key_value.weight', 'transformer.h.22.mlp.dense_h_to_4h.weight', 'transformer.h.23.mlp.dense_h_to_4h.weight', 'transformer.h.5.mlp.dense_h_to_4h.weight', 'transformer.h.25.mlp.dense_h_to_4h.weight', 'transformer.h.25.mlp.dense_4h_to_h.weight', 'transformer.h.0.mlp.dense_4h_to_h.weight', 'transformer.h.11.mlp.dense_h_to_4h.weight', 'transformer.h.29.self_attention.dense.weight', 'transformer.h.24.self_attention.query_key_value.weight', 'transformer.h.24.mlp.dense_h_to_4h.weight', 'transformer.h.14.mlp.dense_4h_to_h.weight', 'transformer.h.1.self_attention.dense.weight', 'transformer.h.13.mlp.dense_4h_to_h.weight', 'transformer.h.8.self_attention.query_key_value.weight', 'transformer.h.20.self_attention.query_key_value.weight', 'transformer.h.27.mlp.dense_h_to_4h.weight', 'transformer.h.22.self_attention.query_key_value.weight', 'transformer.h.11.self_attention.query_key_value.weight', 'transformer.h.23.self_attention.query_key_value.weight', 'transformer.h.13.self_attention.dense.weight', 'transformer.h.15.mlp.dense_h_to_4h.weight', 'transformer.h.9.mlp.dense_h_to_4h.weight', 'transformer.h.15.self_attention.query_key_value.weight', 'transformer.h.24.mlp.dense_4h_to_h.weight', 'transformer.h.31.self_attention.query_key_value.weight', 'transformer.h.7.self_attention.dense.weight', 'transformer.h.27.self_attention.query_key_value.weight', 'transformer.h.1.mlp.dense_h_to_4h.weight', 'transformer.h.21.mlp.dense_4h_to_h.weight', 'transformer.h.24.self_attention.dense.weight', 'transformer.h.16.mlp.dense_4h_to_h.weight', 'transformer.h.20.mlp.dense_4h_to_h.weight', 'transformer.h.27.self_attention.dense.weight', 'transformer.h.4.mlp.dense_4h_to_h.weight', 'transformer.h.3.mlp.dense_h_to_4h.weight', 'transformer.h.25.self_attention.dense.weight', 'transformer.h.7.mlp.dense_4h_to_h.weight', 'transformer.h.17.self_attention.query_key_value.weight', 'transformer.h.19.self_attention.dense.weight', 'transformer.h.12.self_attention.query_key_value.weight', 'transformer.h.3.self_attention.dense.weight', 'transformer.h.28.mlp.dense_h_to_4h.weight', 'transformer.h.19.mlp.dense_h_to_4h.weight', 'transformer.h.20.self_attention.dense.weight', 'transformer.h.14.self_attention.query_key_value.weight', 'transformer.h.21.mlp.dense_h_to_4h.weight', 'transformer.h.12.mlp.dense_h_to_4h.weight', 'transformer.h.29.mlp.dense_h_to_4h.weight', 'transformer.h.6.mlp.dense_h_to_4h.weight', 'transformer.h.14.mlp.dense_h_to_4h.weight', 'transformer.h.30.self_attention.dense.weight', 'transformer.h.10.self_attention.query_key_value.weight', 'transformer.h.6.self_attention.query_key_value.weight', 'transformer.h.10.mlp.dense_4h_to_h.weight', 'transformer.h.23.mlp.dense_4h_to_h.weight', 'transformer.h.21.self_attention.query_key_value.weight', 'transformer.h.30.self_attention.query_key_value.weight', 'transformer.h.8.mlp.dense_h_to_4h.weight', 'transformer.h.30.mlp.dense_h_to_4h.weight', 'transformer.h.18.self_attention.query_key_value.weight', 'transformer.h.5.mlp.dense_4h_to_h.weight', 'transformer.h.15.mlp.dense_4h_to_h.weight', 'transformer.h.26.self_attention.dense.weight', 'transformer.h.9.self_attention.query_key_value.weight', 'transformer.h.17.mlp.dense_4h_to_h.weight', 'transformer.h.10.mlp.dense_h_to_4h.weight', 'transformer.h.6.self_attention.dense.weight', 'transformer.h.2.mlp.dense_4h_to_h.weight', 'transformer.h.5.self_attention.dense.weight', 'transformer.h.9.mlp.dense_4h_to_h.weight', 'transformer.h.3.mlp.dense_4h_to_h.weight', 'transformer.h.17.mlp.dense_h_to_4h.weight', 'transformer.h.27.mlp.dense_4h_to_h.weight', 'transformer.h.29.self_attention.query_key_value.weight', 'transformer.h.5.self_attention.query_key_value.weight', 'transformer.h.11.self_attention.dense.weight', 'transformer.h.19.mlp.dense_4h_to_h.weight', 'transformer.h.16.mlp.dense_h_to_4h.weight', 'transformer.h.8.mlp.dense_4h_to_h.weight', 'transformer.h.30.mlp.dense_4h_to_h.weight', 'transformer.h.31.mlp.dense_h_to_4h.weight', 'transformer.h.1.mlp.dense_4h_to_h.weight', 'transformer.h.28.self_attention.dense.weight', 'transformer.h.22.mlp.dense_4h_to_h.weight', 'transformer.h.31.self_attention.dense.weight', 'transformer.h.4.mlp.dense_h_to_4h.weight', 'transformer.h.19.self_attention.query_key_value.weight', 'transformer.h.0.self_attention.dense.weight', 'transformer.h.1.self_attention.query_key_value.weight', 'transformer.h.17.self_attention.dense.weight', 'transformer.h.18.self_attention.dense.weight', 'transformer.h.23.self_attention.dense.weight', 'transformer.h.28.self_attention.query_key_value.weight', 'transformer.h.12.mlp.dense_4h_to_h.weight', 'transformer.h.16.self_attention.query_key_value.weight', 'transformer.h.22.self_attention.dense.weight', 'transformer.h.18.mlp.dense_4h_to_h.weight', 'transformer.h.2.self_attention.query_key_value.weight', 'transformer.h.18.mlp.dense_h_to_4h.weight', 'transformer.h.8.self_attention.dense.weight', 'transformer.h.12.self_attention.dense.weight', 'transformer.h.29.mlp.dense_4h_to_h.weight', 'transformer.h.10.self_attention.dense.weight', 'transformer.h.26.mlp.dense_h_to_4h.weight', 'transformer.h.31.mlp.dense_4h_to_h.weight', 'transformer.h.3.self_attention.query_key_value.weight', 'transformer.h.16.self_attention.dense.weight', 'transformer.h.9.self_attention.dense.weight', 'transformer.h.21.self_attention.dense.weight', 'transformer.h.0.self_attention.query_key_value.weight', 'transformer.h.28.mlp.dense_4h_to_h.weight', 'transformer.word_embeddings.weight', 'transformer.h.0.mlp.dense_h_to_4h.weight', 'transformer.h.4.self_attention.dense.weight', 'transformer.h.13.self_attention.query_key_value.weight', 'transformer.h.7.mlp.dense_h_to_4h.weight', 'transformer.h.2.mlp.dense_h_to_4h.weight', 'transformer.h.7.self_attention.query_key_value.weight', 'transformer.h.11.mlp.dense_4h_to_h.weight', 'transformer.h.2.self_attention.dense.weight', 'transformer.h.13.mlp.dense_h_to_4h.weight', 'transformer.h.14.self_attention.dense.weight', 'transformer.h.15.self_attention.dense.weight', 'transformer.h.25.self_attention.query_key_value.weight', 'transformer.h.26.self_attention.query_key_value.weight', 'transformer.h.20.mlp.dense_h_to_4h.weight'}, {'transformer.h.22.input_layernorm.bias', 'transformer.h.17.input_layernorm.bias', 'transformer.h.20.input_layernorm.weight', 'transformer.h.20.input_layernorm.bias', 'transformer.h.1.input_layernorm.bias', 'transformer.h.18.input_layernorm.bias', 'transformer.h.28.input_layernorm.bias', 'transformer.h.7.input_layernorm.bias', 'transformer.h.5.input_layernorm.weight', 'transformer.h.8.input_layernorm.weight', 'transformer.h.0.input_layernorm.weight', 'transformer.h.9.input_layernorm.bias', 'transformer.h.12.input_layernorm.weight', 'transformer.h.19.input_layernorm.weight', 'transformer.h.30.input_layernorm.bias', 'transformer.h.31.input_layernorm.weight', 'transformer.h.6.input_layernorm.bias', 'transformer.h.7.input_layernorm.weight', 'transformer.h.6.input_layernorm.weight', 'transformer.ln_f.weight', 'transformer.h.5.input_layernorm.bias', 'transformer.h.13.input_layernorm.weight', 'transformer.h.13.input_layernorm.bias', 'transformer.h.30.input_layernorm.weight', 'transformer.h.19.input_layernorm.bias', 'transformer.h.18.input_layernorm.weight', 'transformer.h.16.input_layernorm.bias', 'transformer.h.27.input_layernorm.bias', 'transformer.h.21.input_layernorm.weight', 'transformer.h.14.input_layernorm.weight', 'transformer.h.16.input_layernorm.weight', 'transformer.h.10.input_layernorm.bias', 'transformer.h.25.input_layernorm.bias', 'transformer.h.23.input_layernorm.bias', 'transformer.h.29.input_layernorm.weight', 'transformer.h.11.input_layernorm.weight', 'transformer.h.3.input_layernorm.bias', 'transformer.ln_f.bias', 'transformer.h.22.input_layernorm.weight', 'transformer.h.28.input_layernorm.weight', 'transformer.h.0.input_layernorm.bias', 'transformer.h.1.input_layernorm.weight', 'transformer.h.14.input_layernorm.bias', 'transformer.h.24.input_layernorm.bias', 'transformer.h.8.input_layernorm.bias', 'transformer.h.21.input_layernorm.bias', 'transformer.h.10.input_layernorm.weight', 'transformer.h.12.input_layernorm.bias', 'transformer.h.27.input_layernorm.weight', 'transformer.h.31.input_layernorm.bias', 'transformer.h.11.input_layernorm.bias', 'transformer.h.23.input_layernorm.weight', 'transformer.h.26.input_layernorm.bias', 'transformer.h.29.input_layernorm.bias', 'transformer.h.15.input_layernorm.bias', 'transformer.h.2.input_layernorm.weight', 'transformer.h.2.input_layernorm.bias', 'transformer.h.24.input_layernorm.weight', 'transformer.h.4.input_layernorm.weight', 'transformer.h.26.input_layernorm.weight', 'transformer.h.15.input_layernorm.weight', 'transformer.h.4.input_layernorm.bias', 'transformer.h.9.input_layernorm.weight', 'transformer.h.25.input_layernorm.weight', 'transformer.h.3.input_layernorm.weight', 'transformer.h.17.input_layernorm.weight'}].
A potential way to correctly save your model is to use `save_model`.
More information at https://huggingface.co/docs/safetensors/torch_shared_tensors
Hi!
The error is related to safetensors, We have uploaded safetensors, but we still having some trouble with the text-generation-inference container, so it might still fail.
Sorry for the inconvenience, We hope to fix it soon.
Hi!
Everything should work now :-)
We have tested with v0.9.3 of the text-generation-inference.
Sorry for the delay,
Best regards,
Joan