Does whisper-large-v3 work on Sagemaker?
I've been trying to deploy on Sagemaker and can't seem to get it to work once deployed.
I keep getting this error:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Wrong index found for \u003c|0.02|\u003e: should be None but found 50366."
I can't find much about this error anywhere, but was wondering if it had to do with a transformers version problem.
Here's the code I'm using:
hub = {
'HF_MODEL_ID':'openai/whisper-large-v3',
'HF_TASK':'automatic-speech-recognition'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.26.0',
pytorch_version='1.13.1',
py_version='py39',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
audio_serializer = DataSerializer(content_type='audio/x-audio')
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.g4dn.xlarge', # ec2 instance type
serializer=audio_serializer
)
I've been banging my head against this same problem all week. As best I can tell, the "Deploy this model using SageMaker SDK" instructions are incorrect.
In particular, it seems the AWS Deep Learning Containers only support up to transformers
version 4.26.0
, which is too low.
I've been following these guides and deploying a model.tar.gz
that essentially consists of only an code/inference.py
and code/requirements.txt
file so that I can force using transformers==4.36.2
, which does seem to work
https://github.com/aws/sagemaker-huggingface-inference-toolkit#-user-defined-codemodules
https://aws.amazon.com/blogs/machine-learning/hugging-face-on-amazon-sagemaker-bring-your-own-scripts-and-data/
The inference.py
pretty much just follows the getting started instructions, looks more or less like
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "openai/whisper-large-v3"
task = "automatic-speech-recognition"
def model_fn(model_dir):
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
return pipeline(
task,
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
return_timestamps=True,
torch_dtype=torch_dtype,
device=device,
)
And actually now that I'm thinking about it, you may be able to just get away with having the requirements.txt
file and setting the HF_MODEL_ID
and HF_TASK
environment variables...
Wow, this is great! Thank you so much for posting this. I didn't know you could do this.
thank you so much for your support
I have written this for a seamless deployment https://dev.to/mohalbakerkaw/deploying-openais-whisper-large-v3-model-on-sagemaker-using-hugging-face-libraries-hlh
can you support on how can i pass generate_kwargs
the goal is to have the task as transcribe
i don't want all my transcripts to be in English
https://huggingface.co/openai/whisper-large-v3/discussions/71
they talk about the issue in this discussion
but not sure how to deal with it on sagemaker