changed use_flash_attention_2=True to attn_implementation="flash_attention_2"
#53
by
macadeliccc
- opened
I receive this warning when using use_flash_attention_2=True
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2"
instead.
Using attn_implementation="flash_attention_2" alleviates the warning message
model = AutoModelForSpeechSeq2Seq.from_pretrained(
model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation="flash_attention_2"
)
That will only work for the people who use the latest release. Let's keep it that way for now.