changed use_flash_attention_2=True to attn_implementation="flash_attention_2"

#53
by macadeliccc - opened

I receive this warning when using use_flash_attention_2=True

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.

Using attn_implementation="flash_attention_2" alleviates the warning message

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, attn_implementation="flash_attention_2"
)

That will only work for the people who use the latest release. Let's keep it that way for now.

Cannot merge
This branch has merge conflicts in the following files:
  • README.md

Sign up or log in to comment