Crystalcareai
/

Quiet-Star-Custom

Text Generation

Model card Files Files and versions Community

Crystalcareai commited on Apr 3, 2024

Commit

1d25390

·

verified ·

1 Parent(s): 974e6b8

Update modeling_quiet.py

Files changed (1) hide show

modeling_quiet.py +1 -1

modeling_quiet.py CHANGED Viewed

@@ -100,7 +100,7 @@ def _prepare_4d_causal_attention_mask_for_sdpa(attention_mask, input_shape, inpu
             # - if the model is a decoder, apply a causal mask in addition to the padding mask
             # - if the model is an encoder, make the mask broadcastable to [batch_size, num_heads, seq_length, seq_length]
             if past_key_values_length > 0:
-                attention_mask = attention_mask.to(dtype=torch.long)
                 attention_mask = attention_mask[:, past_key_values_length:]
             expanded_attn_mask = attention_mask[:, None, None, :]
             combined_attention_mask = expanded_attn_mask

             # - if the model is a decoder, apply a causal mask in addition to the padding mask
             # - if the model is an encoder, make the mask broadcastable to [batch_size, num_heads, seq_length, seq_length]
             if past_key_values_length > 0:
+                attention_mask = attention_mask.to(dtype=torch.bfloat16)
                 attention_mask = attention_mask[:, past_key_values_length:]
             expanded_attn_mask = attention_mask[:, None, None, :]
             combined_attention_mask = expanded_attn_mask