model is too busy
#23
by
tangtang1995
- opened
Hi, I'm using huggingface_hub InferenceClient for inference, but I always get an error today:
"Model too busy, unable to get response in less than 120 second(s)"
Same
endless stream!
from huggingface_hub import InferenceClient
client = InferenceClient(api_key="YOUR_HF_TOKEN")
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
stream = client.chat.completions.create(
model="Qwen/Qwen2.5-72B-Instruct",
messages=messages,
temperature = 0.1,
top_p = 0.2,
presence_penalty = 0.6,
frequency_penalty = 0.6,
max_tokens=6144,
stream=True
)
for chunk in stream: # this infinity Loop !!!
print(chunk.choices[0].delta.content)