model is too busy

#23
by tangtang1995 - opened

Hi, I'm using huggingface_hub InferenceClient for inference, but I always get an error today:

"Model too busy, unable to get response in less than 120 second(s)"

endless stream!

from huggingface_hub import InferenceClient

client = InferenceClient(api_key="YOUR_HF_TOKEN")

messages = [{"role": "system", "content": SYSTEM_PROMPT}]

stream = client.chat.completions.create(
    model="Qwen/Qwen2.5-72B-Instruct", 
    messages=messages, 

    temperature = 0.1,
    top_p = 0.2,
    presence_penalty = 0.6,
    frequency_penalty = 0.6,
    max_tokens=6144,

    stream=True
)

for chunk in stream:  # this infinity Loop !!!
    print(chunk.choices[0].delta.content)

Sign up or log in to comment