model is too busy

#23

by tangtang1995 - opened Nov 8, 2024

Discussion

tangtang1995

Nov 8, 2024

Hi, I'm using huggingface_hub InferenceClient for inference, but I always get an error today:

"Model too busy, unable to get response in less than 120 second(s)"

JoseLuisNeves

15 days ago

Same

SteepPepper

7 days ago

•

edited 7 days ago

endless stream!

from huggingface_hub import InferenceClient

client = InferenceClient(api_key="YOUR_HF_TOKEN")

messages = [{"role": "system", "content": SYSTEM_PROMPT}]

stream = client.chat.completions.create(
    model="Qwen/Qwen2.5-72B-Instruct", 
    messages=messages, 

    temperature = 0.1,
    top_p = 0.2,
    presence_penalty = 0.6,
    frequency_penalty = 0.6,
    max_tokens=6144,

    stream=True
)

for chunk in stream:  # this infinity Loop !!!
    print(chunk.choices[0].delta.content)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment