Spaces:
Running
Running
File size: 3,142 Bytes
a77818e 2732b14 172f670 a77818e c8763bd a77818e a8a6326 a77818e c8763bd a77818e 16a8bbd a77818e 8e30a31 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
title: LLM-Perf Leaderboard
emoji: πποΈ
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 4.26.0
app_file: app.py
pinned: true
license: apache-2.0
tags: [llm perf leaderboard, llm performance leaderboard, llm, performance, leaderboard]
---
# LLM-perf leaderboard
## π About
The π€ LLM-Perf Leaderboard ποΈ is a laderboard at the intersection of quality and performance.
Its aim is to benchmark the performance (latency, throughput, memory & energy)
of Large Language Models (LLMs) with different hardwares, backends and optimizations
using [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark).
Anyone from the community can request a new base model or hardware/backend/optimization
configuration for automated benchmarking:
- Model evaluation requests should be made in the
[π€ Open LLM Leaderboard π
](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) ;
we scrape the [list of canonical base models](https://github.com/huggingface/optimum-benchmark/blob/main/llm_perf/utils.py) from there.
- Hardware/Backend/Optimization configuration requests should be made in the
[π€ LLM-Perf Leaderboard ποΈ](https://huggingface.co/spaces/optimum/llm-perf-leaderboard) or
[Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) repository (where the code is hosted).
## βοΈ Details
- To avoid communication-dependent results, only one GPU is used.
- Score is the average evaluation score obtained from the [π€ Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
- LLMs are running on a singleton batch with a prompt size of 256 and generating a 64 tokens for at least 10 iterations and 10 seconds.
- Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.
- We measure three types of memory: Max Allocated Memory, Max Reserved Memory and Max Used Memory. The first two being reported by PyTorch and the last one being observed using PyNVML.
All of our benchmarks are ran by this single script
[benchmark_cuda_pytorch.py](https://github.com/huggingface/optimum-benchmark/blob/llm-perf/llm-perf/benchmark_cuda_pytorch.py)
using the power of [Optimum-Benhcmark](https://github.com/huggingface/optimum-benchmark) to garantee reproducibility and consistency.
## π How to run locally
To run the LLM-Perf Leaderboard locally on your machine, follow these steps:
### 1. Clone the Repository
First, clone the repository to your local machine:
```bash
git clone https://huggingface.co/spaces/optimum/llm-perf-leaderboard
cd llm-perf-leaderboard
```
### 2. Install the Required Dependencies
Install the necessary Python packages listed in the requirements.txt file:
`pip install -r requirements.txt`
### 3. Run the Application
You can run the Gradio application in one of the following ways:
- Option 1: Using Python
`python app.py`
- Option 2: Using Gradio CLI (include hot-reload)
`gradio app.py`
### 4. Access the Application
Once the application is running, you can access it locally in your web browser at http://127.0.0.1:7860/ |