shuyuej commited on
Commit
9b0cb12
1 Parent(s): 3bdf21a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -17,7 +17,9 @@ For real-world deployment, please refer to the [vLLM Distributed Inference and S
17
  > [!NOTE]
18
  > The vLLM version we are using is `0.6.2`. Please check [this version](https://github.com/vllm-project/vllm/releases/tag/v0.6.2).
19
 
20
- vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. By default, it starts the server at `http://localhost:8000`.
 
 
21
  ```shell
22
  vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
23
  --quantization gptq \
@@ -28,7 +30,7 @@ vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
28
  --pipeline-parallel-size 4 \
29
  --api-key token-abc123 \
30
  --enable-lora \
31
- --lora-modules adapter=checkpoint-18640
32
  ```
33
 
34
  Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.
 
17
  > [!NOTE]
18
  > The vLLM version we are using is `0.6.2`. Please check [this version](https://github.com/vllm-project/vllm/releases/tag/v0.6.2).
19
 
20
+ vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API.
21
+ By default, it starts the server at `http://localhost:8000`.
22
+ And please use the vLLM to serve the base model with the LoRA adapter by including the `--enable-lora` flag and specifying `--lora-modules`:
23
  ```shell
24
  vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
25
  --quantization gptq \
 
30
  --pipeline-parallel-size 4 \
31
  --api-key token-abc123 \
32
  --enable-lora \
33
+ --lora-modules adapter=Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ/checkpoint-18640
34
  ```
35
 
36
  Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.