shuyuej
/

Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ

Model card Files Files and versions Community

shuyuej commited on 16 days ago

Commit

9b0cb12

•

1 Parent(s): 3bdf21a

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -17,7 +17,9 @@ For real-world deployment, please refer to the [vLLM Distributed Inference and S
 > [!NOTE]
 > The vLLM version we are using is `0.6.2`. Please check [this version](https://github.com/vllm-project/vllm/releases/tag/v0.6.2).
-vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. By default, it starts the server at `http://localhost:8000`.
 ```shell
 vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --quantization gptq \
@@ -28,7 +30,7 @@ vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --pipeline-parallel-size 4 \
     --api-key token-abc123 \
     --enable-lora \
-    --lora-modules adapter=checkpoint-18640
 ```
 Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.

 > [!NOTE]
 > The vLLM version we are using is `0.6.2`. Please check [this version](https://github.com/vllm-project/vllm/releases/tag/v0.6.2).
+vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API.
+By default, it starts the server at `http://localhost:8000`.
+And please use the vLLM to serve the base model with the LoRA adapter by including the `--enable-lora` flag and specifying `--lora-modules`:
 ```shell
 vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --quantization gptq \
     --pipeline-parallel-size 4 \
     --api-key token-abc123 \
     --enable-lora \
+    --lora-modules adapter=Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ/checkpoint-18640
 ```
 Since this server is compatible with OpenAI API, you can use it as a drop-in replacement for any applications using OpenAI API.