shuyuej commited on
Commit
6007f1a
·
verified ·
1 Parent(s): dd008a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -14
README.md CHANGED
@@ -14,20 +14,6 @@ For real-world deployment, please refer to the [vLLM Distributed Inference and S
14
 
15
  vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. By default, it starts the server at `http://localhost:8000`.
16
  ```shell
17
- vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
18
- --quantization gptq \
19
- --trust-remote-code \
20
- --dtype float16 \
21
- --max-model-len 4096 \
22
- --distributed-executor-backend mp \
23
- --pipeline-parallel-size 4 \
24
- --api-key token-abc123
25
- ```
26
- Please check [here](https://docs.vllm.ai/en/latest/usage/engine_args.html) if you wanna change `Engine Arguments`.
27
-
28
- If you would like to deploy your LoRA adapter, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/usage/lora.html#serving-lora-adapters) for a detailed guide.
29
- It provides step-by-step instructions on how to serve LoRA adapters effectively in a vLLM environment.
30
- ```shell
31
  vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
32
  --quantization gptq \
33
  --trust-remote-code \
 
14
 
15
  vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. By default, it starts the server at `http://localhost:8000`.
16
  ```shell
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
18
  --quantization gptq \
19
  --trust-remote-code \