shuyuej
/

Public-Shared-LoRA-for-Llama-3.3-70B-Instruct-GPTQ

Model card Files Files and versions Community

shuyuej commited on Dec 21, 2024

Commit

6007f1a

·

verified ·

1 Parent(s): dd008a9

Update README.md

Files changed (1) hide show

README.md +0 -14

README.md CHANGED Viewed

@@ -14,20 +14,6 @@ For real-world deployment, please refer to the [vLLM Distributed Inference and S
 vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. By default, it starts the server at `http://localhost:8000`.
 ```shell
-vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
-    --quantization gptq \
-    --trust-remote-code \
-    --dtype float16 \
-    --max-model-len 4096 \
-    --distributed-executor-backend mp \
-    --pipeline-parallel-size 4 \
-    --api-key token-abc123
-```
-Please check [here](https://docs.vllm.ai/en/latest/usage/engine_args.html) if you wanna change `Engine Arguments`.
-If you would like to deploy your LoRA adapter, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/usage/lora.html#serving-lora-adapters) for a detailed guide.
-It provides step-by-step instructions on how to serve LoRA adapters effectively in a vLLM environment.
-```shell
 vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --quantization gptq \
     --trust-remote-code \

 vLLM can be deployed as a server that implements the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. By default, it starts the server at `http://localhost:8000`.
 ```shell
 vllm serve shuyuej/Llama-3.3-70B-Instruct-GPTQ \
     --quantization gptq \
     --trust-remote-code \