Exact parameters to match https://chat.lmsys.org/
I downloaded vicuna-13b-v1.5 and ran it locally. I noticed that the performance (accuracy) seems worse than when I tried vicuna-13b at https://chat.lmsys.org/. Maybe it's due to generation parameter setting. At https://chat.lmsys.org/ only three parameters were shown: temperature, top_p, and max_new_tokens. Is it possible to get a full list of parameters as used by https://chat.lmsys.org/ so that the locally implemented version matches the performance? Thanks!
how do you run vicuna-13b locally. use fastchat to make sure setting is correct.
https://github.com/lm-sys/FastChat#model-weights
I'm in a unique situation: one laptop with internet connection but no CUDA, and one workstation with CUDA but no internet. So what I did was first use fastchat in the laptop:
python -m fastchat.serve.cli --model-path lmsys/vicuna-13b-v1.5
This downloaded the weights and tried to initiate a chat, but then would stop and report an error of no CUDA.
Then I uploaded the downloaded weights to the workstation, and used hugging face transformer there to run the model.
There I did find that when using generate parameters temperature=0.7, top_p=1 (which amounts to not using top_p I think) did give better results than greedy search. However, it's still worse than when I tried it at chat.lmsys.org.