12.5 tokens/second on 16 GB RAM with 4 GB VRAM GPU Nvidia GTX 1050 Ti

#5
by JLouisBiz - opened

While I cannot personally see how much different
is the bling-phi-3.5 versus phi-3.5-mini,
I am now using it and will be reporting here. So
far I am getting 12.5 tokens/second on 16 GB RAM
with 4 GB VRAM GPU Nvidia GTX 1050 Ti, and I am satisfied.

I have been testing Qwen-1.5B and
rocked-3B which work faster for some other
tasks.

This model based on Phi-3.5-mini works well for
summaries which was important to me to run it
locally and on the low end hardware.

For now I am satisfied 😊 while I am still looking
πŸ” for differences to original Phi-3.5-mini πŸ“πŸŽ―.

Sign up or log in to comment