FrankC0st1e commited on
Commit
49e4794
·
1 Parent(s): 6c769fa

add vllm inference example

Browse files
Files changed (1) hide show
  1. README.md +26 -3
README.md CHANGED
@@ -18,11 +18,11 @@ MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of
18
 
19
  Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to []() for usage guidelines.
20
 
21
- MiniCPM3-4B has a 32k context window. Equipped with LLMxMapreduce, MiniCPM3-4B can handle infinite contexts theoretically, without requiring huge amount of memory.
22
 
23
  ## Usage
 
24
  ```python
25
-
26
  from transformers import AutoModelForCausalLM, AutoTokenizer
27
  import torch
28
 
@@ -42,7 +42,7 @@ model_outputs = model.generate(
42
  max_new_tokens=1024,
43
  top_p=0.7,
44
  temperature=0.7,
45
- repetition_penalty=1.02
46
  )
47
 
48
  output_token_ids = [
@@ -53,6 +53,29 @@ responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0
53
  print(responses)
54
  ```
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Evaluation Results
57
 
58
  <table>
 
18
 
19
  Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to []() for usage guidelines.
20
 
21
+ MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite contexts theoretically, without requiring huge amount of memory.
22
 
23
  ## Usage
24
+ ### Inference with Transformers
25
  ```python
 
26
  from transformers import AutoModelForCausalLM, AutoTokenizer
27
  import torch
28
 
 
42
  max_new_tokens=1024,
43
  top_p=0.7,
44
  temperature=0.7,
45
+ repetition_penalty=1.02
46
  )
47
 
48
  output_token_ids = [
 
53
  print(responses)
54
  ```
55
 
56
+ ### Inference with [vLLM](https://github.com/vllm-project/vllm)
57
+ ```python
58
+ from transformers import AutoTokenizer
59
+ from vllm import LLM, SamplingParams
60
+
61
+ model_name = "openbmb/MiniCPM3-4B"
62
+ prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]
63
+
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
65
+ input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
66
+
67
+ llm = LLM(
68
+ model=model_name,
69
+ trust_remote_code=True,
70
+ tensor_parallel_size=1
71
+ )
72
+ sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
73
+
74
+ outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
75
+
76
+ print(outputs[0].outputs[0].text)
77
+ ```
78
+
79
  ## Evaluation Results
80
 
81
  <table>