Zimix commited on
Commit
1e519a7
·
1 Parent(s): e11bdd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -11
README.md CHANGED
@@ -20,20 +20,30 @@ Wenzhong-GPT2-110M is one of the Wenzhong series, which has smaller parameters.
20
 
21
  ### load model
22
  ```python
23
- from transformers import GPT2Tokenizer, GPT2Model
24
- tokenizer = GPT2Tokenizer.from_pretrained('IDEA-CCNL/Wenzhong-GPT2-3.5B')
25
- model = GPT2Model.from_pretrained('IDEA-CCNL/Wenzhong-GPT2-3.5B')
26
- text = "Replace me by any text you'd like."
27
- encoded_input = tokenizer(text, return_tensors='pt')
28
- output = model(**encoded_input)
29
- ```
30
  ### generation
31
  ```python
32
- from transformers import pipeline, set_seed
33
- set_seed(55)
34
- generator = pipeline('text-generation', model='IDEA-CCNL/Wenzhong-GPT2-3.5B')
35
- generator("北京位于", max_length=30, num_return_sequences=1)
 
 
 
 
 
 
 
 
 
 
36
 
 
 
 
37
  ```
38
 
39
  ## Citation
 
20
 
21
  ### load model
22
  ```python
23
+ from transformers import GPT2Tokenizer,GPT2LMHeadModel
24
+ hf_model_path = 'IDEA-CCNL/Wenzhong-GPT2-110M'
25
+ tokenizer = GPT2Tokenizer.from_pretrained(hf_model_path)
26
+ model = GPT2LMHeadModel.from_pretrained(hf_model_path)```
 
 
 
27
  ### generation
28
  ```python
29
+ question = "北京是中国的"
30
+ inputs = tokenizer(question,return_tensors='pt')
31
+ generation_output = model.generate(**inputs,
32
+ return_dict_in_generate=True,
33
+ output_scores=True,
34
+ max_length=150,
35
+ # max_new_tokens=80,
36
+ do_sample=True,
37
+ top_p = 0.6,
38
+ # num_beams=5,
39
+ eos_token_id=50256,
40
+ pad_token_id=0,
41
+ # stopping_criteria = StoppingCriteriaList([custom_stopping(stop_token=50256)]),
42
+ num_return_sequences = 5)
43
 
44
+ for idx,sentence in enumerate(generation_output.sequences):
45
+ print('next sentence %d:\n'%idx,tokenizer.decode(sentence).split('<|endoftext|>')[0])
46
+ print('*'*40)
47
  ```
48
 
49
  ## Citation