Text Generation
Transformers
PyTorch
Safetensors
English
gpt_refact
code
custom_code
Eval Results
svakhreev katek commited on
Commit
6ce8719
·
1 Parent(s): d03fe92

Move example section higher (#2)

Browse files

- Move example section higher (b56cc3846fdebea2b3ee7c08fe1810795c297198)


Co-authored-by: Kate K <[email protected]>

Files changed (1) hide show
  1. README.md +52 -52
README.md CHANGED
@@ -589,6 +589,58 @@ You can start using it right now by downloading the
589
 
590
  And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
591
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
592
 
593
  # Architecture
594
 
@@ -646,58 +698,6 @@ and to perform well on a wide range of metrics. The best attempt took 40B tokens
646
  The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
647
  code comments. Its performance on non-English languages is lower, for sure.
648
 
649
-
650
- # It Works As a Chat
651
-
652
- The primary application of this model is code completion (infill) in multiple programming languages.
653
- But it works as a chat quite well.
654
-
655
- HumanEval results using instruction following (chat) format, against models specialized for chat only:
656
-
657
- Model | Size | pass@1 | pass@10 |
658
- -----------------------|--------|----------|----------|
659
- <b>Refact-1.6-fim</b> | 1.6b | 38.4% | 55.6% |
660
- StableCode-instruct | 3b | 26.9% | 36.2% |
661
- OctoGeeX | 6b | 44.7% | |
662
- CodeLlama-instruct | 7b | 34.8% | 64.3% |
663
- CodeGen2.5-instruct | 7b | 36.2% | 60.87 |
664
- CodeLlama-instruct | 13b | 42.7% | 71.6% |
665
- StarChat-β | 15b | 33.5% | |
666
- OctoCoder | 15b | 46.2% | |
667
-
668
-
669
- # Example
670
-
671
- Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
672
-
673
- ```python
674
- # pip install -q transformers
675
- from transformers import AutoModelForCausalLM, AutoTokenizer
676
-
677
- checkpoint = "smallcloudai/Refact-1_6B-fim"
678
- device = "cuda" # for GPU usage or "cpu" for CPU usage
679
-
680
- tokenizer = AutoTokenizer.from_pretrained(checkpoint)
681
- model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
682
-
683
- prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
684
-
685
- inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
686
- outputs = model.generate(inputs, max_length=100, temperature=0.2)
687
- print("-"*80)
688
- print(tokenizer.decode(outputs[0]))
689
- ```
690
-
691
- # Chat Format
692
-
693
- The same model works as chat (experimental).
694
-
695
- ```python
696
- prompt_template = "<empty_output>SYSTEM {system}\n" \
697
- "<empty_output>USER {query}\n" \
698
- "<empty_output>ASSISTANT"
699
- prompt = prompt_template.format(system="You are a programming assistant",
700
- query="How do I sort a list in Python?")
701
  ```
702
 
703
  # Model Stats
 
589
 
590
  And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
591
 
592
+ # It Works As a Chat
593
+
594
+ The primary application of this model is code completion (infill) in multiple programming languages.
595
+ But it works as a chat quite well.
596
+
597
+ HumanEval results using instruction following (chat) format, against models specialized for chat only:
598
+
599
+ Model | Size | pass@1 | pass@10 |
600
+ -----------------------|--------|----------|----------|
601
+ <b>Refact-1.6-fim</b> | 1.6b | 38.4% | 55.6% |
602
+ StableCode-instruct | 3b | 26.9% | 36.2% |
603
+ OctoGeeX | 6b | 44.7% | |
604
+ CodeLlama-instruct | 7b | 34.8% | 64.3% |
605
+ CodeGen2.5-instruct | 7b | 36.2% | 60.87 |
606
+ CodeLlama-instruct | 13b | 42.7% | 71.6% |
607
+ StarChat-β | 15b | 33.5% | |
608
+ OctoCoder | 15b | 46.2% | |
609
+
610
+
611
+ # Example
612
+
613
+ Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
614
+
615
+ ```python
616
+ # pip install -q transformers
617
+ from transformers import AutoModelForCausalLM, AutoTokenizer
618
+
619
+ checkpoint = "smallcloudai/Refact-1_6B-fim"
620
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
621
+
622
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
623
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
624
+
625
+ prompt = '<fim_prefix>def print_hello_world():\n """<fim_suffix>\n print("Hello world!")<fim_middle>'
626
+
627
+ inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
628
+ outputs = model.generate(inputs, max_length=100, temperature=0.2)
629
+ print("-"*80)
630
+ print(tokenizer.decode(outputs[0]))
631
+ ```
632
+
633
+ # Chat Format
634
+
635
+ The same model works as chat (experimental).
636
+
637
+ ```python
638
+ prompt_template = "<empty_output>SYSTEM {system}\n" \
639
+ "<empty_output>USER {query}\n" \
640
+ "<empty_output>ASSISTANT"
641
+ prompt = prompt_template.format(system="You are a programming assistant",
642
+ query="How do I sort a list in Python?")
643
+
644
 
645
  # Architecture
646
 
 
698
  The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
699
  code comments. Its performance on non-English languages is lower, for sure.
700
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
701
  ```
702
 
703
  # Model Stats