RichardErkhov commited on
Commit
7dd91f2
·
verified ·
1 Parent(s): 88b7e18

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +148 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ gemma2-2b-fraud - GGUF
11
+ - Model creator: https://huggingface.co/jslin09/
12
+ - Original model: https://huggingface.co/jslin09/gemma2-2b-fraud/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [gemma2-2b-fraud.Q2_K.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q2_K.gguf) | Q2_K | 1.15GB |
18
+ | [gemma2-2b-fraud.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q3_K_S.gguf) | Q3_K_S | 1.27GB |
19
+ | [gemma2-2b-fraud.Q3_K.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q3_K.gguf) | Q3_K | 1.36GB |
20
+ | [gemma2-2b-fraud.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q3_K_M.gguf) | Q3_K_M | 1.36GB |
21
+ | [gemma2-2b-fraud.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q3_K_L.gguf) | Q3_K_L | 1.44GB |
22
+ | [gemma2-2b-fraud.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.IQ4_XS.gguf) | IQ4_XS | 1.47GB |
23
+ | [gemma2-2b-fraud.Q4_0.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q4_0.gguf) | Q4_0 | 1.52GB |
24
+ | [gemma2-2b-fraud.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.IQ4_NL.gguf) | IQ4_NL | 1.53GB |
25
+ | [gemma2-2b-fraud.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q4_K_S.gguf) | Q4_K_S | 1.53GB |
26
+ | [gemma2-2b-fraud.Q4_K.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q4_K.gguf) | Q4_K | 1.59GB |
27
+ | [gemma2-2b-fraud.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q4_K_M.gguf) | Q4_K_M | 1.59GB |
28
+ | [gemma2-2b-fraud.Q4_1.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q4_1.gguf) | Q4_1 | 1.64GB |
29
+ | [gemma2-2b-fraud.Q5_0.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q5_0.gguf) | Q5_0 | 1.75GB |
30
+ | [gemma2-2b-fraud.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q5_K_S.gguf) | Q5_K_S | 1.75GB |
31
+ | [gemma2-2b-fraud.Q5_K.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q5_K.gguf) | Q5_K | 1.79GB |
32
+ | [gemma2-2b-fraud.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q5_K_M.gguf) | Q5_K_M | 1.79GB |
33
+ | [gemma2-2b-fraud.Q5_1.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q5_1.gguf) | Q5_1 | 1.87GB |
34
+ | [gemma2-2b-fraud.Q6_K.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q6_K.gguf) | Q6_K | 2.0GB |
35
+ | [gemma2-2b-fraud.Q8_0.gguf](https://huggingface.co/RichardErkhov/jslin09_-_gemma2-2b-fraud-gguf/blob/main/gemma2-2b-fraud.Q8_0.gguf) | Q8_0 | 2.59GB |
36
+
37
+
38
+
39
+
40
+ Original model description:
41
+ ---
42
+ license: gemma
43
+ datasets:
44
+ - jslin09/Fraud_Case_Verdicts
45
+ language:
46
+ - zh
47
+ base_model:
48
+ - google/gemma-2-2b
49
+ pipeline_tag: text-generation
50
+ text-generation:
51
+ parameters:
52
+ max_length: 400
53
+ max_new_tokens: 400
54
+ do_sample: true
55
+ temperature: 0.75
56
+ top_k: 50
57
+ top_p: 0.9
58
+ tags:
59
+ - legal
60
+ widget:
61
+ - text: 王大明意圖為自己不法所有,基於竊盜之犯意,
62
+ example_title: 生成竊盜罪之犯罪事實
63
+ - text: 騙人布意圖為自己不法所有,基於詐欺取財之犯意,
64
+ example_title: 生成詐欺罪之犯罪事實
65
+ - text: 梅友乾明知其無資力支付酒店消費,亦無付款意願,竟意圖為自己不法之所有,
66
+ example_title: 生成吃霸王餐之詐欺犯罪事實
67
+ - text: 闕很大明知金融帳戶之存摺、提款卡及密碼係供自己使用之重要理財工具,
68
+ example_title: 生成賣帳戶幫助詐欺犯罪事實
69
+ - text: 通訊王明知近來盛行以虛設、租賃、借用或買賣行動電話人頭門號之方式,供詐騙集團作為詐欺他人交付財物等不法用途,
70
+ example_title: 生成賣電話SIM卡之幫助詐欺犯罪事實
71
+ - text: 趙甲王基於行使偽造特種文書及詐欺取財之犯意,
72
+ example_title: 偽造特種文書(契約、車牌等)詐財
73
+ library_name: transformers
74
+ ---
75
+ # 判決書「犯罪事實」欄草稿自動生成
76
+ 本模型是以司法院公開之「詐欺」案件判決書做成之資料集,基於 [Google Gemma2:2b](https://huggingface.co/google/gemma-2-2b) 模型進行微調訓練,可以自動生成詐欺及竊盜案件之犯罪事實段落之草稿。資料集之資料範圍從100年1月1日至110年12��31日,所蒐集到的原始資料共有 74823 篇(判決以及裁定),我們只取判決書的「犯罪事實」欄位內容,並把這原始的資料分成三份,用於訓練的資料集有59858篇,約佔原始資料的80%,剩下的20%,則是各分配10%給驗證集(7482篇),10%給測試集(7483篇)。在本網頁進行測試時,請在模型載入完畢並生成第一小句後,持續按下Compute按鈕,就能持續生成文字。或是輸入自己想要測試的資料到文字框中進行測試。或是可以到[這裡](https://huggingface.co/spaces/jslin09/legal_document_drafting)有更完整的使用體驗。
77
+
78
+ # 使用範例
79
+ 如果要在自己的程式中調用本模型,可以參考下列的 Python 程式碼,藉由呼叫 API 的方式來生成刑事判決書「犯罪事實」欄的內容。
80
+ <details>
81
+ <summary> 點擊後展開 </summary>
82
+ <pre>
83
+ <code>
84
+ import requests, json
85
+ from time import sleep
86
+ from tqdm.auto import tqdm, trange
87
+
88
+ API_URL = "https://api-inference.huggingface.co/models/jslin09/gemma2-2b-fraud"
89
+ API_TOKEN = 'XXXXXXXXXXXXXXX' # 調用模型的 API token
90
+ headers = {"Authorization": f"Bearer {API_TOKEN}"}
91
+
92
+ def query(payload):
93
+ response = requests.post(API_URL, headers=headers, json=payload)
94
+ return json.loads(response.content.decode("utf-8"))
95
+
96
+ prompt = "森上梅前明知其無資力支付酒店消費,亦無付款意願,竟意圖為自己不法之所有,"
97
+ query_dict = {
98
+ "inputs": prompt,
99
+ }
100
+ text_len = 300
101
+ t = trange(text_len, desc= '生成例稿', leave=True)
102
+ for i in t:
103
+ response = query(query_dict)
104
+ try:
105
+ response_text = response[0]['generated_text']
106
+ query_dict["inputs"] = response_text
107
+ t.set_description(f"{i}: {response[0]['generated_text']}")
108
+ t.refresh()
109
+ except KeyError:
110
+ sleep(30) # 如果伺服器太忙無回應,等30秒後再試。
111
+ pass
112
+ print(response[0]['generated_text'])
113
+ </code>
114
+ </pre>
115
+ </details>
116
+
117
+ 或是,你要使用 transformers 套件來實作你的程式,將本模型下載至你本地端的電腦中執行,可以參考下列程式碼:
118
+ <details>
119
+ <summary> 點擊後展開 </summary>
120
+ <pre>
121
+ <code>
122
+ # Load model directly
123
+ from transformers import AutoTokenizer, AutoModelForCausalLM
124
+
125
+ tokenizer = AutoTokenizer.from_pretrained("jslin09/gemma2-2b-fraud")
126
+ model = AutoModelForCausalLM.from_pretrained("jslin09/gemma2-2b-fraud")
127
+
128
+ </code>
129
+ </pre>
130
+ </details>
131
+
132
+ # 致謝
133
+ 微調本模型所需要的算力,是由[評律網](https://www.pingluweb.com.tw/)提供 NVIDIA H100。特此致謝。
134
+
135
+ # 引文訊息
136
+
137
+ ```
138
+ @misc{lin2024legal,
139
+ title={Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model},
140
+ author={Chun-Hsien Lin and Pu-Jen Cheng},
141
+ year={2024},
142
+ eprint={2406.04202},
143
+ archivePrefix={arXiv},
144
+ primaryClass={cs.CL}
145
+ url = {https://arxiv.org/abs/2406.04202}
146
+ }
147
+ ```
148
+