webbigdata
/

C3TR-Adapter

@@ -644,6 +644,119 @@ Muzan: "Is there anything else you want to say?"
 Wakuraba: "This guy is going to be killed too. Everything depends on this guy's mood. I'm going to die too."
 ```
 ## 留意事項 Attention
 このアダプターをモデルとマージして保存すると性能が下がってしまう不具��が存在するため、**ベースモデル(unsloth/gemma-2-9b-it-bnb-4bit)とアダプターをマージして保存しないでください**

 Wakuraba: "This guy is going to be killed too. Everything depends on this guy's mood. I'm going to die too."
 ```
+## SpeedUp Sample
+unslothを使う事で精度をわずかに犠牲にして実行速度を上げる事ができます。
+Using unsloth can increase execution speed at the expense of a small amount of accuracy.
+```
+pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
+pip install transformers==4.43.3
+pip install bitsandbytes==0.43.3
+pip install accelerate==0.33.0
+pip install peft==0.12.0
+pip install flash-attn --no-build-isolation
+pip install --upgrade pip
+python -m pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
+pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"
+```
+```
+import time
+import torch
+max_seq_length = 2048
+load_in_4bit = True
+dtype=torch.bfloat16
+from unsloth import FastLanguageModel
+adp_name =  "webbigdata/C3TR-Adapter"
+from transformers import TextStreamer
+model_name = "unsloth/gemma-2-9b-it"
+import os
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+model, tokenizer = FastLanguageModel.from_pretrained(
+    adp_name,
+    max_seq_length = max_seq_length,
+    dtype = dtype,
+    load_in_4bit = load_in_4bit,
+)
+FastLanguageModel.for_inference(model)
+def trans(instruction,  input):
+    system =  """You are a highly skilled professional Japanese-English and English-Japanese translator. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. Only when the subject is specified in the Japanese sentence, the subject will be added when translating into English. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. After completing the translation, review it once more to check for errors or unnatural expressions. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating."""
+    prompt = f"""{system}
+<start_of_turn>### Instruction:
+{instruction}
+### Input:
+{input}
+<end_of_turn>
+<start_of_turn>### Response:
+"""
+    inputs = tokenizer(prompt, return_tensors="pt",
+        padding=True, max_length=2400, truncation=True).to("cuda")
+    from transformers import TextStreamer
+    class CountingStreamer(TextStreamer):
+        def __init__(self, tokenizer):
+            super().__init__(tokenizer)
+            self.tokenizer = tokenizer
+            self.token_count = 0
+        def put(self, text):
+            self.token_count += len(self.tokenizer.encode(text, add_special_tokens=False))
+            super().put(text)
+        def put(self, text):
+            if isinstance(text, torch.Tensor):
+                self.token_count += text.shape[-1]
+            elif isinstance(text, list):
+                self.token_count += len(text)
+            elif isinstance(text, str):
+                self.token_count += len(self.tokenizer.encode(text, add_special_tokens=False))
+            else:
+                raise TypeError(f"Unexpected type for text: {type(text)}")
+            super().put(text)
+    counting_streamer = CountingStreamer(tokenizer)
+    start_time = time.time()
+    _ = model.generate(**inputs, streamer = counting_streamer, max_new_tokens=2400,
+         #min_length=1000,
+         early_stopping=False)
+    end_time = time.time()
+    elapsed_time = end_time - start_time
+    generated_tokens = counting_streamer.token_count
+    tokens_per_second = generated_tokens / elapsed_time
+    print(f"generated_tokens: {generated_tokens}")
+    print(f"elapsed_time: {elapsed_time}")
+    tokens_per_second = generated_tokens / elapsed_time if elapsed_time > 0 else 0
+    print(f"トークン生成速度: {tokens_per_second:.2f} トークン/秒")
+    return tokens_per_second
+tokens_per_second  = trans("Translate English to Japanese.\nWhen translating, please use the following hints:\n[writing_style: journalistic]",
+"""Tech war: China narrows AI gap with US despite chip restrictions
+China is narrowing the artificial intelligence (AI) gap with the US through rapid progress in deploying applications and state-backed adoption of the technology, despite the lack of access to advanced chips, according to industry experts and analysts.
+""")
+```
 ## 留意事項 Attention
 このアダプターをモデルとマージして保存すると性能が下がってしまう不具��が存在するため、**ベースモデル(unsloth/gemma-2-9b-it-bnb-4bit)とアダプターをマージして保存しないでください**