File size: 4,017 Bytes
b83f9e2 b17df13 b83f9e2 f52ce40 7c75df3 f52ce40 d49be2d ccd7b54 f52ce40 ccd7b54 f52ce40 b83f9e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
base_model: ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1
license: llama3
language:
- tr
- en
tags:
- gguf
- ggml
- llama3
- cosmosllama
- turkish llama
---
# CosmsoLLaMa GGUFs
## Objective
Due to the need for quantized models in real-time applications, we introduce our GGUF formatted models. These models are part of
GGML project with a hope to democratize the use of Large Models. Depending on the quantization type, there are 20+ models.
### Features
* All quantization details are listed on the right by Hugging Face.
* All the models have been tested in `llama.cpp` environments, `llama-cli` and `llama-server`.
* Furthermore, a YouTube video has been made to introduce the basics of using `lmstudio` to utilize these models. 👇
[![lmstudio_yt](https://img.youtube.com/vi/JRID-6sRl7I/0.jpg)](https://www.youtube.com/watch?v=JRID-6sRl7I)
### Code Example
Usage example with `llama-cpp-python`
```py
from llama_cpp import Llama
# Define the inference parameters
inference_params = {
"n_threads": 4,
"n_predict": -1,
"top_k": 40,
"min_p": 0.05,
"top_p": 0.95,
"temp": 0.8,
"repeat_penalty": 1.1,
"input_prefix": "<|start_header_id|>user<|end_header_id|>\\n\\n",
"input_suffix": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n",
"antiprompt": [],
"pre_prompt": "Sen bir yapay zeka asistanısın. Kullanıcı sana bir görev verecek. Amacın görevi olabildiğince sadık bir şekilde tamamlamak.",
"pre_prompt_suffix": "<|eot_id|>",
"pre_prompt_prefix": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\n",
"seed": -1,
"tfs_z": 1,
"typical_p": 1,
"repeat_last_n": 64,
"frequency_penalty": 0,
"presence_penalty": 0,
"n_keep": 0,
"logit_bias": {},
"mirostat": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"memory_f16": True,
"multiline_input": False,
"penalize_nl": True
}
# Initialize the Llama model with the specified inference parameters
llama = Llama.from_pretrained(
repo_id="ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1-GGUF",
filename="*Q4_K.gguf",
verbose=False
)
# Example input
user_input = "Türkiyenin başkenti neresidir?"
# Construct the prompt
prompt = f"{inference_params['pre_prompt_prefix']}{inference_params['pre_prompt']}\n\n{inference_params['input_prefix']}{user_input}{inference_params['input_suffix']}"
# Generate the response
response = llama(prompt)
# Output the response
print(response['choices'][0]['text'])
```
The quantization has been made using `llama.cpp`. As we have seen, this method tends to give the most stable results.
Obviously, we encountered better inference quality for models with the highest bits. However, the inference time tends to be similar between low-bit models.
Each model's memory footprint can be anticipated by the qunatization docs in either [Hugging Face](https://huggingface.co/docs/transformers/main/en/quantization/overview) or [llama.cpp](https://github.com/ggerganov/llama.cpp/tree/master/examples/quantize).
# Acknowledgments
- Research supported with Cloud TPUs from [Google's TensorFlow Research Cloud](https://sites.research.google/trc/about/) (TFRC). Thanks for providing access to the TFRC ❤️
- Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
# Citation
```bibtex
@inproceedings{kesgin2024optimizing,
title={Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training},
author={Kesgin, H Toprak and Yuce, M Kaan and Dogan, Eren and Uzun, M Egemen and Uz, Atahan and {\.I}nce, Elif and Erdem, Yusuf and Shbib, Osama and Zeer, Ahmed and Amasyali, M Fatih},
booktitle={2024 Innovations in Intelligent Systems and Applications Conference (ASYU)},
pages={1--6},
year={2024},
organization={IEEE}
}
```
## Contact
COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department
https://cosmos.yildiz.edu.tr/
[email protected] |