File size: 3,207 Bytes
cb5548c
1451659
cb5548c
 
 
 
 
 
 
 
 
 
 
1451659
cb5548c
1451659
cb5548c
1451659
cb5548c
1451659
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
base_model: Qwen/Qwen2.5-0.5B-Instruct
tags:
- text-generation-inference
- transformers
- unsloth
- qwen2
- trl
license: apache-2.0
language:
- en
---

![Header](https://raw.githubusercontent.com/Aayan-Mishra/Images/refs/heads/main/Athena.png)

# Athena-1 0.5B:

Athena-1 0.5B is a fine-tuned, instruction-following large language model derived from [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct). Designed for ultra-lightweight applications, Athena-1 0.5B balances compactness with robust performance, making it suitable for tasks with limited computational resources.

---

## Key Features

### ⚡ Ultra-Lightweight and Efficient

*   **Compact Size:** With just **500 million parameters**, Athena-1 0.5B is ideal for edge devices and low-resource environments.
*   **Instruction Following:** Fine-tuned for reliable adherence to user instructions.
*   **Coding and Mathematics:** Capable of handling basic coding and mathematical tasks.

### 📖 Contextual Understanding

*   **Context Length:** Supports up to **16,384 tokens**, enabling processing of moderately sized conversations or documents.
*   **Token Generation:** Can generate up to **4K tokens** of coherent output.

### 🌍 Multilingual Support

*   Supports **20+ languages**, including:
    *   English, Chinese, French, Spanish, German, Italian, Russian
    *   Japanese, Korean, Vietnamese, Thai, and more.

### 📊 Structured Data & Outputs

*   **Structured Data Interpretation:** Handles formats like tables and JSON effectively.
*   **Structured Output Generation:** Produces well-formatted outputs for data-specific tasks.

---

## Model Details

*   **Base Model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct)
*   **Architecture:** Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
*   **Parameters:** 500M total.
*   **Layers:** (Adjust if different from the base model)
*   **Attention Heads:** (Adjust if different from the base model)
*   **Context Length:** Up to **16,384 tokens**.

---

## Applications

Athena-1 0.5B is optimized for:

*   **Conversational AI:** Power lightweight and responsive chatbots.
*   **Code Assistance:** Basic code generation, debugging, and explanations.
*   **Mathematical Assistance:** Solves fundamental math problems.
*   **Document Processing:** Summarizes and analyzes smaller documents effectively.
*   **Multilingual Tasks:** Supports global use cases with a compact model.
*   **Structured Data:** Reads and generates structured formats like JSON and tables.

---

## Quickstart

Here’s how you can use Athena-1 0.5B for quick text generation:

```python
# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "What can you do?"},
]
pipe = pipeline("text-generation", model="Spestly/Athena-1-0.5B") # Update model name
print(pipe(messages))

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-0.5B") # Update model name
model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-0.5B") # Update model name
```