InHUMAN commited on
Commit
4112c84
·
verified ·
1 Parent(s): aa5dd4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -2
README.md CHANGED
@@ -4,8 +4,32 @@ language:
4
  - en
5
  pipeline_tag: text-generation
6
  ---
7
- Model Name: **Maximum-218M**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- First attempt to build GPT from scratch. Used RoPE and GeGLU
10
 
11
 
 
4
  - en
5
  pipeline_tag: text-generation
6
  ---
7
+ # Maximum Language Model (218M)
8
+
9
+ A transformer-based language model inspired by GPT architecture, incorporating RoPE (Rotary Position Embeddings) and GeGLU (Gated Exponential Linear Unit) activations for enhanced performance.
10
+
11
+ ## Model Specifications
12
+
13
+ - **Parameters**: 218M
14
+ - **Training Data**: 3M tokens
15
+ - **Key Features**:
16
+ - RoPE (Rotary Position Embeddings) for better position encoding
17
+ - GeGLU activation function for improved gradient flow
18
+ - Transformer-based architecture
19
+
20
+
21
+ ### Position Embeddings
22
+ The model uses RoPE (Rotary Position Embeddings) instead of traditional positional encodings. RoPE enables:
23
+ - Better relative position modeling
24
+ - Enhanced extrapolation to longer sequences
25
+ - Theoretical backing for position-aware attention
26
+
27
+ ### Activation Function
28
+ GeGLU (Gated Exponential Linear Unit) is used as the activation function, which:
29
+ - Provides better gradient flow during training
30
+ - Combines the benefits of gating mechanisms with ELU's properties
31
+ - Helps mitigate vanishing gradient problems
32
+
33
 
 
34
 
35