File size: 1,154 Bytes
7f1a36e
5610aae
7f1a36e
5610aae
 
 
 
7f1a36e
5610aae
 
8ff94ab
388fbd9
8ff94ab
388fbd9
5610aae
af90776
388fbd9
 
5610aae
af90776
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
library_name: transformers
license: apache-2.0
tags:
- jamba
- mamba
- moe
---



# A experts weights of [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)

Required Weights for Follow-up Research

The original model is **[AI21lab's Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)**, which requires an **A100 80GB GPU**. Unfortunately, this almonst was not available via Google Colab or cloud computing services. Thus, attempts were made to perform **MoE (Mixture of Experts) splitting**, using the following resources as a basis:
- **Original Model:** [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1)
- **MoE Layer Separation**: Consult [this script](https://github.com/TechxGenus/Jamba-utils/blob/main/dense_downcycling.py) and using [TechxGenus/Jamba-v0.1-9B](https://huggingface.co/TechxGenus/Jamba-v0.1-9B).

Check [ai21labs/Jamba-tiny-random](https://huggingface.co/ai21labs/Jamba-tiny-random), which has 128M parameters (instead of 52B), and is initialized with random weights and did not undergo any training.has 128M parameters (instead of 52B), and is initialized with random weights and did not undergo any training.