File size: 2,315 Bytes
3c583d7
 
 
eb4e607
 
fbc1404
 
 
 
 
 
eb4e607
 
 
 
 
 
 
 
 
 
 
 
4982aca
eb4e607
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
714480c
eb4e607
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: apache-2.0
library_name: transformers
datasets:
- roneneldan/TinyStories
language:
- en
tags:
- custom_code
- minGRU
- hf_integration
---

# MinGRU Sentiment Analysis

![minGRU](minGRU.jpg)

First Hugging Face integration of minGRU models from the paper "[**Were RNNs All We Needed?**](https://arxiv.org/abs/2410.01201)".

This model uses GPT-2 tokenizer and trained on roneneldan/TinyStories dataset.

**Note: This is an experimental model. Don't forget to train model before usage!**

Make sure you have installed "[**minGRU-pytorch**](https://github.com/lucidrains/minGRU-pytorch)" library by running "pip install minGRU-pytorch".

For modeling and configuration codes: [**minGRU-hf**](https://github.com/suayptalha/minGRU-hf/tree/main)

# Training:

Training code:

```py
def train_model(model, tokenizer, train_data, output_dir, epochs=3, batch_size=16, learning_rate=5e-5, block_size=128):
    train_dataset = TinyStoriesDataset(train_data, tokenizer, block_size)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
    scheduler = get_scheduler("linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=len(train_loader) * epochs)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    model.train()
    for epoch in range(epochs):
        print(f"Epoch {epoch + 1}/{epochs}")
        epoch_loss = 0
        progress_bar = tqdm(train_loader, desc="Training")
        for batch in progress_bar:
            batch = batch.to(device)

            outputs = model(batch, labels=batch)
            loss = outputs.loss

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            scheduler.step()

            epoch_loss += loss.item()
            progress_bar.set_postfix(loss=loss.item())

        print(f"Epoch {epoch + 1} Loss: {epoch_loss / len(train_loader)}")

    model.save_pretrained(output_dir, safe_serialization = False)
    tokenizer.save_pretrained(output_dir)
```

You can use this code snippet for fine-tuning!

# Credits:

https://arxiv.org/abs/2410.01201

I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.