suayptalha
/

minGRULM-base

Text Generation

Model card Files Files and versions Community

minGRULM-base / README.md

suayptalha's picture

Update README.md

714480c verified 27 days ago

|

2.32 kB

	---
	license: apache-2.0
	library_name: transformers
	datasets:
	- roneneldan/TinyStories
	language:
	- en
	tags:
	- custom_code
	- minGRU
	- hf_integration
	---

	# MinGRU Sentiment Analysis

	![minGRU](minGRU.jpg)

	First Hugging Face integration of minGRU models from the paper "[Were RNNs All We Needed?](https://arxiv.org/abs/2410.01201)".

	This model uses GPT-2 tokenizer and trained on roneneldan/TinyStories dataset.

	Note: This is an experimental model. Don't forget to train model before usage!

	Make sure you have installed "[minGRU-pytorch](https://github.com/lucidrains/minGRU-pytorch)" library by running "pip install minGRU-pytorch".

	For modeling and configuration codes: [minGRU-hf](https://github.com/suayptalha/minGRU-hf/tree/main)

	# Training:

	Training code:

	```py
	def train_model(model, tokenizer, train_data, output_dir, epochs=3, batch_size=16, learning_rate=5e-5, block_size=128):
	train_dataset = TinyStoriesDataset(train_data, tokenizer, block_size)
	train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

	optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
	scheduler = get_scheduler("linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=len(train_loader) * epochs)

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)

	model.train()
	for epoch in range(epochs):
	print(f"Epoch {epoch + 1}/{epochs}")
	epoch_loss = 0
	progress_bar = tqdm(train_loader, desc="Training")
	for batch in progress_bar:
	batch = batch.to(device)

	outputs = model(batch, labels=batch)
	loss = outputs.loss

	optimizer.zero_grad()
	loss.backward()
	optimizer.step()
	scheduler.step()

	epoch_loss += loss.item()
	progress_bar.set_postfix(loss=loss.item())

	print(f"Epoch {epoch + 1} Loss: {epoch_loss / len(train_loader)}")

	model.save_pretrained(output_dir, safe_serialization = False)
	tokenizer.save_pretrained(output_dir)
	```

	You can use this code snippet for fine-tuning!

	# Credits:

	https://arxiv.org/abs/2410.01201

	I am thankful to Leo Feng, Frederick Tung, Mohamed Osama Ahmed, Yoshua Bengio and Hossein Hajimirsadeghi for their papers.