kirv
/

Mistral-7b-tokens4b-v1

Model card Files Files and versions Community

Mistral-7b-tokens4b-v1 / README.md

kirv's picture

Update README.md

b28d10d verified 8 months ago

|

history blame contribute delete

808 Bytes

	---
	language:
	- ru
	- en
	license: apache-2.0
	library_name: torchtune
	base_model:
	- mistralai/Mistral-7B-v0.1
	datasets:
	- IlyaGusev/rulm
	---

	Модель mistralai/Mistral-7B-v0.1, обучение всех слоев с ~4млрд токенов из датасета.

	130 часов 2xTesla H100.

	```
	batch_size: 20
	epochs: 1
	optimizer:
	_component_: torch.optim.AdamW
	lr: 5e-6
	weight_decay: 0.01
	loss:
	_component_: torch.nn.CrossEntropyLoss
	max_steps_per_epoch: null
	gradient_accumulation_steps: 5
	```
	Размер последовательности 1024 токенов.

	Loss curve
	![image/png](https://huggingface.co/kirv/Mistral-7b-tokens4b-v1/resolve/main/loss.png?download=true)

	По https://github.com/NLP-Core-Team/mmlu_ru

	Квантизация в 4b: accuracy_total=41.86218134391028