Model Card for SmolLM-135M-de

A german version of HuggingFaceTB/SmolLM-135M, trained to speak German by applying CPT for about 6 billion tokens.

If you are looking for a chat model, try this fine tune or the corresponding adapter model.

Model Details

Model Description

The base model is HuggingFaceTB/SmolLM-135M, which I further trained on about 6 billion German-language tokens.

  • Model type: Large Language Model (Llama architecture)
  • Language(s) (NLP): German
  • License: Apache 2.0
  • Finetuned from model: HuggingFaceTB/SmolLM-135M

Uses

I mainly made this as a small experimentation model to quickly benchmark datasets etc. - since the model is so small, I am unsure about its usefulness for any real-world scenarios.

This is a base model without any chat fine tuning etc. and thus should not be used as-is. It outputs mostly correct German, which is what I tried to achieve.

If you are looking for a chat model, try this adapter.

Bias, Risks, and Limitations

This is a very small model and will output blatantly wrong information. I have not done any further filtering on the source datasets, so it is possible that the model will generate lewd or otherwise inappropriate content. Use with care.

I would strongly recommend against using this model in a production setting, at least without further fine tuning and preference optimization.

How to Get Started with the Model

Use the code below to get started with the model.

# adapted from the original SmolLM repo
# pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "LemiSt/SmolLM-135M-de"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Rezept für einen leckeren veganen Schokokuchen:\n", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

Training Procedure

This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048 with an effective batch size of 512, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler. Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.

Downloads last month
836
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for LemiSt/SmolLM-135M-de

Finetuned
(34)
this model
Adapters
1 model
Finetunes
1 model

Datasets used to train LemiSt/SmolLM-135M-de