BabyMistral Model Card
Model Overview
BabyMistral is a compact yet powerful language model designed for efficient text generation tasks. Built on the Mistral architecture, this model offers impressive performance despite its relatively small size.
Key Specifications
- Parameters: 1.5 billion
- Training Data: 1.5 trillion tokens
- Architecture: Based on Mistral
- Training Duration: 70 days
- Hardware: 4x NVIDIA A100 GPUs
Model Details
Architecture
BabyMistral utilizes the Mistral AI architecture, which is known for its efficiency and performance. The model scales this architecture to 1.5 billion parameters, striking a balance between capability and computational efficiency.
Training
- Dataset Size: 1.5 trillion tokens
- Training Approach: Trained from scratch
- Hardware: 4x NVIDIA A100 GPUs
- Duration: 70 days of continuous training
Capabilities
BabyMistral is designed for a wide range of natural language processing tasks, including:
- Text completion and generation
- Creative writing assistance
- Dialogue systems
- Question answering
- Language understanding tasks
Usage
Getting Started
To use BabyMistral with the Hugging Face Transformers library:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("OEvortex/BabyMistral")
tokenizer = AutoTokenizer.from_pretrained("OEvortex/BabyMistral")
# Define the chat input
chat = [
# { "role": "system", "content": "You are BabyMistral" },
{ "role": "user", "content": "Hey there! How are you? ๐" }
]
inputs = tokenizer.apply_chat_template(
chat,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate text
outputs = model.generate(
inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.6,
top_p=0.9,
eos_token_id=tokenizer.eos_token_id,
)
response = outputs[0][inputs.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
#I am doing well! How can I assist you today? ๐
Ethical Considerations
While BabyMistral is a powerful tool, users should be aware of its limitations and potential biases:
- The model may reproduce biases present in its training data
- It should not be used as a sole source of factual information
- Generated content should be reviewed for accuracy and appropriateness
Limitations
- May struggle with very specialized or technical domains
- Lacks real-time knowledge beyond its training data
- Potential for generating plausible-sounding but incorrect information
- Downloads last month
- 224
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.