Model Card for Model ID

Instruct tuned AuroraGPT-7B model. Created from 2250 iterations (970/epoch) over the IT-v4 dataset (described below).

Usage

This model uses a pretty standart chat interface. Using the supplied tokenizer, you can convert from input messages:

messages = [{"role": "system", "content": <system_prompt>},{"role": "user", "content": <user_prompt>}]

to a chat-string using tokenizer.apply_chat_template(message).

Training Data

Trained on an aggregation of several datasets:

  • open-phi/textbooks
  • open-phi/programming_books_llama
  • openchat/openchat_sharegpt4_dataset
  • nvidia/ChatQA-Training-Data
  • In-house 4o-mini reflect tuned fermi problems
  • In-house 4o-mini reflect tuned theorem QA
  • jeffmeloy/sonnet3.5_science_conversations
  • HuggingFaceH4/ultrachat_200k
  • microsoft/orca-math-word-problems-200k
  • m-a-p/CodeFeedback-Filtered-Instruction
  • teknium/OpenHermes-2.5
  • openbmb/UltraInteract_sft

Training Procedure

Trained on 32 nodes of Polaris supercomputer using pytorch FSDP with Hybrid-shard:

  • LR = 5x10^-5
  • per-gpu batch size = 1
  • Gradient accumulation = 6
  • Global batch size = 768
Downloads last month
78
Safetensors
Model size
5.93B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for argonne-private/AuroraGPT-IT-v4-0125

Quantized
(2)
this model
Quantizations
2 models

Datasets used to train argonne-private/AuroraGPT-IT-v4-0125