argonne-private
/

AuroraGPT-IT-v4-0125

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Model Card for Model ID

Instruct tuned AuroraGPT-7B model. Created from 2250 iterations (970/epoch) over the IT-v4 dataset (described below).

Usage

This model uses a pretty standart chat interface. Using the supplied tokenizer, you can convert from input messages:

messages = [{"role": "system", "content": <system_prompt>},{"role": "user", "content": <user_prompt>}]

to a chat-string using tokenizer.apply_chat_template(message).

Training Data

Trained on an aggregation of several datasets:

open-phi/textbooks
open-phi/programming_books_llama
openchat/openchat_sharegpt4_dataset
nvidia/ChatQA-Training-Data
In-house 4o-mini reflect tuned fermi problems
In-house 4o-mini reflect tuned theorem QA
jeffmeloy/sonnet3.5_science_conversations
HuggingFaceH4/ultrachat_200k
microsoft/orca-math-word-problems-200k
m-a-p/CodeFeedback-Filtered-Instruction
teknium/OpenHermes-2.5
openbmb/UltraInteract_sft

Training Procedure

Trained on 32 nodes of Polaris supercomputer using pytorch FSDP with Hybrid-shard:

LR = 5x10^-5
per-gpu batch size = 1
Gradient accumulation = 6
Global batch size = 768

Downloads last month: 78

Safetensors

Model size

5.93B params

Tensor type

FP16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for argonne-private/AuroraGPT-IT-v4-0125

Base model

argonne-private/AuroraGPT-7B

Quantized

(2)

this model

Quantizations

Datasets used to train argonne-private/AuroraGPT-IT-v4-0125