Triangle104
/

OLMo-2-1124-7B-Q4_K_M-GGUF

GGUF

English

llama-cpp

gguf-my-repo

Inference Endpoints

Model card Files Files and versions Community

Triangle104 commited on Dec 1, 2024

Commit

943ee09

verified ·

1 Parent(s): 6d02d30

Update README.md

Browse files

Files changed (1) hide show

README.md +230 -0

README.md CHANGED Viewed

@@ -15,6 +15,236 @@ tags:
 This model was converted to GGUF format from [`allenai/OLMo-2-1124-7B`](https://huggingface.co/allenai/OLMo-2-1124-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/allenai/OLMo-2-1124-7B) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`allenai/OLMo-2-1124-7B`](https://huggingface.co/allenai/OLMo-2-1124-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/allenai/OLMo-2-1124-7B) for more details on the model.
+---
+Model details:
+-
+We introduce OLMo 2, a new family of 7B and 13B models featuring a
+9-point increase in MMLU, among other evaluation improvements, compared
+to the original OLMo 7B model. These gains come from training on
+OLMo-mix-1124 and Dolmino-mix-1124 datasets and staged training
+approach.
+OLMo is a series of Open Language Models
+ designed to enable the science of language models.
+These models are trained on the Dolma dataset. We are releasing all
+code, checkpoints, logs (coming soon), and associated training details.
+		Installation
+OLMo 2 will be supported in the next version of Transformers, and you need to install it from the main branch using:
+pip install --upgrade git+https://github.com/huggingface/transformers.git
+    Inference
+You can use OLMo with the standard HuggingFace transformers library:
+from transformers import AutoModelForCausalLM, AutoTokenizer
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B")
+tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-1124-7B")
+message = ["Language modeling is "]
+inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
+		optional verifying cuda
+		inputs = {k: v.to('cuda') for k,v in inputs.items()}
+		olmo = olmo.to('cuda')
+response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
+print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
+'Language modeling is  a key component of any text-based application, but its effectiveness...'
+For faster performance, you can quantize the model using the following method:
+AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B",
+    torch_dtype=torch.float16,
+    load_in_8bit=True)  # Requires bitsandbytes
+The quantized model is more sensitive to data
+types and CUDA operations. To avoid potential issues, it's recommended
+to pass the inputs directly to CUDA using:
+inputs.input_ids.to('cuda')
+We have released checkpoints for these models. For pretraining, the
+naming convention is stepXXX-tokensYYYB. For checkpoints with
+ingredients of the soup, the naming convention is
+stage2-ingredientN-stepXXX-tokensYYYB
+To load a specific model revision with HuggingFace, simply add the argument revision:
+olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-1124-7B", revision="step1000-tokens5B")
+Or, you can access all the revisions for the models via the following code snippet:
+from huggingface_hub import list_repo_refs
+out = list_repo_refs("allenai/OLMo-2-1124-7B")
+branches = [b.name for b in out.branches]
+    Fine-tuning
+Model fine-tuning can be done from the final checkpoint (the main
+revision of this model) or many intermediate checkpoints. Two recipes
+for tuning are available.
+Fine-tune with the OLMo repository:
+torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config}
+    --data.paths=[{path_to_data}/input_ids.npy]
+    --data.label_mask_paths=[{path_to_data}/label_mask.npy]
+    --load_path={path_to_checkpoint}
+    --reset_trainer_state
+For more documentation, see the GitHub readme.
+Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are here.
+    Model Description
+Developed by: Allen Institute for AI (Ai2)
+Model type: a Transformer style autoregressive language model.
+Language(s) (NLP): English
+License: The code and model are released under Apache 2.0.
+Contact: Technical inquiries: [email protected]. Press: [email protected]
+Date cutoff: Dec. 2023.
+    Model Sources
+Project Page: https://allenai.org/olmo
+Repositories:
+Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo
+Evaluation code: https://github.com/allenai/OLMo-Eval
+Further fine-tuning code: https://github.com/allenai/open-instruct
+Paper: Coming soon
+    Pretraining
+OLMo 2 7B
+OLMo 2 13B
+Pretraining Stage 1
+(OLMo-Mix-1124)
+4 trillion tokens
+(1 epoch)
+5 trillion tokens
+(1.2 epochs)
+Pretraining Stage 2
+(Dolmino-Mix-1124)
+50B tokens (3 runs)
+merged
+100B tokens (3 runs)
+300B tokens (1 run)
+merged
+Post-training
+(Tulu 3 SFT OLMo mix)
+SFT + DPO + PPO
+(preference mix)
+SFT + DPO + PPO
+(preference mix)
+    Stage 1: Initial Pretraining
+Dataset: OLMo-Mix-1124 (3.9T tokens)
+Coverage: 90%+ of total pretraining budget
+7B Model: ~1 epoch
+13B Model: 1.2 epochs (5T tokens)
+    Stage 2: Fine-tuning
+Dataset: Dolmino-Mix-1124 (843B tokens)
+Three training mixes:
+50B tokens
+100B tokens
+300B tokens
+Mix composition: 50% high-quality data + academic/Q&A/instruction/math content
+    Model Merging
+7B Model: 3 versions trained on 50B mix, merged via model souping
+13B Model: 3 versions on 100B mix + 1 version on 300B mix, merged for final checkpoint
+    Bias, Risks, and Limitations
+Like any base language model or fine-tuned model without safety
+filtering, these models can easily be prompted by users to generate
+harmful and sensitive content. Such content may also be produced
+unintentionally, especially in cases involving bias, so we recommend
+that users consider the risks when applying this technology.
+Additionally, many statements from OLMo or any LLM are often inaccurate,
+ so facts should be verified.
+    Citation
+A technical manuscript is forthcoming!
+    Model Card Contact
+For errors in this model card, contact [email protected].
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)