--- datasets: - togethercomputer/RedPajama-Data-V2 language: - de pipeline_tag: text-generation library_name: coremltools license: other tags: - coreml - tinyllama - german-language-model --- # LLäMmlein 1B CoreML This repository contains the CoreML version of [LLäMmlein 1B](https://huggingface.co/LSX-UniWue/LLaMmlein_1B), a German language model trained from scratch using the [Tinyllama](https://github.com/jzhang38/TinyLlama) codebase on the German portion of [RedPajama V2](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-V2). ## Model Details - **Model Type**: German Language Model based on TinyLlama architecture - **Language:** German - **Framework**: CoreML - **Original Model:** [LSX-UniWue/LLaMmlein_1B](https://huggingface.co/LSX-UniWue/LLaMmlein_1B) - **Size:** 1B parameters - **Format:** CoreML (.mlpackage) - **Minimum Deployment Target:** iOS 16 - **Compute Units:** ALL (CPU + Neural Engine) - **Input Sequence Length:** 512 tokens ## Conversion Process The model was converted from PyTorch to CoreML using the following steps: ```python import torch import numpy as np from transformers import AutoModelForCausalLM, AutoTokenizer import coremltools as ct # Load model and convert to TorchScript model = AutoModelForCausalLM.from_pretrained("LSX-UniWue/LLaMmlein_1B") tokenizer = AutoTokenizer.from_pretrained("LSX-UniWue/LLaMmlein_1B") # Set model to eval mode model.eval() # Create example input text = "Ein Beispieltext" inputs = tokenizer(text, return_tensors="pt") # Create a wrapper class for tracing class ModelWrapper(torch.nn.Module): def __init__(self, model): super().__init__() self.model = model def forward(self, input_ids): return self.model(input_ids).logits # Wrap and trace model wrapped_model = ModelWrapper(model) traced_model = torch.jit.trace(wrapped_model, inputs.input_ids) # Convert to CoreML model_mlpackage = ct.convert( traced_model, inputs=[ ct.TensorType( name="input_ids", shape=inputs.input_ids.shape, dtype=np.int32 ) ], source="pytorch", minimum_deployment_target=ct.target.iOS16, convert_to="mlprogram", compute_precision=ct.precision.FLOAT16, compute_units=ct.ComputeUnit.ALL, ) model_mlpackage.save("LLaMmlein_1B.mlpackage") ``` ## Usage To use this model on Apple devices: ```swift import CoreML // Load the model let config = MLModelConfiguration() let model = try LLaMmlein_1B(configuration: config) // Prepare input let inputIds = // Your tokenized input as [Int32] // Make prediction let prediction = try model.prediction(input_ids: inputIds) ``` ## Performance Considerations - The model is optimized for Apple Neural Engine - Recommended for iOS 16+ devices - Best performance achieved with batch size of 1 - Maximum sequence length is set to 512 tokens ## Original Model Information The original model was trained on the German portion of RedPajama V2. For more details about the base model: - Visit the [project page](https://www.informatik.uni-wuerzburg.de/datascience/projects/nlp/llammlein/) - Read the [research paper](arxiv.org/abs/2411.11171) - Check the [SuperGLEBer benchmark](https://lsx-uniwue.github.io/SuperGLEBer-site/) for evaluation results ## License This model inherits its license from the original LLäMmlein 1B model. ## Citation If you use this model, please cite the original work: ```bibtex @misc{llammlein2024, title={LLäMmlein: A German Language Model}, author={LSX-UniWue}, year={2024}, publisher={Hugging Face}, journal={Hugging Face Hub}, howpublished={\url{https://huggingface.co/LSX-UniWue/LLaMmlein_1B}}, } ``` For the original model description and evaluation results, see the [original model card](https://huggingface.co/LSX-UniWue/LLaMmlein_1B).