124M-0.4 / README.md
Ambuj Varshney
Update README.md
c14a27d verified
metadata
license: apache-2.0
datasets:
  - HuggingFaceFW/fineweb
language:
  - en
library_name: transformers
tags:
  - IoT
  - sensor
  - embedded

TinyLLM

Overview

This repository hosts a small language model developed as part of the TinyLLM framework ([arxiv link]). These models are specifically designed and fine-tuned with sensor data to support embedded sensing applications. They enable locally hosted language models on low-computing-power devices, such as single-board computers. The models, based on the GPT-2 architecture, are trained using Nvidia's H100 GPUs. This repo provides base models that can be further fine-tuned for specific downstream tasks related to embedded sensing.

Model Information

  • Parameters: 124M (Hidden Size = 768)
  • Architecture: Decoder-only transformer
  • Training Data: Up to 10B tokens from the SHL and Fineweb datasets, combined in a 4:6 ratio
  • Input and Output Modality: Text
  • Context Length: 1024

Acknowledgements

We want to acknowledge the open-source frameworks llm.c and llama.cpp and the sensor dataset provided by SHL, which were instrumental in training and testing these models.

Usage

The model can be used in two primary ways:

  1. With Hugging Face’s Transformers Library

    from transformers import pipeline
    import torch
     
    path = "tinyllm/124M-0.4"
    prompt = "The sea is blue but it's his red sea"
     
    generator = pipeline("text-generation", model=path,max_new_tokens = 30, repetition_penalty=1.3, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
    print(generator(prompt)[0]['generated_text'])
    
  2. With llama.cpp Generate a GGUF model file using this tool and use the generated GGUF file for inferencing.

    python3 convert_hf_to_gguf.py models/mymodel/
    

Disclaimer

This model is intended solely for research purposes.