|
--- |
|
title: README |
|
emoji: 📚 |
|
colorFrom: yellow |
|
colorTo: red |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
# About Us |
|
|
|
MosaicML’s mission is to make efficient training of ML models accessible. |
|
We continually productionize state-of-the-art research on efficient model training, and study the |
|
combinations of these methods in order to ensure that model training is ✨ as optimized as possible ✨. |
|
These findings are baked into our highly efficient model training stack, the MosaicML platform. |
|
|
|
If you have questions, please feel free to reach out to us on [Twitter](https://twitter.com/mosaicml), |
|
[Email]([email protected]), or join our [Slack channel](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg)! |
|
|
|
# [LLM Foundry](https://github.com/mosaicml/llm-foundry/tree/main) |
|
|
|
This repo contains code for training, finetuning, evaluating, and deploying LLMs for inference with [Composer](https://github.com/mosaicml/composer) and the [MosaicML platform](https://www.mosaicml.com/training). |
|
|
|
|
|
# [Composer Library](https://github.com/mosaicml/composer) |
|
|
|
The open source Composer library makes it easy to train models faster at the algorithmic level. It is built on top of PyTorch. |
|
Use our collection of speedup methods in your own training loop or—for the best experience—with our Composer trainer. |
|
|
|
# [StreamingDataset](https://github.com/mosaicml/streaming) |
|
|
|
Fast, accurate streaming of training data from cloud storage. We built StreamingDataset to make training on large datasets from cloud storage as fast, cheap, and scalable as possible. |
|
|
|
It’s specially designed for multi-node, distributed training for large models—maximizing correctness guarantees, |
|
performance, and ease of use. Now, you can efficiently train anywhere, independent of your training data location. |
|
Just stream in the data you need, when you need it. To learn more about why we built StreamingDataset, read our [announcement blog](https://www.mosaicml.com/blog/mosaicml-streamingdataset). |
|
|
|
StreamingDataset is compatible with any data type, including images, text, video, and multimodal data. |
|
|
|
With support for major cloud storage providers (AWS, OCI, and GCS are supported today; Azure is coming soon), |
|
and designed as a drop-in replacement for your PyTorch [IterableDataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset) class, StreamingDataset seamlessly integrates |
|
into your existing training workflows. |
|
|
|
# [MosaicML Examples Repo](https://github.com/mosaicml/examples) |
|
|
|
This repo contains reference examples for training ML models quickly and to high accuracy. It's designed to be easily forked and modified. |
|
|
|
It currently features the following examples: |
|
|
|
* [ResNet-50 + ImageNet](https://github.com/mosaicml/examples#resnet-50--imagenet) |
|
* [DeeplabV3 + ADE20k](https://github.com/mosaicml/examples#deeplabv3--ade20k) |
|
* [GPT / Large Language Models](https://github.com/mosaicml/examples#large-language-models-llms) |
|
* [BERT](https://github.com/mosaicml/examples#bert) |
|
|
|
|
|
|
|
# [MosaicML Platform](https://mcli.docs.mosaicml.com/en/latest/getting_started/installation.html) |
|
|
|
The proprietary MosaicML Platform enables you to easily train large AI models on your data, in your secure environment. |
|
|
|
With the MosaicML Platform, you can train large AI models at scale with a single command. We handle the rest — orchestration, efficiency, node failures, infrastructure. |
|
|
|
Our platform is fully interoperable, cloud agnostic, and enterprise proven. It also seamlessly integrate with your existing workflows, experiment trackers, and data pipelines. |
|
|