Flax Community

non-profit

https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects

Activity Feed

AI & ML interests

JAX, Flax, TPU, 🤗

Recent Activity

ncoop57 authored a paper 14 days ago

Stable Code Technical Report

ncoop57 authored a paper 14 days ago

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

gagan3012 authored a paper 15 days ago

DateLogicQA: Benchmarking Temporal Biases in Large Language Models

View all activity

flax-community's activity

ncoop57

authored 2 papers 14 days ago

Stable Code Technical Report

Paper • 2404.01226 • Published Apr 1, 2024 • 1

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 17 days ago • 116

versae

authored a paper 22 days ago

The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective

Paper • 2412.09460 • Published 22 days ago • 5

stefan-it

posted an update 26 days ago

Post

1183

My latest project is the outcome of the last 2+ years working with TPUs from the amazing TPU Research Cloud (TRC) program and training Encoder-only LMs with the TensorFlow Model Garden library.

👉 Link: https://github.com/stefan-it/model-garden-lms

An overview of some features:

- Cheatsheet for setting-up a TPU VM Pod (with all necessary dependencies) to pretrain LMs with TF Model Garden
- Conversion scripts that convert TF Model Garden weights to Hugging Face Transformers-compatible models
- Supported architectures include BERT, BERT with Token Dropping and TEAMS

I also released BERT-based models pretrained on the great Hugging Face FineWeb and FineWeb-Edu datasets (10BT subset). With more to come!

👉 Model Hub Link: https://huggingface.co/model-garden-lms

If you find these resources useful, please give them a like!

Made from Bavarian Oberland with ❤️ and 🥨.

infinitylogesh

authored a paper about 2 months ago

MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

Paper • 2203.14371 • Published Mar 27, 2022

tasnim

authored a paper 2 months ago

DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Paper • 2410.15017 • Published Oct 19, 2024 • 1

gkuwanto

authored a paper 3 months ago

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Paper • 2410.12705 • Published Oct 16, 2024 • 30

Muennighoff

authored a paper 3 months ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 104

versae

authored a paper 4 months ago

Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges

Paper • 2402.01917 • Published Feb 2, 2024

Muennighoff

authored a paper 4 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 77

vumichien

authored a paper 5 months ago

Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 12

LouisCastricato

authored a paper 5 months ago

PERSONA: A Reproducible Testbed for Pluralistic Alignment

Paper • 2407.17387 • Published Jul 24, 2024 • 18

Muennighoff

authored a paper 5 months ago

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23, 2024 • 68

morgan

posted an update 5 months ago

Post

1299

Llama 3.1 405B Instruct beats GPT-4o on MixEval-Hard

Just ran MixEval for 405B, Sonnet-3.5 and 4o, with 405B landing right between the other two at 66.19

The GPT-4o result of 64.7 replicated locally but Sonnet-3.5 actually scored 70.25/69.45 in my replications 🤔 Still well ahead of the other 2 though.

Sammple of 1 of the eval calls here: https://wandb.ai/morgan/MixEval/weave/calls/07b05ae2-2ef5-4525-98a6-c59963b76fe1

Quick auto-logging tracing for openai-compatible clients and many more here: https://wandb.github.io/weave/quickstart/