Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Paper • 2411.01192 • Published Nov 2, 2024 • 3
AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages Paper • 2305.06897 • Published May 11, 2023 • 8
Enhancing Amharic-LLaMA: Integrating Task Specific and Generative Datasets Paper • 2402.08015 • Published Feb 12, 2024 • 1
AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages Paper • 2211.03263 • Published Nov 7, 2022
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Paper • 2406.05967 • Published Jun 10, 2024 • 5
InkubaLM: A small language model for low-resource African languages Paper • 2408.17024 • Published Aug 30, 2024 • 13
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition Paper • 2407.13559 • Published Jul 18, 2024 • 14
AraT5: Text-to-Text Transformers for Arabic Language Generation Paper • 2109.12068 • Published Aug 31, 2021
TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation Paper • 2206.03933 • Published May 27, 2022
ORCA: A Challenging Benchmark for Arabic Language Understanding Paper • 2212.10758 • Published Dec 21, 2022
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG Paper • 2305.14989 • Published May 24, 2023
Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation Paper • 2310.16127 • Published Oct 24, 2023
Cheetah: Natural Language Generation for 517 African Languages Paper • 2401.01053 • Published Jan 2, 2024 • 1
SERENGETI: Massively Multilingual Language Models for Africa Paper • 2212.10785 • Published Dec 21, 2022 • 1
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic Paper • 2101.01785 • Published Dec 27, 2020
AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking Paper • 2104.13559 • Published Apr 28, 2021
AraT5: Text-to-Text Transformers for Arabic Language Generation Paper • 2109.12068 • Published Aug 31, 2021
Decay No More: A Persistent Twitter Dataset for Learning Social Meaning Paper • 2204.04611 • Published Apr 10, 2022
TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation Paper • 2206.03933 • Published May 27, 2022