jiakai's picture

63 477

jiakai

real-jiakai

·

https://blog.gujiakai.top

AI & ML interests

LLM && Smart QA

Recent Activity

liked a model about 23 hours ago

jukofyork/creative-writer-32b-preview

liked a model about 23 hours ago

tablegpt/TableGPT2-7B

upvoted an article 1 day ago

✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

View all activity

Organizations

real-jiakai's activity

upvoted an article 1 day ago

Article

✴️ ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

By

•

3 days ago

• 9

upvoted a paper 1 day ago

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published 4 days ago • 40

upvoted 2 articles 3 days ago

Article

Introducing Observers: AI Observability with Hugging Face datasets through a lightweight SDK

By

•

Nov 21, 2024

• 35

Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

By

•

4 days ago

• 30

upvoted a paper 3 days ago

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 32

upvoted a collection 4 days ago

Open LLM Leaderboard best models ❤️‍🔥

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 64 items • Updated about 1 hour ago • 497

upvoted a collection 10 days ago

GTE models

General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 19 items • Updated 16 days ago • 19

upvoted 2 papers 17 days ago

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published 20 days ago • 41

Are Your LLMs Capable of Stable Reasoning?

Paper • 2412.13147 • Published 20 days ago • 91

upvoted a paper 22 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published 25 days ago • 96

upvoted a paper 24 days ago

StreamChat: Chatting with Streaming Video

Paper • 2412.08646 • Published 26 days ago • 17

upvoted a paper 30 days ago

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 32

upvoted a collection about 1 month ago

🔱 Sailor2 Language Models

Sailing in South-East Asia with Inclusive Multilingual LLMs • 9 items • Updated Dec 3, 2024 • 22

upvoted 2 papers about 1 month ago

Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published Nov 28, 2024 • 33

o1-Coder: an o1 Replication for Coding

Paper • 2412.00154 • Published Nov 29, 2024 • 42

upvoted 2 collections about 1 month ago

Nov 29 Releases 🌲🌲

25 items • Updated Dec 2, 2024 • 10

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 453

upvoted a paper about 1 month ago

Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 48

upvoted a paper about 2 months ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 112

upvoted an article about 2 months ago

Article

Releasing the largest multilingual open pretraining dataset

By

•

Nov 13, 2024

• 98