BahaaGalal
's Collections
LLM for Coding
updated
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
Language Models in Code Generation from Scientific Plots
Paper
•
2405.07990
•
Published
•
16
Large Language Models as Planning Domain Generators
Paper
•
2405.06650
•
Published
•
9
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler
Generation
Paper
•
2404.12753
•
Published
•
41
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
Computer Environments
Paper
•
2404.07972
•
Published
•
46
LLoCO: Learning Long Contexts Offline
Paper
•
2404.07979
•
Published
•
20
CodecLM: Aligning Language Models with Tailored Synthetic Data
Paper
•
2404.05875
•
Published
•
16
Elephants Never Forget: Memorization and Learning of Tabular Data in
Large Language Models
Paper
•
2404.06209
•
Published
•
4
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper
•
2404.05719
•
Published
•
82
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
Dialogues
Paper
•
2404.03820
•
Published
•
24
CodeEditorBench: Evaluating Code Editing Capability of Large Language
Models
Paper
•
2404.03543
•
Published
•
15
Language Models as Compilers: Simulating Pseudocode Execution Improves
Algorithmic Reasoning in Language Models
Paper
•
2404.02575
•
Published
•
48
RAFT: Adapting Language Model to Domain Specific RAG
Paper
•
2403.10131
•
Published
•
67
Quiet-STaR: Language Models Can Teach Themselves to Think Before
Speaking
Paper
•
2403.09629
•
Published
•
75
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
•
2403.03163
•
Published
•
93
StarCoder 2 and The Stack v2: The Next Generation
Paper
•
2402.19173
•
Published
•
136
StructLM: Towards Building Generalist Models for Structured Knowledge
Grounding
Paper
•
2402.16671
•
Published
•
26
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API
LLMs
Paper
•
2402.15491
•
Published
•
13
OpenCodeInterpreter: Integrating Code Generation with Execution and
Refinement
Paper
•
2402.14658
•
Published
•
82
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper
•
2402.14261
•
Published
•
10
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
Summarization
Paper
•
2402.13249
•
Published
•
11
Chain-of-Thought Reasoning Without Prompting
Paper
•
2402.10200
•
Published
•
104
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper
•
2402.09727
•
Published
•
36
MPIrigen: MPI Code Generation through Domain-Specific Language Models
Paper
•
2402.09126
•
Published
•
12
Multi-line AI-assisted Code Authoring
Paper
•
2402.04141
•
Published
•
9
StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback
Paper
•
2402.01391
•
Published
•
41
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper
•
2401.16467
•
Published
•
9
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Paper
•
2401.03065
•
Published
•
11