Hugging Face Open LLM Leaderboard

Open-weight LLM / VLM benchmark dashboard

Compare open-weight language and vision-language models for local AI, private deployment, and data-compliant model selection.

Select benchmark category
Loading leaderboard data…
models ranked ?
best average score ?
strongest family ?
data refresh ?

Top models by selected metric

Benchmark profile of top 5

Score vs parameter count

Family averages in top set

Top 10 by benchmark

Ranked models

Click column headers to sort. Benchmark cells are color-scaled from weak to strong within each metric.

From benchmark shortlist to production choice

Benchmarks help you cut the search space. Workflow tests decide the model.

If you need a local or open-weight model for manufacturing, due diligence, document workflows, or regulated data settings, I can help define the eval set, shortlist models, and turn the result into a deployment decision.

Book a 30-minute call