Hugging Face Open LLM Leaderboard
Open-weight LLM / VLM benchmark dashboard
Compare open-weight language and vision-language models for local AI, private deployment, and data-compliant model selection.
Select benchmark category
Loading leaderboard data…
Top models by selected metric
Benchmark profile of top 5
Score vs parameter count
Family averages in top set
Top 10 by benchmark
Ranked models
Click column headers to sort. Benchmark cells are color-scaled from weak to strong within each metric.
—
From benchmark shortlist to production choice
Benchmarks help you cut the search space. Workflow tests decide the model.
If you need a local or open-weight model for manufacturing, due diligence, document workflows, or regulated data settings, I can help define the eval set, shortlist models, and turn the result into a deployment decision.