Hugging Face Open LLM Leaderboard

Open-weight LLM / VLM benchmark dashboard

Compare open-weight language and vision-language models for local AI, private deployment, and data-compliant model selection.

Talk through your AI use case AI transformation advisory Manufacturing diagnostic case study

Select benchmark category

Loading leaderboard data…

—models ranked ?

—best average score ?

—strongest family ?

—data refresh ?

Filter models Metric Rows per page

Top models by selected metric

Benchmark profile of top 5

Score vs parameter count

Family averages in top set

Top 10 by benchmark

Ranked models

Click column headers to sort. Benchmark cells are color-scaled from weak to strong within each metric.

From benchmark shortlist to production choice

Benchmarks help you cut the search space. Workflow tests decide the model.

If you need a local or open-weight model for manufacturing, due diligence, document workflows, or regulated data settings, I can help define the eval set, shortlist models, and turn the result into a deployment decision.

Agenovation AI advisory AI opportunity scorecard

Book a 30-minute call