LiveBench.ai Data Analysis

툴팁 제목
툴팁 내용

Top Models by NormScore - Livebench

Rank Model Name NormScore - Livebench Coding Data Analysis IF Language Mathematics Reasoning
1 o3 High 80.707 80.759 78.445 81.813 79.864 79.021 83.234
2 o3 Medium 78.933 82.091 79.655 80.048 76.571 74.950 81.201
3 o4-Mini High 78.116 84.271 81.060 80.653 67.464 78.834 78.599
4 Gemini 2.5 Pro Preview 77.443 74.948 76.161 76.509 74.841 83.062 78.191
5 Claude 3.7 Sonnet Thinking 74.983 77.127 82.618 77.146 73.864 73.645 68.034
6 o4-Mini Medium 73.752 78.136 80.213 77.659 63.538 75.241 70.038
7 Qwen 3 235B A22B 73.573 68.813 81.021 83.281 62.604 73.282 70.099
8 DeepSeek R1 72.047 79.024 81.385 76.409 57.950 72.506 68.991
9 Qwen 3 32B 71.429 67.643 79.847 80.862 58.699 70.425 69.496
10 Grok 3 Mini Beta (High) 70.778 57.270 74.695 74.716 63.603 71.374 78.496

Category Performance Comparison

모델별 카테고리 점수

모델 Coding Data Analysis IF Language Mathematics Reasoning
o3 High 76.715 67.020 86.175 75.996 85.004 93.333
o3 Medium 77.863 68.193 84.321 73.481 80.657 91.000
o4-Mini High 79.976 68.328 84.958 66.055 84.895 88.111
Gemini 2.5 Pro Preview 71.081 62.475 80.592 69.314 89.157 87.528
Claude 3.7 Sonnet Thinking 73.194 69.107 81.254 68.269 78.999 76.167
o4-Mini Medium 74.219 68.472 81.825 62.409 81.020 78.472
Qwen 3 235B A22B 65.325 68.308 87.729 60.609 78.778 78.611
DeepSeek R1 74.985 69.625 80.508 54.771 77.910 77.167
Qwen 3 32B 64.238 68.289 85.171 55.153 75.583 77.750
Grok 3 Mini Beta (High) 54.516 64.578 78.704 59.087 77.005 87.611

Categories and Benchmarks

Category Benchmarks
Coding code_completion, code_generation
Data Analysis tablejoin, tablereformat
IF paraphrase, simplify, story_generation, summarize
Language connections, plot_unscrambling, typos
Mathematics AMPS_Hard, math_comp, olympiad
Reasoning spatial, web_of_lies_v3, zebra_puzzle
-- Select a Model --
-- Select a Model -- ChatGPT-4oClaude 3.5 HaikuClaude 3.5 SonnetClaude 3.7 SonnetClaude 3.7 Sonnet ThinkingCommand RCommand R PlusDeepSeek R1DeepSeek R1 Distill Llama 70BDeepSeek R1 Distill Qwen 32BDeepSeek V3.1Dracarys2 72B InstructDracarys2 Llama 3.1 70B InstructGemini 2.0 FlashGemini 2.0 Flash LiteGemini 2.5 Flash PreviewGemini 2.5 Pro PreviewGemma 3 27BGPT-4.1GPT-4.1 MiniGPT-4.1 NanoGPT-4.5 PreviewGPT-4oGPT-4o MiniGrok 3 BetaGrok 3 Mini Beta (High)Hunyuan TurbosLearnLM 1.5 Pro ExperimentalLearnLM 2.0 Flash ExperimentalLlama 3.3 70B Instruct TurboLlama 4 Maverick 17B 128E InstructMistral LargeMistral SmallNova LiteNova MicroNova Proo3 Higho3 Mediumo4-Mini Higho4-Mini MediumQwen 3 235B A22BQwen 3 30B A3BQwen 3 32BQwen2.5 72B Instruct TurboQwen2.5 7B Instruct TurboQwen2.5 MaxQwQ 32BStep 2 16K
-- Select a Model --

Select a Model

  • -- Select a Model --
  • ChatGPT-4o
  • Claude 3.5 Haiku
  • Claude 3.5 Sonnet
  • Claude 3.7 Sonnet
  • Claude 3.7 Sonnet Thinking
  • Command R
  • Command R Plus
  • DeepSeek R1
  • DeepSeek R1 Distill Llama 70B
  • DeepSeek R1 Distill Qwen 32B
  • DeepSeek V3.1
  • Dracarys2 72B Instruct
  • Dracarys2 Llama 3.1 70B Instruct
  • Gemini 2.0 Flash
  • Gemini 2.0 Flash Lite
  • Gemini 2.5 Flash Preview
  • Gemini 2.5 Pro Preview
  • Gemma 3 27B
  • GPT-4.1
  • GPT-4.1 Mini
  • GPT-4.1 Nano
  • GPT-4.5 Preview
  • GPT-4o
  • GPT-4o Mini
  • Grok 3 Beta
  • Grok 3 Mini Beta (High)
  • Hunyuan Turbos
  • LearnLM 1.5 Pro Experimental
  • LearnLM 2.0 Flash Experimental
  • Llama 3.3 70B Instruct Turbo
  • Llama 4 Maverick 17B 128E Instruct
  • Mistral Large
  • Mistral Small
  • Nova Lite
  • Nova Micro
  • Nova Pro
  • o3 High
  • o3 Medium
  • o4-Mini High
  • o4-Mini Medium
  • Qwen 3 235B A22B
  • Qwen 3 30B A3B
  • Qwen 3 32B
  • Qwen2.5 72B Instruct Turbo
  • Qwen2.5 7B Instruct Turbo
  • Qwen2.5 Max
  • QwQ 32B
  • Step 2 16K

Please select a model to view detailed performance.