LiveBench.ai Data Analysis

툴팁 제목
툴팁 내용

Top Models by NormScore - Livebench

Rank Model Name NormScore - Livebench Coding Data Analysis IF Language Mathematics Reasoning
1 o3 High 80.707 80.759 78.445 81.813 79.864 79.021 83.234
2 o3 Medium 78.933 82.091 79.655 80.048 76.571 74.950 81.201
3 o4-Mini High 78.116 84.271 81.060 80.653 67.464 78.834 78.599
4 Gemini 2.5 Pro Preview 77.443 74.948 76.161 76.509 74.841 83.062 78.191
5 Claude 3.7 Sonnet Thinking 74.983 77.127 82.618 77.146 73.864 73.645 68.034
6 o4-Mini Medium 73.752 78.136 80.213 77.659 63.538 75.241 70.038
7 Qwen 3 235B A22B 73.573 68.813 81.021 83.281 62.604 73.282 70.099
8 DeepSeek R1 72.047 79.024 81.385 76.409 57.950 72.506 68.991
9 Qwen 3 32B 71.429 67.643 79.847 80.862 58.699 70.425 69.496
10 Grok 3 Mini Beta (High) 70.778 57.270 74.695 74.716 63.603 71.374 78.496
11 Gemini 2.5 Flash Preview 70.549 63.566 75.464 75.013 65.258 76.085 65.727
12 QwQ 32B 69.522 64.575 82.328 77.703 53.219 70.698 68.503
13 Qwen 3 30B A3B 66.100 50.005 77.598 79.026 57.539 67.093 59.496
14 GPT-4.5 Preview 65.740 80.195 69.083 68.639 69.383 63.458 48.652
15 Claude 3.7 Sonnet 65.179 78.298 70.006 72.614 69.294 60.548 43.817
16 Grok 3 Beta 64.174 77.571 65.874 80.445 58.509 58.705 43.547
17 GPT-4.1 62.883 77.127 76.079 73.121 58.983 58.030 39.692
18 DeepSeek V3.1 62.814 72.607 70.174 77.378 51.454 66.566 39.567
19 Gemini 2.0 Flash 61.084 68.409 68.774 81.459 47.634 59.028 39.416
20 ChatGPT-4o 61.082 81.647 76.310 68.259 54.473 51.794 43.549
21 Qwen2.5 Max 60.646 70.427 75.216 71.551 63.473 53.317 34.375
22 LearnLM 2.0 Flash Experimental 58.797 67.803 61.993 79.523 47.997 57.139 35.486
23 Claude 3.5 Sonnet 58.206 77.853 65.855 65.784 60.505 47.458 38.352
24 GPT-4.1 Mini 58.053 75.957 68.215 66.737 41.200 54.783 47.887
25 Llama 4 Maverick 17B 128E Instruct 56.731 56.987 57.919 71.904 53.021 56.691 39.286
26 DeepSeek R1 Distill Llama 70B 55.592 48.956 69.427 66.385 39.831 54.230 53.525
27 Step 2 16K 54.781 60.822 71.369 75.833 42.387 41.029 37.775
28 Gemini 2.0 Flash Lite 53.994 62.558 75.175 72.767 37.032 51.345 28.743
29 GPT-4o 53.213 73.051 71.464 61.645 48.869 38.682 35.454
30 Hunyuan Turbos 51.743 53.073 53.992 72.311 37.646 53.728 34.043
31 Llama 3.3 70B Instruct Turbo 50.925 54.687 47.536 78.489 48.095 38.714 28.966
32 LearnLM 1.5 Pro Experimental 50.515 62.113 47.961 64.714 42.200 53.229 31.153
33 Mistral Large 50.424 66.351 60.738 64.482 45.040 39.703 30.290
34 Gemma 3 27B 49.909 51.620 43.554 71.127 45.122 48.733 30.679
35 Dracarys2 72B Instruct 49.468 62.153 56.107 61.915 36.478 48.980 33.469
36 Qwen2.5 72B Instruct Turbo 49.412 60.539 58.338 61.150 40.200 48.519 30.496
37 DeepSeek R1 Distill Qwen 32B 47.628 48.673 57.573 52.884 32.903 55.704 39.942
38 Dracarys2 Llama 3.1 70B Instruct 47.175 43.468 61.468 60.031 46.786 37.838 32.700
39 Mistral Small 46.316 52.346 60.029 60.437 38.092 35.843 33.021
40 Nova Pro 45.557 52.346 52.380 63.716 42.878 35.131 25.375
41 Claude 3.5 Haiku 44.930 55.978 60.585 58.729 43.077 32.412 23.096
42 GPT-4.1 Nano 44.723 66.633 48.582 54.625 32.438 39.780 31.566
43 GPT-4o Mini 42.676 58.037 60.275 53.914 32.658 35.589 22.822
44 Nova Lite 39.364 47.544 47.304 51.408 30.204 32.501 28.580
45 Command R Plus 35.451 28.575 51.950 54.661 33.872 21.213 19.242
46 Qwen2.5 7B Instruct Turbo 33.889 36.162 41.549 49.427 20.096 34.463 19.770
47 Nova Micro 33.664 30.471 42.879 45.596 26.367 31.951 22.749
48 Command R 32.294 27.566 42.092 52.791 30.737 17.192 18.243

Category Performance Comparison

모델별 카테고리 점수

모델 Coding Data Analysis IF Language Mathematics Reasoning
o3 High 76.715 67.020 86.175 75.996 85.004 93.333
o3 Medium 77.863 68.193 84.321 73.481 80.657 91.000
o4-Mini High 79.976 68.328 84.958 66.055 84.895 88.111
Gemini 2.5 Pro Preview 71.081 62.475 80.592 69.314 89.157 87.528
Claude 3.7 Sonnet Thinking 73.194 69.107 81.254 68.269 78.999 76.167
o4-Mini Medium 74.219 68.472 81.825 62.409 81.020 78.472
Qwen 3 235B A22B 65.325 68.308 87.729 60.609 78.778 78.611
DeepSeek R1 74.985 69.625 80.508 54.771 77.910 77.167
Qwen 3 32B 64.238 68.289 85.171 55.153 75.583 77.750
Grok 3 Mini Beta (High) 54.516 64.578 78.704 59.087 77.005 87.611
Gemini 2.5 Flash Preview 60.334 65.530 79.021 59.432 81.802 73.472
QwQ 32B 61.360 69.529 81.829 51.481 76.084 76.722
Qwen 3 30B A3B 47.474 66.597 83.234 55.576 72.202 66.833
GPT-4.5 Preview 76.072 60.070 72.325 64.759 67.940 54.417
Claude 3.7 Sonnet 74.281 59.965 76.492 63.194 64.654 49.111
Grok 3 Beta 73.576 55.629 84.738 53.797 62.752 48.528
GPT-4.1 73.194 66.404 77.046 54.551 62.386 44.389
DeepSeek V3.1 68.907 64.019 81.471 46.823 71.437 44.278
Gemini 2.0 Flash 64.743 59.916 85.788 42.387 63.189 44.250
ChatGPT-4o 77.480 66.520 71.921 49.428 55.717 48.806
Qwen2.5 Max 66.794 64.271 75.346 58.369 56.868 38.528
LearnLM 2.0 Flash Experimental 64.299 51.419 83.759 43.344 61.102 39.722
Claude 3.5 Sonnet 73.898 56.187 69.296 54.477 50.543 43.222
GPT-4.1 Mini 72.107 61.338 70.308 37.996 58.779 53.778
Llama 4 Maverick 17B 128E Instruct 54.195 47.113 75.746 49.648 60.579 43.833
DeepSeek R1 Distill Llama 70B 46.648 60.810 69.938 37.050 58.802 59.806
Step 2 16K 57.578 62.348 79.883 38.405 43.683 42.389
Gemini 2.0 Flash Lite 59.308 65.385 76.629 33.941 54.967 32.250
GPT-4o 69.290 63.530 64.942 44.683 41.478 39.750
Hunyuan Turbos 50.352 47.986 76.129 34.458 57.466 38.222
Llama 3.3 70B Instruct Turbo 51.822 40.787 82.671 43.966 41.404 32.528
LearnLM 1.5 Pro Experimental 58.926 39.298 68.158 37.864 56.708 34.861
Mistral Large 62.890 54.196 67.929 40.453 42.202 33.833
Gemma 3 27B 48.944 38.797 74.904 41.314 52.267 34.417
Dracarys2 72B Instruct 58.726 48.477 65.217 33.058 52.250 37.486
Qwen2.5 72B Instruct Turbo 57.257 50.159 64.392 36.466 51.877 34.083
DeepSeek R1 Distill Qwen 32B 46.326 46.940 55.713 30.915 60.132 44.361
Dracarys2 Llama 3.1 70B Instruct 41.136 55.128 63.242 42.367 40.299 36.667
Mistral Small 49.648 52.140 63.663 34.586 38.392 37.083
Nova Pro 49.648 44.344 67.129 38.935 37.696 28.250
Claude 3.5 Haiku 53.169 54.119 61.879 39.707 34.841 26.194
GPT-4.1 Nano 63.212 49.820 57.537 30.958 42.391 35.583
GPT-4o Mini 55.022 55.099 56.800 29.879 38.047 25.639
Nova Lite 45.040 41.238 54.129 27.620 34.616 32.000
Command R Plus 27.128 49.235 57.612 30.861 22.815 21.639
Qwen2.5 7B Instruct Turbo 34.293 42.332 52.109 18.380 36.814 22.306
Nova Micro 28.919 41.295 48.042 24.192 34.147 25.417
Command R 26.103 39.767 55.617 27.933 18.351 20.583

Categories and Benchmarks

Category Benchmarks
Coding code_completion, code_generation
Data Analysis tablejoin, tablereformat
IF paraphrase, simplify, story_generation, summarize
Language connections, plot_unscrambling, typos
Mathematics AMPS_Hard, math_comp, olympiad
Reasoning spatial, web_of_lies_v3, zebra_puzzle
GPT-4.1 Mini
-- Select a Model -- ChatGPT-4oClaude 3.5 HaikuClaude 3.5 SonnetClaude 3.7 SonnetClaude 3.7 Sonnet ThinkingCommand RCommand R PlusDeepSeek R1DeepSeek R1 Distill Llama 70BDeepSeek R1 Distill Qwen 32BDeepSeek V3.1Dracarys2 72B InstructDracarys2 Llama 3.1 70B InstructGemini 2.0 FlashGemini 2.0 Flash LiteGemini 2.5 Flash PreviewGemini 2.5 Pro PreviewGemma 3 27BGPT-4.1GPT-4.1 MiniGPT-4.1 NanoGPT-4.5 PreviewGPT-4oGPT-4o MiniGrok 3 BetaGrok 3 Mini Beta (High)Hunyuan TurbosLearnLM 1.5 Pro ExperimentalLearnLM 2.0 Flash ExperimentalLlama 3.3 70B Instruct TurboLlama 4 Maverick 17B 128E InstructMistral LargeMistral SmallNova LiteNova MicroNova Proo3 Higho3 Mediumo4-Mini Higho4-Mini MediumQwen 3 235B A22BQwen 3 30B A3BQwen 3 32BQwen2.5 72B Instruct TurboQwen2.5 7B Instruct TurboQwen2.5 MaxQwQ 32BStep 2 16K
GPT-4.1 Mini

Select a Model

  • -- Select a Model --
  • ChatGPT-4o
  • Claude 3.5 Haiku
  • Claude 3.5 Sonnet
  • Claude 3.7 Sonnet
  • Claude 3.7 Sonnet Thinking
  • Command R
  • Command R Plus
  • DeepSeek R1
  • DeepSeek R1 Distill Llama 70B
  • DeepSeek R1 Distill Qwen 32B
  • DeepSeek V3.1
  • Dracarys2 72B Instruct
  • Dracarys2 Llama 3.1 70B Instruct
  • Gemini 2.0 Flash
  • Gemini 2.0 Flash Lite
  • Gemini 2.5 Flash Preview
  • Gemini 2.5 Pro Preview
  • Gemma 3 27B
  • GPT-4.1
  • GPT-4.1 Mini
  • GPT-4.1 Nano
  • GPT-4.5 Preview
  • GPT-4o
  • GPT-4o Mini
  • Grok 3 Beta
  • Grok 3 Mini Beta (High)
  • Hunyuan Turbos
  • LearnLM 1.5 Pro Experimental
  • LearnLM 2.0 Flash Experimental
  • Llama 3.3 70B Instruct Turbo
  • Llama 4 Maverick 17B 128E Instruct
  • Mistral Large
  • Mistral Small
  • Nova Lite
  • Nova Micro
  • Nova Pro
  • o3 High
  • o3 Medium
  • o4-Mini High
  • o4-Mini Medium
  • Qwen 3 235B A22B
  • Qwen 3 30B A3B
  • Qwen 3 32B
  • Qwen2.5 72B Instruct Turbo
  • Qwen2.5 7B Instruct Turbo
  • Qwen2.5 Max
  • QwQ 32B
  • Step 2 16K

Detailed Performance for gpt-4.1-mini-2025-04-14

Benchmark Scores

Category Benchmark Score
Coding code_completion 69.565
code_generation 74.648
Data Analysis tablejoin 28.558
tablereformat 94.118
IF paraphrase 70.133
simplify 72.433
story_generation 72.417
summarize 66.250
Language connections 32.333
plot_unscrambling 27.655
typos 54.000
Mathematics AMPS_Hard 80.000
math_comp 62.500
olympiad 33.836
Reasoning spatial 60.000
web_of_lies_v3 53.333
zebra_puzzle 48.000
Average Score: 58.811