LiveBench.ai Data Analysis
툴팁 제목
툴팁 내용
Top Models Performance Overview
Top Models by NormScore - Livebench
Rank | Model Name | NormScore - Livebench | Coding | Data Analysis | IF | Language | Mathematics | Reasoning |
---|---|---|---|---|---|---|---|---|
1 | o3 High | 80.707 | 80.759 | 78.445 | 81.813 | 79.864 | 79.021 | 83.234 |
2 | o3 Medium | 78.933 | 82.091 | 79.655 | 80.048 | 76.571 | 74.950 | 81.201 |
3 | o4-Mini High | 78.116 | 84.271 | 81.060 | 80.653 | 67.464 | 78.834 | 78.599 |
4 | Gemini 2.5 Pro Preview | 77.443 | 74.948 | 76.161 | 76.509 | 74.841 | 83.062 | 78.191 |
5 | Claude 3.7 Sonnet Thinking | 74.983 | 77.127 | 82.618 | 77.146 | 73.864 | 73.645 | 68.034 |
6 | o4-Mini Medium | 73.752 | 78.136 | 80.213 | 77.659 | 63.538 | 75.241 | 70.038 |
7 | Qwen 3 235B A22B | 73.573 | 68.813 | 81.021 | 83.281 | 62.604 | 73.282 | 70.099 |
8 | DeepSeek R1 | 72.047 | 79.024 | 81.385 | 76.409 | 57.950 | 72.506 | 68.991 |
9 | Qwen 3 32B | 71.429 | 67.643 | 79.847 | 80.862 | 58.699 | 70.425 | 69.496 |
10 | Grok 3 Mini Beta (High) | 70.778 | 57.270 | 74.695 | 74.716 | 63.603 | 71.374 | 78.496 |
11 | Gemini 2.5 Flash Preview | 70.549 | 63.566 | 75.464 | 75.013 | 65.258 | 76.085 | 65.727 |
12 | QwQ 32B | 69.522 | 64.575 | 82.328 | 77.703 | 53.219 | 70.698 | 68.503 |
13 | Qwen 3 30B A3B | 66.100 | 50.005 | 77.598 | 79.026 | 57.539 | 67.093 | 59.496 |
14 | GPT-4.5 Preview | 65.740 | 80.195 | 69.083 | 68.639 | 69.383 | 63.458 | 48.652 |
15 | Claude 3.7 Sonnet | 65.179 | 78.298 | 70.006 | 72.614 | 69.294 | 60.548 | 43.817 |
16 | Grok 3 Beta | 64.174 | 77.571 | 65.874 | 80.445 | 58.509 | 58.705 | 43.547 |
17 | GPT-4.1 | 62.883 | 77.127 | 76.079 | 73.121 | 58.983 | 58.030 | 39.692 |
18 | DeepSeek V3.1 | 62.814 | 72.607 | 70.174 | 77.378 | 51.454 | 66.566 | 39.567 |
19 | Gemini 2.0 Flash | 61.084 | 68.409 | 68.774 | 81.459 | 47.634 | 59.028 | 39.416 |
20 | ChatGPT-4o | 61.082 | 81.647 | 76.310 | 68.259 | 54.473 | 51.794 | 43.549 |
21 | Qwen2.5 Max | 60.646 | 70.427 | 75.216 | 71.551 | 63.473 | 53.317 | 34.375 |
22 | LearnLM 2.0 Flash Experimental | 58.797 | 67.803 | 61.993 | 79.523 | 47.997 | 57.139 | 35.486 |
23 | Claude 3.5 Sonnet | 58.206 | 77.853 | 65.855 | 65.784 | 60.505 | 47.458 | 38.352 |
24 | GPT-4.1 Mini | 58.053 | 75.957 | 68.215 | 66.737 | 41.200 | 54.783 | 47.887 |
25 | Llama 4 Maverick 17B 128E Instruct | 56.731 | 56.987 | 57.919 | 71.904 | 53.021 | 56.691 | 39.286 |
26 | DeepSeek R1 Distill Llama 70B | 55.592 | 48.956 | 69.427 | 66.385 | 39.831 | 54.230 | 53.525 |
27 | Step 2 16K | 54.781 | 60.822 | 71.369 | 75.833 | 42.387 | 41.029 | 37.775 |
28 | Gemini 2.0 Flash Lite | 53.994 | 62.558 | 75.175 | 72.767 | 37.032 | 51.345 | 28.743 |
29 | GPT-4o | 53.213 | 73.051 | 71.464 | 61.645 | 48.869 | 38.682 | 35.454 |
30 | Hunyuan Turbos | 51.743 | 53.073 | 53.992 | 72.311 | 37.646 | 53.728 | 34.043 |
31 | Llama 3.3 70B Instruct Turbo | 50.925 | 54.687 | 47.536 | 78.489 | 48.095 | 38.714 | 28.966 |
32 | LearnLM 1.5 Pro Experimental | 50.515 | 62.113 | 47.961 | 64.714 | 42.200 | 53.229 | 31.153 |
33 | Mistral Large | 50.424 | 66.351 | 60.738 | 64.482 | 45.040 | 39.703 | 30.290 |
34 | Gemma 3 27B | 49.909 | 51.620 | 43.554 | 71.127 | 45.122 | 48.733 | 30.679 |
35 | Dracarys2 72B Instruct | 49.468 | 62.153 | 56.107 | 61.915 | 36.478 | 48.980 | 33.469 |
36 | Qwen2.5 72B Instruct Turbo | 49.412 | 60.539 | 58.338 | 61.150 | 40.200 | 48.519 | 30.496 |
37 | DeepSeek R1 Distill Qwen 32B | 47.628 | 48.673 | 57.573 | 52.884 | 32.903 | 55.704 | 39.942 |
38 | Dracarys2 Llama 3.1 70B Instruct | 47.175 | 43.468 | 61.468 | 60.031 | 46.786 | 37.838 | 32.700 |
39 | Mistral Small | 46.316 | 52.346 | 60.029 | 60.437 | 38.092 | 35.843 | 33.021 |
40 | Nova Pro | 45.557 | 52.346 | 52.380 | 63.716 | 42.878 | 35.131 | 25.375 |
41 | Claude 3.5 Haiku | 44.930 | 55.978 | 60.585 | 58.729 | 43.077 | 32.412 | 23.096 |
42 | GPT-4.1 Nano | 44.723 | 66.633 | 48.582 | 54.625 | 32.438 | 39.780 | 31.566 |
43 | GPT-4o Mini | 42.676 | 58.037 | 60.275 | 53.914 | 32.658 | 35.589 | 22.822 |
44 | Nova Lite | 39.364 | 47.544 | 47.304 | 51.408 | 30.204 | 32.501 | 28.580 |
45 | Command R Plus | 35.451 | 28.575 | 51.950 | 54.661 | 33.872 | 21.213 | 19.242 |
46 | Qwen2.5 7B Instruct Turbo | 33.889 | 36.162 | 41.549 | 49.427 | 20.096 | 34.463 | 19.770 |
47 | Nova Micro | 33.664 | 30.471 | 42.879 | 45.596 | 26.367 | 31.951 | 22.749 |
48 | Command R | 32.294 | 27.566 | 42.092 | 52.791 | 30.737 | 17.192 | 18.243 |
Category Performance Comparison
모델별 카테고리 점수
모델 | Coding | Data Analysis | IF | Language | Mathematics | Reasoning |
---|---|---|---|---|---|---|
o3 High | 76.715 | 67.020 | 86.175 | 75.996 | 85.004 | 93.333 |
o3 Medium | 77.863 | 68.193 | 84.321 | 73.481 | 80.657 | 91.000 |
o4-Mini High | 79.976 | 68.328 | 84.958 | 66.055 | 84.895 | 88.111 |
Gemini 2.5 Pro Preview | 71.081 | 62.475 | 80.592 | 69.314 | 89.157 | 87.528 |
Claude 3.7 Sonnet Thinking | 73.194 | 69.107 | 81.254 | 68.269 | 78.999 | 76.167 |
o4-Mini Medium | 74.219 | 68.472 | 81.825 | 62.409 | 81.020 | 78.472 |
Qwen 3 235B A22B | 65.325 | 68.308 | 87.729 | 60.609 | 78.778 | 78.611 |
DeepSeek R1 | 74.985 | 69.625 | 80.508 | 54.771 | 77.910 | 77.167 |
Qwen 3 32B | 64.238 | 68.289 | 85.171 | 55.153 | 75.583 | 77.750 |
Grok 3 Mini Beta (High) | 54.516 | 64.578 | 78.704 | 59.087 | 77.005 | 87.611 |
Gemini 2.5 Flash Preview | 60.334 | 65.530 | 79.021 | 59.432 | 81.802 | 73.472 |
QwQ 32B | 61.360 | 69.529 | 81.829 | 51.481 | 76.084 | 76.722 |
Qwen 3 30B A3B | 47.474 | 66.597 | 83.234 | 55.576 | 72.202 | 66.833 |
GPT-4.5 Preview | 76.072 | 60.070 | 72.325 | 64.759 | 67.940 | 54.417 |
Claude 3.7 Sonnet | 74.281 | 59.965 | 76.492 | 63.194 | 64.654 | 49.111 |
Grok 3 Beta | 73.576 | 55.629 | 84.738 | 53.797 | 62.752 | 48.528 |
GPT-4.1 | 73.194 | 66.404 | 77.046 | 54.551 | 62.386 | 44.389 |
DeepSeek V3.1 | 68.907 | 64.019 | 81.471 | 46.823 | 71.437 | 44.278 |
Gemini 2.0 Flash | 64.743 | 59.916 | 85.788 | 42.387 | 63.189 | 44.250 |
ChatGPT-4o | 77.480 | 66.520 | 71.921 | 49.428 | 55.717 | 48.806 |
Qwen2.5 Max | 66.794 | 64.271 | 75.346 | 58.369 | 56.868 | 38.528 |
LearnLM 2.0 Flash Experimental | 64.299 | 51.419 | 83.759 | 43.344 | 61.102 | 39.722 |
Claude 3.5 Sonnet | 73.898 | 56.187 | 69.296 | 54.477 | 50.543 | 43.222 |
GPT-4.1 Mini | 72.107 | 61.338 | 70.308 | 37.996 | 58.779 | 53.778 |
Llama 4 Maverick 17B 128E Instruct | 54.195 | 47.113 | 75.746 | 49.648 | 60.579 | 43.833 |
DeepSeek R1 Distill Llama 70B | 46.648 | 60.810 | 69.938 | 37.050 | 58.802 | 59.806 |
Step 2 16K | 57.578 | 62.348 | 79.883 | 38.405 | 43.683 | 42.389 |
Gemini 2.0 Flash Lite | 59.308 | 65.385 | 76.629 | 33.941 | 54.967 | 32.250 |
GPT-4o | 69.290 | 63.530 | 64.942 | 44.683 | 41.478 | 39.750 |
Hunyuan Turbos | 50.352 | 47.986 | 76.129 | 34.458 | 57.466 | 38.222 |
Llama 3.3 70B Instruct Turbo | 51.822 | 40.787 | 82.671 | 43.966 | 41.404 | 32.528 |
LearnLM 1.5 Pro Experimental | 58.926 | 39.298 | 68.158 | 37.864 | 56.708 | 34.861 |
Mistral Large | 62.890 | 54.196 | 67.929 | 40.453 | 42.202 | 33.833 |
Gemma 3 27B | 48.944 | 38.797 | 74.904 | 41.314 | 52.267 | 34.417 |
Dracarys2 72B Instruct | 58.726 | 48.477 | 65.217 | 33.058 | 52.250 | 37.486 |
Qwen2.5 72B Instruct Turbo | 57.257 | 50.159 | 64.392 | 36.466 | 51.877 | 34.083 |
DeepSeek R1 Distill Qwen 32B | 46.326 | 46.940 | 55.713 | 30.915 | 60.132 | 44.361 |
Dracarys2 Llama 3.1 70B Instruct | 41.136 | 55.128 | 63.242 | 42.367 | 40.299 | 36.667 |
Mistral Small | 49.648 | 52.140 | 63.663 | 34.586 | 38.392 | 37.083 |
Nova Pro | 49.648 | 44.344 | 67.129 | 38.935 | 37.696 | 28.250 |
Claude 3.5 Haiku | 53.169 | 54.119 | 61.879 | 39.707 | 34.841 | 26.194 |
GPT-4.1 Nano | 63.212 | 49.820 | 57.537 | 30.958 | 42.391 | 35.583 |
GPT-4o Mini | 55.022 | 55.099 | 56.800 | 29.879 | 38.047 | 25.639 |
Nova Lite | 45.040 | 41.238 | 54.129 | 27.620 | 34.616 | 32.000 |
Command R Plus | 27.128 | 49.235 | 57.612 | 30.861 | 22.815 | 21.639 |
Qwen2.5 7B Instruct Turbo | 34.293 | 42.332 | 52.109 | 18.380 | 36.814 | 22.306 |
Nova Micro | 28.919 | 41.295 | 48.042 | 24.192 | 34.147 | 25.417 |
Command R | 26.103 | 39.767 | 55.617 | 27.933 | 18.351 | 20.583 |
Categories and Benchmarks
Category | Benchmarks |
---|---|
Coding | code_completion, code_generation |
Data Analysis | tablejoin, tablereformat |
IF | paraphrase, simplify, story_generation, summarize |
Language | connections, plot_unscrambling, typos |
Mathematics | AMPS_Hard, math_comp, olympiad |
Reasoning | spatial, web_of_lies_v3, zebra_puzzle |
Select a Model
Detailed Performance for llama-3.3-70b-instruct-turbo
Benchmark Scores
Category | Benchmark | Score |
---|---|---|
Coding | code_completion | 54.348 |
code_generation | 49.296 | |
Data Analysis | tablejoin | 22.750 |
tablereformat | 58.824 | |
IF | paraphrase | 78.200 |
simplify | 82.517 | |
story_generation | 84.883 | |
summarize | 85.083 | |
Language | connections | 36.667 |
plot_unscrambling | 35.230 | |
typos | 60.000 | |
Mathematics | AMPS_Hard | 53.000 |
math_comp | 39.583 | |
olympiad | 31.629 | |
Reasoning | spatial | 38.000 |
web_of_lies_v3 | 27.333 | |
zebra_puzzle | 32.250 | |
Average Score: | 51.153 |