LiveBench.ai Data Analysis
툴팁 제목
툴팁 내용
Top Models Performance Overview
Top Models by NormScore - Livebench
Rank | Model Name | NormScore - Livebench | Coding | Data Analysis | IF | Language | Mathematics | Reasoning |
---|---|---|---|---|---|---|---|---|
1 | o3 High | 80.707 | 80.759 | 78.445 | 81.813 | 79.864 | 79.021 | 83.234 |
2 | o3 Medium | 78.933 | 82.091 | 79.655 | 80.048 | 76.571 | 74.950 | 81.201 |
3 | o4-Mini High | 78.116 | 84.271 | 81.060 | 80.653 | 67.464 | 78.834 | 78.599 |
4 | Gemini 2.5 Pro Preview | 77.443 | 74.948 | 76.161 | 76.509 | 74.841 | 83.062 | 78.191 |
5 | Claude 3.7 Sonnet Thinking | 74.983 | 77.127 | 82.618 | 77.146 | 73.864 | 73.645 | 68.034 |
6 | o4-Mini Medium | 73.752 | 78.136 | 80.213 | 77.659 | 63.538 | 75.241 | 70.038 |
7 | Qwen 3 235B A22B | 73.573 | 68.813 | 81.021 | 83.281 | 62.604 | 73.282 | 70.099 |
8 | DeepSeek R1 | 72.047 | 79.024 | 81.385 | 76.409 | 57.950 | 72.506 | 68.991 |
9 | Qwen 3 32B | 71.429 | 67.643 | 79.847 | 80.862 | 58.699 | 70.425 | 69.496 |
10 | Grok 3 Mini Beta (High) | 70.778 | 57.270 | 74.695 | 74.716 | 63.603 | 71.374 | 78.496 |
Category Performance Comparison
모델별 카테고리 점수
모델 | Coding | Data Analysis | IF | Language | Mathematics | Reasoning |
---|---|---|---|---|---|---|
o3 High | 76.715 | 67.020 | 86.175 | 75.996 | 85.004 | 93.333 |
o3 Medium | 77.863 | 68.193 | 84.321 | 73.481 | 80.657 | 91.000 |
o4-Mini High | 79.976 | 68.328 | 84.958 | 66.055 | 84.895 | 88.111 |
Gemini 2.5 Pro Preview | 71.081 | 62.475 | 80.592 | 69.314 | 89.157 | 87.528 |
Claude 3.7 Sonnet Thinking | 73.194 | 69.107 | 81.254 | 68.269 | 78.999 | 76.167 |
o4-Mini Medium | 74.219 | 68.472 | 81.825 | 62.409 | 81.020 | 78.472 |
Qwen 3 235B A22B | 65.325 | 68.308 | 87.729 | 60.609 | 78.778 | 78.611 |
DeepSeek R1 | 74.985 | 69.625 | 80.508 | 54.771 | 77.910 | 77.167 |
Qwen 3 32B | 64.238 | 68.289 | 85.171 | 55.153 | 75.583 | 77.750 |
Grok 3 Mini Beta (High) | 54.516 | 64.578 | 78.704 | 59.087 | 77.005 | 87.611 |
Categories and Benchmarks
Category | Benchmarks |
---|---|
Coding | code_completion, code_generation |
Data Analysis | tablejoin, tablereformat |
IF | paraphrase, simplify, story_generation, summarize |
Language | connections, plot_unscrambling, typos |
Mathematics | AMPS_Hard, math_comp, olympiad |
Reasoning | spatial, web_of_lies_v3, zebra_puzzle |
Select a Model
Please select a model to view detailed performance.