Detailed Results

Per-task results for all models. View the full results repository: SWE-AGI-Eval

Model Task Difficulty Passed Pass Rate Duration LOC Cost Total Actions Spec Understanding Planning Code Understanding Code Writing Debugging Hygiene External Search Other
claude-opus-4.5 c99 hard No
45.3%
(53/117)
0.69h 4937 $20.39 290 11.0% (32) 5.2% (15) 34.8% (101) 20.7% (60) 25.9% (75) 0.7% (2) 0.0% (0) 1.7% (5)
claude-opus-4.5 capnp medium Yes
100.0%
(111/111)
0.25h 2690 $7.55 87 21.8% (19) 8.0% (7) 36.8% (32) 20.7% (18) 11.5% (10) 1.1% (1) 0.0% (0) 0.0% (0)
claude-opus-4.5 cdcl hard No
99.6%
(4295/4312)
3.0h 1020 $20.63 334 2.7% (9) 8.4% (28) 32.6% (109) 27.2% (91) 28.7% (96) 0.0% (0) 0.0% (0) 0.3% (1)
claude-opus-4.5 csv easy Yes
100.0%
(98/98)
0.49h 483 $11.89 149 4.7% (7) 6.7% (10) 18.1% (27) 32.9% (49) 34.2% (51) 0.0% (0) 2.0% (3) 1.3% (2)
claude-opus-4.5 ecma262 hard No
23.1%
(143/618)
0.64h 5172 $21.78 347 6.1% (21) 3.5% (12) 42.1% (146) 30.3% (105) 13.0% (45) 0.0% (0) 0.0% (0) 5.2% (18)
claude-opus-4.5 git_object easy Yes
100.0%
(1000/1000)
0.68h 1065 $15.81 261 10.0% (26) 3.4% (9) 33.7% (88) 24.1% (63) 24.1% (63) 0.0% (0) 0.0% (0) 4.6% (12)
claude-opus-4.5 hpack easy Yes
100.0%
(129/129)
0.38h 1561 $10.53 143 22.4% (32) 6.3% (9) 25.9% (37) 25.2% (36) 19.6% (28) 0.0% (0) 0.0% (0) 0.7% (1)
claude-opus-4.5 html5 hard No
56.5%
(4648/8221)
1.0h 7583 $14.38 181 16.0% (29) 5.5% (10) 16.6% (30) 36.5% (66) 23.2% (42) 0.0% (0) 0.0% (0) 2.2% (4)
claude-opus-4.5 ini easy Yes
100.0%
(98/98)
0.28h 1070 $6.43 97 11.3% (11) 6.2% (6) 21.6% (21) 30.9% (30) 29.9% (29) 0.0% (0) 0.0% (0) 0.0% (0)
claude-opus-4.5 jq hard Yes
100.0%
(218/218)
1.3h 7812 $34.47 464 3.4% (16) 3.0% (14) 47.4% (220) 27.4% (127) 18.1% (84) 0.2% (1) 0.0% (0) 0.4% (2)
claude-opus-4.5 lua hard No
96.3%
(132/137)
1.2h 6782 $34.15 465 2.8% (13) 5.8% (27) 51.4% (239) 17.4% (81) 18.9% (88) 0.0% (0) 0.0% (0) 3.7% (17)
claude-opus-4.5 protobuf easy Yes
100.0%
(141/141)
0.17h 1051 $5.21 94 26.6% (25) 16.0% (15) 20.2% (19) 26.6% (25) 10.6% (10) 0.0% (0) 0.0% (0) 0.0% (0)
claude-opus-4.5 pug medium No
35.5%
(89/251)
1.1h 3422 $22.82 309 5.8% (18) 3.2% (10) 27.8% (86) 35.3% (109) 20.1% (62) 0.0% (0) 0.6% (2) 7.1% (22)
claude-opus-4.5 python hard No
60.8%
(397/653)
3.2h 10932 $60.45 848 3.2% (27) 2.4% (20) 50.1% (425) 23.5% (199) 18.8% (159) 0.0% (0) 0.9% (8) 1.2% (10)
claude-opus-4.5 r6rs hard No
54.0%
(735/1362)
2.9h 8585 $36.57 539 6.5% (35) 4.6% (25) 44.5% (240) 22.1% (119) 21.3% (115) 0.0% (0) 0.0% (0) 0.9% (5)
claude-opus-4.5 toml medium No
82.3%
(603/733)
0.65h 2855 $18.27 292 9.2% (27) 3.8% (11) 34.2% (100) 26.7% (78) 24.0% (70) 1.4% (4) 0.0% (0) 0.7% (2)
claude-opus-4.5 uri easy Yes
100.0%
(138/138)
0.31h 1320 $6.81 124 22.6% (28) 6.5% (8) 22.6% (28) 25.0% (31) 21.8% (27) 0.8% (1) 0.0% (0) 0.8% (1)
claude-opus-4.5 url medium No
87.0%
(1062/1220)
1.1h 2517 $18.36 290 11.0% (32) 5.9% (17) 29.7% (86) 31.4% (91) 20.0% (58) 0.0% (0) 0.0% (0) 2.1% (6)
claude-opus-4.5 wasm medium Yes
100.0%
(800/800)
1.7h 6101 $30.91 450 6.7% (30) 2.2% (10) 32.2% (145) 24.0% (108) 34.0% (153) 0.0% (0) 0.0% (0) 0.9% (4)
claude-opus-4.5 xml medium Yes
100.0%
(735/735)
2.4h 3241 $39.42 534 2.6% (14) 6.4% (34) 32.6% (174) 26.4% (141) 31.8% (170) 0.0% (0) 0.0% (0) 0.2% (1)
claude-opus-4.5 yaml medium No
68.1%
(235/345)
1.4h 3596 $33.91 491 3.9% (19) 3.1% (15) 42.4% (208) 23.4% (115) 24.4% (120) 0.0% (0) 0.8% (4) 2.0% (10)
claude-opus-4.5 zip medium No
88.0%
(958/1089)
2.1h 2006 $37.19 595 4.9% (29) 1.2% (7) 44.2% (263) 21.3% (127) 19.7% (117) 0.2% (1) 0.0% (0) 8.6% (51)
claude-opus-4.6 c99 hard Yes
100.0%
(117/117)
1.1h 4979 $27.12 427 13.8% (59) 4.7% (20) 39.1% (167) 19.9% (85) 11.0% (47) 0.0% (0) 0.0% (0) 11.5% (49)
claude-opus-4.6 capnp medium Yes
100.0%
(111/111)
0.38h 2605 $9.37 195 16.4% (32) 5.6% (11) 39.0% (76) 26.2% (51) 11.8% (23) 0.5% (1) 0.0% (0) 0.5% (1)
claude-opus-4.6 cdcl hard Yes
100.0%
(4312/4312)
6.7h 1203 $46.19 772 6.0% (46) 7.5% (58) 44.3% (342) 15.9% (123) 22.4% (173) 0.1% (1) 2.3% (18) 1.4% (11)
claude-opus-4.6 csv easy Yes
100.0%
(98/98)
0.70h 474 $12.44 214 7.9% (17) 5.6% (12) 30.4% (65) 27.6% (59) 26.2% (56) 0.5% (1) 0.9% (2) 0.9% (2)
claude-opus-4.6 ecma262 hard No
60.2%
(372/618)
9.3h 13901 $0.00 2384 6.5% (155) 10.6% (253) 53.4% (1273) 10.2% (243) 16.9% (402) 0.0% (1) 1.3% (31) 1.1% (26)
claude-opus-4.6 git_object easy Yes
100.0%
(1000/1000)
0.36h 1291 $8.27 206 12.6% (26) 5.3% (11) 52.4% (108) 16.5% (34) 10.2% (21) 0.5% (1) 0.0% (0) 2.4% (5)
claude-opus-4.6 hpack easy Yes
100.0%
(129/129)
0.68h 5941 $9.37 136 11.8% (16) 8.8% (12) 27.2% (37) 24.3% (33) 8.8% (12) 0.0% (0) 5.1% (7) 14.0% (19)
claude-opus-4.6 html5 hard Yes
100.0%
(8221/8221)
12.5h 10585 $0.00 2179 5.8% (126) 4.4% (96) 48.5% (1057) 10.3% (225) 12.1% (264) 0.0% (0) 14.0% (304) 4.9% (107)
claude-opus-4.6 ini easy Yes
100.0%
(98/98)
0.54h 923 $9.05 146 9.6% (14) 6.8% (10) 34.9% (51) 24.0% (35) 21.9% (32) 0.0% (0) 1.4% (2) 1.4% (2)
claude-opus-4.6 jq hard Yes
100.0%
(218/218)
1.3h 6055 $25.49 459 3.7% (17) 6.5% (30) 52.5% (241) 22.0% (101) 11.5% (53) 0.0% (0) 2.6% (12) 1.1% (5)
claude-opus-4.6 lua hard No
97.1%
(133/137)
1.5h 6688 $398.24 425 8.0% (34) 7.5% (32) 38.4% (163) 21.2% (90) 20.5% (87) 0.2% (1) 0.5% (2) 3.8% (16)
claude-opus-4.6 protobuf easy Yes
100.0%
(141/141)
0.20h 858 $4.77 97 14.4% (14) 8.2% (8) 27.8% (27) 32.0% (31) 17.5% (17) 0.0% (0) 0.0% (0) 0.0% (0)
claude-opus-4.6 pug medium No
51.4%
(129/251)
1.8h 4246 $51.69 528 6.1% (32) 8.3% (44) 50.8% (268) 16.1% (85) 12.9% (68) 0.2% (1) 0.9% (5) 4.7% (25)
claude-opus-4.6 python hard No
0.0%
(0/653)
5.2h 16991 $135.91 2190 5.1% (112) 6.0% (132) 57.2% (1253) 12.1% (264) 14.4% (315) 0.1% (3) 1.4% (31) 3.7% (80)
claude-opus-4.6 r6rs hard No
91.9%
(1252/1362)
7.9h 20422 $190.32 3146 7.5% (236) 5.8% (183) 48.4% (1524) 14.8% (466) 18.9% (595) 0.0% (0) 0.4% (14) 4.1% (128)
claude-opus-4.6 toml medium No
98.0%
(718/733)
13.0h 6779 $0.00 3093 8.6% (267) 5.3% (165) 36.6% (1132) 14.2% (438) 15.3% (474) 0.2% (5) 18.5% (572) 1.3% (40)
claude-opus-4.6 uri easy Yes
100.0%
(138/138)
0.23h 1198 $4.72 101 19.8% (20) 5.9% (6) 27.7% (28) 24.8% (25) 18.8% (19) 1.0% (1) 0.0% (0) 2.0% (2)
claude-opus-4.6 url medium Yes
100.0%
(1220/1220)
1.2h 4065 $26.97 432 22.7% (98) 4.4% (19) 22.0% (95) 16.0% (69) 14.6% (63) 0.2% (1) 12.7% (55) 7.4% (32)
claude-opus-4.6 wasm medium Yes
100.0%
(800/800)
0.54h 4152 $13.66 211 22.7% (48) 5.2% (11) 29.4% (62) 19.9% (42) 17.1% (36) 0.0% (0) 0.5% (1) 5.2% (11)
claude-opus-4.6 xml medium Yes
100.0%
(735/735)
1.6h 2839 $35.88 542 6.5% (35) 4.4% (24) 39.5% (214) 28.0% (152) 17.3% (94) 0.0% (0) 4.1% (22) 0.2% (1)
claude-opus-4.6 yaml medium No
99.7%
(344/345)
1.8h 3721 $29.83 513 6.0% (31) 4.5% (23) 47.8% (245) 11.5% (59) 20.9% (107) 0.2% (1) 3.3% (17) 5.8% (30)
claude-opus-4.6 zip medium Yes
100.0%
(1089/1089)
8.1h 10530 $1016.53 1990 3.6% (71) 6.5% (130) 57.3% (1140) 7.1% (141) 10.2% (203) 0.0% (0) 4.9% (97) 10.5% (208)
claude-sonnet-4.5 csv easy No
98.0%
(96/98)
0.23h 572 $5.11 128 7.8% (10) 7.8% (10) 15.6% (20) 46.1% (59) 21.9% (28) 0.0% (0) 0.0% (0) 0.8% (1)
claude-sonnet-4.5 git_object easy No
0.0%
(0/1000)
0.16h 693 $2.48 94 7.4% (7) 7.4% (7) 37.2% (35) 28.7% (27) 16.0% (15) 0.0% (0) 0.0% (0) 3.2% (3)
claude-sonnet-4.5 hpack easy No
99.2%
(128/129)
0.53h 1571 $10.42 303 4.3% (13) 5.9% (18) 29.7% (90) 31.0% (94) 22.8% (69) 0.0% (0) 0.0% (0) 6.3% (19)
claude-sonnet-4.5 ini easy No
71.4%
(70/98)
0.31h 882 $6.68 160 3.1% (5) 4.4% (7) 35.0% (56) 31.2% (50) 25.0% (40) 1.2% (2) 0.0% (0) 0.0% (0)
claude-sonnet-4.5 protobuf easy No
92.9%
(131/141)
0.34h 1098 $9.17 150 6.7% (10) 3.3% (5) 13.3% (20) 42.7% (64) 32.0% (48) 0.7% (1) 0.0% (0) 1.3% (2)
claude-sonnet-4.5 uri easy No
94.9%
(131/138)
0.35h 761 $6.80 154 4.5% (7) 4.5% (7) 22.1% (34) 39.6% (61) 24.7% (38) 0.6% (1) 0.0% (0) 3.9% (6)
deepseek-v3.2 csv easy No
0.0%
(0/98)
2.7h 1058 $0.62 578 2.8% (16) 3.1% (18) 31.8% (184) 44.6% (258) 17.3% (100) 0.0% (0) 0.3% (2) 0.0% (0)
deepseek-v3.2 git_object easy No
0.0%
(0/1000)
1.9h 1099 $0.39 320 6.2% (20) 4.7% (15) 49.1% (157) 21.9% (70) 14.7% (47) 0.0% (0) 1.2% (4) 2.2% (7)
deepseek-v3.2 hpack easy No
0.0%
(0/129)
2.9h 1625 $0.67 555 3.6% (20) 2.5% (14) 35.5% (197) 41.3% (229) 16.4% (91) 0.0% (0) 0.0% (0) 0.7% (4)
deepseek-v3.2 ini easy No
0.0%
(0/98)
5.5h 456 $1.06 972 3.2% (31) 4.2% (41) 37.8% (367) 36.5% (355) 17.6% (171) 0.2% (2) 0.2% (2) 0.3% (3)
deepseek-v3.2 protobuf easy No
0.0%
(0/141)
4.6h 1297 $0.97 867 5.2% (45) 1.6% (14) 37.0% (321) 40.9% (355) 15.2% (132) 0.0% (0) 0.0% (0) 0.0% (0)
deepseek-v3.2 uri easy Yes
100.0%
(138/138)
2.6h 887 $0.42 355 6.2% (22) 5.4% (19) 26.5% (94) 37.5% (133) 23.7% (84) 0.6% (2) 0.0% (0) 0.3% (1)
gemini-3-flash-preview csv easy No
99.0%
(97/98)
0.57h 260 $22.30 155 3.2% (5) 0.0% (0) 8.4% (13) 42.6% (66) 45.8% (71) 0.0% (0) 0.0% (0) 0.0% (0)
gemini-3-flash-preview git_object easy No
0.0%
(0/1000)
0.02h 115 $0.08 8 37.5% (3) 0.0% (0) 62.5% (5) 0.0% (0) 0.0% (0) 0.0% (0) 0.0% (0) 0.0% (0)
gemini-3-flash-preview hpack easy No
0.0%
(0/129)
0.16h 1099 $1.81 59 8.5% (5) 0.0% (0) 13.6% (8) 61.0% (36) 16.9% (10) 0.0% (0) 0.0% (0) 0.0% (0)
gemini-3-flash-preview ini easy No
0.0%
(0/98)
0.30h 235 $2.42 110 4.5% (5) 0.0% (0) 17.3% (19) 48.2% (53) 29.1% (32) 0.0% (0) 0.0% (0) 0.9% (1)
gemini-3-flash-preview protobuf easy Yes
100.0%
(141/141)
0.17h 911 $2.17 46 13.0% (6) 0.0% (0) 13.0% (6) 41.3% (19) 32.6% (15) 0.0% (0) 0.0% (0) 0.0% (0)
gemini-3-flash-preview uri easy Yes
100.0%
(138/138)
0.28h 727 $2.82 94 4.3% (4) 0.0% (0) 10.6% (10) 46.8% (44) 33.0% (31) 0.0% (0) 1.1% (1) 4.3% (4)
gemini-3-pro-preview csv easy No
0.0%
(0/98)
0.40h 413 N/A 91 3.3% (3) 0.0% (0) 8.8% (8) 54.9% (50) 33.0% (30) 0.0% (0) 0.0% (0) 0.0% (0)
gemini-3-pro-preview git_object easy No
0.0%
(0/1000)
0.27h 770 N/A 103 2.9% (3) 0.0% (0) 3.9% (4) 46.6% (48) 27.2% (28) 0.0% (0) 13.6% (14) 5.8% (6)
gemini-3-pro-preview hpack easy No
99.2%
(128/129)
0.53h 1431 N/A 88 6.8% (6) 0.0% (0) 10.2% (9) 62.5% (55) 19.3% (17) 1.1% (1) 0.0% (0) 0.0% (0)
gemini-3-pro-preview ini easy No
0.0%
(0/98)
0.21h 521 N/A 48 8.3% (4) 0.0% (0) 10.4% (5) 58.3% (28) 22.9% (11) 0.0% (0) 0.0% (0) 0.0% (0)
gemini-3-pro-preview protobuf easy No
0.0%
(0/141)
0.13h 704 N/A 23 21.7% (5) 0.0% (0) 13.0% (3) 52.2% (12) 13.0% (3) 0.0% (0) 0.0% (0) 0.0% (0)
gemini-3-pro-preview uri easy No
0.0%
(0/138)
0.26h 423 N/A 69 4.3% (3) 0.0% (0) 4.3% (3) 69.6% (48) 20.3% (14) 0.0% (0) 1.4% (1) 0.0% (0)
glm-4.7 csv easy No
95.9%
(94/98)
0.32h 373 $0.40 141 7.8% (11) 4.3% (6) 47.5% (67) 13.5% (19) 16.3% (23) 0.0% (0) 6.4% (9) 4.3% (6)
glm-4.7 git_object easy No
0.0%
(0/1000)
0.48h 797 $0.42 112 4.5% (5) 4.5% (5) 43.8% (49) 17.0% (19) 15.2% (17) 0.0% (0) 0.0% (0) 15.2% (17)
glm-4.7 hpack easy No
0.0%
(0/129)
0.41h 945 $0.50 131 10.7% (14) 2.3% (3) 29.8% (39) 28.2% (37) 19.1% (25) 0.0% (0) 3.1% (4) 6.9% (9)
glm-4.7 ini easy Yes
100.0%
(98/98)
1.4h 1196 $1.57 438 1.8% (8) 2.3% (10) 25.6% (112) 30.8% (135) 38.6% (169) 0.0% (0) 0.2% (1) 0.7% (3)
glm-4.7 protobuf easy No
89.4%
(126/141)
0.87h 1015 $0.86 245 6.9% (17) 4.1% (10) 24.5% (60) 22.0% (54) 28.6% (70) 0.4% (1) 0.0% (0) 13.5% (33)
glm-4.7 uri easy Yes
100.0%
(138/138)
0.67h 1097 $1.11 326 3.4% (11) 5.5% (18) 31.3% (102) 31.6% (103) 25.8% (84) 0.0% (0) 0.6% (2) 1.8% (6)
gpt-5.2-codex-high c99 hard Yes
100.0%
(117/117)
3.1h 5052 $17.68 580 10.9% (63) 0.2% (1) 48.6% (282) 25.7% (149) 11.9% (69) 2.4% (14) 0.0% (0) 0.3% (2)
gpt-5.2-codex-high capnp medium Yes
100.0%
(111/111)
1.2h 2798 $7.04 265 12.5% (33) 0.0% (0) 55.8% (148) 18.1% (48) 10.9% (29) 1.9% (5) 0.0% (0) 0.8% (2)
gpt-5.2-codex-high cdcl hard No
99.8%
(4305/4312)
5.1h 1380 $11.75 393 2.3% (9) 0.3% (1) 35.9% (141) 31.3% (123) 14.2% (56) 13.5% (53) 0.0% (0) 2.5% (10)
gpt-5.2-codex-high csv easy Yes
100.0%
(98/98)
0.68h 440 $3.69 143 7.7% (11) 0.7% (1) 56.6% (81) 19.6% (28) 9.1% (13) 5.6% (8) 0.0% (0) 0.7% (1)
gpt-5.2-codex-high ecma262 hard No
97.7%
(604/618)
42.2h 33302 N/A 9794 0.8% (82) 0.0% (0) 71.2% (6973) 17.3% (1696) 7.8% (765) 0.4% (38) 0.3% (33) 2.1% (207)
gpt-5.2-codex-high git_object easy Yes
100.0%
(1000/1000)
1.2h 1164 $9.78 324 4.6% (15) 0.3% (1) 54.3% (176) 23.5% (76) 12.7% (41) 3.4% (11) 0.0% (0) 1.2% (4)
gpt-5.2-codex-high hpack easy Yes
100.0%
(129/129)
1.0h 1157 $6.68 204 10.8% (22) 0.5% (1) 43.1% (88) 26.5% (54) 10.3% (21) 2.5% (5) 0.0% (0) 6.4% (13)
gpt-5.2-codex-high html5 hard No
78.4%
(6444/8221)
3.0h 6433 $12.66 386 10.1% (39) 0.3% (1) 33.9% (131) 28.2% (109) 21.0% (81) 0.5% (2) 0.0% (0) 6.0% (23)
gpt-5.2-codex-high ini easy Yes
100.0%
(98/98)
0.77h 927 $4.95 176 4.0% (7) 0.0% (0) 42.6% (75) 31.8% (56) 15.9% (28) 5.1% (9) 0.0% (0) 0.6% (1)
gpt-5.2-codex-high jq hard Yes
100.0%
(218/218)
1.9h 6416 $16.67 522 3.3% (17) 0.0% (0) 45.2% (236) 31.6% (165) 16.7% (87) 3.3% (17) 0.0% (0) 0.0% (0)
gpt-5.2-codex-high lua hard Yes
100.0%
(137/137)
2.5h 5574 $18.40 568 1.8% (10) 0.2% (1) 52.6% (299) 34.7% (197) 9.3% (53) 1.1% (6) 0.0% (0) 0.4% (2)
gpt-5.2-codex-high protobuf easy Yes
100.0%
(141/141)
0.50h 1670 $3.83 136 12.5% (17) 0.7% (1) 54.4% (74) 20.6% (28) 8.1% (11) 2.2% (3) 0.0% (0) 1.5% (2)
gpt-5.2-codex-high pug medium Yes
100.0%
(251/251)
24.6h 14251 $176.90 5093 5.4% (277) 0.0% (1) 54.0% (2752) 19.2% (977) 10.3% (523) 1.3% (68) 2.7% (140) 7.0% (355)
gpt-5.2-codex-high python hard Yes
100.0%
(653/653)
1.8h 7675 $17.70 511 3.1% (16) 0.0% (0) 55.4% (283) 24.9% (127) 10.2% (52) 1.6% (8) 0.0% (0) 4.9% (25)
gpt-5.2-codex-high r6rs hard No
53.4%
(727/1362)
2.9h 6436 $20.19 654 10.4% (68) 0.2% (1) 49.1% (321) 28.9% (189) 10.2% (67) 0.2% (1) 0.0% (0) 1.1% (7)
gpt-5.2-codex-high toml medium Yes
100.0%
(733/733)
3.0h 2280 $17.25 474 8.9% (42) 0.0% (0) 39.0% (185) 28.1% (133) 19.2% (91) 4.0% (19) 0.0% (0) 0.8% (4)
gpt-5.2-codex-high uri easy Yes
100.0%
(138/138)
0.72h 1128 $4.59 187 4.3% (8) 0.5% (1) 49.2% (92) 28.9% (54) 14.4% (27) 2.1% (4) 0.0% (0) 0.5% (1)
gpt-5.2-codex-high url medium No
91.2%
(1112/1220)
1.2h 4849 $8.40 338 17.8% (60) 0.0% (0) 48.5% (164) 16.6% (56) 10.4% (35) 0.6% (2) 0.0% (0) 6.2% (21)
gpt-5.2-codex-high wasm medium Yes
100.0%
(800/800)
2.1h 3479 $15.67 491 10.4% (51) 0.2% (1) 49.1% (241) 21.0% (103) 13.4% (66) 2.4% (12) 3.3% (16) 0.2% (1)
gpt-5.2-codex-high xml medium Yes
100.0%
(735/735)
1.9h 4946 $14.81 483 4.1% (20) 0.2% (1) 42.2% (204) 32.5% (157) 16.4% (79) 4.6% (22) 0.0% (0) 0.0% (0)
gpt-5.2-codex-high yaml medium Yes
100.0%
(345/345)
3.9h 3664 $26.78 721 3.5% (25) 0.1% (1) 40.9% (295) 34.0% (245) 16.0% (115) 3.6% (26) 0.8% (6) 1.1% (8)
gpt-5.2-codex-high zip medium Yes
100.0%
(1089/1089)
3.2h 1346 $20.32 693 2.5% (17) 0.0% (0) 42.6% (295) 12.0% (83) 7.8% (54) 1.2% (8) 1.9% (13) 32.2% (223)
gpt-5.3-codex-xhigh c99 hard Yes
100.0%
(117/117)
0.62h 3624 $5.82 218 6.9% (15) 0.0% (0) 39.9% (87) 25.2% (55) 19.7% (43) 0.5% (1) 0.5% (1) 7.3% (16)
gpt-5.3-codex-xhigh capnp medium Yes
100.0%
(111/111)
0.53h 3114 $4.88 194 13.4% (26) 0.0% (0) 42.8% (83) 19.1% (37) 21.6% (42) 1.0% (2) 0.0% (0) 2.1% (4)
gpt-5.3-codex-xhigh cdcl hard No
99.6%
(4295/4312)
2.2h 1650 $11.14 216 4.2% (9) 0.0% (0) 35.2% (76) 18.5% (40) 28.2% (61) 0.5% (1) 0.0% (0) 13.4% (29)
gpt-5.3-codex-xhigh csv easy Yes
100.0%
(98/98)
0.21h 467 $1.85 84 8.3% (7) 0.0% (0) 48.8% (41) 22.6% (19) 17.9% (15) 0.0% (0) 0.0% (0) 2.4% (2)
gpt-5.3-codex-xhigh ecma262 hard No
17.5%
(108/618)
0.75h 7445 $5.45 201 10.0% (20) 0.0% (0) 48.3% (97) 21.9% (44) 14.9% (30) 0.5% (1) 2.5% (5) 2.0% (4)
gpt-5.3-codex-xhigh git_object easy Yes
100.0%
(1000/1000)
0.44h 840 $2.80 125 4.0% (5) 0.0% (0) 59.2% (74) 24.8% (31) 9.6% (12) 0.8% (1) 0.0% (0) 1.6% (2)
gpt-5.3-codex-xhigh hpack easy Yes
100.0%
(129/129)
0.33h 1793 $3.10 142 12.0% (17) 0.7% (1) 44.4% (63) 22.5% (32) 16.2% (23) 0.0% (0) 0.0% (0) 4.2% (6)
gpt-5.3-codex-xhigh html5 hard No
86.2%
(7086/8221)
3.3h 11080 $15.45 329 7.9% (26) 0.0% (0) 30.1% (99) 20.1% (66) 29.5% (97) 0.6% (2) 7.3% (24) 4.6% (15)
gpt-5.3-codex-xhigh ini easy Yes
100.0%
(98/98)
0.26h 1495 $2.57 99 5.1% (5) 0.0% (0) 40.4% (40) 29.3% (29) 25.3% (25) 0.0% (0) 0.0% (0) 0.0% (0)
gpt-5.3-codex-xhigh jq hard Yes
100.0%
(218/218)
1.1h 4914 $11.41 435 2.8% (12) 0.0% (0) 32.0% (139) 20.7% (90) 22.5% (98) 0.2% (1) 0.0% (0) 21.8% (95)
gpt-5.3-codex-xhigh lua hard Yes
100.0%
(137/137)
0.93h 9625 $7.09 215 4.2% (9) 0.0% (0) 42.3% (91) 35.8% (77) 16.7% (36) 0.0% (0) 0.0% (0) 0.9% (2)
gpt-5.3-codex-xhigh protobuf easy Yes
100.0%
(141/141)
0.19h 1590 $2.05 109 13.8% (15) 0.0% (0) 46.8% (51) 24.8% (27) 12.8% (14) 1.8% (2) 0.0% (0) 0.0% (0)
gpt-5.3-codex-xhigh pug medium Yes
100.0%
(251/251)
3.5h 2709 $22.69 708 5.5% (39) 0.0% (0) 35.6% (252) 18.4% (130) 15.0% (106) 1.0% (7) 13.6% (96) 11.0% (78)
gpt-5.3-codex-xhigh python hard Yes
100.0%
(653/653)
2.3h 7953 $13.86 419 5.3% (22) 0.0% (0) 56.1% (235) 20.3% (85) 12.6% (53) 0.2% (1) 2.9% (12) 2.6% (11)
gpt-5.3-codex-xhigh r6rs hard Yes
100.0%
(1362/1362)
2.6h 3751 $13.72 377 8.2% (31) 0.0% (0) 46.2% (174) 12.5% (47) 15.9% (60) 1.6% (6) 1.6% (6) 14.1% (53)
gpt-5.3-codex-xhigh toml medium Yes
100.0%
(733/733)
0.88h 2149 $6.64 237 12.2% (29) 0.0% (0) 33.3% (79) 14.3% (34) 30.4% (72) 3.4% (8) 5.5% (13) 0.8% (2)
gpt-5.3-codex-xhigh uri easy Yes
100.0%
(138/138)
0.25h 1645 $2.63 95 12.6% (12) 0.0% (0) 33.7% (32) 25.3% (24) 28.4% (27) 0.0% (0) 0.0% (0) 0.0% (0)
gpt-5.3-codex-xhigh url medium Yes
100.0%
(1220/1220)
0.32h 926 $2.88 125 7.2% (9) 0.0% (0) 53.6% (67) 16.0% (20) 17.6% (22) 0.8% (1) 0.0% (0) 4.8% (6)
gpt-5.3-codex-xhigh wasm medium Yes
100.0%
(800/800)
0.76h 5085 $8.01 263 10.3% (27) 0.0% (0) 42.6% (112) 14.8% (39) 16.3% (43) 0.0% (0) 4.2% (11) 11.8% (31)
gpt-5.3-codex-xhigh xml medium Yes
100.0%
(735/735)
1.2h 4504 $10.57 339 5.9% (20) 0.0% (0) 30.1% (102) 32.2% (109) 22.4% (76) 0.0% (0) 8.0% (27) 1.5% (5)
gpt-5.3-codex-xhigh yaml medium Yes
100.0%
(345/345)
0.90h 856 $47.64 288 6.6% (19) 0.0% (0) 50.0% (144) 19.8% (57) 17.7% (51) 0.3% (1) 0.7% (2) 4.9% (14)
gpt-5.3-codex-xhigh zip medium Yes
100.0%
(1089/1089)
1.2h 1258 $10.83 330 3.3% (11) 0.0% (0) 43.6% (144) 16.4% (54) 19.1% (63) 0.3% (1) 0.0% (0) 17.3% (57)
kimi-k2.5 csv easy No
96.9%
(95/98)
0.23h 383 N/A 32 15.6% (5) 9.4% (3) 37.5% (12) 21.9% (7) 15.6% (5) 0.0% (0) 0.0% (0) 0.0% (0)
kimi-k2.5 git_object easy No
75.9%
(759/1000)
2.1h 1299 N/A 121 5.0% (6) 5.8% (7) 38.0% (46) 30.6% (37) 15.7% (19) 0.8% (1) 0.0% (0) 4.1% (5)
kimi-k2.5 hpack easy Yes
100.0%
(129/129)
0.54h 1961 N/A 160 6.9% (11) 3.1% (5) 20.0% (32) 21.9% (35) 32.5% (52) 0.6% (1) 0.0% (0) 15.0% (24)
kimi-k2.5 ini easy No
83.7%
(82/98)
1.0h 998 N/A 99 6.1% (6) 0.0% (0) 19.2% (19) 31.3% (31) 35.4% (35) 0.0% (0) 0.0% (0) 8.1% (8)
kimi-k2.5 protobuf easy No
95.7%
(135/141)
0.85h 1076 N/A 176 9.7% (17) 2.3% (4) 15.3% (27) 26.1% (46) 31.8% (56) 0.6% (1) 0.0% (0) 14.2% (25)
kimi-k2.5 uri easy Yes
100.0%
(138/138)
1.2h 1260 N/A 238 4.2% (10) 3.4% (8) 13.4% (32) 27.3% (65) 46.6% (111) 0.4% (1) 0.0% (0) 4.6% (11)
qwen3-max-2026-01-23 csv easy No
0.0%
(0/98)
0.85h 398 $60.10 201 2.5% (5) 1.5% (3) 14.4% (29) 60.2% (121) 21.4% (43) 0.0% (0) 0.0% (0) 0.0% (0)
qwen3-max-2026-01-23 git_object easy No
0.0%
(0/1000)
2.6h 643 $78.25 240 4.2% (10) 4.6% (11) 46.2% (111) 30.4% (73) 13.3% (32) 0.0% (0) 1.2% (3) 0.0% (0)
qwen3-max-2026-01-23 hpack easy No
0.0%
(0/129)
0.52h 656 $23.51 97 30.9% (30) 6.2% (6) 7.2% (7) 30.9% (30) 24.7% (24) 0.0% (0) 0.0% (0) 0.0% (0)
qwen3-max-2026-01-23 ini easy No
53.1%
(52/98)
1.3h 568 $23.66 149 8.1% (12) 5.4% (8) 33.6% (50) 32.2% (48) 20.8% (31) 0.0% (0) 0.0% (0) 0.0% (0)
qwen3-max-2026-01-23 protobuf easy No
0.0%
(0/141)
6.9h 1755 $153.29 402 4.5% (18) 3.5% (14) 17.7% (71) 49.3% (198) 24.9% (100) 0.0% (0) 0.0% (0) 0.2% (1)
qwen3-max-2026-01-23 uri easy No
30.4%
(42/138)
3.4h 1080 $29.57 101 9.9% (10) 9.9% (10) 22.8% (23) 33.7% (34) 21.8% (22) 0.0% (0) 2.0% (2) 0.0% (0)