OpenClaw Model Rankings

Top 10 of 87 qualifying models

Score weights:45% Terminal-Bench40% Capability15% Cost

#1GPT-5.4 (xhigh)

OpenAI

73.6

OpenClaw

Intelligence · Coding

Int 51.4·Code 71.1

Terminal-Bench Hard

57.6%

Latency (TTFT)

97.04 s

Run Cost

$0.090

in $2.500 / out $15.000 per 1M

#2GLM-5.2 (max)

Z AI

72.6

OpenClaw

Intelligence · Coding

Int 51.1·Code 68.8

Terminal-Bench Hard

50.8%

Latency (TTFT)

856 ms

Run Cost

$0.034

in $1.400 / out $4.400 per 1M

#3Gemini 3.1 Pro Preview

Google

70.2

OpenClaw

Intelligence · Coding

Int 46.5·Code 68.8

Terminal-Bench Hard

53.8%

Latency (TTFT)

20.86 s

Run Cost

$0.072

in $2.000 / out $12.000 per 1M

#4GPT-5.4 mini (xhigh)

OpenAI

70.0

OpenClaw

Intelligence · Coding

Int 40.0·Code 56.1

Terminal-Bench Hard

52.3%

Latency (TTFT)

10.63 s

Run Cost

$0.027

in $0.750 / out $4.500 per 1M

#5Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)

Anthropic

69.8

OpenClaw

Intelligence · Coding

Int 59.9·Code 76.5

Terminal-Bench Hard

62.9%

Latency (TTFT)

46.22 s

Run Cost

$0.320

in $10.000 / out $50.000 per 1M

#6Claude Opus 4.8 (Adaptive Reasoning, Max Effort)

Anthropic

69.6

OpenClaw

Intelligence · Coding

Int 55.7·Code 74.3

Terminal-Bench Hard

58.3%

Latency (TTFT)

19.19 s

Run Cost

$0.160

in $5.000 / out $25.000 per 1M

#7Qwen3.7 Max

Alibaba

68.4

OpenClaw

Intelligence · Coding

Int 46.0·Code 66.0

Terminal-Bench Hard

50.8%

Latency (TTFT)

1.59 s

Run Cost

$0.060

in $2.500 / out $7.500 per 1M

#8DeepSeek V4 Pro (Reasoning, Max Effort)

DeepSeek

68.4

OpenClaw

Intelligence · Coding

Int 44.3·Code 59.4

Terminal-Bench Hard

46.2%

Latency (TTFT)

1.16 s

Run Cost

$0.0087

in $0.435 / out $0.870 per 1M

#9GPT-5.5 (xhigh)

OpenAI

67.9

OpenClaw

Intelligence · Coding

Int 54.8·Code 74.9

Terminal-Bench Hard

60.6%

Latency (TTFT)

32.93 s

Run Cost

$0.180

in $5.000 / out $30.000 per 1M

#10Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)

Anthropic

67.6

OpenClaw

Intelligence · Coding

Int 47.2·Code 63.0

Terminal-Bench Hard

53.0%

Latency (TTFT)

41.22 s

Run Cost

$0.096

in $3.000 / out $15.000 per 1M

#	Model	OpenClaw Score	Intelligence · Coding	Terminal-Bench Hard	Latency (TTFT)	Run Cost	Value
#1	GPT-5.4 (xhigh) OpenAI	73.6	Int 51.4·Code 71.1 Capability 53.4	57.6% Score 100.0	97.04 s Speed score 1.0	$0.090 in $2.500 / out $15.000 per 1M	14.6
#2	GLM-5.2 (max) Z AI	72.6	Int 51.1·Code 68.8 Capability 52.9	50.8% Score 87.7	856 ms Speed score 93.0	$0.034 in $1.400 / out $4.400 per 1M	27.7
#3	Gemini 3.1 Pro Preview Google	70.2	Int 46.5·Code 68.8 Capability 48.7	53.8% Score 93.2	20.86 s Speed score 18.0	$0.072 in $2.000 / out $12.000 per 1M	14.6
#4	GPT-5.4 mini (xhigh) OpenAI	70.0	Int 40.0·Code 56.1 Capability 41.6	52.3% Score 90.4	10.63 s Speed score 37.2	$0.027 in $0.750 / out $4.500 per 1M	23.9
#5	Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback) Anthropic	69.8	Int 59.9·Code 76.5 Capability 61.6	62.9% Score 100.0	46.22 s Speed score 1.0	$0.320 in $10.000 / out $50.000 per 1M	8.2
#6	Claude Opus 4.8 (Adaptive Reasoning, Max Effort) Anthropic	69.6	Int 55.7·Code 74.3 Capability 57.6	58.3% Score 100.0	19.19 s Speed score 20.4	$0.160 in $5.000 / out $25.000 per 1M	11.3
#7	Qwen3.7 Max Alibaba	68.4	Int 46.0·Code 66.0 Capability 48.0	50.8% Score 87.7	1.59 s Speed score 82.9	$0.060 in $2.500 / out $7.500 per 1M	16.2
#8	DeepSeek V4 Pro (Reasoning, Max Effort) DeepSeek	68.4	Int 44.3·Code 59.4 Capability 45.8	46.2% Score 79.5	1.16 s Speed score 88.3	$0.0087 in $0.435 / out $0.870 per 1M	61.5
#9	GPT-5.5 (xhigh) OpenAI	67.9	Int 54.8·Code 74.9 Capability 56.8	60.6% Score 100.0	32.93 s Speed score 4.6	$0.180 in $5.000 / out $30.000 per 1M	10.2
#10	Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) Anthropic	67.6	Int 47.2·Code 63.0 Capability 48.8	53.0% Score 91.8	41.22 s Speed score 1.0	$0.096 in $3.000 / out $15.000 per 1M	11.7

Scores (0–100) are percentile-normalized across all qualifying models — not raw benchmark percentages. Standard run = 12,000 input + 4,000 output tokens. Hover column headers for metric definitions. Data via Artificial Analysis.

Value Map

Best models cluster top-left: high Intelligence Index, low run cost. Bubble size scales with Terminal-Bench Hard score. Hover a bubble or legend entry to inspect.

Scroll horizontally to explore →

Best models for OpenClaw usage