Model Comparison

Which model should I use? Simple answers below, select any rows in the table for side-by-side.

Which model for what?

Complex coding / large refactors

Claude Opus 4.6oro3

Best SWE-bench (72.5%), 1M context reads entire codebases.

Daily coding assistant (fast + cheap)

Claude Sonnet 4.6orGPT-4.1

Best speed/quality ratio. $3/M in, still 93.8% HumanEval.

Batch processing / pipelines

GPT-4.1 nanoorGemini 2.5 Flash

$0.10/M in. Route 1M messages/day for $4 total.

Self-hosted / private (best open source)

DeepSeek V3.2orQwen 3 235B

94% HumanEval, MIT license, cheapest API if you don't self-host.

Self-hosted on single GPU

Qwen 3 30B (MoE)orGemma 4 27B MoE

3B active params = fast on RTX 4090. Hybrid thinking mode.

Hard math / science / reasoning

o3orDeepSeek R1

96.7% AIME. Use reasoning_effort='high'. R1 is the open-source alternative.

Budget reasoning

Grok 3 minioro4-mini

$0.30/M in. Reasoning at 1/10th the cost of o3.

Long documents (500K+ tokens)

Gemini 2.5 ProorLlama 4 Scout

1M context (Gemini) or 10M (Scout, open source).

Multimodal (images + video)

Gemma 4 31B DenseorGemini 2.5 Pro

Open source, Apache 2.0, native image+video. #3 on LMSYS.

RAG / search embeddings

text-embedding-3-largeorNomic Embed v2 MoE

OpenAI for quality, Nomic for self-hosted zero-cost.

Code completion in editor

Qwen 2.5 Coder 32BorCodestral 25.01

92.7% HumanEval, Apache 2.0, self-hosted Copilot replacement.

On AWS (via Bedrock)

Claude Sonnet 4.6orClaude Opus 4.6

Both available on Bedrock. No external API keys needed.

Price tiers (input per 1M tokens)

Free / self-host

Llama 4 Maverick

Llama 4 Scout

Llama 3.3 70B

DeepSeek R1

DeepSeek V3

DeepSeek V3.2

Qwen 3 235B

Qwen 3 30B

Qwen 2.5 Coder 32B

Mistral Large 3

Ministral 3 (3B/8B/14B)

Codestral 25.01

Gemma 4 31B Dense

Gemma 4 27B MoE

Gemma 4 E4B

Gemma 3 27B

Phi-4

< $1/M

Claude Haiku 4.5

GPT-4.1 mini

GPT-4.1 nano

Gemini 2.5 Flash

Grok 3 mini

Kimi K2.5

Moonshot v1 (8K/32K/128K)

$1–5/M

Claude Opus 4.7

Claude Sonnet 4.6

GPT-4.1

o4-mini

Gemini 2.5 Pro

Grok 3

> $5/M

Claude Opus 4.6

Full comparison

LicenseTypeVendorPrice

32 of 32 models

		Type					Strengths
Claude Haiku 4.5	Anthropic	CLOSED	200K tokens	$0.80/M	88.5%	—	speedcoststructured output
Claude Opus 4.6	Anthropic	CLOSED	1M tokens	$15/M	95.4%	72.5%	reasoninglong contexttool use
Claude Opus 4.7	Anthropic	CLOSED	1M tokens	$5/M	—	step-change over Opus 4.6	agentic codingnew tokenizeradaptive thinking
Claude Sonnet 4.6	Anthropic	CLOSED	200K tokens	$3/M	93.8%	65.2%	speedcost-efficiencycoding
Codestral 25.01	Mistral	OPEN	256K tokens	self-host	91.0%	—	code completionFIM (fill-in-middle)80+ languages
DeepSeek R1	DeepSeek	OPEN	128K tokens	self-host	—	49.2%	reasoningmathcoding
DeepSeek V3	DeepSeek	OPEN	128K tokens	self-host	92.1%	—	codingmathMoE 37B active / 671B total
DeepSeek V3.2	DeepSeek	OPEN	164K tokens	self-host	94.0%	—	codingmathsparse attention (DSA)
Gemini 2.5 Flash	Google	CLOSED	1M tokens	$0.15/M	—	—	speedcostlong context
Gemini 2.5 Pro	Google	CLOSED	1M tokens	$1.25/M	—	63.8%	multimodallong contextsearch grounding
Gemma 3 27B	Google	OPEN	128K tokens	self-host	78.0%	—	compactmultimodalruns on single GPU
Gemma 4 27B MoE	Google	OPEN	128K tokens	self-host	—	—	MoE efficiencymultimodalimages + video
Gemma 4 31B Dense	Google	OPEN	256K tokens	self-host	—	—	multimodalimages + video35+ languages
Gemma 4 E4B	Google	OPEN	128K tokens	self-host	—	—	tinyon-devicemultimodal + audio
GPT-4.1	OpenAI	CLOSED	1M tokens	$2/M	95.3%	54.6%	codinginstruction followinglong context
GPT-4.1 mini	OpenAI	CLOSED	1M tokens	$0.40/M	92.5%	28.8%	costspeedlong context
GPT-4.1 nano	OpenAI	CLOSED	1M tokens	$0.10/M	—	—	ultra-cheapfastclassification
Grok 3	xAI	CLOSED	128K tokens	$3/M	—	—	reasoningreal-time datamath
Grok 3 mini	xAI	CLOSED	128K tokens	$0.30/M	—	—	fast reasoningvery cheapmath
Kimi K2.5	Moonshot AI	CLOSED	256K tokens	$0.55/M	—	—	reasoningmultimodalcheap
Llama 3.3 70B	Meta	OPEN	128K tokens	self-host	88.4%	—	mature ecosystemfine-tuning friendlywide hardware support
Llama 4 Maverick	Meta	OPEN	1M tokens	self-host	84.8%	—	multilingualmultimodalMoE architecture
Llama 4 Scout	Meta	OPEN	10M tokens	self-host	—	—	longest context (10M)MoE 17B active / 109B totalfits single H100
Ministral 3 (3B/8B/14B)	Mistral	OPEN	128K tokens	self-host	—	—	edge-friendlyvisiondense
Mistral Large 3	Mistral	OPEN	256K tokens	self-host	—	—	MoE 41B active / 675B totalmultilingualfunction calling
Moonshot v1 (8K/32K/128K)	Moonshot AI	CLOSED	8K / 32K / 128K tokens	$0.14/M	—	—	very cheapno hidden reasoningreliable JSON
o3	OpenAI	CLOSED	200K tokens	$2/M	—	69.1%	reasoningmathscience
o4-mini	OpenAI	CLOSED	200K tokens	$1.10/M	—	68.1%	reasoningcodingcost-efficient reasoning
Phi-4	Microsoft	OPEN	16K tokens	self-host	—	—	14B paramsSTEM reasoningMIT license
Qwen 2.5 Coder 32B	Alibaba	OPEN	128K tokens	self-host	92.7%	—	code completioncode generationApache 2.0
Qwen 3 235B	Alibaba	OPEN	128K tokens	self-host	90.2%	—	hybrid thinkingMoE 22B activeApache 2.0
Qwen 3 30B	Alibaba	OPEN	128K tokens	self-host	—	—	MoE 3B active / 30B totalruns on consumer GPUhybrid thinking

Click a row to select it for side-by-side comparison · click any column header to sort