AI Models
52 models · 1 new in 60d
- ▾Claude Opus 4.7New
Anthropic · 1M tokens · $5/M → $25/M
Best for: Most capable generally available model. Complex multi-step coding, long agentic workflows, 1M-token codebase reads.
How: client.messages.create(model='claude-opus-4-7', ...). Adaptive thinking is on by default — no separate extended-thinking mode needed.
Example: Use Claude Code CLI with --model claude-opus-4-7 to handle PR-sized refactors end-to-end in a single run.
SWE-bench step-change over Opus 4.6Context 1M (~555k words)agentic codingnew tokenizeradaptive thinking1M context128k max outputAPI: api.anthropic.com (model: claude-opus-4-7) · AWS Bedrock · GCP Vertex AI · Microsoft Foundry
Step-change improvement in agentic coding vs Opus 4.6. New tokenizer means 1M tokens ≈ 555k words (vs 750k for Sonnet 4.6).
- ▾Kimi K2.5
Moonshot AI · 256K tokens · $0.55/M → $2.19/M
Best for: Budget alternative to flagship models, Chinese language tasks
How: OpenAI SDK with base_url='https://api.moonshot.ai/v1'. WARNING: has implicit reasoning that eats max_tokens.
Example: Use moonshot-v1-8k instead for structured JSON tasks — kimi-k2.5 wastes tokens on hidden thinking.
reasoningmultimodalcheapAPI: api.moonshot.ai — OpenAI-compatible
Watch:hidden thinking burns tokenstemperature locked to 1 - ▾Claude Opus 4.6
Anthropic · 1M tokens · $15/M → $75/M
Best for: Complex multi-step coding, large codebase refactors, long-document analysis
How: Best via Claude Code CLI for coding tasks. For API: messages.create() with system prompt + tools.
Example: claude-code: point it at a repo, describe the feature, it reads/edits/tests autonomously.
SWE-bench 72.5%GPQA Diamond 74.9%HumanEval 95.4%reasoninglong contexttool useagentic workflowscode generationAPI: api.anthropic.com — SDK: pip install anthropic / npm i @anthropic-ai/sdk
- ▾Claude Sonnet 4.6
Anthropic · 200K tokens · $3/M → $15/M
Best for: Production API backends, real-time chat, moderate complexity coding
How: Drop-in replacement for Opus when you need faster/cheaper. Same API, just change model ID.
Example: Use as the default model in your API gateway — upgrade to Opus only for hard problems.
SWE-bench 65.2%HumanEval 93.8%speedcost-efficiencycodingtool useAPI: api.anthropic.com — same SDK as Opus
- ▾Gemini 2.5 Flash
Google · 1M tokens · $0.15/M → $0.60/M
Best for: High-volume processing, real-time apps, budget-conscious pipelines
How: Set thinking_budget to control reasoning cost. 0 = no thinking, 24576 = max.
Example: Summarize 1000 GitHub issues per hour for a triage dashboard at ~$1.
speedcostlong contextthinking budget controlAPI: Same SDK as Gemini Pro. model='gemini-2.5-flash-preview-05-20'
- ▾Claude Haiku 4.5
Anthropic · 200K tokens · $0.80/M → $4/M
Best for: Pipelines, batch processing, structured data extraction, routing
How: Use for high-volume, low-complexity tasks: classification, extraction, summarization.
Example: Process 10K support tickets per hour to classify priority and extract entities.
HumanEval 88.5%speedcoststructured outputclassificationAPI: api.anthropic.com — same SDK
- ▾GPT-4.1
OpenAI · 1M tokens · $2/M → $8/M
Best for: General-purpose API integration, multimodal apps, coding assistance
How: client.chat.completions.create(model='gpt-4.1', messages=[...]). Supports vision, tools, JSON mode.
Example: Build a PR review bot that reads diffs + screenshots and posts comments.
SWE-bench 54.6%HumanEval 95.3%codinginstruction followinglong contextmultimodalAPI: api.openai.com — SDK: pip install openai / npm i openai
- ▾GPT-4.1 mini
OpenAI · 1M tokens · $0.40/M → $1.60/M
Best for: Embeddings preprocessing, log parsing, lightweight generation
How: Same API as GPT-4.1. Best for high-volume, simple tasks where cost matters.
Example: Parse 50K structured logs per hour and extract error patterns.
SWE-bench 28.8%HumanEval 92.5%costspeedlong contextAPI: api.openai.com — same SDK
- ▾GPT-4.1 nano
OpenAI · 1M tokens · $0.10/M → $0.40/M
Best for: Intent classification, entity extraction at massive scale
How: Use for routing, tagging, simple extraction where quality bar is lower.
Example: Route 1M incoming messages per day to the right service for $4 total.
ultra-cheapfastclassificationAPI: api.openai.com — same SDK
- ▾o3
OpenAI · 200K tokens · $2/M → $8/M
Best for: Hard math, science, multi-step planning, complex debugging
How: Use reasoning_effort param: 'low'/'medium'/'high'. No system prompt — use developer message instead.
Example: Debug a distributed system deadlock by feeding it the full trace + architecture.
GPQA Diamond 79.7%AIME 2024 96.7%SWE-bench 69.1%reasoningmathscienceplanningAPI: api.openai.com — same SDK, just model='o3'
- ▾o4-mini
OpenAI · 200K tokens · $1.10/M → $4.40/M
Best for: Coding with reasoning, moderate-complexity math, budget reasoning
How: Cheaper reasoning model. Use when o3 is overkill but you need chain-of-thought.
Example: Generate a migration plan for a database schema change with safety checks.
AIME 2024 93.4%SWE-bench 68.1%reasoningcodingcost-efficient reasoningAPI: api.openai.com — same SDK
- ▾GPT-Image-1
OpenAI · N/A · $5/M tokens → $40/M tokens
Best for: UI mockups, marketing assets, diagrams with text
How: Supports text overlays, inpainting, and style control. Best text rendering of any model.
Example: Generate architecture diagrams with accurate labels from a text description.
text renderinginstruction followingeditingAPI: api.openai.com — client.images.generate(model='gpt-image-1')
- ▾Gemini 2.5 Pro
Google · 1M tokens · $1.25/M → $10/M
Best for: Long-document analysis, multimodal tasks, apps needing search grounding
How: client.models.generate_content(model='gemini-2.5-pro', contents=[...]). Supports grounding with Google Search.
Example: Feed a 200-page architecture doc and ask it to find security issues.
SWE-bench 63.8%GPQA Diamond 67.2%multimodallong contextsearch groundingcode generationAPI: generativelanguage.googleapis.com — SDK: pip install google-genai
- ▾Grok 3
xAI · 128K tokens · $3/M → $15/M
Best for: Tasks needing real-time information, math-heavy problems
How: OpenAI SDK with base_url override. Also supports live search via tools.
Example: Monitor real-time tech news and generate summaries using live search.
GPQA Diamond 68.2%AIME 2024 93.3%reasoningreal-time datamathAPI: api.x.ai — OpenAI-compatible SDK. Set base_url='https://api.x.ai/v1'
- ▾Grok 3 mini
xAI · 128K tokens · $0.30/M → $0.50/M
Best for: Budget reasoning tasks, math, lightweight chain-of-thought
How: Excellent cost-to-reasoning ratio. Use reasoning_effort param.
Example: Validate Terraform plans with reasoning about dependency chains for pennies.
fast reasoningvery cheapmathAPI: api.x.ai — same as Grok 3
- ▾Flux.1 Pro
Black Forest Labs · N/A · $0.05/image → N/A
Best for: High-quality image generation, product photography
How: API or self-host Flux.1 Schnell (open). Pro via API only.
Example: Generate product mockups for landing pages programmatically.
photorealismprompt adherencecommercial licenseAPI: api.bfl.ml OR via Replicate, fal.ai
- ▾Moonshot v1 (8K/32K/128K)
Moonshot AI · 8K / 32K / 128K tokens · $0.14/M → $0.28/M
Best for: Batch processing, structured extraction, JSON pipelines
How: Best for structured output tasks. Supports response_format: json_object. No reasoning overhead.
Example: Process RSS feeds into structured summaries for pennies per 1000 articles.
very cheapno hidden reasoningreliable JSONAPI: api.moonshot.ai — OpenAI-compatible. model='moonshot-v1-8k'
- ▾text-embedding-3-large
OpenAI · 8K tokens · $0.13/M → N/A
Best for: RAG pipelines, semantic search, document retrieval
How: Set dimensions param to reduce size (e.g., 256 for fast search, 3072 for max quality).
Example: Index your internal docs and build a search API with pgvector + this model.
3072 dimensionsstrong retrievalmatryoshka supportAPI: api.openai.com — client.embeddings.create(model='text-embedding-3-large')
- ▾GPT-Rosalind
OpenAI · N/A · api
Best for: life sciences research
How: N/A
Example: N/A
accelerate drug discoverygenomics analysisprotein reasoningscientific research workflowsAuto-discovered from news articles.