AI Models

52 models · 1 new in 60d

Compare →
  • Claude Opus 4.7New

    Anthropic · 1M tokens · $5/M → $25/M

    Best for: Most capable generally available model. Complex multi-step coding, long agentic workflows, 1M-token codebase reads.

    How: client.messages.create(model='claude-opus-4-7', ...). Adaptive thinking is on by default — no separate extended-thinking mode needed.

    Example: Use Claude Code CLI with --model claude-opus-4-7 to handle PR-sized refactors end-to-end in a single run.

    SWE-bench step-change over Opus 4.6Context 1M (~555k words)
    agentic codingnew tokenizeradaptive thinking1M context128k max output

    API: api.anthropic.com (model: claude-opus-4-7) · AWS Bedrock · GCP Vertex AI · Microsoft Foundry

    Step-change improvement in agentic coding vs Opus 4.6. New tokenizer means 1M tokens ≈ 555k words (vs 750k for Sonnet 4.6).

  • Kimi K2.5

    Moonshot AI · 256K tokens · $0.55/M → $2.19/M

    Best for: Budget alternative to flagship models, Chinese language tasks

    How: OpenAI SDK with base_url='https://api.moonshot.ai/v1'. WARNING: has implicit reasoning that eats max_tokens.

    Example: Use moonshot-v1-8k instead for structured JSON tasks — kimi-k2.5 wastes tokens on hidden thinking.

    reasoningmultimodalcheap

    API: api.moonshot.ai — OpenAI-compatible

    Watch:hidden thinking burns tokenstemperature locked to 1
  • Claude Opus 4.6

    Anthropic · 1M tokens · $15/M → $75/M

    Best for: Complex multi-step coding, large codebase refactors, long-document analysis

    How: Best via Claude Code CLI for coding tasks. For API: messages.create() with system prompt + tools.

    Example: claude-code: point it at a repo, describe the feature, it reads/edits/tests autonomously.

    SWE-bench 72.5%GPQA Diamond 74.9%HumanEval 95.4%
    reasoninglong contexttool useagentic workflowscode generation

    API: api.anthropic.com — SDK: pip install anthropic / npm i @anthropic-ai/sdk

  • Claude Sonnet 4.6

    Anthropic · 200K tokens · $3/M → $15/M

    Best for: Production API backends, real-time chat, moderate complexity coding

    How: Drop-in replacement for Opus when you need faster/cheaper. Same API, just change model ID.

    Example: Use as the default model in your API gateway — upgrade to Opus only for hard problems.

    SWE-bench 65.2%HumanEval 93.8%
    speedcost-efficiencycodingtool use

    API: api.anthropic.com — same SDK as Opus

  • GPT-4.1

    OpenAI · 1M tokens · $2/M → $8/M

    Best for: General-purpose API integration, multimodal apps, coding assistance

    How: client.chat.completions.create(model='gpt-4.1', messages=[...]). Supports vision, tools, JSON mode.

    Example: Build a PR review bot that reads diffs + screenshots and posts comments.

    SWE-bench 54.6%HumanEval 95.3%
    codinginstruction followinglong contextmultimodal

    API: api.openai.com — SDK: pip install openai / npm i openai

  • Gemini 2.5 Pro

    Google · 1M tokens · $1.25/M → $10/M

    Best for: Long-document analysis, multimodal tasks, apps needing search grounding

    How: client.models.generate_content(model='gemini-2.5-pro', contents=[...]). Supports grounding with Google Search.

    Example: Feed a 200-page architecture doc and ask it to find security issues.

    SWE-bench 63.8%GPQA Diamond 67.2%
    multimodallong contextsearch groundingcode generation

    API: generativelanguage.googleapis.com — SDK: pip install google-genai

  • Grok 3

    xAI · 128K tokens · $3/M → $15/M

    Best for: Tasks needing real-time information, math-heavy problems

    How: OpenAI SDK with base_url override. Also supports live search via tools.

    Example: Monitor real-time tech news and generate summaries using live search.

    GPQA Diamond 68.2%AIME 2024 93.3%
    reasoningreal-time datamath

    API: api.x.ai — OpenAI-compatible SDK. Set base_url='https://api.x.ai/v1'