AI Models

186 models · 4 new in 60d

Compare →
  • Claude Opus 4.7

    Anthropic · 1M tokens · $5/M → $25/M

    Best for: Most capable generally available model. Complex multi-step coding, long agentic workflows, 1M-token codebase reads.

    How: client.messages.create(model='claude-opus-4-7', ...). Adaptive thinking is on by default — no separate extended-thinking mode needed.

    Example: Use Claude Code CLI with --model claude-opus-4-7 to handle PR-sized refactors end-to-end in a single run.

    SWE-bench step-change over Opus 4.6Context 1M (~555k words)
    agentic codingnew tokenizeradaptive thinking1M context128k max output

    API: api.anthropic.com (model: claude-opus-4-7) · AWS Bedrock · GCP Vertex AI · Microsoft Foundry

    Step-change improvement in agentic coding vs Opus 4.6. New tokenizer means 1M tokens ≈ 555k words (vs 750k for Sonnet 4.6).

  • Seedance 2.0 Pro

    ByteDance · N/A · credit-based → per-second

    Best for: Cost-sensitive Chinese-market video, fast iteration on social shorts, longer narrative clips.

    How: Two tiers — Seedance 2.0 Pro for quality and Seedance 2.0 Lite for fast/cheap drafts. Both expose text-to-video and image-to-video; v2 adds longer shot length and stronger prompt adherence over v1.

    Example: POST volcengineapi.com/seedance/v2/videos { prompt: 'a hummingbird in flight, slow motion', mode: 'pro' }

    high-fidelity 1080p (with 4K up-res)longer coherent shotsimproved motion physicsLite tier for cheap iteration

    API: Volcengine / ByteDance API

    Successor to Seedance 1.0 (2025-06). ByteDance's competitor to Sora / Veo / Kling. The Lite tier remains notably cheaper than competitors at comparable quality.

  • Sora 2

    OpenAI · N/A · see openai.com/pricing → per-second tiered

    Best for: Marketing reels, b-roll, storyboard previz, social-media shorts.

    How: Generate up to 60s clips from a text prompt or seed image. Audio and lip-sync included.

    Example: client.videos.generate(model='sora-2', prompt='aerial shot of a coastal city at sunrise, 1080p, 10s')

    high-fidelity 1080p videorealistic motion physicslong-shot consistencyaudio + dialogue generation

    API: api.openai.com — client.videos.generate(model='sora-2')

    Successor to Sora 1 — adds native audio and longer coherent shots.

  • Kimi K2.5

    Moonshot AI · 256K tokens · $0.55/M → $2.19/M

    Best for: Budget alternative to flagship models, Chinese language tasks

    How: OpenAI SDK with base_url='https://api.moonshot.ai/v1'. WARNING: has implicit reasoning that eats max_tokens.

    Example: Use moonshot-v1-8k instead for structured JSON tasks — kimi-k2.5 wastes tokens on hidden thinking.

    reasoningmultimodalcheap

    API: api.moonshot.ai — OpenAI-compatible

    Watch:hidden thinking burns tokenstemperature locked to 1
  • Claude Opus 4.6

    Anthropic · 1M tokens · $15/M → $75/M

    Best for: Complex multi-step coding, large codebase refactors, long-document analysis

    How: Best via Claude Code CLI for coding tasks. For API: messages.create() with system prompt + tools.

    Example: claude-code: point it at a repo, describe the feature, it reads/edits/tests autonomously.

    SWE-bench 72.5%GPQA Diamond 74.9%HumanEval 95.4%
    reasoninglong contexttool useagentic workflowscode generation

    API: api.anthropic.com — SDK: pip install anthropic / npm i @anthropic-ai/sdk

  • Claude Sonnet 4.6

    Anthropic · 200K tokens · $3/M → $15/M

    Best for: Production API backends, real-time chat, moderate complexity coding

    How: Drop-in replacement for Opus when you need faster/cheaper. Same API, just change model ID.

    Example: Use as the default model in your API gateway — upgrade to Opus only for hard problems.

    SWE-bench 65.2%HumanEval 93.8%
    speedcost-efficiencycodingtool use

    API: api.anthropic.com — same SDK as Opus

  • Gemini 2.5 Flash

    Google · 1M tokens · $0.15/M → $0.60/M

    Best for: High-volume processing, real-time apps, budget-conscious pipelines

    How: Set thinking_budget to control reasoning cost. 0 = no thinking, 24576 = max.

    Example: Summarize 1000 GitHub issues per hour for a triage dashboard at ~$1.

    speedcostlong contextthinking budget control

    API: Same SDK as Gemini Pro. model='gemini-2.5-flash-preview-05-20'

  • Veo 3

    Google · N/A · Vertex AI pricing → per-second tiered

    Best for: Photoreal cinematic clips, ad creative, talking-head shorts with audio.

    How: Vertex AI: generate(model='veo-3.0-generate-preview', prompt='...'). Gemini API exposes the same model.

    Example: ai.models.generate_videos(model='veo-3.0-generate-preview', prompt='timelapse of a city under heavy rain')

    1080p / 4K up-ressynchronized audiostrong prompt adherence8s native, longer with stitching

    API: Vertex AI / Gemini API — model: veo-3.0-generate-preview

  • Claude Haiku 4.5

    Anthropic · 200K tokens · $0.80/M → $4/M

    Best for: Pipelines, batch processing, structured data extraction, routing

    How: Use for high-volume, low-complexity tasks: classification, extraction, summarization.

    Example: Process 10K support tickets per hour to classify priority and extract entities.

    HumanEval 88.5%
    speedcoststructured outputclassification

    API: api.anthropic.com — same SDK

  • GPT-4.1

    OpenAI · 1M tokens · $2/M → $8/M

    Best for: General-purpose API integration, multimodal apps, coding assistance

    How: client.chat.completions.create(model='gpt-4.1', messages=[...]). Supports vision, tools, JSON mode.

    Example: Build a PR review bot that reads diffs + screenshots and posts comments.

    SWE-bench 54.6%HumanEval 95.3%
    codinginstruction followinglong contextmultimodal

    API: api.openai.com — SDK: pip install openai / npm i openai

  • GPT-4.1 mini

    OpenAI · 1M tokens · $0.40/M → $1.60/M

    Best for: Embeddings preprocessing, log parsing, lightweight generation

    How: Same API as GPT-4.1. Best for high-volume, simple tasks where cost matters.

    Example: Parse 50K structured logs per hour and extract error patterns.

    SWE-bench 28.8%HumanEval 92.5%
    costspeedlong context

    API: api.openai.com — same SDK

  • GPT-4.1 nano

    OpenAI · 1M tokens · $0.10/M → $0.40/M

    Best for: Intent classification, entity extraction at massive scale

    How: Use for routing, tagging, simple extraction where quality bar is lower.

    Example: Route 1M incoming messages per day to the right service for $4 total.

    ultra-cheapfastclassification

    API: api.openai.com — same SDK

  • o3

    OpenAI · 200K tokens · $2/M → $8/M

    Best for: Hard math, science, multi-step planning, complex debugging

    How: Use reasoning_effort param: 'low'/'medium'/'high'. No system prompt — use developer message instead.

    Example: Debug a distributed system deadlock by feeding it the full trace + architecture.

    GPQA Diamond 79.7%AIME 2024 96.7%SWE-bench 69.1%
    reasoningmathscienceplanning

    API: api.openai.com — same SDK, just model='o3'

  • o4-mini

    OpenAI · 200K tokens · $1.10/M → $4.40/M

    Best for: Coding with reasoning, moderate-complexity math, budget reasoning

    How: Cheaper reasoning model. Use when o3 is overkill but you need chain-of-thought.

    Example: Generate a migration plan for a database schema change with safety checks.

    AIME 2024 93.4%SWE-bench 68.1%
    reasoningcodingcost-efficient reasoning

    API: api.openai.com — same SDK

  • GPT-Image-1

    OpenAI · N/A · $5/M tokens → $40/M tokens

    Best for: UI mockups, marketing assets, diagrams with text

    How: Supports text overlays, inpainting, and style control. Best text rendering of any model.

    Example: Generate architecture diagrams with accurate labels from a text description.

    text renderinginstruction followingediting

    API: api.openai.com — client.images.generate(model='gpt-image-1')

  • Kling 2.1

    Kuaishou · N/A · credit-based → per-second

    Best for: Cost-sensitive video generation, dance / sports content.

    How: Cheaper alternative to Sora/Veo with strong human-motion fidelity.

    Example: POST klingai.com/v1/videos/text2video { prompt: '...', duration: 10 }

    realistic human motionlong shot generationcompetitive quality at lower price

    API: klingai.com — REST API

  • Gemini 2.5 Pro

    Google · 1M tokens · $1.25/M → $10/M

    Best for: Long-document analysis, multimodal tasks, apps needing search grounding

    How: client.models.generate_content(model='gemini-2.5-pro', contents=[...]). Supports grounding with Google Search.

    Example: Feed a 200-page architecture doc and ask it to find security issues.

    SWE-bench 63.8%GPQA Diamond 67.2%
    multimodallong contextsearch groundingcode generation

    API: generativelanguage.googleapis.com — SDK: pip install google-genai

  • Runway Gen-4

    Runway · N/A · credit-based → per-second

    Best for: Short narrative content where the same character appears in multiple scenes.

    How: Pass a reference image and a prompt; returns a 5–10s clip with consistent characters across shots.

    Example: POST runwayml.com/v1/image_to_video { promptImage: ..., promptText: '...' }

    character & object consistency across shotsimage-to-videolip-synccreator workflow

    API: runwayml.com — REST API + web app

  • Grok 3

    xAI · 128K tokens · $3/M → $15/M

    Best for: Tasks needing real-time information, math-heavy problems

    How: OpenAI SDK with base_url override. Also supports live search via tools.

    Example: Monitor real-time tech news and generate summaries using live search.

    GPQA Diamond 68.2%AIME 2024 93.3%
    reasoningreal-time datamath

    API: api.x.ai — OpenAI-compatible SDK. Set base_url='https://api.x.ai/v1'

  • Grok 3 mini

    xAI · 128K tokens · $0.30/M → $0.50/M

    Best for: Budget reasoning tasks, math, lightweight chain-of-thought

    How: Excellent cost-to-reasoning ratio. Use reasoning_effort param.

    Example: Validate Terraform plans with reasoning about dependency chains for pennies.

    fast reasoningvery cheapmath

    API: api.x.ai — same as Grok 3

  • Pika 2.2

    Pika Labs · N/A · credit-based → per-clip

    Best for: Social shorts, music videos, rapid creative iteration.

    How: Strong for short, stylized clips and quick iteration. Pikaframes lets you set start/end frames.

    Example: Use Pikaframes: upload start + end image, prompt the in-between motion.

    fast iterationpikaframes (keyframe interpolation)lipsync

    API: pika.art — web app + API

  • Flux.1 Pro

    Black Forest Labs · N/A · $0.05/image → N/A

    Best for: High-quality image generation, product photography

    How: API or self-host Flux.1 Schnell (open). Pro via API only.

    Example: Generate product mockups for landing pages programmatically.

    photorealismprompt adherencecommercial license

    API: api.bfl.ml OR via Replicate, fal.ai

  • Moonshot v1 (8K/32K/128K)

    Moonshot AI · 8K / 32K / 128K tokens · $0.14/M → $0.28/M

    Best for: Batch processing, structured extraction, JSON pipelines

    How: Best for structured output tasks. Supports response_format: json_object. No reasoning overhead.

    Example: Process RSS feeds into structured summaries for pennies per 1000 articles.

    very cheapno hidden reasoningreliable JSON

    API: api.moonshot.ai — OpenAI-compatible. model='moonshot-v1-8k'

  • text-embedding-3-large

    OpenAI · 8K tokens · $0.13/M → N/A

    Best for: RAG pipelines, semantic search, document retrieval

    How: Set dimensions param to reduce size (e.g., 256 for fast search, 3072 for max quality).

    Example: Index your internal docs and build a search API with pgvector + this model.

    3072 dimensionsstrong retrievalmatryoshka support

    API: api.openai.com — client.embeddings.create(model='text-embedding-3-large')

  • ESM2

    NVIDIA · 128K tokens · api

    Best for: computational biology tasks

    How: Fine-tune ESM2 using NVIDIA BioNeMo recipes

    Example: Fine-tuning ESM2 with LoRA for specific protein tasks

    protein language understandinggenomic sequences

    Auto-discovered from news articles.

  • NVIDIA BioNeMo

    NVIDIA · N/A · api

    Best for: Computational biology tasks

    How: Use NVIDIA BioNeMo recipes for fine-tuning

    Example: Fine-tuning ESM2 protein language models

    Fine-tuning biological foundation modelsPretrained on massive corpora of protein or genomic sequences

    Auto-discovered from news articles.

  • Ryzen AI Halo

    AMD · N/A · api

    Best for: petite PC development

    How: work with either Microsoft Windows or Linux

    Example: use in AI development platforms

    Linux-friendlypowered by AMD Ryzen AI Max+

    Auto-discovered from news articles.

  • Claude Code

    Anthropic · 128K tokens · api

    Best for: use in infrastructure management tasks

    How: connect AI to your infrastructure through the Model Context Protocol (MCP)

    Example: AI assistants like GitHub Copilot, IBM Bob, Claude Code etc. to interact with Terraform through the Model Context Protocol (MCP)

    interacts with Terraformsupports infrastructure management

    Auto-discovered from news articles.

  • DiffusionGemma

    NVIDIA · 128K tokens · api

    Best for: real-time AI applications such as chat assistants, copilots, and agentic workflows

    How: Run DiffusionGemma on NVIDIA for high-throughput text generation

    Example: Developers can leverage DiffusionGemma for building real-time AI applications

    Developer-ReadyHigh-ThroughputText Generation

    Auto-discovered from news articles.

  • Claude Mythos 5New

    Anthropic · 1M tokens · →

    Best for: Available through Project Glasswing. Successor to Claude Mythos Preview.

    How: client.messages.create({model: "claude-mythos-5", messages: [...]})

    Example: Use via the Anthropic SDK with model='claude-mythos-5'.

    1M tokens contextadaptive thinking128k tokens max output

    API: api.anthropic.com — model: claude-mythos-5 · AWS Bedrock · GCP Vertex AI

    Max output: 128k tokens. Adaptive thinking enabled by default.

  • Claude Fable 5New

    Anthropic · 1M tokens · →

    Best for: Anthropic's most capable widely released model, for the most demanding reasoning and long-horizon agentic work

    How: client.messages.create({model: "claude-fable-5", messages: [...]})

    Example: Use via the Anthropic SDK with model='claude-fable-5'.

    1M tokens contextadaptive thinking128k tokens max outputagentic coding

    API: api.anthropic.com — model: claude-fable-5 · AWS Bedrock · GCP Vertex AI

    Max output: 128k tokens. Adaptive thinking enabled by default.

  • NVIDIA Nemotron Speech

    NVIDIA · api

    Best for: Training speech AI models for clinical applications

    How: Evaluate Clinical ASR Models Faster with Agent Skills and NVIDIA Nemotron Speech

    Example: Training a speech AI model to correctly recognize drug names like Acetaminophen, Amlodipine

    Recognizing or synthesizing clinical terminology

    Auto-discovered from news articles.

  • Google Gemini modelsNew

    Google · 128K tokens · api

    Best for: AI applications

    How: integrate with Apple's new AI architecture

    Example: use in AI-powered applications

    AI architectureinnovative

    Auto-discovered from news articles.

  • Nemotron 3 Ultra

    NVIDIA · api

    Best for: maintaining context and completing tasks across many turns

    How: deploy on Renesas RZ/V series for production

    Example: use in chatbots evolving into long-running agents

    faster reasoningmore efficientlong-running agents

    Auto-discovered from news articles.

  • Claude Opus 4.8New

    Anthropic · 1M tokens · $5/M → $25/M

    Best for: Anthropic's most capable Opus-tier model for complex reasoning and agentic coding

    How: client.messages.create({model: "claude-opus-4-8", messages: [...]})

    Example: Use via the Anthropic SDK with model='claude-opus-4-8'.

    1M tokens contextadaptive thinking128k tokens max outputagentic coding

    API: api.anthropic.com — model: claude-opus-4-8 · AWS Bedrock · GCP Vertex AI

    Max output: 128k tokens. Adaptive thinking enabled by default.

  • NVIDIA Nemotron 3 Ultra

    NVIDIA · api

    Best for: Maintaining context and efficiency across many turns

    How: Integrate with existing chatbot frameworks to enhance long-running agent capabilities

    Example: Use Nemotron 3 Ultra to power a chatbot that can reason and maintain context over multiple interactions

    Faster reasoningMore efficient for long-running agents

    Auto-discovered from news articles.

  • Gemma 4 QAT

    Google · 128K tokens · api

    Best for: Mobile and laptop applications requiring efficient AI models

    How: Integrate Gemma 4 QAT models into your application for on-device AI processing

    Example: Use Gemma 4 QAT for image recognition on smartphones with low latency and power consumption

    Optimizing compression for mobile and laptop efficiency

    Auto-discovered from news articles.

  • Phi-4-mini

    Microsoft · api

    Best for: expanding on-device AI capabilities in Microsoft Edge

    How: use Prompt and Writing Assistance APIs in Microsoft Edge

    Example: integrated with Microsoft Edge for on-device AI tasks

    on-device AInew models and APIs for the web

    Auto-discovered from news articles.

  • Mellum2

    JetBrains · api

    Best for: Advanced AI tasks

    How: Integrate Mellum2 into your AI workflows

    Example: Use Mellum2 for complex problem-solving and decision-making

    12B Mixture-of-Experts Model

    Auto-discovered from news articles.

  • NVIDIA Cosmos 3

    NVIDIA · N/A · api

    Best for: Developing Physical AI systems that need to understand and act within the real world

    How: Integrate NVIDIA Cosmos 3 into your Physical AI system to enable reasoning and action capabilities

    Example: Using NVIDIA Cosmos 3 to develop a robot that can understand and interact with its environment

    Physical AI reasoningAction modelsUnderstanding real world

    Auto-discovered from news articles.

  • Gemini 3.5

    Google · 128K tokens · api

    Best for: General AI applications

    How: Integrate with Google I/O 2026

    Example: Watch 9 videos showing the capabilities of Gemini 3.5

    Advanced capabilitiesHigh performance

    Auto-discovered from news articles.

  • Gemini Omni

    Google · 128K tokens · api

    Best for: General AI applications

    How: Integrate with Google I/O 2026

    Example: Watch 9 videos showing the capabilities of Gemini Omni

    Advanced capabilitiesHigh performance

    Auto-discovered from news articles.

  • NVIDIA Blackwell

    NVIDIA · 128K tokens · api

    Best for: financial trading landscape

    How: Enables sophisticated analysis

    Example: revolutionizing financial trading landscape

    sophisticated analysisvast amounts of unstructured data

    Auto-discovered from news articles.

  • ChatGPT

    OpenAI · 128K tokens · api

    Best for: conversational AI and content generation in Portuguese

    How: Use ChatGPT API to integrate with applications

    Example: Generate news articles in Portuguese

    dialoguecontent creationinformation retrieval

    Auto-discovered from news articles.

  • NVIDIA Cloud Partner (NCP) reference architecture

    NVIDIA · N/A · api

    Best for: governments, enterprises, and telcos

    How: N/A

    Example: N/A

    sovereign AI factoriesbased on NCP reference architecture

    Auto-discovered from news articles.

  • Gordon

    Docker · api

    Best for: container workflow management

    How: Integrate Gordon with Docker Desktop

    Example: Gordon proposes fixes and takes action across your entire Docker workflow

    understands environmentproposes fixestakes action across Docker workflow

    Auto-discovered from news articles.

  • Mythos

    Cloudflare · N/A · api

    Best for: analyzing live code across critical parts of infrastructure

    How: Point Mythos at live code to observe its strengths and weaknesses

    Example: Mythos was used to analyze live code across critical parts of Cloudflare's infrastructure

    security-focusedcode analysis

    Auto-discovered from news articles.

  • NVIDIA Vera Rubin Platform

    NVIDIA · 128K tokens · api

    Best for: Agentic inference workloads

    How: Integrate with NVIDIA's platform for inference

    Example: Use for non-deterministic trajectories in AI

    Solving Agentic AI’s Scale-Up ProblemRuntime dynamics of inference workloads

    Auto-discovered from news articles.

  • NVIDIA DLSS 4.5

    NVIDIA · N/A · api

    Best for: AI-powered game development

    How: Integrate NVIDIA DLSS 4.5 with Unreal Engine 5

    Example: Game developers can enhance game performance and visuals

    Dynamic Multi Frame GenerationMulti Frame Generation 6Xsecond-generation RTX

    Auto-discovered from news articles.

  • Seedance 1.0 Pro

    ByteDance · N/A · credit-based → per-second

    Best for: Cost-sensitive Chinese-market video, fast iteration on social shorts.

    How: Two tiers: Seedance 1.0 Pro for top quality and Seedance 1.0 Lite for fast/cheap drafts. Both expose text-to-video and image-to-video.

    Example: POST volcengineapi.com/seedance/v1/videos { prompt: 'a hummingbird in flight, slow motion', mode: 'pro' }

    fast generation1080p outputstrong prompt adherenceLite variant for cheap iteration

    API: Volcengine / ByteDance API

    ByteDance's competitor to Sora / Veo / Kling. Lite tier is notably cheaper than competitors at similar quality.

  • NVIDIA Nemotron 3 Nano Omni

    NVIDIA · api

    Best for: multimodal agent reasoning in a single efficient open model

    How: Run NVIDIA Nemotron 3 Nano Omni locally in a single command

    Example: reasoning across screens, documents, audio, video, and text within a single perception-to-action loop

    understand and reason across video, audio, images, and language

    Auto-discovered from news articles.

  • DeepSeek-V4-Flash

    DeepSeek · api

    Best for: enabling highly efficient operations

    How: Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

    Example: DeepSeek just launched its fourth generation of flagship models

    highly efficient

    Auto-discovered from news articles.

  • DeepSeek-V4-Pro

    DeepSeek · api

    Best for: enabling highly efficient operations

    How: Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

    Example: DeepSeek just launched its fourth generation of flagship models

    highly efficient

    Auto-discovered from news articles.

  • Google TPU 8th Generation

    Google · N/A · api

    Best for: powering AI applications

    How: Deploy Google's 8th generation TPUs for your AI workloads

    Example: Use the new TPUs for training and inference in AI applications

    specialized chipsfuture of AI

    Auto-discovered from news articles.

  • Google TPUv8

    Google · N/A · api

    Best for: AI acceleration

    How: deploy Google TPUv8 in your cloud environment

    Example: use Google TPUv8 for AI model training and inference

    specialized chipspower the future of AI

    Auto-discovered from news articles.

  • Google's 8th generation TPU

    Google AI · N/A · api

    Best for: AI acceleration

    How: Deploy Google's 8th generation TPU for AI workloads.

    Example: Use the TPU for training and inference of AI models.

    specialized chipspower the future of AI

    Auto-discovered from news articles.

  • GPT-5.5

    OpenAI · 128K tokens · api

    Best for: coding, research, and data analysis

    How: Integrate GPT-5.5 into your tools for advanced tasks.

    Example: Use GPT-5.5 for coding assistance or data analysis.

    fastermore capablecomplex tasks

    Auto-discovered from news articles.

  • Google's eighth generation TPU

    Google · N/A · api

    Best for: AI applications requiring high-performance computing

    How: deploy on Google Cloud to leverage the new TPU capabilities

    Example: use for training and inference of large AI models

    powering the future of AItwo specialized chips

    Auto-discovered from news articles.

  • OpenAI Privacy Filter

    OpenAI · api

    Best for: text privacy and compliance

    How: Integrate into text processing workflows

    Example: Automatically redact sensitive information from documents

    detecting and redacting PIIstate-of-the-art accuracy

    Auto-discovered from news articles.

  • Google's TPU (eighth generation)

    Google · api

    Best for: AI acceleration

    How: Deploy in Google Cloud for AI tasks

    Example: Use for training and inference in AI applications

    specialized chipspower the future of AI

    Auto-discovered from news articles.

  • Codex

    OpenAI · 128K tokens · api

    Best for: enterprises to deploy and scale Codex

    How: partner with Accenture, PwC, Infosys, and others

    Example: help enterprises deploy and scale Codex across the software development lifecycle

    deploy and scale across the software development lifecycle

    Auto-discovered from news articles.

  • GPT-Rosalind

    OpenAI · N/A · api

    Best for: life sciences research

    How: N/A

    Example: N/A

    accelerate drug discoverygenomics analysisprotein reasoningscientific research workflows

    Auto-discovered from news articles.