AI Models

186 models · 0 new in 60d

Compare →
  • Gemini 2.5 Flash

    Google · 1M tokens · $0.15/M → $0.60/M

    Best for: High-volume processing, real-time apps, budget-conscious pipelines

    How: Set thinking_budget to control reasoning cost. 0 = no thinking, 24576 = max.

    Example: Summarize 1000 GitHub issues per hour for a triage dashboard at ~$1.

    speedcostlong contextthinking budget control

    API: Same SDK as Gemini Pro. model='gemini-2.5-flash-preview-05-20'

  • Claude Haiku 4.5

    Anthropic · 200K tokens · $0.80/M → $4/M

    Best for: Pipelines, batch processing, structured data extraction, routing

    How: Use for high-volume, low-complexity tasks: classification, extraction, summarization.

    Example: Process 10K support tickets per hour to classify priority and extract entities.

    HumanEval 88.5%
    speedcoststructured outputclassification

    API: api.anthropic.com — same SDK

  • GPT-4.1 mini

    OpenAI · 1M tokens · $0.40/M → $1.60/M

    Best for: Embeddings preprocessing, log parsing, lightweight generation

    How: Same API as GPT-4.1. Best for high-volume, simple tasks where cost matters.

    Example: Parse 50K structured logs per hour and extract error patterns.

    SWE-bench 28.8%HumanEval 92.5%
    costspeedlong context

    API: api.openai.com — same SDK

  • GPT-4.1 nano

    OpenAI · 1M tokens · $0.10/M → $0.40/M

    Best for: Intent classification, entity extraction at massive scale

    How: Use for routing, tagging, simple extraction where quality bar is lower.

    Example: Route 1M incoming messages per day to the right service for $4 total.

    ultra-cheapfastclassification

    API: api.openai.com — same SDK

  • Moonshot v1 (8K/32K/128K)

    Moonshot AI · 8K / 32K / 128K tokens · $0.14/M → $0.28/M

    Best for: Batch processing, structured extraction, JSON pipelines

    How: Best for structured output tasks. Supports response_format: json_object. No reasoning overhead.

    Example: Process RSS feeds into structured summaries for pennies per 1000 articles.

    very cheapno hidden reasoningreliable JSON

    API: api.moonshot.ai — OpenAI-compatible. model='moonshot-v1-8k'

  • Gemma 4 QAT

    Google · 128K tokens · api

    Best for: Mobile and laptop applications requiring efficient AI models

    How: Integrate Gemma 4 QAT models into your application for on-device AI processing

    Example: Use Gemma 4 QAT for image recognition on smartphones with low latency and power consumption

    Optimizing compression for mobile and laptop efficiency

    Auto-discovered from news articles.

  • Phi-4-mini

    Microsoft · api

    Best for: expanding on-device AI capabilities in Microsoft Edge

    How: use Prompt and Writing Assistance APIs in Microsoft Edge

    Example: integrated with Microsoft Edge for on-device AI tasks

    on-device AInew models and APIs for the web

    Auto-discovered from news articles.