AI Models

186 models · 54 new in 60d

Compare →
  • Claude Opus 4.7

    Anthropic · 1M tokens · $5/M → $25/M

    Best for: Most capable generally available model. Complex multi-step coding, long agentic workflows, 1M-token codebase reads.

    How: client.messages.create(model='claude-opus-4-7', ...). Adaptive thinking is on by default — no separate extended-thinking mode needed.

    Example: Use Claude Code CLI with --model claude-opus-4-7 to handle PR-sized refactors end-to-end in a single run.

    SWE-bench step-change over Opus 4.6Context 1M (~555k words)
    agentic codingnew tokenizeradaptive thinking1M context128k max output

    API: api.anthropic.com (model: claude-opus-4-7) · AWS Bedrock · GCP Vertex AI · Microsoft Foundry

    Step-change improvement in agentic coding vs Opus 4.6. New tokenizer means 1M tokens ≈ 555k words (vs 750k for Sonnet 4.6).

  • Gemma 4 31B DenseOpen

    Google · 256K tokens · self-host

    Best for: Self-hosted multimodal production, commercial use, multilingual apps

    How: Dense 31B — fits on a single A100 or 2x RTX 4090. Apache 2.0 = fully commercial. Supports images and video natively.

    Example: Deploy as a private multimodal assistant that reads screenshots, logs, and video clips.

    LMSYS Arena #3 textMMLU ~82%
    multimodalimages + video35+ languagesApache 2.0dense architecture
    Hardware to self-host
    VRAM: 20GB (quantized) / 62GB (FP16)
    GPU: 1× A100 80GB or 2× RTX 4090 24GB
    RAM: 32GB+ system RAM

    31B dense. Native multimodal (images + video) increases compute cost vs text-only.

    API: Ollama, vLLM, Hugging Face, Vertex AI. ollama run gemma4:31b

    Brand new (Apr 2026). Ranked #3 on LMSYS Arena text leaderboard at launch.

  • Gemma 4 27B MoEOpen

    Google · 128K tokens · self-host

    Best for: Faster self-hosted inference, cost-efficient multimodal

    How: MoE variant — faster inference than the 31B dense. Same multimodal capabilities.

    Example: Process image-based monitoring alerts faster than the dense variant at the same quality.

    LMSYS Arena #6 text
    MoE efficiencymultimodalimages + videoApache 2.0
    Hardware to self-host
    VRAM: 18GB (quantized) / 54GB (FP16)
    GPU: RTX 4090 24GB or 1× A100 40GB
    RAM: 32GB+ system RAM

    27B total MoE — faster inference than the 31B dense thanks to sparse activations.

    API: Ollama, vLLM, Hugging Face. ollama run gemma4:27b-moe

  • Gemma 4 E4BOpen

    Google · 128K tokens · self-host

    Best for: Edge, mobile, IoT, on-device AI with multimodal input

    How: 4B params — runs on any hardware. Supports images, video, AND native audio input.

    Example: Run on a Raspberry Pi to process security camera feeds with voice commands.

    tinyon-devicemultimodal + audioApache 2.0
    Hardware to self-host
    VRAM: 3GB (quantized) / 8GB (FP16)
    GPU: Any — CPU, phone, Jetson, Raspberry Pi 5, integrated GPU
    RAM: 4-8GB system RAM

    4B params. Edge-first design: runs on phones, SBCs, IoT devices.

    API: Ollama, Hugging Face. Runs on phones and Raspberry Pi.

  • DeepSeek V3.2Open

    DeepSeek · 164K tokens · self-host

    Best for: Long-context coding, upgraded V3 deployments

    How: Drop-in upgrade from V3. Uses Dynamic Sparse Attention for better long-context performance.

    Example: Feed your entire microservice codebase and get cross-service dependency analysis.

    HumanEval 94.0%
    codingmathsparse attention (DSA)MIT licenseimproved context
    Hardware to self-host
    VRAM: 350GB (quantized)
    GPU: 8× H100 80GB
    RAM: 512GB+ system RAM

    Same hardware footprint as V3 — 671B with sparse attention.

    API: api.deepseek.com OR self-host via vLLM. Same OpenAI-compatible API.

  • Mistral Large 3Open

    Mistral · 256K tokens · self-host

    Best for: European deployments, agent workflows, long-context multilingual apps

    How: Major upgrade from Large 2. MoE architecture with 41B active params. Same API, just change model ID.

    Example: Build a multi-tool agent that queries DBs, calls APIs, and generates reports in 30+ languages.

    MoE 41B active / 675B totalmultilingualfunction calling256K context
    Hardware to self-host
    VRAM: 350GB (quantized)
    GPU: 8× H100 80GB
    RAM: 512GB+ system RAM

    675B MoE (41B active). Datacenter class — most users go via api.mistral.ai.

    API: api.mistral.ai OR self-host via vLLM. OpenAI-compatible.

  • Ministral 3 (3B/8B/14B)Open

    Mistral · 128K tokens · self-host

    Best for: Edge deployment, on-device AI, lightweight vision tasks

    How: 3B fits on phones, 8B on laptops, 14B on dev GPUs. All have vision support.

    Example: Run 8B on a Jetson to classify manufacturing defects from camera feeds.

    edge-friendlyvisiondense3 sizes
    Hardware to self-host
    VRAM: 2GB (3B) / 6GB (8B) / 10GB (14B quantized)
    GPU: Phone/CPU (3B) · Laptop GPU (8B) · RTX 3060+ (14B)
    RAM: 8-16GB system RAM

    All three sizes are dense with vision. 3B runs on phones, 8B on laptops, 14B on dev GPUs.

    API: Ollama, vLLM, Hugging Face. Also on Mistral API.

  • Seedance 2.0 Pro

    ByteDance · N/A · credit-based → per-second

    Best for: Cost-sensitive Chinese-market video, fast iteration on social shorts, longer narrative clips.

    How: Two tiers — Seedance 2.0 Pro for quality and Seedance 2.0 Lite for fast/cheap drafts. Both expose text-to-video and image-to-video; v2 adds longer shot length and stronger prompt adherence over v1.

    Example: POST volcengineapi.com/seedance/v2/videos { prompt: 'a hummingbird in flight, slow motion', mode: 'pro' }

    high-fidelity 1080p (with 4K up-res)longer coherent shotsimproved motion physicsLite tier for cheap iteration

    API: Volcengine / ByteDance API

    Successor to Seedance 1.0 (2025-06). ByteDance's competitor to Sora / Veo / Kling. The Lite tier remains notably cheaper than competitors at comparable quality.

  • Sora 2

    OpenAI · N/A · see openai.com/pricing → per-second tiered

    Best for: Marketing reels, b-roll, storyboard previz, social-media shorts.

    How: Generate up to 60s clips from a text prompt or seed image. Audio and lip-sync included.

    Example: client.videos.generate(model='sora-2', prompt='aerial shot of a coastal city at sunrise, 1080p, 10s')

    high-fidelity 1080p videorealistic motion physicslong-shot consistencyaudio + dialogue generation

    API: api.openai.com — client.videos.generate(model='sora-2')

    Successor to Sora 1 — adds native audio and longer coherent shots.

  • Wan 2.2Open

    Alibaba · N/A · self-host

    Best for: Best-in-class open-source video. The 5B variant runs on a single 24GB consumer card.

    How: ComfyUI nodes ship official support. Or `python generate.py --task t2v-A14B --prompt '...'` from the WanX repo.

    Example: python generate.py --task t2v-A14B --prompt 'a corgi running on the beach at sunset' --resolution 720P

    MoE video architectureopen weightsT2V + I2V5B small variant for consumer GPUs720p output
    Hardware to self-host
    VRAM: 24GB (5B variant) / 80GB (A14B)
    GPU: RTX 4090 for 5B · H100 for A14B
    RAM: 32–64GB system RAM

    5B model fits a single 4090. A14B MoE delivers Sora-class quality but needs an H100 or 2× 4090 with offload.

    API: huggingface.co/Wan-AI/Wan2.2-T2V-A14B · Wan-AI/Wan2.2-T2V-5B

  • Kimi K2.5

    Moonshot AI · 256K tokens · $0.55/M → $2.19/M

    Best for: Budget alternative to flagship models, Chinese language tasks

    How: OpenAI SDK with base_url='https://api.moonshot.ai/v1'. WARNING: has implicit reasoning that eats max_tokens.

    Example: Use moonshot-v1-8k instead for structured JSON tasks — kimi-k2.5 wastes tokens on hidden thinking.

    reasoningmultimodalcheap

    API: api.moonshot.ai — OpenAI-compatible

    Watch:hidden thinking burns tokenstemperature locked to 1
  • Claude Opus 4.6

    Anthropic · 1M tokens · $15/M → $75/M

    Best for: Complex multi-step coding, large codebase refactors, long-document analysis

    How: Best via Claude Code CLI for coding tasks. For API: messages.create() with system prompt + tools.

    Example: claude-code: point it at a repo, describe the feature, it reads/edits/tests autonomously.

    SWE-bench 72.5%GPQA Diamond 74.9%HumanEval 95.4%
    reasoninglong contexttool useagentic workflowscode generation

    API: api.anthropic.com — SDK: pip install anthropic / npm i @anthropic-ai/sdk

  • Claude Sonnet 4.6

    Anthropic · 200K tokens · $3/M → $15/M

    Best for: Production API backends, real-time chat, moderate complexity coding

    How: Drop-in replacement for Opus when you need faster/cheaper. Same API, just change model ID.

    Example: Use as the default model in your API gateway — upgrade to Opus only for hard problems.

    SWE-bench 65.2%HumanEval 93.8%
    speedcost-efficiencycodingtool use

    API: api.anthropic.com — same SDK as Opus

  • Gemini 2.5 Flash

    Google · 1M tokens · $0.15/M → $0.60/M

    Best for: High-volume processing, real-time apps, budget-conscious pipelines

    How: Set thinking_budget to control reasoning cost. 0 = no thinking, 24576 = max.

    Example: Summarize 1000 GitHub issues per hour for a triage dashboard at ~$1.

    speedcostlong contextthinking budget control

    API: Same SDK as Gemini Pro. model='gemini-2.5-flash-preview-05-20'

  • Veo 3

    Google · N/A · Vertex AI pricing → per-second tiered

    Best for: Photoreal cinematic clips, ad creative, talking-head shorts with audio.

    How: Vertex AI: generate(model='veo-3.0-generate-preview', prompt='...'). Gemini API exposes the same model.

    Example: ai.models.generate_videos(model='veo-3.0-generate-preview', prompt='timelapse of a city under heavy rain')

    1080p / 4K up-ressynchronized audiostrong prompt adherence8s native, longer with stitching

    API: Vertex AI / Gemini API — model: veo-3.0-generate-preview

  • Claude Haiku 4.5

    Anthropic · 200K tokens · $0.80/M → $4/M

    Best for: Pipelines, batch processing, structured data extraction, routing

    How: Use for high-volume, low-complexity tasks: classification, extraction, summarization.

    Example: Process 10K support tickets per hour to classify priority and extract entities.

    HumanEval 88.5%
    speedcoststructured outputclassification

    API: api.anthropic.com — same SDK

  • GPT-4.1

    OpenAI · 1M tokens · $2/M → $8/M

    Best for: General-purpose API integration, multimodal apps, coding assistance

    How: client.chat.completions.create(model='gpt-4.1', messages=[...]). Supports vision, tools, JSON mode.

    Example: Build a PR review bot that reads diffs + screenshots and posts comments.

    SWE-bench 54.6%HumanEval 95.3%
    codinginstruction followinglong contextmultimodal

    API: api.openai.com — SDK: pip install openai / npm i openai

  • GPT-4.1 mini

    OpenAI · 1M tokens · $0.40/M → $1.60/M

    Best for: Embeddings preprocessing, log parsing, lightweight generation

    How: Same API as GPT-4.1. Best for high-volume, simple tasks where cost matters.

    Example: Parse 50K structured logs per hour and extract error patterns.

    SWE-bench 28.8%HumanEval 92.5%
    costspeedlong context

    API: api.openai.com — same SDK

  • GPT-4.1 nano

    OpenAI · 1M tokens · $0.10/M → $0.40/M

    Best for: Intent classification, entity extraction at massive scale

    How: Use for routing, tagging, simple extraction where quality bar is lower.

    Example: Route 1M incoming messages per day to the right service for $4 total.

    ultra-cheapfastclassification

    API: api.openai.com — same SDK

  • o3

    OpenAI · 200K tokens · $2/M → $8/M

    Best for: Hard math, science, multi-step planning, complex debugging

    How: Use reasoning_effort param: 'low'/'medium'/'high'. No system prompt — use developer message instead.

    Example: Debug a distributed system deadlock by feeding it the full trace + architecture.

    GPQA Diamond 79.7%AIME 2024 96.7%SWE-bench 69.1%
    reasoningmathscienceplanning

    API: api.openai.com — same SDK, just model='o3'

  • o4-mini

    OpenAI · 200K tokens · $1.10/M → $4.40/M

    Best for: Coding with reasoning, moderate-complexity math, budget reasoning

    How: Cheaper reasoning model. Use when o3 is overkill but you need chain-of-thought.

    Example: Generate a migration plan for a database schema change with safety checks.

    AIME 2024 93.4%SWE-bench 68.1%
    reasoningcodingcost-efficient reasoning

    API: api.openai.com — same SDK

  • Llama 4 MaverickOpen

    Meta · 1M tokens · self-host

    Best for: Self-hosted production deployments, privacy-sensitive workloads

    How: ollama run llama4-maverick OR deploy on vLLM with tensor parallelism. Also available hosted on Together/Groq.

    Example: Deploy on 2x A100 GPUs behind your API gateway for private code review.

    MMLU 88.4%HumanEval 84.8%
    multilingualmultimodalMoE architecture17B active / 400B total
    Hardware to self-host
    VRAM: 200GB (quantized)
    GPU: 2× H100 80GB or 4× A100 80GB
    RAM: 256GB system RAM

    400B total params (17B active). FP16 needs ~800GB, FP8 ~400GB, INT4 ~200GB.

    API: Self-host via vLLM, Ollama, or use via Together, Fireworks, Groq

  • Llama 4 ScoutOpen

    Meta · 10M tokens · self-host

    Best for: Processing entire codebases, very long documents, single-GPU deployments

    How: Fits on a single H100. Best open model for extreme context lengths.

    Example: Feed your entire monorepo into context and ask about cross-service dependencies.

    MMLU 86.2%
    longest context (10M)MoE 17B active / 109B totalfits single H100
    Hardware to self-host
    VRAM: 80GB
    GPU: 1× H100 80GB
    RAM: 128GB system RAM

    17B active params, fits in a single H100 at FP8.

    API: Same as Maverick — vLLM, Ollama, Together, Fireworks

  • Qwen 3 235BOpen

    Alibaba · 128K tokens · self-host

    Best for: Flexible thinking control, commercial self-hosting, multilingual

    How: Supports /think and /no_think tags to toggle reasoning on/off per request. Apache 2.0 = fully commercial.

    Example: Use /no_think for fast classification, /think for complex debugging — same model.

    AIME 2024 85.7%HumanEval 90.2%
    hybrid thinkingMoE 22B activeApache 2.0multilingual
    Hardware to self-host
    VRAM: 140GB (quantized)
    GPU: 4× A100 80GB or 2× H100
    RAM: 256GB+ system RAM

    235B total (22B active). MoE architecture — only 22B params active per forward pass.

    API: Self-host via vLLM/SGLang or use via Together, Fireworks. Also on Alibaba Cloud.

  • Qwen 3 30BOpen

    Alibaba · 128K tokens · self-host

    Best for: Local development, laptop-friendly reasoning, privacy

    How: Excellent for local dev. MoE means only 3B params active — fast on consumer hardware.

    Example: Run on your dev machine as a private coding assistant with reasoning.

    AIME 2024 66.7%
    MoE 3B active / 30B totalruns on consumer GPUhybrid thinking
    Hardware to self-host
    VRAM: 20GB (quantized) / 60GB (FP16)
    GPU: RTX 4090 24GB (quantized) or 1× A100
    RAM: 32GB+ system RAM

    30B total (3B active). The 3B active params make inference fast on consumer hardware.

    API: ollama run qwen3:30b — fits on RTX 4090 (24GB)

  • GPT-Image-1

    OpenAI · N/A · $5/M tokens → $40/M tokens

    Best for: UI mockups, marketing assets, diagrams with text

    How: Supports text overlays, inpainting, and style control. Best text rendering of any model.

    Example: Generate architecture diagrams with accurate labels from a text description.

    text renderinginstruction followingediting

    API: api.openai.com — client.images.generate(model='gpt-image-1')

  • Kling 2.1

    Kuaishou · N/A · credit-based → per-second

    Best for: Cost-sensitive video generation, dance / sports content.

    How: Cheaper alternative to Sora/Veo with strong human-motion fidelity.

    Example: POST klingai.com/v1/videos/text2video { prompt: '...', duration: 10 }

    realistic human motionlong shot generationcompetitive quality at lower price

    API: klingai.com — REST API

  • Gemini 2.5 Pro

    Google · 1M tokens · $1.25/M → $10/M

    Best for: Long-document analysis, multimodal tasks, apps needing search grounding

    How: client.models.generate_content(model='gemini-2.5-pro', contents=[...]). Supports grounding with Google Search.

    Example: Feed a 200-page architecture doc and ask it to find security issues.

    SWE-bench 63.8%GPQA Diamond 67.2%
    multimodallong contextsearch groundingcode generation

    API: generativelanguage.googleapis.com — SDK: pip install google-genai

  • Gemma 3 27BOpen

    Google · 128K tokens · self-host

    Best for: On-device/edge deployment, multimodal at small scale

    How: ollama run gemma3:27b. Fits on RTX 3090/4090. Good multimodal + tool use at small size.

    Example: Run on a dev server to process screenshots and generate bug reports.

    MMLU 75.6%HumanEval 78.0%
    compactmultimodalruns on single GPUfunction calling
    Hardware to self-host
    VRAM: 18GB (quantized) / 54GB (FP16)
    GPU: RTX 3090/4090 24GB or 1× A100 40GB
    RAM: 32GB+ system RAM

    27B dense. Fits on a single high-end consumer GPU with quantization.

    API: Ollama, vLLM, Hugging Face. Also on Vertex AI.

  • Runway Gen-4

    Runway · N/A · credit-based → per-second

    Best for: Short narrative content where the same character appears in multiple scenes.

    How: Pass a reference image and a prompt; returns a 5–10s clip with consistent characters across shots.

    Example: POST runwayml.com/v1/image_to_video { promptImage: ..., promptText: '...' }

    character & object consistency across shotsimage-to-videolip-synccreator workflow

    API: runwayml.com — REST API + web app

  • Grok 3

    xAI · 128K tokens · $3/M → $15/M

    Best for: Tasks needing real-time information, math-heavy problems

    How: OpenAI SDK with base_url override. Also supports live search via tools.

    Example: Monitor real-time tech news and generate summaries using live search.

    GPQA Diamond 68.2%AIME 2024 93.3%
    reasoningreal-time datamath

    API: api.x.ai — OpenAI-compatible SDK. Set base_url='https://api.x.ai/v1'

  • Grok 3 mini

    xAI · 128K tokens · $0.30/M → $0.50/M

    Best for: Budget reasoning tasks, math, lightweight chain-of-thought

    How: Excellent cost-to-reasoning ratio. Use reasoning_effort param.

    Example: Validate Terraform plans with reasoning about dependency chains for pennies.

    fast reasoningvery cheapmath

    API: api.x.ai — same as Grok 3

  • Nomic Embed Text v2-MoEOpen

    Nomic AI · 8K tokens · self-host

    Best for: Self-hosted RAG, privacy-first search, zero-cost embeddings

    How: Self-host for zero cost. Comparable quality to OpenAI embeddings.

    Example: Run alongside pgvector on the same server — full RAG pipeline with zero API costs.

    MoE embeddingmatryoshkaApache 2.0self-hostable
    Hardware to self-host
    VRAM: 2GB or CPU-only
    GPU: Any — runs on CPU at reasonable speed
    RAM: 4-8GB system RAM

    Tiny MoE embedding model. CPU inference is fast enough for most use cases.

    API: pip install nomic OR Ollama. Also hosted on Nomic Atlas.

  • Step-Video-T2VOpen

    StepFun · N/A · self-host

    Best for: Highest-quality open-source video model when you have the hardware to run it.

    How: Clone stepfun-ai/Step-Video-T2V repo, install requirements, run sample.py with your prompt.

    Example: python sample_video.py --prompt 'underwater coral reef, schools of fish' --num-frames 204 --resolution 544x992

    30B paramsMIT licensecompetitive with Sora-class qualitydeep compression video VAE
    Hardware to self-host
    VRAM: 80GB (FP16) / 40GB (FP8)
    GPU: H100 80GB · or 2× A100 40GB · or A100 40GB with FP8
    RAM: 128GB system RAM

    30B is genuinely heavy but the MIT license + quality tradeoff is uniquely permissive in the open-source video space.

    API: huggingface.co/stepfun-ai/stepvideo-t2v

  • DeepSeek R1Open

    DeepSeek · 128K tokens · self-host

    Best for: Budget reasoning, self-hosted chain-of-thought, research

    How: API is OpenAI-compatible. Self-host the 70B distill on 2x A100. MIT license = no restrictions.

    Example: Run the 14B distill locally for debugging complex distributed system issues.

    AIME 2024 79.8%SWE-bench 49.2%GPQA Diamond 71.5%
    reasoningmathcodingMIT licensedistillable
    Hardware to self-host
    VRAM: 10GB (14B distill) / 48GB (70B distill) / 1TB+ (full 671B)
    GPU: RTX 4090 (14B) · 2× A100 (70B) · 8× H100 (full)
    RAM: Full model needs 256GB+ system RAM

    Full 671B MoE is massive. Distilled versions (14B, 32B, 70B) are far more practical.

    API: api.deepseek.com ($0.55/M in, $2.19/M out) OR self-host via vLLM/Ollama

  • Codestral 25.01Open

    Mistral · 256K tokens · self-host

    Best for: Code completion, inline suggestions, editor integration

    How: Supports FIM for inline completion. Integrate with any editor via LSP or Continue.dev.

    Example: Deploy as your team's FIM-capable completion server behind an LSP proxy.

    HumanEval 91.0%
    code completionFIM (fill-in-middle)80+ languages
    Hardware to self-host
    VRAM: 16GB (quantized) / 45GB (FP16)
    GPU: RTX 4090 24GB or 1× A100 40GB
    RAM: 32GB+ system RAM

    22B dense. Fits on a single consumer GPU with quantization.

    API: codestral.mistral.ai — dedicated code endpoint

  • Llama 3.3 70BOpen

    Meta · 128K tokens · self-host

    Best for: Proven workhorse for self-hosted deployments, fine-tuning base

    How: ollama run llama3.3:70b. For production: vLLM on 2x A100 or 4x A10G.

    Example: Fine-tune on your internal docs for a private knowledge base chatbot.

    MMLU 86.0%HumanEval 88.4%
    mature ecosystemfine-tuning friendlywide hardware support
    Hardware to self-host
    VRAM: 40GB (4-bit) / 140GB (FP16)
    GPU: 2× A100 80GB or 4× A10G 24GB
    RAM: 64GB+ system RAM

    70B dense. Widely supported — runs on Ollama with quantization on 48GB VRAM.

    API: Ollama, vLLM, TGI, or hosted (Together $0.60/M, Groq, Fireworks)

  • DeepSeek V3Open

    DeepSeek · 128K tokens · self-host

    Best for: Cost-sensitive production APIs, coding tasks, math-heavy pipelines

    How: Cheapest top-tier API. OpenAI-compatible. Self-host needs 8x A100.

    Example: Replace GPT-4 in your CI pipeline for automated code review at 1/10th the cost.

    HumanEval 92.1%MMLU 88.5%
    codingmathMoE 37B active / 671B totalMIT license
    Hardware to self-host
    VRAM: 350GB (quantized) / 1.3TB (FP16)
    GPU: 8× H100 80GB or 8× A100 80GB
    RAM: 512GB+ system RAM

    671B total (37B active). Most users rent via API — self-hosting needs datacenter hardware.

    API: api.deepseek.com ($0.27/M in, $1.10/M out) OR self-host

  • Phi-4Open

    Microsoft · 16K tokens · self-host

    Best for: Edge deployment, STEM tasks, embedded AI in products

    How: ollama run phi4. MIT license — embed in commercial products freely.

    Example: Embed in a CI pipeline to validate config files and Terraform plans.

    GPQA Diamond 56.2%MATH 80.4%
    14B paramsSTEM reasoningMIT licenseruns on laptop
    Hardware to self-host
    VRAM: 9GB (quantized) / 28GB (FP16)
    GPU: Any 8GB+ GPU (RTX 3060, laptop 4050, etc.)
    RAM: 16GB system RAM

    14B dense. Runs locally on most developer laptops with quantization.

    API: Ollama, Hugging Face, Azure AI

  • Pika 2.2

    Pika Labs · N/A · credit-based → per-clip

    Best for: Social shorts, music videos, rapid creative iteration.

    How: Strong for short, stylized clips and quick iteration. Pikaframes lets you set start/end frames.

    Example: Use Pikaframes: upload start + end image, prompt the in-between motion.

    fast iterationpikaframes (keyframe interpolation)lipsync

    API: pika.art — web app + API

  • HunyuanVideoOpen

    Tencent · N/A · self-host

    Best for: Self-hosted video generation, research, building custom pipelines.

    How: Clone repo, install diffusers, run sample.py with your prompt. Or use ComfyUI workflows.

    Example: python sample_video.py --prompt 'a cat surfing at sunset' --video-length 129 --infer-steps 50

    fully open weights13B paramscompetitive qualityfine-tunable
    Hardware to self-host
    VRAM: 60GB
    GPU: H100 80GB or 2× RTX 4090 (with offload)
    RAM: 64GB system RAM

    Quantized to 8-bit fits on a single A100 40GB. Comfy workflows offload UNet to CPU at the cost of speed.

    API: huggingface.co/tencent/HunyuanVideo

  • Qwen 2.5 Coder 32BOpen

    Alibaba · 128K tokens · self-host

    Best for: Private code completion, self-hosted Copilot replacement

    How: ollama run qwen2.5-coder:32b. Plug into Continue.dev or Copilot alternatives.

    Example: Set up as your team's private code completion backend — zero data leaves your infra.

    HumanEval 92.7%LiveCodeBench 48.5%
    code completioncode generationApache 2.0
    Hardware to self-host
    VRAM: 20GB (quantized) / 64GB (FP16)
    GPU: RTX 4090 24GB or 1× A100 40GB
    RAM: 32GB+ system RAM

    32B dense. Fits on a single consumer GPU with 4-bit quantization.

    API: Ollama, vLLM, or hosted on Together/Fireworks

  • LTX-VideoOpen

    Lightricks · N/A · self-host

    Best for: When you need quick turnaround — prototypes, drafts, dataset generation. Speed-first open-source video.

    How: diffusers LTXPipeline. Generates a 5-second 768×512 clip in seconds on an H100; a few minutes on a 4090.

    Example: from diffusers import LTXPipeline; pipe = LTXPipeline.from_pretrained('Lightricks/LTX-Video'); pipe(prompt='falling autumn leaves').frames

    real-time generation on H1002B params (small footprint)fast iteration loop13B variant for higher quality
    Hardware to self-host
    VRAM: 12GB (2B) / 24GB (13B)
    GPU: RTX 3090 / 4090 for 2B · H100 for 13B
    RAM: 32GB system RAM

    Smallest/fastest of the open-source video models — great for iterating on prompts before committing GPU time to bigger models.

    API: huggingface.co/Lightricks/LTX-Video · Lightricks/LTX-Video-13B

  • Mochi 1Open

    Genmo · N/A · self-host

    Best for: Open-source video where commercial use matters — the Apache 2.0 license is unrestricted.

    How: Diffusers pipeline or the official genmoai/models repo. ComfyUI workflows are well-documented.

    Example: python -m mochi_preview.cli --prompt 'time lapse of a city street at golden hour' --num-frames 84

    Apache 2.0 (commercial-friendly)10B paramshigh motion fidelityactive community fine-tunes
    Hardware to self-host
    VRAM: 60GB (full precision) / 24GB with quantization
    GPU: H100 80GB · or RTX 4090 with FP8 quant + offload
    RAM: 64GB system RAM

    Memory-hungry at full precision but FP8 / GGUF quants from the community fit a single 4090.

    API: huggingface.co/genmo/mochi-1-preview

  • Flux.1 Pro

    Black Forest Labs · N/A · $0.05/image → N/A

    Best for: High-quality image generation, product photography

    How: API or self-host Flux.1 Schnell (open). Pro via API only.

    Example: Generate product mockups for landing pages programmatically.

    photorealismprompt adherencecommercial license

    API: api.bfl.ml OR via Replicate, fal.ai

  • CogVideoX-5BOpen

    THUDM / Zhipu AI · N/A · self-host

    Best for: Pioneer open-source T2V — solid baseline for self-hosted experimentation and fine-tuning.

    How: pip install diffusers; CogVideoXPipeline.from_pretrained('THUDM/CogVideoX-5b') and run with a text prompt.

    Example: from diffusers import CogVideoXPipeline; pipe = CogVideoXPipeline.from_pretrained('THUDM/CogVideoX-5b'); pipe(prompt='a panda playing piano').frames

    fully open weights5B paramsfits on a single 24GB carddiffusers integration
    Hardware to self-host
    VRAM: 18GB (with CPU offload) / 24GB native
    GPU: RTX 4090 24GB
    RAM: 32GB system RAM

    Quantized + offload tricks let it run on 12GB. Slower than newer entries but the most fine-tuned-on open video model.

    API: huggingface.co/THUDM/CogVideoX-5b

  • Moonshot v1 (8K/32K/128K)

    Moonshot AI · 8K / 32K / 128K tokens · $0.14/M → $0.28/M

    Best for: Batch processing, structured extraction, JSON pipelines

    How: Best for structured output tasks. Supports response_format: json_object. No reasoning overhead.

    Example: Process RSS feeds into structured summaries for pennies per 1000 articles.

    very cheapno hidden reasoningreliable JSON

    API: api.moonshot.ai — OpenAI-compatible. model='moonshot-v1-8k'

  • text-embedding-3-large

    OpenAI · 8K tokens · $0.13/M → N/A

    Best for: RAG pipelines, semantic search, document retrieval

    How: Set dimensions param to reduce size (e.g., 256 for fast search, 3072 for max quality).

    Example: Index your internal docs and build a search API with pgvector + this model.

    3072 dimensionsstrong retrievalmatryoshka support

    API: api.openai.com — client.embeddings.create(model='text-embedding-3-large')

  • ESM2

    NVIDIA · 128K tokens · api

    Best for: computational biology tasks

    How: Fine-tune ESM2 using NVIDIA BioNeMo recipes

    Example: Fine-tuning ESM2 with LoRA for specific protein tasks

    protein language understandinggenomic sequences

    Auto-discovered from news articles.

  • NVIDIA BioNeMo

    NVIDIA · N/A · api

    Best for: Computational biology tasks

    How: Use NVIDIA BioNeMo recipes for fine-tuning

    Example: Fine-tuning ESM2 protein language models

    Fine-tuning biological foundation modelsPretrained on massive corpora of protein or genomic sequences

    Auto-discovered from news articles.

  • FastContext 1.0 4B SFTNewOpen

    microsoft · self-host

    Best for: Trending on HuggingFace (114 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("microsoft/FastContext-1.0-4B-SFT")

    transformerssafetensorsqwen3text-generationExplorer SubAgent

    API: huggingface.co/microsoft/FastContext-1.0-4B-SFT

    Auto-discovered from HuggingFace trending. 114 likes, 13 downloads.

  • MiMo V2.5 Pro FP4 DFlashNewOpen

    XiaomiMiMo · self-host

    Best for: Trending on HuggingFace (115 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash")

    transformerssafetensorsmimo_v2text-generationagent

    API: huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash

    Auto-discovered from HuggingFace trending. 115 likes, 4K downloads.

  • Gemma 4 12B Coder Fable5 Composer2.5 V1 GGUFNewOpen

    yuxinlu1 · self-host

    Best for: Trending on HuggingFace (736 likes this week)

    How: Available on Hugging Face. 20K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF")

    ggufgemma4codingcodereasoning

    API: huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF

    Auto-discovered from HuggingFace trending. 736 likes, 20K downloads.

  • Rio 3.5 Open 397BNewOpen

    prefeitura-rio · self-host

    Best for: Trending on HuggingFace (304 likes this week)

    How: Available on Hugging Face. 189K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("prefeitura-rio/Rio-3.5-Open-397B")

    transformerssafetensorsqwen3_5_moeimage-text-to-textconversational

    API: huggingface.co/prefeitura-rio/Rio-3.5-Open-397B

    Auto-discovered from HuggingFace trending. 304 likes, 189K downloads.

  • Ryzen AI Halo

    AMD · N/A · api

    Best for: petite PC development

    How: work with either Microsoft Windows or Linux

    Example: use in AI development platforms

    Linux-friendlypowered by AMD Ryzen AI Max+

    Auto-discovered from news articles.

  • Qwopus3.6 27B Coder MTP GGUFNewOpen

    Jackrong · self-host

    Best for: Trending on HuggingFace (203 likes this week)

    How: Available on Hugging Face. 62K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF")

    transformersggufllama.cppimage-text-to-textvision

    API: huggingface.co/Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF

    Auto-discovered from HuggingFace trending. 203 likes, 62K downloads.

  • SCAIL 2NewOpen

    zai-org · self-host

    Best for: Trending on HuggingFace (191 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("zai-org/SCAIL-2")

    diffuserscharacter-animationvideo-generationpose-drivendiffusion

    API: huggingface.co/zai-org/SCAIL-2

    Auto-discovered from HuggingFace trending. 191 likes, 0 downloads.

  • MiniMax M3NewOpen

    MiniMaxAI · self-host

    Best for: Trending on HuggingFace (857 likes this week)

    How: Available on Hugging Face. 14K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M3")

    transformerssafetensorsminimax_m3_vlimage-text-to-textmultimodal

    API: huggingface.co/MiniMaxAI/MiniMax-M3

    Auto-discovered from HuggingFace trending. 857 likes, 14K downloads.

  • Kimi K2.7 CodeNewOpen

    moonshotai · self-host

    Best for: Trending on HuggingFace (756 likes this week)

    How: Available on Hugging Face. 57K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("moonshotai/Kimi-K2.7-Code")

    transformerssafetensorskimi_k25image-feature-extractioncompressed-tensors

    API: huggingface.co/moonshotai/Kimi-K2.7-Code

    Auto-discovered from HuggingFace trending. 756 likes, 57K downloads.

  • Claude Code

    Anthropic · 128K tokens · api

    Best for: use in infrastructure management tasks

    How: connect AI to your infrastructure through the Model Context Protocol (MCP)

    Example: AI assistants like GitHub Copilot, IBM Bob, Claude Code etc. to interact with Terraform through the Model Context Protocol (MCP)

    interacts with Terraformsupports infrastructure management

    Auto-discovered from news articles.

  • Gemma 4 26B A4B It Qat GGUFNewOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (144 likes this week)

    How: Available on Hugging Face. 129K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-26B-A4B-it-qat-GGUF")

    transformersggufgemma4image-text-to-textunsloth

    API: huggingface.co/unsloth/gemma-4-26B-A4B-it-qat-GGUF

    Auto-discovered from HuggingFace trending. 144 likes, 129K downloads.

  • Diffusiongemma 26B A4B It GGUFNewOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (277 likes this week)

    How: Available on Hugging Face. 107K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/diffusiongemma-26B-A4B-it-GGUF")

    ggufgemma4unslothgemmagoogle

    API: huggingface.co/unsloth/diffusiongemma-26B-A4B-it-GGUF

    Auto-discovered from HuggingFace trending. 277 likes, 107K downloads.

  • DiffusionGemma

    NVIDIA · 128K tokens · api

    Best for: real-time AI applications such as chat assistants, copilots, and agentic workflows

    How: Run DiffusionGemma on NVIDIA for high-throughput text generation

    Example: Developers can leverage DiffusionGemma for building real-time AI applications

    Developer-ReadyHigh-ThroughputText Generation

    Auto-discovered from news articles.

  • Nex N2 MiniNewOpen

    nex-agi · self-host

    Best for: Trending on HuggingFace (220 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("nex-agi/Nex-N2-mini")

    transformerssafetensorsqwen3_5_moeimage-text-to-texttext-generation

    API: huggingface.co/nex-agi/Nex-N2-mini

    Auto-discovered from HuggingFace trending. 220 likes, 8K downloads.

  • Diffusiongemma 26B A4B ItNewOpen

    google · self-host

    Best for: Trending on HuggingFace (895 likes this week)

    How: Available on Hugging Face. 312K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("google/diffusiongemma-26B-A4B-it")

    transformerssafetensorsdiffusion_gemmaimage-text-to-textconversational

    API: huggingface.co/google/diffusiongemma-26B-A4B-it

    Auto-discovered from HuggingFace trending. 895 likes, 312K downloads.

  • Claude Mythos 5New

    Anthropic · 1M tokens · →

    Best for: Available through Project Glasswing. Successor to Claude Mythos Preview.

    How: client.messages.create({model: "claude-mythos-5", messages: [...]})

    Example: Use via the Anthropic SDK with model='claude-mythos-5'.

    1M tokens contextadaptive thinking128k tokens max output

    API: api.anthropic.com — model: claude-mythos-5 · AWS Bedrock · GCP Vertex AI

    Max output: 128k tokens. Adaptive thinking enabled by default.

  • Claude Fable 5New

    Anthropic · 1M tokens · →

    Best for: Anthropic's most capable widely released model, for the most demanding reasoning and long-horizon agentic work

    How: client.messages.create({model: "claude-fable-5", messages: [...]})

    Example: Use via the Anthropic SDK with model='claude-fable-5'.

    1M tokens contextadaptive thinking128k tokens max outputagentic coding

    API: api.anthropic.com — model: claude-fable-5 · AWS Bedrock · GCP Vertex AI

    Max output: 128k tokens. Adaptive thinking enabled by default.

  • NVIDIA Nemotron Speech

    NVIDIA · api

    Best for: Training speech AI models for clinical applications

    How: Evaluate Clinical ASR Models Faster with Agent Skills and NVIDIA Nemotron Speech

    Example: Training a speech AI model to correctly recognize drug names like Acetaminophen, Amlodipine

    Recognizing or synthesizing clinical terminology

    Auto-discovered from news articles.

  • Gemma 4 12B OBLITERATEDNewOpen

    OBLITERATUS · self-host

    Best for: Trending on HuggingFace (326 likes this week)

    How: Available on Hugging Face. 71K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("OBLITERATUS/Gemma-4-12B-OBLITERATED")

    transformerssafetensorsggufgemma4_unifiedimage-text-to-text

    API: huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED

    Auto-discovered from HuggingFace trending. 326 likes, 71K downloads.

  • Nex N2 ProNewOpen

    nex-agi · self-host

    Best for: Trending on HuggingFace (288 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("nex-agi/Nex-N2-Pro")

    transformerssafetensorsqwen3_5_moeimage-text-to-texttext-generation

    API: huggingface.co/nex-agi/Nex-N2-Pro

    Auto-discovered from HuggingFace trending. 288 likes, 4K downloads.

  • North Mini Code 1.0NewOpen

    CohereLabs · self-host

    Best for: Trending on HuggingFace (394 likes this week)

    How: Available on Hugging Face. 11K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("CohereLabs/North-Mini-Code-1.0")

    transformerssafetensorscohere2_moetext-generationconversational

    API: huggingface.co/CohereLabs/North-Mini-Code-1.0

    Auto-discovered from HuggingFace trending. 394 likes, 11K downloads.

  • Google Gemini modelsNew

    Google · 128K tokens · api

    Best for: AI applications

    How: integrate with Apple's new AI architecture

    Example: use in AI-powered applications

    AI architectureinnovative

    Auto-discovered from news articles.

  • NVIDIA Nemotron 3 Ultra 550B A55B NVFP4NewOpen

    nvidia · self-host

    Best for: Trending on HuggingFace (160 likes this week)

    How: Available on Hugging Face. 91K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4")

    transformerssafetensorsnemotron_htext-generationnvidia

    API: huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4

    Auto-discovered from HuggingFace trending. 160 likes, 91K downloads.

  • Nemotron 3 Ultra

    NVIDIA · api

    Best for: maintaining context and completing tasks across many turns

    How: deploy on Renesas RZ/V series for production

    Example: use in chatbots evolving into long-running agents

    faster reasoningmore efficientlong-running agents

    Auto-discovered from news articles.

  • Claude Opus 4.8New

    Anthropic · 1M tokens · $5/M → $25/M

    Best for: Anthropic's most capable Opus-tier model for complex reasoning and agentic coding

    How: client.messages.create({model: "claude-opus-4-8", messages: [...]})

    Example: Use via the Anthropic SDK with model='claude-opus-4-8'.

    1M tokens contextadaptive thinking128k tokens max outputagentic coding

    API: api.anthropic.com — model: claude-opus-4-8 · AWS Bedrock · GCP Vertex AI

    Max output: 128k tokens. Adaptive thinking enabled by default.

  • MisoTTSNewOpen

    MisoLabs · self-host

    Best for: Trending on HuggingFace (188 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("MisoLabs/MisoTTS")

    pytorchsafetensorstext-to-speechspeech-synthesisvoice

    API: huggingface.co/MisoLabs/MisoTTS

    Auto-discovered from HuggingFace trending. 188 likes, 0 downloads.

  • NVIDIA Nemotron 3 Ultra

    NVIDIA · api

    Best for: Maintaining context and efficiency across many turns

    How: Integrate with existing chatbot frameworks to enhance long-running agent capabilities

    Example: Use Nemotron 3 Ultra to power a chatbot that can reason and maintain context over multiple interactions

    Faster reasoningMore efficient for long-running agents

    Auto-discovered from news articles.

  • NVIDIA Nemotron 3 Ultra 550B A55B BF16NewOpen

    nvidia · self-host

    Best for: Trending on HuggingFace (189 likes this week)

    How: Available on Hugging Face. 59K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16")

    transformerssafetensorsnemotron_htext-generationnvidia

    API: huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

    Auto-discovered from HuggingFace trending. 189 likes, 59K downloads.

  • Higgs Audio V3 Tts 4bNewOpen

    bosonai · self-host

    Best for: Trending on HuggingFace (446 likes this week)

    How: Available on Hugging Face. 38K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("bosonai/higgs-audio-v3-tts-4b")

    transformerssafetensorshiggs_multimodal_qwen3text-generationtext-to-speech

    API: huggingface.co/bosonai/higgs-audio-v3-tts-4b

    Auto-discovered from HuggingFace trending. 446 likes, 38K downloads.

  • Gemma 4 QAT

    Google · 128K tokens · api

    Best for: Mobile and laptop applications requiring efficient AI models

    How: Integrate Gemma 4 QAT models into your application for on-device AI processing

    Example: Use Gemma 4 QAT for image recognition on smartphones with low latency and power consumption

    Optimizing compression for mobile and laptop efficiency

    Auto-discovered from news articles.

  • Nemotron 3.5 Asr Streaming 0.6bNewOpen

    nvidia · self-host

    Best for: Trending on HuggingFace (424 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("nvidia/nemotron-3.5-asr-streaming-0.6b")

    nemospeech-recognitioncache-aware ASRautomatic-speech-recognitionstreaming-asr

    API: huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b

    Auto-discovered from HuggingFace trending. 424 likes, 5K downloads.

  • Ideogram 4 Nf4NewOpen

    ideogram-ai · self-host

    Best for: Trending on HuggingFace (334 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("ideogram-ai/ideogram-4-nf4")

    diffuserssafetensorstext-to-imageimage-generationdiffusion

    API: huggingface.co/ideogram-ai/ideogram-4-nf4

    Auto-discovered from HuggingFace trending. 334 likes, 3K downloads.

  • Ideogram 4 Fp8NewOpen

    ideogram-ai · self-host

    Best for: Trending on HuggingFace (548 likes this week)

    How: Available on Hugging Face. 11K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("ideogram-ai/ideogram-4-fp8")

    diffuserssafetensorstext-to-imageimage-generationdiffusion

    API: huggingface.co/ideogram-ai/ideogram-4-fp8

    Auto-discovered from HuggingFace trending. 548 likes, 11K downloads.

  • Gemma 4 12b It GGUFNewOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (599 likes this week)

    How: Available on Hugging Face. 926K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-4-12b-it-GGUF")

    ggufgemma4unslothgemmagoogle

    API: huggingface.co/unsloth/gemma-4-12b-it-GGUF

    Auto-discovered from HuggingFace trending. 599 likes, 926K downloads.

  • Mellum2 12B A2.5B ThinkingNewOpen

    JetBrains · self-host

    Best for: Trending on HuggingFace (274 likes this week)

    How: Available on Hugging Face. 18K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("JetBrains/Mellum2-12B-A2.5B-Thinking")

    transformerssafetensorsmellumtext-generationconversational

    API: huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking

    Auto-discovered from HuggingFace trending. 274 likes, 18K downloads.

  • Bonsai Image Ternary 4B Gemlite 2bitNewOpen

    prism-ml · self-host

    Best for: Trending on HuggingFace (92 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("prism-ml/bonsai-image-ternary-4B-gemlite-2bit")

    diffuserssafetensorsternary1.58-bitgemlite

    API: huggingface.co/prism-ml/bonsai-image-ternary-4B-gemlite-2bit

    Auto-discovered from HuggingFace trending. 92 likes, 0 downloads.

  • MOSS TTS V1.5NewOpen

    OpenMOSS-Team · self-host

    Best for: Trending on HuggingFace (95 likes this week)

    How: Available on Hugging Face. 19K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("OpenMOSS-Team/MOSS-TTS-v1.5")

    safetensorsmoss_tts_delaytext-to-speechcustom_codezh

    API: huggingface.co/OpenMOSS-Team/MOSS-TTS-v1.5

    Auto-discovered from HuggingFace trending. 95 likes, 19K downloads.

  • Qwen3.6 35B A3B NVFP4NewOpen

    nvidia · self-host

    Best for: Trending on HuggingFace (193 likes this week)

    How: Available on Hugging Face. 822K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("nvidia/Qwen3.6-35B-A3B-NVFP4")

    Model Optimizersafetensorsqwen3_5_moenvidiaModelOpt

    API: huggingface.co/nvidia/Qwen3.6-35B-A3B-NVFP4

    Auto-discovered from HuggingFace trending. 193 likes, 822K downloads.

  • Phi-4-mini

    Microsoft · api

    Best for: expanding on-device AI capabilities in Microsoft Edge

    How: use Prompt and Writing Assistance APIs in Microsoft Edge

    Example: integrated with Microsoft Edge for on-device AI tasks

    on-device AInew models and APIs for the web

    Auto-discovered from news articles.

  • Mellum2

    JetBrains · api

    Best for: Advanced AI tasks

    How: Integrate Mellum2 into your AI workflows

    Example: Use Mellum2 for complex problem-solving and decision-making

    12B Mixture-of-Experts Model

    Auto-discovered from news articles.

  • NVIDIA Cosmos 3

    NVIDIA · N/A · api

    Best for: Developing Physical AI systems that need to understand and act within the real world

    How: Integrate NVIDIA Cosmos 3 into your Physical AI system to enable reasoning and action capabilities

    Example: Using NVIDIA Cosmos 3 to develop a robot that can understand and interact with its environment

    Physical AI reasoningAction modelsUnderstanding real world

    Auto-discovered from news articles.

  • PaddleOCR VL 1.6NewOpen

    PaddlePaddle · self-host

    Best for: Trending on HuggingFace (269 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("PaddlePaddle/PaddleOCR-VL-1.6")

    PaddleOCRsafetensorspaddleocr_vlERNIE4.5PaddlePaddle

    API: huggingface.co/PaddlePaddle/PaddleOCR-VL-1.6

    Auto-discovered from HuggingFace trending. 269 likes, 9K downloads.

  • LFM2.5 8B A1B GGUFNewOpen

    LiquidAI · self-host

    Best for: Trending on HuggingFace (177 likes this week)

    How: Available on Hugging Face. 87K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-8B-A1B-GGUF")

    ggufliquidlfm2edgellama.cpp

    API: huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF

    Auto-discovered from HuggingFace trending. 177 likes, 87K downloads.

  • Gemini 3.5

    Google · 128K tokens · api

    Best for: General AI applications

    How: Integrate with Google I/O 2026

    Example: Watch 9 videos showing the capabilities of Gemini 3.5

    Advanced capabilitiesHigh performance

    Auto-discovered from news articles.

  • Gemini Omni

    Google · 128K tokens · api

    Best for: General AI applications

    How: Integrate with Google I/O 2026

    Example: Watch 9 videos showing the capabilities of Gemini Omni

    Advanced capabilitiesHigh performance

    Auto-discovered from news articles.

  • Qwen3.6 27B OBLITERATEDNewOpen

    OBLITERATUS · self-host

    Best for: Trending on HuggingFace (120 likes this week)

    How: Available on Hugging Face. 17K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("OBLITERATUS/Qwen3.6-27B-OBLITERATED")

    transformerssafetensorsggufqwen3_5_texttext-generation

    API: huggingface.co/OBLITERATUS/Qwen3.6-27B-OBLITERATED

    Auto-discovered from HuggingFace trending. 120 likes, 17K downloads.

  • Step 3.7 FlashNewOpen

    stepfun-ai · self-host

    Best for: Trending on HuggingFace (359 likes this week)

    How: Available on Hugging Face. 47K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("stepfun-ai/Step-3.7-Flash")

    transformerssafetensorsstep3p7text-generationvision-language

    API: huggingface.co/stepfun-ai/Step-3.7-Flash

    Auto-discovered from HuggingFace trending. 359 likes, 47K downloads.

  • LFM2.5 8B A1BNewOpen

    LiquidAI · self-host

    Best for: Trending on HuggingFace (551 likes this week)

    How: Available on Hugging Face. 135K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("LiquidAI/LFM2.5-8B-A1B")

    transformerssafetensorslfm2_moetext-generationliquid

    API: huggingface.co/LiquidAI/LFM2.5-8B-A1B

    Auto-discovered from HuggingFace trending. 551 likes, 135K downloads.

  • LocateAnything 3BOpen

    nvidia · self-host

    Best for: Trending on HuggingFace (2063 likes this week)

    How: Available on Hugging Face. 87K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("nvidia/LocateAnything-3B")

    transformerssafetensorslocateanythingimage-feature-extractionnvidia

    API: huggingface.co/nvidia/LocateAnything-3B

    Auto-discovered from HuggingFace trending. 2063 likes, 87K downloads.

  • NVIDIA Blackwell

    NVIDIA · 128K tokens · api

    Best for: financial trading landscape

    How: Enables sophisticated analysis

    Example: revolutionizing financial trading landscape

    sophisticated analysisvast amounts of unstructured data

    Auto-discovered from news articles.

  • Lens TurboNewOpen

    microsoft · self-host

    Best for: Trending on HuggingFace (125 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("microsoft/Lens-Turbo")

    diffuserssafetensorstext-to-imageenarxiv:2605.21573

    API: huggingface.co/microsoft/Lens-Turbo

    Auto-discovered from HuggingFace trending. 125 likes, 1K downloads.

  • LensNewOpen

    microsoft · self-host

    Best for: Trending on HuggingFace (138 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("microsoft/Lens")

    diffuserssafetensorstext-to-imageenarxiv:2605.21573

    API: huggingface.co/microsoft/Lens

    Auto-discovered from HuggingFace trending. 138 likes, 1K downloads.

  • Qwopus3.6 27B V2 MTP GGUFNewOpen

    Jackrong · self-host

    Best for: Trending on HuggingFace (178 likes this week)

    How: Available on Hugging Face. 125K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jackrong/Qwopus3.6-27B-v2-MTP-GGUF")

    transformersggufllama.cppimage-text-to-textvision

    API: huggingface.co/Jackrong/Qwopus3.6-27B-v2-MTP-GGUF

    Auto-discovered from HuggingFace trending. 178 likes, 125K downloads.

  • ChatGPT

    OpenAI · 128K tokens · api

    Best for: conversational AI and content generation in Portuguese

    How: Use ChatGPT API to integrate with applications

    Example: Generate news articles in Portuguese

    dialoguecontent creationinformation retrieval

    Auto-discovered from news articles.

  • MiniCPM5 1BNewOpen

    openbmb · self-host

    Best for: Trending on HuggingFace (776 likes this week)

    How: Available on Hugging Face. 101K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B")

    transformerssafetensorsllamatext-generationminicpm

    API: huggingface.co/openbmb/MiniCPM5-1B

    Auto-discovered from HuggingFace trending. 776 likes, 101K downloads.

  • NVIDIA Cloud Partner (NCP) reference architecture

    NVIDIA · N/A · api

    Best for: governments, enterprises, and telcos

    How: N/A

    Example: N/A

    sovereign AI factoriesbased on NCP reference architecture

    Auto-discovered from news articles.

  • Qwopus3.6 27B V2 GGUFNewOpen

    Jackrong · self-host

    Best for: Trending on HuggingFace (183 likes this week)

    How: Available on Hugging Face. 29K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jackrong/Qwopus3.6-27B-v2-GGUF")

    transformersggufllama.cppimage-text-to-textvision

    API: huggingface.co/Jackrong/Qwopus3.6-27B-v2-GGUF

    Auto-discovered from HuggingFace trending. 183 likes, 29K downloads.

  • SANA WM_bidirectionalNewOpen

    Efficient-Large-Model · self-host

    Best for: Trending on HuggingFace (86 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Efficient-Large-Model/SANA-WM_bidirectional")

    diffuserssafetensorstext-to-videoimage-to-videocamera-control

    API: huggingface.co/Efficient-Large-Model/SANA-WM_bidirectional

    Auto-discovered from HuggingFace trending. 86 likes, 0 downloads.

  • Command A Plus 05 2026 Bf16NewOpen

    CohereLabs · self-host

    Best for: Trending on HuggingFace (126 likes this week)

    How: Available on Hugging Face. 14K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("CohereLabs/command-a-plus-05-2026-bf16")

    transformerssafetensorscohere2_visionimage-text-to-textconversational

    API: huggingface.co/CohereLabs/command-a-plus-05-2026-bf16

    Auto-discovered from HuggingFace trending. 126 likes, 14K downloads.

  • Intern S2 PreviewNewOpen

    internlm · self-host

    Best for: Trending on HuggingFace (86 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("internlm/Intern-S2-Preview")

    transformerssafetensorsintern_s2_previewimage-text-to-textconversational

    API: huggingface.co/internlm/Intern-S2-Preview

    Auto-discovered from HuggingFace trending. 86 likes, 2K downloads.

  • Command A Plus 05 2026 W4a4NewOpen

    CohereLabs · self-host

    Best for: Trending on HuggingFace (213 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("CohereLabs/command-a-plus-05-2026-w4a4")

    transformerssafetensorscohere2_visionimage-text-to-textconversational

    API: huggingface.co/CohereLabs/command-a-plus-05-2026-w4a4

    Auto-discovered from HuggingFace trending. 213 likes, 8K downloads.

  • Gordon

    Docker · api

    Best for: container workflow management

    How: Integrate Gordon with Docker Desktop

    Example: Gordon proposes fixes and takes action across your entire Docker workflow

    understands environmentproposes fixestakes action across Docker workflow

    Auto-discovered from news articles.

  • Ring 2.6 1TNewOpen

    inclusionAI · self-host

    Best for: Trending on HuggingFace (89 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ring-2.6-1T")

    transformerssafetensorsbailing_hybridtext-generationconversational

    API: huggingface.co/inclusionAI/Ring-2.6-1T

    Auto-discovered from HuggingFace trending. 89 likes, 3K downloads.

  • HRM Text 1BNewOpen

    sapientinc · self-host

    Best for: Trending on HuggingFace (751 likes this week)

    How: Available on Hugging Face. 135K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("sapientinc/HRM-Text-1B")

    transformerssafetensorshrm_texttext-generationhrm

    API: huggingface.co/sapientinc/HRM-Text-1B

    Auto-discovered from HuggingFace trending. 751 likes, 135K downloads.

  • Mythos

    Cloudflare · N/A · api

    Best for: analyzing live code across critical parts of infrastructure

    How: Point Mythos at live code to observe its strengths and weaknesses

    Example: Mythos was used to analyze live code across critical parts of Cloudflare's infrastructure

    security-focusedcode analysis

    Auto-discovered from news articles.

  • Qwopus3.5 9B Coder GGUFNewOpen

    Jackrong · self-host

    Best for: Trending on HuggingFace (181 likes this week)

    How: Available on Hugging Face. 39K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jackrong/Qwopus3.5-9B-Coder-GGUF")

    transformersgguftext-generation-inferenceunslothqwen3_5

    API: huggingface.co/Jackrong/Qwopus3.5-9B-Coder-GGUF

    Auto-discovered from HuggingFace trending. 181 likes, 39K downloads.

  • Qwen3.6 35B A3B MTP GGUFNewOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (393 likes this week)

    How: Available on Hugging Face. 628K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.6-35B-A3B-MTP-GGUF")

    transformersggufunslothqwenqwen3_5_moe

    API: huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF

    Auto-discovered from HuggingFace trending. 393 likes, 628K downloads.

  • Qwen3.6 27B MTP GGUFNewOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (613 likes this week)

    How: Available on Hugging Face. 983K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.6-27B-MTP-GGUF")

    transformersggufunslothqwenqwen3_5

    API: huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF

    Auto-discovered from HuggingFace trending. 613 likes, 983K downloads.

  • Qwen3.6 27B MTP UD GGUFNewOpen

    havenoammo · self-host

    Best for: Trending on HuggingFace (87 likes this week)

    How: Available on Hugging Face. 43K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("havenoammo/Qwen3.6-27B-MTP-UD-GGUF")

    transformersggufunslothqwenqwen3_5

    API: huggingface.co/havenoammo/Qwen3.6-27B-MTP-UD-GGUF

    Auto-discovered from HuggingFace trending. 87 likes, 43K downloads.

  • Supertonic 3NewOpen

    Supertone · self-host

    Best for: Trending on HuggingFace (771 likes this week)

    How: Available on Hugging Face. 58K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Supertone/supertonic-3")

    supertoniconnxtext-to-speechspeech-synthesistts

    API: huggingface.co/Supertone/supertonic-3

    Auto-discovered from HuggingFace trending. 771 likes, 58K downloads.

  • Qwopus3.6 35B A3B V1 GGUFNewOpen

    Jackrong · self-host

    Best for: Trending on HuggingFace (117 likes this week)

    How: Available on Hugging Face. 67K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jackrong/Qwopus3.6-35B-A3B-v1-GGUF")

    transformersgguftext-generation-inferenceunslothqwen3_6

    API: huggingface.co/Jackrong/Qwopus3.6-35B-A3B-v1-GGUF

    Auto-discovered from HuggingFace trending. 117 likes, 67K downloads.

  • Sulphur 2 BaseNewOpen

    SulphurAI · self-host

    Best for: Trending on HuggingFace (1537 likes this week)

    How: Available on Hugging Face. 1666K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("SulphurAI/Sulphur-2-base")

    diffusersgguftext-to-videobase_model:Lightricks/LTX-2.3base_model:quantized:Lightricks/LTX-2.3

    API: huggingface.co/SulphurAI/Sulphur-2-base

    Auto-discovered from HuggingFace trending. 1537 likes, 1.7M downloads.

  • Scenema AudioOpen

    ScenemaAI · self-host

    Best for: Trending on HuggingFace (101 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("ScenemaAI/scenema-audio")

    scenema-audioaudio-generationdiffusiontext-to-audiovoice-cloning

    API: huggingface.co/ScenemaAI/scenema-audio

    Auto-discovered from HuggingFace trending. 101 likes, 237 downloads.

  • Deepseek V4 GgufOpen

    antirez · self-host

    Best for: Trending on HuggingFace (139 likes this week)

    How: Available on Hugging Face. 284K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("antirez/deepseek-v4-gguf")

    ggufquantizeddeepseekdeepseek-v4deepseek-v4-flash

    API: huggingface.co/antirez/deepseek-v4-gguf

    Auto-discovered from HuggingFace trending. 139 likes, 284K downloads.

  • DramaboxOpen

    ResembleAI · self-host

    Best for: Trending on HuggingFace (239 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("ResembleAI/Dramabox")

    ltx-audio-ttsdramabox-ttsttsvoice-cloningaudio-generation

    API: huggingface.co/ResembleAI/Dramabox

    Auto-discovered from HuggingFace trending. 239 likes, 1K downloads.

  • MiniCPM V 4.6Open

    openbmb · self-host

    Best for: Trending on HuggingFace (1084 likes this week)

    How: Available on Hugging Face. 445K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM-V-4.6")

    transformerssafetensorsminicpmv4_6image-text-to-textminicpm-v

    API: huggingface.co/openbmb/MiniCPM-V-4.6

    Auto-discovered from HuggingFace trending. 1084 likes, 445K downloads.

  • NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning GGUFOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (100 likes this week)

    How: Available on Hugging Face. 45K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF")

    ggufnvidiaunslothnemotron-3multimodal

    API: huggingface.co/unsloth/NVIDIA-Nemotron-3-Nano-Omni-30B-A3B-Reasoning-GGUF

    Auto-discovered from HuggingFace trending. 100 likes, 45K downloads.

  • Z AnimeOpen

    SeeSee21 · self-host

    Best for: Trending on HuggingFace (418 likes this week)

    How: Available on Hugging Face. 16K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("SeeSee21/Z-Anime")

    diffuserssafetensorsggufz-animetext-to-image

    API: huggingface.co/SeeSee21/Z-Anime

    Auto-discovered from HuggingFace trending. 418 likes, 16K downloads.

  • Ling 2.6 1TOpen

    inclusionAI · self-host

    Best for: Trending on HuggingFace (111 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-2.6-1T")

    transformerssafetensorsbailing_hybridtext-generationconversational

    API: huggingface.co/inclusionAI/Ling-2.6-1T

    Auto-discovered from HuggingFace trending. 111 likes, 642 downloads.

  • LTX 2.3 WorkflowsOpen

    RuneXX · self-host

    Best for: Trending on HuggingFace (564 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("RuneXX/LTX-2.3-Workflows")

    ltxltx-2comfyuicomfygguf

    API: huggingface.co/RuneXX/LTX-2.3-Workflows

    Auto-discovered from HuggingFace trending. 564 likes, 0 downloads.

  • Fara 7BOpen

    microsoft · self-host

    Best for: Trending on HuggingFace (593 likes this week)

    How: Available on Hugging Face. 15K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("microsoft/Fara-7B")

    transformerssafetensorsqwen2_5_vlimage-text-to-textmultimodal

    API: huggingface.co/microsoft/Fara-7B

    Auto-discovered from HuggingFace trending. 593 likes, 15K downloads.

  • NVIDIA Vera Rubin Platform

    NVIDIA · 128K tokens · api

    Best for: Agentic inference workloads

    How: Integrate with NVIDIA's platform for inference

    Example: Use for non-deterministic trajectories in AI

    Solving Agentic AI’s Scale-Up ProblemRuntime dynamics of inference workloads

    Auto-discovered from news articles.

  • NeedleOpen

    Cactus · self-host

    Best for: function-calling tasks

    How: run Needle on consumer devices

    Example: function-calling (tool use) model

    26M parameter modelruns at 6000 tok/s prefill1200 tok/s decode on consumer devices

    Auto-discovered from news articles.

  • Qwen3.6 27B Heretic Uncensored FINETUNE NEO CODE Di IMatrix MAX GGUFOpen

    DavidAU · self-host

    Best for: Trending on HuggingFace (105 likes this week)

    How: Available on Hugging Face. 144K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF")

    transformersggufunslothhereticuncensored

    API: huggingface.co/DavidAU/Qwen3.6-27B-Heretic-Uncensored-FINETUNE-NEO-CODE-Di-IMatrix-MAX-GGUF

    Auto-discovered from HuggingFace trending. 105 likes, 144K downloads.

  • LTX2.3 10ErosOpen

    TenStrip · self-host

    Best for: Trending on HuggingFace (281 likes this week)

    How: Available on Hugging Face. 136K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("TenStrip/LTX2.3-10Eros")

    diffusersimage-to-videoregion:us

    API: huggingface.co/TenStrip/LTX2.3-10Eros

    Auto-discovered from HuggingFace trending. 281 likes, 136K downloads.

  • Granite 4.1 30bOpen

    ibm-granite · self-host

    Best for: Trending on HuggingFace (100 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-4.1-30b")

    transformerssafetensorsgranitetext-generationlanguage

    API: huggingface.co/ibm-granite/granite-4.1-30b

    Auto-discovered from HuggingFace trending. 100 likes, 6K downloads.

  • Granite 4.1 8bOpen

    ibm-granite · self-host

    Best for: Trending on HuggingFace (157 likes this week)

    How: Available on Hugging Face. 20K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("ibm-granite/granite-4.1-8b")

    transformerssafetensorsgranitetext-generationlanguage

    API: huggingface.co/ibm-granite/granite-4.1-8b

    Auto-discovered from HuggingFace trending. 157 likes, 20K downloads.

  • Ling 2.6 FlashOpen

    inclusionAI · self-host

    Best for: Trending on HuggingFace (456 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("inclusionAI/Ling-2.6-flash")

    safetensorsbailing_hybridtext-generationconversationalcustom_code

    API: huggingface.co/inclusionAI/Ling-2.6-flash

    Auto-discovered from HuggingFace trending. 456 likes, 1K downloads.

  • OmniVoiceOpen

    k2-fsa · self-host

    Best for: Trending on HuggingFace (872 likes this week)

    How: Available on Hugging Face. 2236K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("k2-fsa/OmniVoice")

    omnivoicesafetensorszero-shotmultilingualvoice-cloning

    API: huggingface.co/k2-fsa/OmniVoice

    Auto-discovered from HuggingFace trending. 872 likes, 2.2M downloads.

  • CyberSecQwen-4BOpen

    Hugging Face · 128K tokens · self-host

    Best for: defensive cyber tasks

    How: use the model for specialized cyber defense tasks

    Example: model can be used for detecting and preventing cyber threats

    defensive cyberspecializedlocally-runnable

    Auto-discovered from news articles.

  • GAIAOpen

    AMD · self-host

    Best for: local AI applications on Windows and Linux

    How: integrate with Lemonade SDK

    Example: use GAIA for local AI tasks on Windows and Linux systems

    easy to useleverages Lemonade SDK

    Auto-discovered from news articles.

  • NVIDIA DLSS 4.5

    NVIDIA · N/A · api

    Best for: AI-powered game development

    How: Integrate NVIDIA DLSS 4.5 with Unreal Engine 5

    Example: Game developers can enhance game performance and visuals

    Dynamic Multi Frame GenerationMulti Frame Generation 6Xsecond-generation RTX

    Auto-discovered from news articles.

  • Laguna XS.2Open

    poolside · self-host

    Best for: Trending on HuggingFace (228 likes this week)

    How: Available on Hugging Face. 14K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("poolside/Laguna-XS.2")

    transformerssafetensorslagunatext-generationlaguna-xs.2

    API: huggingface.co/poolside/Laguna-XS.2

    Auto-discovered from HuggingFace trending. 228 likes, 14K downloads.

  • Qwen3.6 27B DFlashOpen

    z-lab · self-host

    Best for: Trending on HuggingFace (262 likes this week)

    How: Available on Hugging Face. 29K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("z-lab/Qwen3.6-27B-DFlash")

    transformerssafetensorsqwen3feature-extractiondflash

    API: huggingface.co/z-lab/Qwen3.6-27B-DFlash

    Auto-discovered from HuggingFace trending. 262 likes, 29K downloads.

  • Qwen3.6 35B A3B DFlashOpen

    z-lab · self-host

    Best for: Trending on HuggingFace (165 likes this week)

    How: Available on Hugging Face. 27K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("z-lab/Qwen3.6-35B-A3B-DFlash")

    transformerssafetensorsqwen3feature-extractiondflash

    API: huggingface.co/z-lab/Qwen3.6-35B-A3B-DFlash

    Auto-discovered from HuggingFace trending. 165 likes, 27K downloads.

  • Qwen3.6 27B Uncensored HauhauCS AggressiveOpen

    HauhauCS · self-host

    Best for: Trending on HuggingFace (265 likes this week)

    How: Available on Hugging Face. 303K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive")

    ggufuncensoredqwen3.6visionmultimodal

    API: huggingface.co/HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Aggressive

    Auto-discovered from HuggingFace trending. 265 likes, 303K downloads.

  • Hy3 PreviewOpen

    tencent · self-host

    Best for: Trending on HuggingFace (189 likes this week)

    How: Available on Hugging Face. 14K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("tencent/Hy3-preview")

    transformerssafetensorshy_v3text-generationconversational

    API: huggingface.co/tencent/Hy3-preview

    Auto-discovered from HuggingFace trending. 189 likes, 14K downloads.

  • Seedance 1.0 Pro

    ByteDance · N/A · credit-based → per-second

    Best for: Cost-sensitive Chinese-market video, fast iteration on social shorts.

    How: Two tiers: Seedance 1.0 Pro for top quality and Seedance 1.0 Lite for fast/cheap drafts. Both expose text-to-video and image-to-video.

    Example: POST volcengineapi.com/seedance/v1/videos { prompt: 'a hummingbird in flight, slow motion', mode: 'pro' }

    fast generation1080p outputstrong prompt adherenceLite variant for cheap iteration

    API: Volcengine / ByteDance API

    ByteDance's competitor to Sora / Veo / Kling. Lite tier is notably cheaper than competitors at similar quality.

  • NVIDIA Nemotron 3 Nano Omni

    NVIDIA · api

    Best for: multimodal agent reasoning in a single efficient open model

    How: Run NVIDIA Nemotron 3 Nano Omni locally in a single command

    Example: reasoning across screens, documents, audio, video, and text within a single perception-to-action loop

    understand and reason across video, audio, images, and language

    Auto-discovered from news articles.

  • MiMo V2.5 ProOpen

    XiaomiMiMo · self-host

    Best for: Trending on HuggingFace (506 likes this week)

    How: Available on Hugging Face. 40K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("XiaomiMiMo/MiMo-V2.5-Pro")

    safetensorsmimo_v2text-generationagentlong-context

    API: huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro

    Auto-discovered from HuggingFace trending. 506 likes, 40K downloads.

  • DeepSeek-V4-Flash

    DeepSeek · api

    Best for: enabling highly efficient operations

    How: Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

    Example: DeepSeek just launched its fourth generation of flagship models

    highly efficient

    Auto-discovered from news articles.

  • DeepSeek-V4-Pro

    DeepSeek · api

    Best for: enabling highly efficient operations

    How: Build with DeepSeek V4 Using NVIDIA Blackwell and GPU-Accelerated Endpoints

    Example: DeepSeek just launched its fourth generation of flagship models

    highly efficient

    Auto-discovered from news articles.

  • Qwen3.6 27B FP8Open

    Qwen · self-host

    Best for: Trending on HuggingFace (160 likes this week)

    How: Available on Hugging Face. 745K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-27B-FP8")

    transformerssafetensorsqwen3_5image-text-to-textconversational

    API: huggingface.co/Qwen/Qwen3.6-27B-FP8

    Auto-discovered from HuggingFace trending. 160 likes, 745K downloads.

  • Google TPU 8th Generation

    Google · N/A · api

    Best for: powering AI applications

    How: Deploy Google's 8th generation TPUs for your AI workloads

    Example: Use the new TPUs for training and inference in AI applications

    specialized chipsfuture of AI

    Auto-discovered from news articles.

  • Google TPUv8

    Google · N/A · api

    Best for: AI acceleration

    How: deploy Google TPUv8 in your cloud environment

    Example: use Google TPUv8 for AI model training and inference

    specialized chipspower the future of AI

    Auto-discovered from news articles.

  • Google's 8th generation TPU

    Google AI · N/A · api

    Best for: AI acceleration

    How: Deploy Google's 8th generation TPU for AI workloads.

    Example: Use the TPU for training and inference of AI models.

    specialized chipspower the future of AI

    Auto-discovered from news articles.

  • GPT-5.5

    OpenAI · 128K tokens · api

    Best for: coding, research, and data analysis

    How: Integrate GPT-5.5 into your tools for advanced tasks.

    Example: Use GPT-5.5 for coding assistance or data analysis.

    fastermore capablecomplex tasks

    Auto-discovered from news articles.

  • DeepSeek V4 FlashOpen

    deepseek-ai · self-host

    Best for: Trending on HuggingFace (1371 likes this week)

    How: Available on Hugging Face. 3525K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V4-Flash")

    transformerssafetensorsconversationallicense:miteval-results

    API: huggingface.co/deepseek-ai/DeepSeek-V4-Flash

    Auto-discovered from HuggingFace trending. 1371 likes, 3.5M downloads.

  • DeepSeek V4 ProOpen

    deepseek-ai · self-host

    Best for: Trending on HuggingFace (4867 likes this week)

    How: Available on Hugging Face. 2935K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")

    transformerssafetensorsdeepseek_v4text-generationconversational

    API: huggingface.co/deepseek-ai/DeepSeek-V4-Pro

    Auto-discovered from HuggingFace trending. 4867 likes, 2.9M downloads.

  • Google's eighth generation TPU

    Google · N/A · api

    Best for: AI applications requiring high-performance computing

    How: deploy on Google Cloud to leverage the new TPU capabilities

    Example: use for training and inference of large AI models

    powering the future of AItwo specialized chips

    Auto-discovered from news articles.

  • OpenAI Privacy Filter

    OpenAI · api

    Best for: text privacy and compliance

    How: Integrate into text processing workflows

    Example: Automatically redact sensitive information from documents

    detecting and redacting PIIstate-of-the-art accuracy

    Auto-discovered from news articles.

  • Google's TPU (eighth generation)

    Google · api

    Best for: AI acceleration

    How: Deploy in Google Cloud for AI tasks

    Example: Use for training and inference in AI applications

    specialized chipspower the future of AI

    Auto-discovered from news articles.

  • Qwen3.6 35B A3B Claude 4.6 Opus Reasoning Distilled GGUFOpen

    hesamation · self-host

    Best for: Trending on HuggingFace (200 likes this week)

    How: Available on Hugging Face. 129K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF")

    ggufllama.cppqwenqwen3.6qwen3_5_moe

    API: huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF

    Auto-discovered from HuggingFace trending. 200 likes, 129K downloads.

  • Qwen3.6 27B GGUFOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (633 likes this week)

    How: Available on Hugging Face. 1355K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.6-27B-GGUF")

    transformersggufunslothqwenqwen3_5

    API: huggingface.co/unsloth/Qwen3.6-27B-GGUF

    Auto-discovered from HuggingFace trending. 633 likes, 1.4M downloads.

  • Qwen3.6 27BOpen

    Qwen · self-host

    Best for: Trending on HuggingFace (1554 likes this week)

    How: Available on Hugging Face. 5064K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-27B")

    transformerssafetensorsqwen3_5image-text-to-textconversational

    API: huggingface.co/Qwen/Qwen3.6-27B

    Auto-discovered from HuggingFace trending. 1554 likes, 5.1M downloads.

  • Codex

    OpenAI · 128K tokens · api

    Best for: enterprises to deploy and scale Codex

    How: partner with Accenture, PwC, Infosys, and others

    Example: help enterprises deploy and scale Codex across the software development lifecycle

    deploy and scale across the software development lifecycle

    Auto-discovered from news articles.

  • Qwopus GLM 18B Merged GGUFOpen

    Jackrong · self-host

    Best for: Trending on HuggingFace (201 likes this week)

    How: Available on Hugging Face. 70K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jackrong/Qwopus-GLM-18B-Merged-GGUF")

    ggufmergefrankenmergeqwen3.5reasoning

    API: huggingface.co/Jackrong/Qwopus-GLM-18B-Merged-GGUF

    Auto-discovered from HuggingFace trending. 201 likes, 70K downloads.

  • Kimi K2.6Open

    moonshotai · self-host

    Best for: Trending on HuggingFace (1197 likes this week)

    How: Available on Hugging Face. 825K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("moonshotai/Kimi-K2.6")

    transformerssafetensorskimi_k25feature-extractioncompressed-tensors

    API: huggingface.co/moonshotai/Kimi-K2.6

    Auto-discovered from HuggingFace trending. 1197 likes, 825K downloads.

  • Qwen3.6 35B A3B FP8Open

    Qwen · self-host

    Best for: Trending on HuggingFace (158 likes this week)

    How: Available on Hugging Face. 490K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B-FP8")

    transformerssafetensorsqwen3_5_moeimage-text-to-textconversational

    API: huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8

    Auto-discovered from HuggingFace trending. 158 likes, 490K downloads.

  • Qwen3.6 35B A3B Uncensored HauhauCS AggressiveOpen

    HauhauCS · self-host

    Best for: Trending on HuggingFace (1860 likes this week)

    How: Available on Hugging Face. 2698K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive")

    ggufuncensoredqwen3.6moevision

    API: huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

    Auto-discovered from HuggingFace trending. 1860 likes, 2.7M downloads.

  • GPT-Rosalind

    OpenAI · N/A · api

    Best for: life sciences research

    How: N/A

    Example: N/A

    accelerate drug discoverygenomics analysisprotein reasoningscientific research workflows

    Auto-discovered from news articles.

  • ERNIE Image Turbo GGUFOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (180 likes this week)

    How: Available on Hugging Face. 30K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/ERNIE-Image-Turbo-GGUF")

    ggmlgguftext-to-imageunslothbase_model:baidu/ERNIE-Image-Turbo

    API: huggingface.co/unsloth/ERNIE-Image-Turbo-GGUF

    Auto-discovered from HuggingFace trending. 180 likes, 30K downloads.

  • Gemma 4 31B It NVFP4 TurboOpen

    LilaRest · self-host

    Best for: Trending on HuggingFace (247 likes this week)

    How: Available on Hugging Face. 105K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("LilaRest/gemma-4-31B-it-NVFP4-turbo")

    transformerssafetensorsgemma4text-generationgemma-4-31b-it

    API: huggingface.co/LilaRest/gemma-4-31B-it-NVFP4-turbo

    Auto-discovered from HuggingFace trending. 247 likes, 105K downloads.

  • Supergemma4 26b Uncensored Mlx 4bit V2Open

    Jiunsong · self-host

    Best for: Trending on HuggingFace (172 likes this week)

    How: Available on Hugging Face. 14K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2")

    mlxsafetensorsgemma4uncensoredapple-silicon

    API: huggingface.co/Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2

    Auto-discovered from HuggingFace trending. 172 likes, 14K downloads.

  • Gemma 4 E4B It OBLITERATEDOpen

    OBLITERATUS · self-host

    Best for: Trending on HuggingFace (526 likes this week)

    How: Available on Hugging Face. 128K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("OBLITERATUS/gemma-4-E4B-it-OBLITERATED")

    safetensorsggufgemma4abliterateduncensored

    API: huggingface.co/OBLITERATUS/gemma-4-E4B-it-OBLITERATED

    Auto-discovered from HuggingFace trending. 526 likes, 128K downloads.

  • Gemma 4 31B JANG_4M CRACKOpen

    dealignai · self-host

    Best for: Trending on HuggingFace (1487 likes this week)

    How: Available on Hugging Face. 170K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("dealignai/Gemma-4-31B-JANG_4M-CRACK")

    mlxsafetensorsgemma4abliterateduncensored

    API: huggingface.co/dealignai/Gemma-4-31B-JANG_4M-CRACK

    Auto-discovered from HuggingFace trending. 1487 likes, 170K downloads.

  • ERNIE Image TurboOpen

    baidu · self-host

    Best for: Trending on HuggingFace (344 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("baidu/ERNIE-Image-Turbo")

    diffuserssafetensorstext-to-image8Blicense:apache-2.0

    API: huggingface.co/baidu/ERNIE-Image-Turbo

    Auto-discovered from HuggingFace trending. 344 likes, 6K downloads.

  • Qwen3.6 35B A3B GGUFOpen

    unsloth · self-host

    Best for: Trending on HuggingFace (966 likes this week)

    How: Available on Hugging Face. 2500K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.6-35B-A3B-GGUF")

    transformersggufunslothqwenqwen3_5_moe

    API: huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF

    Auto-discovered from HuggingFace trending. 966 likes, 2.5M downloads.

  • Supergemma4 26b Uncensored Gguf V2Open

    Jiunsong · self-host

    Best for: Trending on HuggingFace (627 likes this week)

    How: Available on Hugging Face. 267K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Jiunsong/supergemma4-26b-uncensored-gguf-v2")

    ggufgemma4uncensoredfastllama.cpp

    API: huggingface.co/Jiunsong/supergemma4-26b-uncensored-gguf-v2

    Auto-discovered from HuggingFace trending. 627 likes, 267K downloads.

  • GLM 5.1Open

    zai-org · self-host

    Best for: Trending on HuggingFace (1472 likes this week)

    How: Available on Hugging Face. 171K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("zai-org/GLM-5.1")

    transformerssafetensorsglm_moe_dsatext-generationconversational

    API: huggingface.co/zai-org/GLM-5.1

    Auto-discovered from HuggingFace trending. 1472 likes, 171K downloads.

  • ERNIE ImageOpen

    baidu · self-host

    Best for: Trending on HuggingFace (550 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("baidu/ERNIE-Image")

    diffuserssafetensorstext-to-image8Blicense:apache-2.0

    API: huggingface.co/baidu/ERNIE-Image

    Auto-discovered from HuggingFace trending. 550 likes, 7K downloads.

  • HY Embodied 0.5Open

    tencent · self-host

    Best for: Trending on HuggingFace (897 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("tencent/HY-Embodied-0.5")

    transformerssafetensorshunyuan_vl_motimage-text-to-texthunyuan

    API: huggingface.co/tencent/HY-Embodied-0.5

    Auto-discovered from HuggingFace trending. 897 likes, 2K downloads.

  • Qwen3.6 35B A3BOpen

    Qwen · self-host

    Best for: Trending on HuggingFace (1803 likes this week)

    How: Available on Hugging Face. 5477K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B")

    transformerssafetensorsqwen3_5_moeimage-text-to-textconversational

    API: huggingface.co/Qwen/Qwen3.6-35B-A3B

    Auto-discovered from HuggingFace trending. 1803 likes, 5.5M downloads.

  • MiniMax M2.7Open

    MiniMaxAI · self-host

    Best for: Trending on HuggingFace (1052 likes this week)

    How: Available on Hugging Face. 469K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2.7")

    transformerssafetensorsminimax_m2text-generationconversational

    API: huggingface.co/MiniMaxAI/MiniMax-M2.7

    Auto-discovered from HuggingFace trending. 1052 likes, 469K downloads.

  • Nucleus ImageOpen

    NucleusAI · self-host

    Best for: Trending on HuggingFace (213 likes this week)

    How: Available on Hugging Face.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("NucleusAI/Nucleus-Image")

    diffuserssafetensorsmoesparse-moediffusion

    API: huggingface.co/NucleusAI/Nucleus-Image

    Auto-discovered from HuggingFace trending. 213 likes, 2K downloads.

  • Gemma 4 31B ItOpen

    google · self-host

    Best for: Trending on HuggingFace (2640 likes this week)

    How: Available on Hugging Face. 9794K downloads.

    Example: from transformers import AutoModelForCausalLM; model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it")

    transformerssafetensorsgemma4image-text-to-textconversational

    API: huggingface.co/google/gemma-4-31B-it

    Auto-discovered from HuggingFace trending. 2640 likes, 9.8M downloads.