AI Models

186 models · 0 new in 60d

Compare →

Live

Sort

Newest A→Z

Type

All Flagship Small Reasoning Code Embedding Image Video Vision Audio function-calling local AI multimodal

License

All Open Closed

Gemma 4 27B MoEOpen
Google · 128K tokens · self-host
▾
Best for: Faster self-hosted inference, cost-efficient multimodal
How: MoE variant — faster inference than the 31B dense. Same multimodal capabilities.
Example: Process image-based monitoring alerts faster than the dense variant at the same quality.
LMSYS Arena #6 text
MoE efficiencymultimodalimages + videoApache 2.0
Hardware to self-host
VRAM: 18GB (quantized) / 54GB (FP16)
GPU: RTX 4090 24GB or 1× A100 40GB
RAM: 32GB+ system RAM
27B total MoE — faster inference than the 31B dense thanks to sparse activations.
API: Ollama, vLLM, Hugging Face. ollama run gemma4:27b-moe
Gemma 4 E4BOpen
Google · 128K tokens · self-host
▾
Best for: Edge, mobile, IoT, on-device AI with multimodal input
How: 4B params — runs on any hardware. Supports images, video, AND native audio input.
Example: Run on a Raspberry Pi to process security camera feeds with voice commands.
tinyon-devicemultimodal + audioApache 2.0
Hardware to self-host
VRAM: 3GB (quantized) / 8GB (FP16)
GPU: Any — CPU, phone, Jetson, Raspberry Pi 5, integrated GPU
RAM: 4-8GB system RAM
4B params. Edge-first design: runs on phones, SBCs, IoT devices.
API: Ollama, Hugging Face. Runs on phones and Raspberry Pi.
Ministral 3 (3B/8B/14B)Open
Mistral · 128K tokens · self-host
▾
Best for: Edge deployment, on-device AI, lightweight vision tasks
How: 3B fits on phones, 8B on laptops, 14B on dev GPUs. All have vision support.
Example: Run 8B on a Jetson to classify manufacturing defects from camera feeds.
edge-friendlyvisiondense3 sizes
Hardware to self-host
VRAM: 2GB (3B) / 6GB (8B) / 10GB (14B quantized)
GPU: Phone/CPU (3B) · Laptop GPU (8B) · RTX 3060+ (14B)
RAM: 8-16GB system RAM
All three sizes are dense with vision. 3B runs on phones, 8B on laptops, 14B on dev GPUs.
API: Ollama, vLLM, Hugging Face. Also on Mistral API.
Qwen 3 30BOpen
Alibaba · 128K tokens · self-host
▾
Best for: Local development, laptop-friendly reasoning, privacy
How: Excellent for local dev. MoE means only 3B params active — fast on consumer hardware.
Example: Run on your dev machine as a private coding assistant with reasoning.
AIME 2024 66.7%
MoE 3B active / 30B totalruns on consumer GPUhybrid thinking
Hardware to self-host
VRAM: 20GB (quantized) / 60GB (FP16)
GPU: RTX 4090 24GB (quantized) or 1× A100
RAM: 32GB+ system RAM
30B total (3B active). The 3B active params make inference fast on consumer hardware.
API: ollama run qwen3:30b — fits on RTX 4090 (24GB)
Gemma 3 27BOpen
Google · 128K tokens · self-host
▾
Best for: On-device/edge deployment, multimodal at small scale
How: ollama run gemma3:27b. Fits on RTX 3090/4090. Good multimodal + tool use at small size.
Example: Run on a dev server to process screenshots and generate bug reports.
MMLU 75.6%HumanEval 78.0%
compactmultimodalruns on single GPUfunction calling
Hardware to self-host
VRAM: 18GB (quantized) / 54GB (FP16)
GPU: RTX 3090/4090 24GB or 1× A100 40GB
RAM: 32GB+ system RAM
27B dense. Fits on a single high-end consumer GPU with quantization.
API: Ollama, vLLM, Hugging Face. Also on Vertex AI.
Phi-4Open
Microsoft · 16K tokens · self-host
▾
Best for: Edge deployment, STEM tasks, embedded AI in products
How: ollama run phi4. MIT license — embed in commercial products freely.
Example: Embed in a CI pipeline to validate config files and Terraform plans.
GPQA Diamond 56.2%MATH 80.4%
14B paramsSTEM reasoningMIT licenseruns on laptop
Hardware to self-host
VRAM: 9GB (quantized) / 28GB (FP16)
GPU: Any 8GB+ GPU (RTX 3060, laptop 4050, etc.)
RAM: 16GB system RAM
14B dense. Runs locally on most developer laptops with quantization.
API: Ollama, Hugging Face, Azure AI
CyberSecQwen-4BOpen
Hugging Face · 128K tokens · self-host
▾
Best for: defensive cyber tasks
How: use the model for specialized cyber defense tasks
Example: model can be used for detecting and preventing cyber threats
defensive cyberspecializedlocally-runnable
Auto-discovered from news articles.