AI Models

52 models · 0 new in 60d

Compare →
  • Codestral 25.01Open

    Mistral · 256K tokens · self-host

    Best for: Code completion, inline suggestions, editor integration

    How: Supports FIM for inline completion. Integrate with any editor via LSP or Continue.dev.

    Example: Deploy as your team's FIM-capable completion server behind an LSP proxy.

    HumanEval 91.0%
    code completionFIM (fill-in-middle)80+ languages
    Hardware to self-host
    VRAM: 16GB (quantized) / 45GB (FP16)
    GPU: RTX 4090 24GB or 1× A100 40GB
    RAM: 32GB+ system RAM

    22B dense. Fits on a single consumer GPU with quantization.

    API: codestral.mistral.ai — dedicated code endpoint

  • Qwen 2.5 Coder 32BOpen

    Alibaba · 128K tokens · self-host

    Best for: Private code completion, self-hosted Copilot replacement

    How: ollama run qwen2.5-coder:32b. Plug into Continue.dev or Copilot alternatives.

    Example: Set up as your team's private code completion backend — zero data leaves your infra.

    HumanEval 92.7%LiveCodeBench 48.5%
    code completioncode generationApache 2.0
    Hardware to self-host
    VRAM: 20GB (quantized) / 64GB (FP16)
    GPU: RTX 4090 24GB or 1× A100 40GB
    RAM: 32GB+ system RAM

    32B dense. Fits on a single consumer GPU with 4-bit quantization.

    API: Ollama, vLLM, or hosted on Together/Fireworks