AI Hardware

26 GPUs & accelerators · self-host price guide · cloud rates

What to buy for each model size

3B-8B (Llama 3.1 8B, Gemma 4 E4B, Phi-4)

Raspberry Pi 5Slow but works
$80
Jetson Orin NanoFast edge inference
$249
Apple M4 MacBook AirSilent + fast
$1099
Cloud: T4$250/mo if 24/7
$0.35/hr

13B-34B (Qwen 2.5 Coder 32B, Llama 3.1 13B)

RTX 3060 12GB4-bit only, slow but cheap
$280
RTX 3090 24GB (used)🏆 Best value. Full speed.
$700
RTX 4090 24GBFastest consumer option
$1800
Cloud: A10G$720/mo if 24/7
$1.00/hr

70B (Llama 3 70B, Qwen 2.5 72B)

2× RTX 3090 (used)48GB total, 4-bit at good speed
$1400
RTX 5090 32GB4-bit fits on single card
$2000
M3 Ultra Mac Studio 192GBFull FP16 on unified memory
$5999
Cloud: A100 80GBFP16 fits. $1150/mo if 24/7
$1.60/hr

235B MoE (Qwen 3 235B)

M3 Ultra Mac Studio 192GBQuantized fits
$5999
2× A100 80GBStandard self-host setup
~$24k used
1× MI300X 192GBFits on single card
$15k
Cloud: H100 80GB x2$3600/mo if 24/7
$5/hr

400B+ (Llama 3 405B, DeepSeek V3 671B MoE)

1× B200 192GBSingle card fits FP16
$45k
1× H200 141GBFits at 4-bit
$40k
1× MI300X 192GB🏆 Cheapest single-card option
$15k
Cloud: H200$2500/mo if 24/7
$3.50/hr

57 hardware items

  • Google TPU v5e

    Google · 2023 · 819 GB/s

    16 GB
    VRAM
    $0
    $1.20/hr/hr cloud

    Fits: Up to 70B in pods (multi-chip)

    Price (new): $0
    Cloud rental: $1.20/hr (GCP) · only available as service
    Memory BW: 819 GB/s

    GCP-only. Cheap, fast for JAX/TF workloads. Works with vLLM via JetStream.

  • Raspberry Pi 5 (8GB)Budget king

    Other · 2023 · 17 GB/s

    8 GB
    VRAM
    $80

    Fits: 3B models at 1-3 tok/s (Phi-3 mini, Gemma 4 E4B)

    Price (new): $80
    Memory BW: 17 GB/s

    CPU inference only via llama.cpp. Fine for tiny models + learning.

  • RTX 3060 (12GB)Budget king

    NVIDIA · 2021 · 13 TFLOPS · 360 GB/s

    12 GB
    GDDR6
    $220 used

    Fits: 7-8B models at 4-bit (Llama 3.1 8B, Gemma 2 9B)

    Price (new): $280
    Price (used): $220
    FP16 compute: 13 TFLOPS
    Memory BW: 360 GB/s
    Power: 170W TDP

    Cheapest entry to CUDA AI. Slow but works. 12GB is the key feature.

  • NVIDIA Jetson Orin Nano (8GB)

    NVIDIA · 2023 · 20 TFLOPS · 68 GB/s

    8 GB
    VRAM
    $249

    Fits: 3-7B quantized (Gemma 4 E4B, Phi-4 int4)

    Price (new): $249
    FP16 compute: 20 TFLOPS
    Memory BW: 68 GB/s
    Power: 15W TDP

    Best SBC option. CUDA support — same code as desktop/server.

  • RTX A4000 (16GB)

    NVIDIA · 2021 · 19 TFLOPS · 448 GB/s

    16 GB
    GDDR6 ECC
    $650 used
    $0.17/hr live

    Fits: 13B FP16 / 30B 4-bit

    Price (new): $1,100
    Price (used): $650
    Cloud on-demand: $0.17/hr · RunPod (live)
    Cloud spot: $0.09/hr · RunPod (live)
    FP16 compute: 19 TFLOPS
    Memory BW: 448 GB/s
    Power: 140W TDP

    Single-slot, low-power. Good for quiet homelab servers. ECC VRAM is nice for long runs.

  • RTX 3090 (24GB, used)Best value

    NVIDIA · 2020 · 36 TFLOPS · 936 GB/s

    24 GB
    GDDR6X
    $700 used
    $0.22/hr live

    Fits: 14B FP16 / 30-34B 4-bit / 70B 2-bit

    Price (used): $700
    Cloud on-demand: $0.22/hr · RunPod (live)
    Cloud spot: $0.11/hr · RunPod (live)
    FP16 compute: 36 TFLOPS
    Memory BW: 936 GB/s
    Power: 350W TDP

    The undisputed used-market champion. 24GB VRAM at $700 beats almost everything new under $2000.

  • RTX 4070 Ti Super (16GB)

    NVIDIA · 2024 · 45 TFLOPS · 672 GB/s

    16 GB
    GDDR6X
    $800

    Fits: 13-14B FP16 / 30B 4-bit

    Price (new): $800
    FP16 compute: 45 TFLOPS
    Memory BW: 672 GB/s
    Power: 285W TDP

    Good perf/W. 16GB is limiting for bigger models — 3090 used is better value.

  • AMD Radeon RX 7900 XTX (24GB)

    AMD · 2022 · 61 TFLOPS · 960 GB/s

    24 GB
    GDDR6
    $900

    Fits: 14B FP16 / 30-34B 4-bit

    Price (new): $900
    FP16 compute: 61 TFLOPS
    Memory BW: 960 GB/s
    Power: 355W TDP

    24GB VRAM cheaper than NVIDIA. ROCm support works with llama.cpp + vLLM, but ecosystem is smaller.

  • NVIDIA T4 (16GB)

    NVIDIA · 2018 · 65 TFLOPS · 320 GB/s

    16 GB
    GDDR6
    $900 used
    $0.35/hr/hr cloud

    Fits: 7-13B 4-bit

    Price (used): $900
    Cloud rental: $0.35/hr (AWS g4dn)
    FP16 compute: 65 TFLOPS
    Memory BW: 320 GB/s
    Power: 70W TDP

    Very cheap to rent. Too slow for production serving but fine for batch inference.

  • Apple M4 (MacBook Air)

    Apple · 2025 · 120 GB/s

    16 GB
    LPDDR5X (unified)
    $1,099

    Fits: 7B-14B quantized models at good speed

    Price (new): $1,099
    Memory BW: 120 GB/s

    Unified memory is great for LLMs. Use llama.cpp Metal backend.

  • RTX 4090 (24GB)Sweet spot

    NVIDIA · 2022 · 82 TFLOPS · 1008 GB/s

    24 GB
    GDDR6X
    $1,400 used
    $0.34/hr live

    Fits: 14B FP16 / 30-34B 4-bit / 70B 2-bit

    Price (new): $1,800
    Price (used): $1,400
    Cloud on-demand: $0.34/hr · RunPod (live)
    Cloud spot: $0.20/hr · RunPod (live)
    FP16 compute: 82 TFLOPS
    Memory BW: 1008 GB/s
    Power: 450W TDP

    Best single-card consumer GPU. 2x faster than 3090 at same VRAM.

  • RTX 5090 (32GB)Flagship

    NVIDIA · 2025 · 104 TFLOPS · 1792 GB/s

    32 GB
    GDDR7
    $2,000
    $0.69/hr live

    Fits: 20B FP16 / 70B 4-bit

    Price (new): $2,000
    Cloud on-demand: $0.69/hr · RunPod (live)
    FP16 compute: 104 TFLOPS
    Memory BW: 1792 GB/s
    Power: 575W TDP

    Current flagship consumer card. 32GB unlocks Llama 3 70B at 4-bit on a single card.

  • Apple M4 Pro (64GB)Sweet spot

    Apple · 2025 · 273 GB/s

    64 GB
    LPDDR5X (unified)
    $2,499

    Fits: Up to ~45B quantized (Qwen 2.5 Coder 32B, Gemma 3 27B)

    Price (new): $2,499
    Memory BW: 273 GB/s

    Best dev laptop for local AI. Silent + no cooling issues.

  • NVIDIA L4 (24GB)

    NVIDIA · 2023 · 121 TFLOPS · 300 GB/s

    24 GB
    GDDR6
    $2,500
    $0.44/hr live

    Fits: 13-14B FP16 / 30B 4-bit

    Price (new): $2,500
    Cloud on-demand: $0.44/hr · RunPod (live)
    FP16 compute: 121 TFLOPS
    Memory BW: 300 GB/s
    Power: 72W TDP

    Modern replacement for T4. Single-slot, low-power — great for density.

  • RTX A6000 (48GB)

    NVIDIA · 2020 · 39 TFLOPS · 768 GB/s

    48 GB
    GDDR6 ECC
    $3,500 used
    $0.33/hr live

    Fits: 30B FP16 / 70B 4-bit / 120B 2-bit

    Price (new): $4,500
    Price (used): $3,500
    Cloud on-demand: $0.33/hr · RunPod (live)
    Cloud spot: $0.25/hr · RunPod (live)
    FP16 compute: 39 TFLOPS
    Memory BW: 768 GB/s
    Power: 300W TDP

    Best 'fits in a desktop' workstation card. 48GB VRAM without datacenter cost.

  • Apple M3 Ultra (192GB)

    Apple · 2025 · 800 GB/s

    192 GB
    LPDDR5 (unified)
    $5,999

    Fits: Up to 235B MoE (Qwen 3 235B) or 70B dense models

    Price (new): $5,999
    Memory BW: 800 GB/s

    Mac Studio. Runs models that require multi-GPU on NVIDIA, on a single box.

  • RTX 6000 Ada (48GB)

    NVIDIA · 2022 · 91 TFLOPS · 960 GB/s

    48 GB
    GDDR6 ECC
    $6,800
    $0.74/hr live

    Fits: 30B FP16 / 70B 4-bit

    Price (new): $6,800
    Cloud on-demand: $0.74/hr · RunPod (live)
    Cloud spot: $0.40/hr · RunPod (live)
    FP16 compute: 91 TFLOPS
    Memory BW: 960 GB/s
    Power: 300W TDP

    Ada gen of A6000. 2x faster. Worth it only if you need the speed and have budget.

  • NVIDIA A100 40GBSweet spot

    NVIDIA · 2020 · 312 TFLOPS · 1555 GB/s

    40 GB
    HBM2e
    $8,000 used
    $1.10/hr/hr cloud

    Fits: 30B FP16 / 70B 4-bit

    Price (used): $8,000
    Cloud rental: $1.10/hr (RunPod) / $3.06/hr (AWS)
    FP16 compute: 312 TFLOPS
    Memory BW: 1555 GB/s
    Power: 400W TDP

    The de-facto standard for serious AI training + inference. Huge ecosystem. Used price dropping fast.

  • NVIDIA A100 80GB

    NVIDIA · 2021 · 312 TFLOPS · 2039 GB/s

    80 GB
    HBM2e
    $12,000 used
    $1.60/hr/hr cloud

    Fits: 70B FP16 / 200B+ 4-bit

    Price (used): $12,000
    Cloud rental: $1.60/hr (RunPod) / $4.10/hr (AWS p4de)
    FP16 compute: 312 TFLOPS
    Memory BW: 2039 GB/s
    Power: 400W TDP

    Most rented ML GPU. 80GB fits Llama 3 70B at FP16 on a single card.

  • AMD MI300X (192GB)Best value

    AMD · 2023 · 1307 TFLOPS · 5300 GB/s

    192 GB
    HBM3
    $15,000
    $0.50/hr live

    Fits: 405B FP16 / frontier models at 4-bit

    Price (new): $15,000
    Cloud on-demand: $0.50/hr · RunPod (live)
    FP16 compute: 1307 TFLOPS
    Memory BW: 5300 GB/s
    Power: 750W TDP

    More VRAM than H100 at half the price. ROCm is decent now — works with vLLM, PyTorch, llama.cpp.

  • NVIDIA H100 (80GB)Flagship

    NVIDIA · 2022 · 989 TFLOPS · 3350 GB/s

    80 GB
    HBM3
    $30,000
    $2.50/hr/hr cloud

    Fits: 70B FP16 / 200B+ 4-bit / 405B with 2+ cards

    Price (new): $30,000
    Cloud rental: $2.50/hr (RunPod) / $8.00/hr (AWS p5)
    FP16 compute: 989 TFLOPS
    Memory BW: 3350 GB/s
    Power: 700W TDP

    3x faster than A100 for modern transformer workloads. FP8 support doubles it again.

  • NVIDIA H200 (141GB)

    NVIDIA · 2024 · 989 TFLOPS · 4800 GB/s

    141 GB
    HBM3e
    $40,000
    $3.50/hr/hr cloud

    Fits: Llama 3 405B at 4-bit on ONE card. 70B FP16 with massive batch size.

    Price (new): $40,000
    Cloud rental: $3.50/hr (RunPod)
    FP16 compute: 989 TFLOPS
    Memory BW: 4800 GB/s
    Power: 700W TDP

    Same compute as H100 but 76% more VRAM + 43% more bandwidth. Huge win for long context.

  • NVIDIA B200 (192GB)Flagship

    NVIDIA · 2024 · 2250 TFLOPS · 8000 GB/s

    192 GB
    HBM3e
    $45,000
    $5.00/hr/hr cloud

    Fits: Single card runs Llama 405B FP16 with room to spare

    Price (new): $45,000
    Cloud rental: $5.00/hr (limited availability)
    FP16 compute: 2250 TFLOPS
    Memory BW: 8000 GB/s
    Power: 1000W TDP

    Blackwell gen. 2.5x H100 perf. FP4 native. Current king for new deployments.

  • NVIDIA GB200 (NVL72)

    NVIDIA · 2024 · 162000 TFLOPS · 576000 GB/s

    13824 GB
    HBM3e
    $3,000,000

    Fits: Frontier training — GPT-5-scale models

    Price (new): $3,000,000
    FP16 compute: 162000 TFLOPS
    Memory BW: 576000 GB/s
    Power: 120000W TDP

    Full rack: 72 Blackwell GPUs + 36 Grace CPUs. Only relevant if you're training a frontier model.

  • NVIDIA A10G (24GB)

    NVIDIA · 2021 · 125 TFLOPS · 600 GB/s

    24 GB
    GDDR6
    rent only
    $1.00/hr/hr cloud

    Fits: 13-14B FP16 / 30B 4-bit

    Cloud rental: $1.00/hr (AWS g5)
    FP16 compute: 125 TFLOPS
    Memory BW: 600 GB/s
    Power: 150W TDP

    AWS's workhorse inference GPU. 4x A10G matches a single A100 40GB for many workloads.

  • Google TPU v5p

    Google · 2024 · 2765 GB/s

    95 GB
    VRAM
    rent only
    $4.20/hr/hr cloud

    Fits: Frontier training (Gemini-scale)

    Cloud rental: $4.20/hr (GCP)
    Memory BW: 2765 GB/s

    Top-tier GCP training chip. Compares to H100-H200 for transformer workloads.

  • A100 PCIe

    NVIDIA · 2026

    80 GB
    VRAM
    rent only

    Fits: 80GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • A100 SXM 40GB

    NVIDIA · 2026

    40 GB
    VRAM
    rent only

    Fits: 40GB VRAM — 13B-70B quantized

    Auto-discovered via RunPod live pricing.

  • A100 SXM

    NVIDIA · 2026

    80 GB
    VRAM
    rent only

    Fits: 80GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • B300

    NVIDIA · 2026

    288 GB
    VRAM
    rent only

    Fits: 288GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • RTX 3070

    NVIDIA · 2026

    8 GB
    VRAM
    rent only

    Fits: 8GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 3080

    NVIDIA · 2026

    10 GB
    VRAM
    rent only

    Fits: 10GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 3080 Ti

    NVIDIA · 2026

    12 GB
    VRAM
    rent only

    Fits: 12GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 3090 Ti

    NVIDIA · 2026

    24 GB
    VRAM
    rent only

    Fits: 24GB VRAM — 13B-70B quantized

    Auto-discovered via RunPod live pricing.

  • RTX 4080

    NVIDIA · 2026

    16 GB
    VRAM
    rent only

    Fits: 16GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 4080 SUPER

    NVIDIA · 2026

    16 GB
    VRAM
    rent only

    Fits: 16GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 5080

    NVIDIA · 2026

    16 GB
    VRAM
    rent only

    Fits: 16GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • H100 SXM

    NVIDIA · 2026

    80 GB
    VRAM
    rent only

    Fits: 80GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • H100 NVL

    NVIDIA · 2026

    94 GB
    VRAM
    rent only

    Fits: 94GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • H100 PCIe

    NVIDIA · 2026

    80 GB
    VRAM
    rent only

    Fits: 80GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • H200 SXM

    NVIDIA · 2026

    141 GB
    VRAM
    rent only

    Fits: 141GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • NVIDIA H200 NVL

    NVIDIA · 2026

    143 GB
    VRAM
    rent only

    Fits: 143GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • L40

    NVIDIA · 2026

    48 GB
    VRAM
    rent only

    Fits: 48GB VRAM — 13B-70B quantized

    Auto-discovered via RunPod live pricing.

  • L40S

    NVIDIA · 2026

    48 GB
    VRAM
    rent only

    Fits: 48GB VRAM — 13B-70B quantized

    Auto-discovered via RunPod live pricing.

  • RTX 2000 Ada

    NVIDIA · 2026

    16 GB
    VRAM
    rent only

    Fits: 16GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 4000 Ada

    NVIDIA · 2026

    20 GB
    VRAM
    rent only

    Fits: 20GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 4000 Ada SFF

    NVIDIA · 2026

    20 GB
    VRAM
    rent only

    Fits: 20GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX 5000 Ada

    NVIDIA · 2026

    32 GB
    VRAM
    rent only

    Fits: 32GB VRAM — 13B-70B quantized

    Auto-discovered via RunPod live pricing.

  • RTX A2000

    NVIDIA · 2026

    6 GB
    VRAM
    rent only

    Fits: 6GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX A4500

    NVIDIA · 2026

    20 GB
    VRAM
    rent only

    Fits: 20GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • RTX A5000

    NVIDIA · 2026

    24 GB
    VRAM
    rent only

    Fits: 24GB VRAM — 13B-70B quantized

    Auto-discovered via RunPod live pricing.

  • RTX PRO 4500

    NVIDIA · 2026

    32 GB
    VRAM
    rent only

    Fits: 32GB VRAM — 13B-70B quantized

    Auto-discovered via RunPod live pricing.

  • RTX PRO 6000 MaxQ

    NVIDIA · 2026

    96 GB
    VRAM
    rent only

    Fits: 96GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • RTX PRO 6000

    NVIDIA · 2026

    96 GB
    VRAM
    rent only

    Fits: 96GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • RTX PRO 6000 WK

    NVIDIA · 2026

    96 GB
    VRAM
    rent only

    Fits: 96GB VRAM — large models

    Auto-discovered via RunPod live pricing.

  • Tesla V100

    NVIDIA · 2026

    16 GB
    VRAM
    rent only

    Fits: 16GB VRAM — small models

    Auto-discovered via RunPod live pricing.

  • V100 SXM2

    NVIDIA · 2026

    16 GB
    VRAM
    rent only

    Fits: 16GB VRAM — small models

    Auto-discovered via RunPod live pricing.