AI Hardware
26 GPUs & accelerators · self-host price guide · cloud rates
What to buy for each model size
3B-8B (Llama 3.1 8B, Gemma 4 E4B, Phi-4)
13B-34B (Qwen 2.5 Coder 32B, Llama 3.1 13B)
70B (Llama 3 70B, Qwen 2.5 72B)
235B MoE (Qwen 3 235B)
400B+ (Llama 3 405B, DeepSeek V3 671B MoE)
57 hardware items
- Google TPU v5e
Google · 2023 · 819 GB/s
16 GBVRAM▾$0$1.20/hr/hr cloudFits: Up to 70B in pods (multi-chip)
Price (new): $0Cloud rental: $1.20/hr (GCP) · only available as serviceMemory BW: 819 GB/sGCP-only. Cheap, fast for JAX/TF workloads. Works with vLLM via JetStream.
- Raspberry Pi 5 (8GB)Budget king
Other · 2023 · 17 GB/s
8 GBVRAM▾$80Fits: 3B models at 1-3 tok/s (Phi-3 mini, Gemma 4 E4B)
Price (new): $80Memory BW: 17 GB/sCPU inference only via llama.cpp. Fine for tiny models + learning.
- RTX 3060 (12GB)Budget king
NVIDIA · 2021 · 13 TFLOPS · 360 GB/s
12 GBGDDR6▾$220 usedFits: 7-8B models at 4-bit (Llama 3.1 8B, Gemma 2 9B)
Price (new): $280Price (used): $220FP16 compute: 13 TFLOPSMemory BW: 360 GB/sPower: 170W TDPCheapest entry to CUDA AI. Slow but works. 12GB is the key feature.
- NVIDIA Jetson Orin Nano (8GB)
NVIDIA · 2023 · 20 TFLOPS · 68 GB/s
8 GBVRAM▾$249Fits: 3-7B quantized (Gemma 4 E4B, Phi-4 int4)
Price (new): $249FP16 compute: 20 TFLOPSMemory BW: 68 GB/sPower: 15W TDPBest SBC option. CUDA support — same code as desktop/server.
- RTX A4000 (16GB)
NVIDIA · 2021 · 19 TFLOPS · 448 GB/s
16 GBGDDR6 ECC▾$650 used$0.17/hr liveFits: 13B FP16 / 30B 4-bit
Price (new): $1,100Price (used): $650Cloud on-demand: $0.17/hr · RunPod (live)Cloud spot: $0.09/hr · RunPod (live)FP16 compute: 19 TFLOPSMemory BW: 448 GB/sPower: 140W TDPSingle-slot, low-power. Good for quiet homelab servers. ECC VRAM is nice for long runs.
- RTX 3090 (24GB, used)Best value
NVIDIA · 2020 · 36 TFLOPS · 936 GB/s
24 GBGDDR6X▾$700 used$0.22/hr liveFits: 14B FP16 / 30-34B 4-bit / 70B 2-bit
Price (used): $700Cloud on-demand: $0.22/hr · RunPod (live)Cloud spot: $0.11/hr · RunPod (live)FP16 compute: 36 TFLOPSMemory BW: 936 GB/sPower: 350W TDPThe undisputed used-market champion. 24GB VRAM at $700 beats almost everything new under $2000.
- RTX 4070 Ti Super (16GB)
NVIDIA · 2024 · 45 TFLOPS · 672 GB/s
16 GBGDDR6X▾$800Fits: 13-14B FP16 / 30B 4-bit
Price (new): $800FP16 compute: 45 TFLOPSMemory BW: 672 GB/sPower: 285W TDPGood perf/W. 16GB is limiting for bigger models — 3090 used is better value.
- AMD Radeon RX 7900 XTX (24GB)
AMD · 2022 · 61 TFLOPS · 960 GB/s
24 GBGDDR6▾$900Fits: 14B FP16 / 30-34B 4-bit
Price (new): $900FP16 compute: 61 TFLOPSMemory BW: 960 GB/sPower: 355W TDP24GB VRAM cheaper than NVIDIA. ROCm support works with llama.cpp + vLLM, but ecosystem is smaller.
- NVIDIA T4 (16GB)
NVIDIA · 2018 · 65 TFLOPS · 320 GB/s
16 GBGDDR6▾$900 used$0.35/hr/hr cloudFits: 7-13B 4-bit
Price (used): $900Cloud rental: $0.35/hr (AWS g4dn)FP16 compute: 65 TFLOPSMemory BW: 320 GB/sPower: 70W TDPVery cheap to rent. Too slow for production serving but fine for batch inference.
- Apple M4 (MacBook Air)
Apple · 2025 · 120 GB/s
16 GBLPDDR5X (unified)▾$1,099Fits: 7B-14B quantized models at good speed
Price (new): $1,099Memory BW: 120 GB/sUnified memory is great for LLMs. Use llama.cpp Metal backend.
- RTX 4090 (24GB)Sweet spot
NVIDIA · 2022 · 82 TFLOPS · 1008 GB/s
24 GBGDDR6X▾$1,400 used$0.34/hr liveFits: 14B FP16 / 30-34B 4-bit / 70B 2-bit
Price (new): $1,800Price (used): $1,400Cloud on-demand: $0.34/hr · RunPod (live)Cloud spot: $0.20/hr · RunPod (live)FP16 compute: 82 TFLOPSMemory BW: 1008 GB/sPower: 450W TDPBest single-card consumer GPU. 2x faster than 3090 at same VRAM.
- RTX 5090 (32GB)Flagship
NVIDIA · 2025 · 104 TFLOPS · 1792 GB/s
32 GBGDDR7▾$2,000$0.69/hr liveFits: 20B FP16 / 70B 4-bit
Price (new): $2,000Cloud on-demand: $0.69/hr · RunPod (live)FP16 compute: 104 TFLOPSMemory BW: 1792 GB/sPower: 575W TDPCurrent flagship consumer card. 32GB unlocks Llama 3 70B at 4-bit on a single card.
- Apple M4 Pro (64GB)Sweet spot
Apple · 2025 · 273 GB/s
64 GBLPDDR5X (unified)▾$2,499Fits: Up to ~45B quantized (Qwen 2.5 Coder 32B, Gemma 3 27B)
Price (new): $2,499Memory BW: 273 GB/sBest dev laptop for local AI. Silent + no cooling issues.
- NVIDIA L4 (24GB)
NVIDIA · 2023 · 121 TFLOPS · 300 GB/s
24 GBGDDR6▾$2,500$0.44/hr liveFits: 13-14B FP16 / 30B 4-bit
Price (new): $2,500Cloud on-demand: $0.44/hr · RunPod (live)FP16 compute: 121 TFLOPSMemory BW: 300 GB/sPower: 72W TDPModern replacement for T4. Single-slot, low-power — great for density.
- RTX A6000 (48GB)
NVIDIA · 2020 · 39 TFLOPS · 768 GB/s
48 GBGDDR6 ECC▾$3,500 used$0.33/hr liveFits: 30B FP16 / 70B 4-bit / 120B 2-bit
Price (new): $4,500Price (used): $3,500Cloud on-demand: $0.33/hr · RunPod (live)Cloud spot: $0.25/hr · RunPod (live)FP16 compute: 39 TFLOPSMemory BW: 768 GB/sPower: 300W TDPBest 'fits in a desktop' workstation card. 48GB VRAM without datacenter cost.
- Apple M3 Ultra (192GB)
Apple · 2025 · 800 GB/s
192 GBLPDDR5 (unified)▾$5,999Fits: Up to 235B MoE (Qwen 3 235B) or 70B dense models
Price (new): $5,999Memory BW: 800 GB/sMac Studio. Runs models that require multi-GPU on NVIDIA, on a single box.
- RTX 6000 Ada (48GB)
NVIDIA · 2022 · 91 TFLOPS · 960 GB/s
48 GBGDDR6 ECC▾$6,800$0.74/hr liveFits: 30B FP16 / 70B 4-bit
Price (new): $6,800Cloud on-demand: $0.74/hr · RunPod (live)Cloud spot: $0.40/hr · RunPod (live)FP16 compute: 91 TFLOPSMemory BW: 960 GB/sPower: 300W TDPAda gen of A6000. 2x faster. Worth it only if you need the speed and have budget.
- NVIDIA A100 40GBSweet spot
NVIDIA · 2020 · 312 TFLOPS · 1555 GB/s
40 GBHBM2e▾$8,000 used$1.10/hr/hr cloudFits: 30B FP16 / 70B 4-bit
Price (used): $8,000Cloud rental: $1.10/hr (RunPod) / $3.06/hr (AWS)FP16 compute: 312 TFLOPSMemory BW: 1555 GB/sPower: 400W TDPThe de-facto standard for serious AI training + inference. Huge ecosystem. Used price dropping fast.
- NVIDIA A100 80GB
NVIDIA · 2021 · 312 TFLOPS · 2039 GB/s
80 GBHBM2e▾$12,000 used$1.60/hr/hr cloudFits: 70B FP16 / 200B+ 4-bit
Price (used): $12,000Cloud rental: $1.60/hr (RunPod) / $4.10/hr (AWS p4de)FP16 compute: 312 TFLOPSMemory BW: 2039 GB/sPower: 400W TDPMost rented ML GPU. 80GB fits Llama 3 70B at FP16 on a single card.
- AMD MI300X (192GB)Best value
AMD · 2023 · 1307 TFLOPS · 5300 GB/s
192 GBHBM3▾$15,000$0.50/hr liveFits: 405B FP16 / frontier models at 4-bit
Price (new): $15,000Cloud on-demand: $0.50/hr · RunPod (live)FP16 compute: 1307 TFLOPSMemory BW: 5300 GB/sPower: 750W TDPMore VRAM than H100 at half the price. ROCm is decent now — works with vLLM, PyTorch, llama.cpp.
- NVIDIA H100 (80GB)Flagship
NVIDIA · 2022 · 989 TFLOPS · 3350 GB/s
80 GBHBM3▾$30,000$2.50/hr/hr cloudFits: 70B FP16 / 200B+ 4-bit / 405B with 2+ cards
Price (new): $30,000Cloud rental: $2.50/hr (RunPod) / $8.00/hr (AWS p5)FP16 compute: 989 TFLOPSMemory BW: 3350 GB/sPower: 700W TDP3x faster than A100 for modern transformer workloads. FP8 support doubles it again.
- NVIDIA H200 (141GB)
NVIDIA · 2024 · 989 TFLOPS · 4800 GB/s
141 GBHBM3e▾$40,000$3.50/hr/hr cloudFits: Llama 3 405B at 4-bit on ONE card. 70B FP16 with massive batch size.
Price (new): $40,000Cloud rental: $3.50/hr (RunPod)FP16 compute: 989 TFLOPSMemory BW: 4800 GB/sPower: 700W TDPSame compute as H100 but 76% more VRAM + 43% more bandwidth. Huge win for long context.
- NVIDIA B200 (192GB)Flagship
NVIDIA · 2024 · 2250 TFLOPS · 8000 GB/s
192 GBHBM3e▾$45,000$5.00/hr/hr cloudFits: Single card runs Llama 405B FP16 with room to spare
Price (new): $45,000Cloud rental: $5.00/hr (limited availability)FP16 compute: 2250 TFLOPSMemory BW: 8000 GB/sPower: 1000W TDPBlackwell gen. 2.5x H100 perf. FP4 native. Current king for new deployments.
- NVIDIA GB200 (NVL72)
NVIDIA · 2024 · 162000 TFLOPS · 576000 GB/s
13824 GBHBM3e▾$3,000,000Fits: Frontier training — GPT-5-scale models
Price (new): $3,000,000FP16 compute: 162000 TFLOPSMemory BW: 576000 GB/sPower: 120000W TDPFull rack: 72 Blackwell GPUs + 36 Grace CPUs. Only relevant if you're training a frontier model.
- NVIDIA A10G (24GB)
NVIDIA · 2021 · 125 TFLOPS · 600 GB/s
24 GBGDDR6▾rent only$1.00/hr/hr cloudFits: 13-14B FP16 / 30B 4-bit
Cloud rental: $1.00/hr (AWS g5)FP16 compute: 125 TFLOPSMemory BW: 600 GB/sPower: 150W TDPAWS's workhorse inference GPU. 4x A10G matches a single A100 40GB for many workloads.
- Google TPU v5p
Google · 2024 · 2765 GB/s
95 GBVRAM▾rent only$4.20/hr/hr cloudFits: Frontier training (Gemini-scale)
Cloud rental: $4.20/hr (GCP)Memory BW: 2765 GB/sTop-tier GCP training chip. Compares to H100-H200 for transformer workloads.
- A100 PCIe
NVIDIA · 2026
80 GBVRAM▾rent onlyFits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
- A100 SXM 40GB
NVIDIA · 2026
40 GBVRAM▾rent onlyFits: 40GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
- A100 SXM
NVIDIA · 2026
80 GBVRAM▾rent onlyFits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
- B300
NVIDIA · 2026
288 GBVRAM▾rent onlyFits: 288GB VRAM — large models
Auto-discovered via RunPod live pricing.
- RTX 3070
NVIDIA · 2026
8 GBVRAM▾rent onlyFits: 8GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 3080
NVIDIA · 2026
10 GBVRAM▾rent onlyFits: 10GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 3080 Ti
NVIDIA · 2026
12 GBVRAM▾rent onlyFits: 12GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 3090 Ti
NVIDIA · 2026
24 GBVRAM▾rent onlyFits: 24GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
- RTX 4080
NVIDIA · 2026
16 GBVRAM▾rent onlyFits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 4080 SUPER
NVIDIA · 2026
16 GBVRAM▾rent onlyFits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 5080
NVIDIA · 2026
16 GBVRAM▾rent onlyFits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
- H100 SXM
NVIDIA · 2026
80 GBVRAM▾rent onlyFits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
- H100 NVL
NVIDIA · 2026
94 GBVRAM▾rent onlyFits: 94GB VRAM — large models
Auto-discovered via RunPod live pricing.
- H100 PCIe
NVIDIA · 2026
80 GBVRAM▾rent onlyFits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
- H200 SXM
NVIDIA · 2026
141 GBVRAM▾rent onlyFits: 141GB VRAM — large models
Auto-discovered via RunPod live pricing.
- NVIDIA H200 NVL
NVIDIA · 2026
143 GBVRAM▾rent onlyFits: 143GB VRAM — large models
Auto-discovered via RunPod live pricing.
- L40
NVIDIA · 2026
48 GBVRAM▾rent onlyFits: 48GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
- L40S
NVIDIA · 2026
48 GBVRAM▾rent onlyFits: 48GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
- RTX 2000 Ada
NVIDIA · 2026
16 GBVRAM▾rent onlyFits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 4000 Ada
NVIDIA · 2026
20 GBVRAM▾rent onlyFits: 20GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 4000 Ada SFF
NVIDIA · 2026
20 GBVRAM▾rent onlyFits: 20GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX 5000 Ada
NVIDIA · 2026
32 GBVRAM▾rent onlyFits: 32GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
- RTX A2000
NVIDIA · 2026
6 GBVRAM▾rent onlyFits: 6GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX A4500
NVIDIA · 2026
20 GBVRAM▾rent onlyFits: 20GB VRAM — small models
Auto-discovered via RunPod live pricing.
- RTX A5000
NVIDIA · 2026
24 GBVRAM▾rent onlyFits: 24GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
- RTX PRO 4500
NVIDIA · 2026
32 GBVRAM▾rent onlyFits: 32GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
- RTX PRO 6000 MaxQ
NVIDIA · 2026
96 GBVRAM▾rent onlyFits: 96GB VRAM — large models
Auto-discovered via RunPod live pricing.
- RTX PRO 6000
NVIDIA · 2026
96 GBVRAM▾rent onlyFits: 96GB VRAM — large models
Auto-discovered via RunPod live pricing.
- RTX PRO 6000 WK
NVIDIA · 2026
96 GBVRAM▾rent onlyFits: 96GB VRAM — large models
Auto-discovered via RunPod live pricing.
- Tesla V100
NVIDIA · 2026
16 GBVRAM▾rent onlyFits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
- V100 SXM2
NVIDIA · 2026
16 GBVRAM▾rent onlyFits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.