AI Hardware

26 GPUs & accelerators · self-host price guide · cloud rates

Live

Tier

All Edge / On-device Consumer GPUs Workstation Datacenter

Sort

Cheapest first Most VRAM Fastest Newest

What to buy for each model size

3B-8B (Llama 3.1 8B, Gemma 4 E4B, Phi-4)

Raspberry Pi 5Slow but works

$80

Jetson Orin NanoFast edge inference

$249

Apple M4 MacBook AirSilent + fast

$1099

Cloud: T4$250/mo if 24/7

$0.35/hr

13B-34B (Qwen 2.5 Coder 32B, Llama 3.1 13B)

RTX 3060 12GB4-bit only, slow but cheap

$280

RTX 3090 24GB (used)🏆 Best value. Full speed.

$700

RTX 4090 24GBFastest consumer option

$1800

Cloud: A10G$720/mo if 24/7

$1.00/hr

70B (Llama 3 70B, Qwen 2.5 72B)

2× RTX 3090 (used)48GB total, 4-bit at good speed

$1400

RTX 5090 32GB4-bit fits on single card

$2000

M3 Ultra Mac Studio 192GBFull FP16 on unified memory

$5999

Cloud: A100 80GBFP16 fits. $1150/mo if 24/7

$1.60/hr

235B MoE (Qwen 3 235B)

M3 Ultra Mac Studio 192GBQuantized fits

$5999

2× A100 80GBStandard self-host setup

~$24k used

1× MI300X 192GBFits on single card

$15k

Cloud: H100 80GB x2$3600/mo if 24/7

$5/hr

400B+ (Llama 3 405B, DeepSeek V3 671B MoE)

1× B200 192GBSingle card fits FP16

$45k

1× H200 141GBFits at 4-bit

$40k

1× MI300X 192GB🏆 Cheapest single-card option

$15k

Cloud: H200$2500/mo if 24/7

$3.50/hr

57 hardware items

Google TPU v5e
Google · 2023 · 819 GB/s
16 GB
VRAM
$0
$1.20/hr/hr cloud
▾
Fits: Up to 70B in pods (multi-chip)
Price (new): $0
Cloud rental: $1.20/hr (GCP) · only available as service
Memory BW: 819 GB/s
GCP-only. Cheap, fast for JAX/TF workloads. Works with vLLM via JetStream.
Raspberry Pi 5 (8GB)Budget king
Other · 2023 · 17 GB/s
8 GB
VRAM
$80
▾
Fits: 3B models at 1-3 tok/s (Phi-3 mini, Gemma 4 E4B)
Price (new): $80
Memory BW: 17 GB/s
CPU inference only via llama.cpp. Fine for tiny models + learning.
RTX 3060 (12GB)Budget king
NVIDIA · 2021 · 13 TFLOPS · 360 GB/s
12 GB
GDDR6
$220 used
▾
Fits: 7-8B models at 4-bit (Llama 3.1 8B, Gemma 2 9B)
Price (new): $280
Price (used): $220
FP16 compute: 13 TFLOPS
Memory BW: 360 GB/s
Power: 170W TDP
Cheapest entry to CUDA AI. Slow but works. 12GB is the key feature.
NVIDIA Jetson Orin Nano (8GB)
NVIDIA · 2023 · 20 TFLOPS · 68 GB/s
8 GB
VRAM
$249
▾
Fits: 3-7B quantized (Gemma 4 E4B, Phi-4 int4)
Price (new): $249
FP16 compute: 20 TFLOPS
Memory BW: 68 GB/s
Power: 15W TDP
Best SBC option. CUDA support — same code as desktop/server.
RTX A4000 (16GB)
NVIDIA · 2021 · 19 TFLOPS · 448 GB/s
16 GB
GDDR6 ECC
$650 used
$0.17/hr live
▾
Fits: 13B FP16 / 30B 4-bit
Price (new): $1,100
Price (used): $650
Cloud on-demand: $0.17/hr · RunPod (live)
Cloud spot: $0.09/hr · RunPod (live)
FP16 compute: 19 TFLOPS
Memory BW: 448 GB/s
Power: 140W TDP
Single-slot, low-power. Good for quiet homelab servers. ECC VRAM is nice for long runs.
RTX 3090 (24GB, used)Best value
NVIDIA · 2020 · 36 TFLOPS · 936 GB/s
24 GB
GDDR6X
$700 used
$0.22/hr live
▾
Fits: 14B FP16 / 30-34B 4-bit / 70B 2-bit
Price (used): $700
Cloud on-demand: $0.22/hr · RunPod (live)
Cloud spot: $0.11/hr · RunPod (live)
FP16 compute: 36 TFLOPS
Memory BW: 936 GB/s
Power: 350W TDP
The undisputed used-market champion. 24GB VRAM at $700 beats almost everything new under $2000.
RTX 4070 Ti Super (16GB)
NVIDIA · 2024 · 45 TFLOPS · 672 GB/s
16 GB
GDDR6X
$800
▾
Fits: 13-14B FP16 / 30B 4-bit
Price (new): $800
FP16 compute: 45 TFLOPS
Memory BW: 672 GB/s
Power: 285W TDP
Good perf/W. 16GB is limiting for bigger models — 3090 used is better value.
AMD Radeon RX 7900 XTX (24GB)
AMD · 2022 · 61 TFLOPS · 960 GB/s
24 GB
GDDR6
$900
▾
Fits: 14B FP16 / 30-34B 4-bit
Price (new): $900
FP16 compute: 61 TFLOPS
Memory BW: 960 GB/s
Power: 355W TDP
24GB VRAM cheaper than NVIDIA. ROCm support works with llama.cpp + vLLM, but ecosystem is smaller.
NVIDIA T4 (16GB)
NVIDIA · 2018 · 65 TFLOPS · 320 GB/s
16 GB
GDDR6
$900 used
$0.35/hr/hr cloud
▾
Fits: 7-13B 4-bit
Price (used): $900
Cloud rental: $0.35/hr (AWS g4dn)
FP16 compute: 65 TFLOPS
Memory BW: 320 GB/s
Power: 70W TDP
Very cheap to rent. Too slow for production serving but fine for batch inference.
Apple M4 (MacBook Air)
Apple · 2025 · 120 GB/s
16 GB
LPDDR5X (unified)
$1,099
▾
Fits: 7B-14B quantized models at good speed
Price (new): $1,099
Memory BW: 120 GB/s
Unified memory is great for LLMs. Use llama.cpp Metal backend.
RTX 4090 (24GB)Sweet spot
NVIDIA · 2022 · 82 TFLOPS · 1008 GB/s
24 GB
GDDR6X
$1,400 used
$0.34/hr live
▾
Fits: 14B FP16 / 30-34B 4-bit / 70B 2-bit
Price (new): $1,800
Price (used): $1,400
Cloud on-demand: $0.34/hr · RunPod (live)
Cloud spot: $0.20/hr · RunPod (live)
FP16 compute: 82 TFLOPS
Memory BW: 1008 GB/s
Power: 450W TDP
Best single-card consumer GPU. 2x faster than 3090 at same VRAM.
RTX 5090 (32GB)Flagship
NVIDIA · 2025 · 104 TFLOPS · 1792 GB/s
32 GB
GDDR7
$2,000
$0.69/hr live
▾
Fits: 20B FP16 / 70B 4-bit
Price (new): $2,000
Cloud on-demand: $0.69/hr · RunPod (live)
FP16 compute: 104 TFLOPS
Memory BW: 1792 GB/s
Power: 575W TDP
Current flagship consumer card. 32GB unlocks Llama 3 70B at 4-bit on a single card.
Apple M4 Pro (64GB)Sweet spot
Apple · 2025 · 273 GB/s
64 GB
LPDDR5X (unified)
$2,499
▾
Fits: Up to ~45B quantized (Qwen 2.5 Coder 32B, Gemma 3 27B)
Price (new): $2,499
Memory BW: 273 GB/s
Best dev laptop for local AI. Silent + no cooling issues.
NVIDIA L4 (24GB)
NVIDIA · 2023 · 121 TFLOPS · 300 GB/s
24 GB
GDDR6
$2,500
$0.44/hr live
▾
Fits: 13-14B FP16 / 30B 4-bit
Price (new): $2,500
Cloud on-demand: $0.44/hr · RunPod (live)
FP16 compute: 121 TFLOPS
Memory BW: 300 GB/s
Power: 72W TDP
Modern replacement for T4. Single-slot, low-power — great for density.
RTX A6000 (48GB)
NVIDIA · 2020 · 39 TFLOPS · 768 GB/s
48 GB
GDDR6 ECC
$3,500 used
$0.33/hr live
▾
Fits: 30B FP16 / 70B 4-bit / 120B 2-bit
Price (new): $4,500
Price (used): $3,500
Cloud on-demand: $0.33/hr · RunPod (live)
Cloud spot: $0.25/hr · RunPod (live)
FP16 compute: 39 TFLOPS
Memory BW: 768 GB/s
Power: 300W TDP
Best 'fits in a desktop' workstation card. 48GB VRAM without datacenter cost.
Apple M3 Ultra (192GB)
Apple · 2025 · 800 GB/s
192 GB
LPDDR5 (unified)
$5,999
▾
Fits: Up to 235B MoE (Qwen 3 235B) or 70B dense models
Price (new): $5,999
Memory BW: 800 GB/s
Mac Studio. Runs models that require multi-GPU on NVIDIA, on a single box.
RTX 6000 Ada (48GB)
NVIDIA · 2022 · 91 TFLOPS · 960 GB/s
48 GB
GDDR6 ECC
$6,800
$0.74/hr live
▾
Fits: 30B FP16 / 70B 4-bit
Price (new): $6,800
Cloud on-demand: $0.74/hr · RunPod (live)
Cloud spot: $0.40/hr · RunPod (live)
FP16 compute: 91 TFLOPS
Memory BW: 960 GB/s
Power: 300W TDP
Ada gen of A6000. 2x faster. Worth it only if you need the speed and have budget.
NVIDIA A100 40GBSweet spot
NVIDIA · 2020 · 312 TFLOPS · 1555 GB/s
40 GB
HBM2e
$8,000 used
$1.10/hr/hr cloud
▾
Fits: 30B FP16 / 70B 4-bit
Price (used): $8,000
Cloud rental: $1.10/hr (RunPod) / $3.06/hr (AWS)
FP16 compute: 312 TFLOPS
Memory BW: 1555 GB/s
Power: 400W TDP
The de-facto standard for serious AI training + inference. Huge ecosystem. Used price dropping fast.
NVIDIA A100 80GB
NVIDIA · 2021 · 312 TFLOPS · 2039 GB/s
80 GB
HBM2e
$12,000 used
$1.60/hr/hr cloud
▾
Fits: 70B FP16 / 200B+ 4-bit
Price (used): $12,000
Cloud rental: $1.60/hr (RunPod) / $4.10/hr (AWS p4de)
FP16 compute: 312 TFLOPS
Memory BW: 2039 GB/s
Power: 400W TDP
Most rented ML GPU. 80GB fits Llama 3 70B at FP16 on a single card.
AMD MI300X (192GB)Best value
AMD · 2023 · 1307 TFLOPS · 5300 GB/s
192 GB
HBM3
$15,000
$0.50/hr live
▾
Fits: 405B FP16 / frontier models at 4-bit
Price (new): $15,000
Cloud on-demand: $0.50/hr · RunPod (live)
FP16 compute: 1307 TFLOPS
Memory BW: 5300 GB/s
Power: 750W TDP
More VRAM than H100 at half the price. ROCm is decent now — works with vLLM, PyTorch, llama.cpp.
NVIDIA H100 (80GB)Flagship
NVIDIA · 2022 · 989 TFLOPS · 3350 GB/s
80 GB
HBM3
$30,000
$2.50/hr/hr cloud
▾
Fits: 70B FP16 / 200B+ 4-bit / 405B with 2+ cards
Price (new): $30,000
Cloud rental: $2.50/hr (RunPod) / $8.00/hr (AWS p5)
FP16 compute: 989 TFLOPS
Memory BW: 3350 GB/s
Power: 700W TDP
3x faster than A100 for modern transformer workloads. FP8 support doubles it again.
NVIDIA H200 (141GB)
NVIDIA · 2024 · 989 TFLOPS · 4800 GB/s
141 GB
HBM3e
$40,000
$3.50/hr/hr cloud
▾
Fits: Llama 3 405B at 4-bit on ONE card. 70B FP16 with massive batch size.
Price (new): $40,000
Cloud rental: $3.50/hr (RunPod)
FP16 compute: 989 TFLOPS
Memory BW: 4800 GB/s
Power: 700W TDP
Same compute as H100 but 76% more VRAM + 43% more bandwidth. Huge win for long context.
NVIDIA B200 (192GB)Flagship
NVIDIA · 2024 · 2250 TFLOPS · 8000 GB/s
192 GB
HBM3e
$45,000
$5.00/hr/hr cloud
▾
Fits: Single card runs Llama 405B FP16 with room to spare
Price (new): $45,000
Cloud rental: $5.00/hr (limited availability)
FP16 compute: 2250 TFLOPS
Memory BW: 8000 GB/s
Power: 1000W TDP
Blackwell gen. 2.5x H100 perf. FP4 native. Current king for new deployments.
NVIDIA GB200 (NVL72)
NVIDIA · 2024 · 162000 TFLOPS · 576000 GB/s
13824 GB
HBM3e
$3,000,000
▾
Fits: Frontier training — GPT-5-scale models
Price (new): $3,000,000
FP16 compute: 162000 TFLOPS
Memory BW: 576000 GB/s
Power: 120000W TDP
Full rack: 72 Blackwell GPUs + 36 Grace CPUs. Only relevant if you're training a frontier model.
NVIDIA A10G (24GB)
NVIDIA · 2021 · 125 TFLOPS · 600 GB/s
24 GB
GDDR6
rent only
$1.00/hr/hr cloud
▾
Fits: 13-14B FP16 / 30B 4-bit
Cloud rental: $1.00/hr (AWS g5)
FP16 compute: 125 TFLOPS
Memory BW: 600 GB/s
Power: 150W TDP
AWS's workhorse inference GPU. 4x A10G matches a single A100 40GB for many workloads.
Google TPU v5p
Google · 2024 · 2765 GB/s
95 GB
VRAM
rent only
$4.20/hr/hr cloud
▾
Fits: Frontier training (Gemini-scale)
Cloud rental: $4.20/hr (GCP)
Memory BW: 2765 GB/s
Top-tier GCP training chip. Compares to H100-H200 for transformer workloads.
A100 PCIe
NVIDIA · 2026
80 GB
VRAM
rent only
▾
Fits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
A100 SXM 40GB
NVIDIA · 2026
40 GB
VRAM
rent only
▾
Fits: 40GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
A100 SXM
NVIDIA · 2026
80 GB
VRAM
rent only
▾
Fits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
B300
NVIDIA · 2026
288 GB
VRAM
rent only
▾
Fits: 288GB VRAM — large models
Auto-discovered via RunPod live pricing.
RTX 3070
NVIDIA · 2026
8 GB
VRAM
rent only
▾
Fits: 8GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 3080
NVIDIA · 2026
10 GB
VRAM
rent only
▾
Fits: 10GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 3080 Ti
NVIDIA · 2026
12 GB
VRAM
rent only
▾
Fits: 12GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 3090 Ti
NVIDIA · 2026
24 GB
VRAM
rent only
▾
Fits: 24GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
RTX 4080
NVIDIA · 2026
16 GB
VRAM
rent only
▾
Fits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 4080 SUPER
NVIDIA · 2026
16 GB
VRAM
rent only
▾
Fits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 5080
NVIDIA · 2026
16 GB
VRAM
rent only
▾
Fits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
H100 SXM
NVIDIA · 2026
80 GB
VRAM
rent only
▾
Fits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
H100 NVL
NVIDIA · 2026
94 GB
VRAM
rent only
▾
Fits: 94GB VRAM — large models
Auto-discovered via RunPod live pricing.
H100 PCIe
NVIDIA · 2026
80 GB
VRAM
rent only
▾
Fits: 80GB VRAM — large models
Auto-discovered via RunPod live pricing.
H200 SXM
NVIDIA · 2026
141 GB
VRAM
rent only
▾
Fits: 141GB VRAM — large models
Auto-discovered via RunPod live pricing.
NVIDIA H200 NVL
NVIDIA · 2026
143 GB
VRAM
rent only
▾
Fits: 143GB VRAM — large models
Auto-discovered via RunPod live pricing.
L40
NVIDIA · 2026
48 GB
VRAM
rent only
▾
Fits: 48GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
L40S
NVIDIA · 2026
48 GB
VRAM
rent only
▾
Fits: 48GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
RTX 2000 Ada
NVIDIA · 2026
16 GB
VRAM
rent only
▾
Fits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 4000 Ada
NVIDIA · 2026
20 GB
VRAM
rent only
▾
Fits: 20GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 4000 Ada SFF
NVIDIA · 2026
20 GB
VRAM
rent only
▾
Fits: 20GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX 5000 Ada
NVIDIA · 2026
32 GB
VRAM
rent only
▾
Fits: 32GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
RTX A2000
NVIDIA · 2026
6 GB
VRAM
rent only
▾
Fits: 6GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX A4500
NVIDIA · 2026
20 GB
VRAM
rent only
▾
Fits: 20GB VRAM — small models
Auto-discovered via RunPod live pricing.
RTX A5000
NVIDIA · 2026
24 GB
VRAM
rent only
▾
Fits: 24GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
RTX PRO 4500
NVIDIA · 2026
32 GB
VRAM
rent only
▾
Fits: 32GB VRAM — 13B-70B quantized
Auto-discovered via RunPod live pricing.
RTX PRO 6000 MaxQ
NVIDIA · 2026
96 GB
VRAM
rent only
▾
Fits: 96GB VRAM — large models
Auto-discovered via RunPod live pricing.
RTX PRO 6000
NVIDIA · 2026
96 GB
VRAM
rent only
▾
Fits: 96GB VRAM — large models
Auto-discovered via RunPod live pricing.
RTX PRO 6000 WK
NVIDIA · 2026
96 GB
VRAM
rent only
▾
Fits: 96GB VRAM — large models
Auto-discovered via RunPod live pricing.
Tesla V100
NVIDIA · 2026
16 GB
VRAM
rent only
▾
Fits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.
V100 SXM2
NVIDIA · 2026
16 GB
VRAM
rent only
▾
Fits: 16GB VRAM — small models
Auto-discovered via RunPod live pricing.