NVIDIA RTX 4090 vs Apple M4 Max (128GB)

How these GPUs compare for running local LLMs — VRAM, bandwidth, price, and per-model fit across popular open-weights models.

54 models compared24 GB vs 128 GB VRAM1008 GB/s vs 546 GB/s$1,999 vs $3,999
GPU A

NVIDIA RTX 4090

VRAM
24 GB
Bandwidth
1008 GB/s
Street price
$1,999
Vendor
nvidia
GPU BRuns more models

Apple M4 Max (128GB)

VRAM
128 GB
Bandwidth
546 GB/s
Street price
$3,999
Vendor
apple

The short answer

Apple M4 Max (128GB) can run 12 models that NVIDIA RTX 4090 can't fit in VRAM — mostly the larger models. For the 33 models both can handle, speeds are similar. If you want headroom for bigger models, Apple M4 Max (128GB) is the clear choice.

12 models B runs, A can't
33 models A is 20%+ faster
0 equal (both run, <20% diff)
9 too large for either

Model-by-model fit

Click any row for the full breakdown. Tie% shown in the Winner column when both GPUs run the model within 20% of each other.

ModelNVIDIA RTX 4090Apple M4 Max (128GB)Winner
c4ai-command-r-v01 35B
35B · command
42 tok/s · Q4_K_M7 tok/s · FP16A (faster)
Command-R+ 104B
104B · command
Too large4 tok/s · Q8_0B (only)
DeepSeek R1 Distill Llama 8B
8B · deepseek
52 tok/s · FP1628 tok/s · FP16A (faster)
DeepSeek R1 Distill Qwen 14B
14.8B · deepseek
56 tok/s · Q8_015 tok/s · FP16A (faster)
DeepSeek R1 Distill Llama 70B
70.6B · deepseek
Too large7 tok/s · Q8_0B (only)
DeepSeek R1 671B
671B · deepseek
Too largeToo large
DeepSeek-V3 685B
685B · deepseek
Too largeToo large
DeepSeek-V3.2 685.4B
685.4B · deepseek
Too largeToo large
gemma-2-9b
9.2B · gemma
45 tok/s · FP1625 tok/s · FP16A (faster)
gemma-2-27b
27.2B · gemma
38 tok/s · Q6_K8 tok/s · FP16A (faster)
Llama 3.1 8B Compact
8B · llama
47 tok/s · FP1625 tok/s · FP16A (faster)
CodeLlama 34B
34B · llama
44 tok/s · Q4_K_M7 tok/s · FP16A (faster)
CodeLlama 34B
34B · llama
44 tok/s · Q4_K_M7 tok/s · FP16A (faster)
Llama 3.3 70B
70.6B · llama
Too large7 tok/s · Q8_0B (only)
Llama 3.1 70B
70.6B · llama
Too large7 tok/s · Q8_0B (only)
Llama 4 Scout 17B
109B · llama
Too large5 tok/s · Q6_KB (only)
Llama-4-Maverick-17B-128E
400B · llama
Too largeToo large
Llama 3.1 405B
405B · llama
Too largeToo large
Mistral 7B v0.1
7.25B · mistral
57 tok/s · FP1631 tok/s · FP16A (faster)
Codestral 22B
22.2B · mistral
46 tok/s · Q6_K10 tok/s · FP16A (faster)
Mixtral 8x7B Instruct v0.1
47B · mixtral
Too large4 tok/s · FP16B (only)
Mistral Large 2 123B
123B · mistral
Too large5 tok/s · Q6_KB (only)
Phi-4-mini 3.8B
3.8B · phi
106 tok/s · FP1657 tok/s · FP16A (faster)
Phi-4 14B
14B · phi
53 tok/s · Q8_014 tok/s · FP16A (faster)
Qwen 2.5 1.5B
1.5B · qwen
246 tok/s · FP16133 tok/s · FP16A (faster)
Qwen 2.5 3B
3.1B · qwen
128 tok/s · FP1669 tok/s · FP16A (faster)
Qwen3.5-4B
4.7B · qwen
87 tok/s · FP1647 tok/s · FP16A (faster)
Qwen 2.5 7B
7.6B · qwen
55 tok/s · FP1630 tok/s · FP16A (faster)
Qwen 2.5 7B
7.6B · qwen
55 tok/s · FP1630 tok/s · FP16A (faster)
Qwen 3 8B
8B · qwen
47 tok/s · FP1625 tok/s · FP16A (faster)
Qwen3.5-9B
9.7B · qwen
43 tok/s · FP1623 tok/s · FP16A (faster)
Qwen 3 32B
32B · qwen
41 tok/s · Q4_K_M6 tok/s · FP16A (faster)
Qwen3.5-35B-A3B
36B · qwen
41 tok/s · Q4_K_M6 tok/s · FP16A (faster)
Qwen 2.5 72B
72.7B · qwen
Too large6 tok/s · Q8_0B (only)
Qwen 2.5 72B
72.7B · qwen
Too large6 tok/s · Q8_0B (only)
Llama 3.2 1B
1.24B · llama
300 tok/s · FP16163 tok/s · FP16A (faster)
Llama 4 Scout 17B
109B · llama
Too large5 tok/s · Q6_KB (only)
DeepSeek R1 671B
671B · deepseek
Too largeToo large
Gemma 3 27B
27B · gemma
40 tok/s · Q5_K_M7 tok/s · FP16A (faster)
Qwen 3 8B
8B · qwen
47 tok/s · FP1625 tok/s · FP16A (faster)
Qwen 3 32B
32B · qwen
41 tok/s · Q4_K_M6 tok/s · FP16A (faster)
Llama 3.1 8B Compact
8B · llama
47 tok/s · FP1625 tok/s · FP16A (faster)
Mixtral 8x7B Instruct v0.1
47B · mixtral
Too large4 tok/s · FP16B (only)
Mistral Small 3.2 24B
24B · mistral
45 tok/s · Q6_K10 tok/s · FP16A (faster)
Command A 111B
111B · command
Too large4 tok/s · Q8_0B (only)
DeepSeek R1 0528
685B · deepseek
Too largeToo large
DeepSeek-V3-0324
684.5B · deepseek
Too largeToo large
DeepSeek-R1-0528-Qwen3-8B
8.2B · qwen
53 tok/s · FP1629 tok/s · FP16A (faster)
Qwen3-235B-A22B-Instruct-2507
235B · qwen
Too largeToo large
Qwen3-30B-A3B-Instruct-2507
30B · qwen
52 tok/s · Q4_K_M16 tok/s · Q8_0A (faster)
Qwen3-4B-Instruct-2507
4B · qwen
110 tok/s · FP1659 tok/s · FP16A (faster)
gemma-4-E4B-it
8B · gemma
55 tok/s · FP1630 tok/s · FP16A (faster)
gemma-4-26B-A4B-it
26.5B · gemma
41 tok/s · Q6_K18 tok/s · Q8_0A (faster)
gemma-4-31B-it
32.7B · gemma
48 tok/s · Q4_K_M15 tok/s · Q8_0A (faster)

Want a different pairing? Browse all comparisons →

Stay ahead of local AI