NVIDIA RTX 4070 vs Apple M2 Ultra (192GB)

How these GPUs compare for running local LLMs — VRAM, bandwidth, price, and per-model fit across popular open-weights models.

54 models compared12 GB vs 192 GB VRAM504 GB/s vs 800 GB/s$655 vs $5,499
GPU A

NVIDIA RTX 4070

VRAM
12 GB
Bandwidth
504 GB/s
Street price
$655
Vendor
nvidia
GPU BRuns more models

Apple M2 Ultra (192GB)

VRAM
192 GB
Bandwidth
800 GB/s
Street price
$5,499
Vendor
apple

The short answer

Apple M2 Ultra (192GB) can run 26 models that NVIDIA RTX 4070 can't fit in VRAM — mostly the larger models. For the 20 models both can handle, speeds are similar. If you want headroom for bigger models, Apple M2 Ultra (192GB) is the clear choice.

26 models B runs, A can't
14 models A is 20%+ faster
6 models B is 20%+ faster
0 equal (both run, <20% diff)
8 too large for either

Model-by-model fit

Click any row for the full breakdown. Tie% shown in the Winner column when both GPUs run the model within 20% of each other.

ModelNVIDIA RTX 4070Apple M2 Ultra (192GB)Winner
c4ai-command-r-v01 35B
35B · command
Too large10 tok/s · FP16B (only)
Command-R+ 104B
104B · command
Too large6 tok/s · Q8_0B (only)
DeepSeek R1 Distill Llama 8B
8B · deepseek
50 tok/s · Q8_041 tok/s · FP16A (faster)
DeepSeek R1 Distill Qwen 14B
14.8B · deepseek
49 tok/s · Q4_K_M23 tok/s · FP16A (faster)
DeepSeek R1 Distill Llama 70B
70.6B · deepseek
Too large5 tok/s · FP16B (only)
DeepSeek R1 671B
671B · deepseek
Too largeToo large
DeepSeek-V3 685B
685B · deepseek
Too largeToo large
DeepSeek-V3.2 685.4B
685.4B · deepseek
Too largeToo large
gemma-2-9b
9.2B · gemma
44 tok/s · Q8_036 tok/s · FP16A (faster)
gemma-2-27b
27.2B · gemma
Too large12 tok/s · FP16B (only)
Llama 3.1 8B Compact
8B · llama
47 tok/s · Q8_037 tok/s · FP16A (faster)
CodeLlama 34B
34B · llama
Too large10 tok/s · FP16B (only)
CodeLlama 34B
34B · llama
Too large10 tok/s · FP16B (only)
Llama 3.3 70B
70.6B · llama
Too large5 tok/s · FP16B (only)
Llama 3.1 70B
70.6B · llama
Too large5 tok/s · FP16B (only)
Llama 4 Scout 17B
109B · llama
Too large5 tok/s · Q8_0B (only)
Llama-4-Maverick-17B-128E
400B · llama
Too largeToo large
Llama 3.1 405B
405B · llama
Too largeToo large
Mistral 7B v0.1
7.25B · mistral
55 tok/s · Q8_045 tok/s · FP16A (faster)
Codestral 22B
22.2B · mistral
Too large15 tok/s · FP16B (only)
Mixtral 8x7B Instruct v0.1
47B · mixtral
Too large6 tok/s · FP16B (only)
Mistral Large 2 123B
123B · mistral
Too large5 tok/s · Q8_0B (only)
Phi-4-mini 3.8B
3.8B · phi
53 tok/s · FP1684 tok/s · FP16B (faster)
Phi-4 14B
14B · phi
39 tok/s · Q5_K_M21 tok/s · FP16A (faster)
Qwen 2.5 1.5B
1.5B · qwen
123 tok/s · FP16195 tok/s · FP16B (faster)
Qwen 2.5 3B
3.1B · qwen
64 tok/s · FP16102 tok/s · FP16B (faster)
Qwen3.5-4B
4.7B · qwen
43 tok/s · FP1669 tok/s · FP16B (faster)
Qwen 2.5 7B
7.6B · qwen
53 tok/s · Q8_043 tok/s · FP16A (faster)
Qwen 2.5 7B
7.6B · qwen
53 tok/s · Q8_043 tok/s · FP16A (faster)
Qwen 3 8B
8B · qwen
47 tok/s · Q8_037 tok/s · FP16A (faster)
Qwen3.5-9B
9.7B · qwen
42 tok/s · Q8_034 tok/s · FP16A (faster)
Qwen 3 32B
32B · qwen
Too large9 tok/s · FP16B (only)
Qwen3.5-35B-A3B
36B · qwen
Too large9 tok/s · FP16B (only)
Qwen 2.5 72B
72.7B · qwen
Too large5 tok/s · FP16B (only)
Qwen 2.5 72B
72.7B · qwen
Too large5 tok/s · FP16B (only)
Llama 3.2 1B
1.24B · llama
150 tok/s · FP16238 tok/s · FP16B (faster)
Llama 4 Scout 17B
109B · llama
Too large5 tok/s · Q8_0B (only)
DeepSeek R1 671B
671B · deepseek
Too largeToo large
Gemma 3 27B
27B · gemma
Too large11 tok/s · FP16B (only)
Qwen 3 8B
8B · qwen
47 tok/s · Q8_037 tok/s · FP16A (faster)
Qwen 3 32B
32B · qwen
Too large9 tok/s · FP16B (only)
Llama 3.1 8B Compact
8B · llama
47 tok/s · Q8_037 tok/s · FP16A (faster)
Mixtral 8x7B Instruct v0.1
47B · mixtral
Too large6 tok/s · FP16B (only)
Mistral Small 3.2 24B
24B · mistral
Too large15 tok/s · FP16B (only)
Command A 111B
111B · command
Too large6 tok/s · Q8_0B (only)
DeepSeek R1 0528
685B · deepseek
Too largeToo large
DeepSeek-V3-0324
684.5B · deepseek
Too largeToo large
DeepSeek-R1-0528-Qwen3-8B
8.2B · qwen
53 tok/s · Q8_042 tok/s · FP16A (faster)
Qwen3-235B-A22B-Instruct-2507
235B · qwen
Too large5 tok/s · Q4_K_MB (only)
Qwen3-30B-A3B-Instruct-2507
30B · qwen
Too large23 tok/s · Q8_0B (only)
Qwen3-4B-Instruct-2507
4B · qwen
55 tok/s · FP1687 tok/s · FP16B (faster)
gemma-4-E4B-it
8B · gemma
55 tok/s · Q8_044 tok/s · FP16A (faster)
gemma-4-26B-A4B-it
26.5B · gemma
Too large26 tok/s · Q8_0B (only)
gemma-4-31B-it
32.7B · gemma
Too large21 tok/s · Q8_0B (only)

Want a different pairing? Browse all comparisons →

Stay ahead of local AI