Local inference performance analysis
System
macOS 26.3
CPU
Apple M4 Pro
RAM
48 GB
GPU
Apple M4 Pro GPU (unified memory)
Total Runs
62
Best Avg F1
98.8%
mlx/mlx-community/Qwen3-8B-4bit
Fastest Avg
7.1s
mlx/mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit
Lowest Avg Memory
2.8 GB
mlx/Narutoouz/GLM-4-9B-0414-4bit-DWQ
F1 × speed0.3 × confidence · higher is better
Bubble size = memory · top-left is ideal
Average time & memory per model
| Timestamp | Model | Backend | Time (s) | Memory (GB) | Tokens | F1 | Composite | Precision | Recall | Prompt | Status |
|---|