Local LLM Benchmark Report

Local inference performance analysis

62 runs

System

macOS 26.3

CPU

Apple M4 Pro

RAM

48 GB

GPU

Apple M4 Pro GPU (unified memory)

Last run: 2026-02-27 10:18 Models tested: 10

Total Runs

Best Avg F1

98.8%

mlx/mlx-community/Qwen3-8B-4bit

Fastest Avg

7.1s

mlx/mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit

Lowest Avg Memory

2.8 GB

mlx/Narutoouz/GLM-4-9B-0414-4bit-DWQ

Composite Score by Model

F1 × speed^0.3 × confidence · higher is better

Bubble size = memory · top-left is ideal

Metric:

Model:

Prompt:

Average time & memory per model

Timestamp	Model	Backend	Time (s)	Memory (GB)	Tokens	F1	Composite	Precision	Recall	Prompt	Status