Local LLM Benchmark Report

Local inference performance analysis

62 runs

System

macOS 26.3

CPU

Apple M4 Pro

RAM

48 GB

GPU

Apple M4 Pro GPU (unified memory)

Last run: 2026-02-27 10:18 Models tested: 10

Total Runs

62

Best Avg F1

98.8%

mlx/mlx-community/Qwen3-8B-4bit

Fastest Avg

7.1s

mlx/mlx-community/Qwen3-Coder-30B-A3B-Instruct-4bit

Lowest Avg Memory

2.8 GB

mlx/Narutoouz/GLM-4-9B-0414-4bit-DWQ

Composite Score by Model

F1 × speed0.3 × confidence · higher is better

Speed vs Quality

Bubble size = memory · top-left is ideal

Metrics Over Time

Resource Usage

Average time & memory per model

All Results

Timestamp Model Backend Time (s) Memory (GB) Tokens F1 Composite Precision Recall Prompt Status