Local LLM setups, model benchmarking, Apple Silicon inference, and AI routing architecture
3 articles
Two local models, three cloud tiers, 31 automated jobs. Here's how I decide what goes where — and why scheduled jobs never run on local.
Qwen3 14B on Apple Silicon MLX: 10.68 t/s, 0.764s TTFT, 10GB RAM. Real numbers from my actual Mac Mini — not a synthetic benchmark.
The honest reason I run a local LLM — zero marginal cost changes how you work with AI. Here's what it took to set up and what I actually use it for.