Mac Studio Local AI Workbench
Production-grade personal AI workstation — local-first, governed, documented, and restorable. Built in 10 days on Apple Silicon.
/Volumes/OKH-Local/05_Research_Vault/mac-studio-setup_DONE_2026-05-12.tar.gz
2026-05-13 post-baseline: Ollama plist corrected, Open WebUI updated to latest, Edge PWA resolved. Project page published.
2026-05-27/28: OpenClaw installed, agent named Larry, MEMORY.md seeded, SearXNG web search live.
What This Is Worth
In May 2026, a Facebook Marketplace listing offered an M1 Mac Studio 32GB with "PageSpace AI pre-installed" for $7,500. PageSpace is open-source and free. An M1 Mac Studio 32GB sells for $600–$900 on the secondary market.
This workbench runs on a Mac Studio M4 Max — three to four generations newer, with substantially more memory and compute. It runs 10 local models across three inference runtimes. It connects to 11 live MCP servers. It has a documented operating model, governance policies, restore scripts, verified benchmarks, and an offsite backup strategy.
The $7,500 listing was selling the box. This build is the operating model.
Status Board
q8_0 KV cache/Volumes/OKH-Local/07_Local_LLMs/ollama/modelslocalhost:3000/Volumes/OKH-Local/07_Local_LLMs/huggingface-cache.zprofileTimeline
2026
Hardware planning and research
2026
Hardware unboxed, OS baseline
2026
Development foundation
.zprofile. Generated GitHub SSH key (ed25519), added as "Mac Studio M4 Max", verified with Hi OKHP3!.
eval lines in .zprofile from paste issue — cleaned to single line.
2026
Ollama install and first models
OKH-Local. Configured OLLAMA_MODELS, OLLAMA_FLASH_ATTENTION, OLLAMA_KV_CACHE_TYPE in .zprofile and Homebrew plist. Pulled phi4:14b, gemma3:12b, gemma3:27b, codestral:22b.
OLLAMA_MODELS env var not picked up by Homebrew service — required editing homebrew.mxcl.ollama.plist directly. Models initially landed in ~/.ollama — corrected with mv and manifest cleanup.
2026
Model collection and LM Studio
2026
Open WebUI, VSCode, and MCP foundation
localhost:3000. Installed VSCode 1.119.0 + Continue.dev v1.2.22. Configured Continue.dev with all local Ollama models (Codestral as primary autocomplete). Installed Node.js 26.0.0 for MCP server support.
2026
MCP servers and app ecosystem
claude_desktop_config.json with Notion and GitHub MCP. Used full paths to fix PATH issues for background service. Pulled GitHub MCP via Docker. Verified 11 MCP servers connected in Claude Desktop: Notion, GitHub, PageSpace, Mermaid Chart (enterprise), Google suite (Calendar/Gmail/Drive), Microsoft 365, and additional business tool integrations.
2026
App ecosystem and Edge PWAs
/Applications/. 30+ Edge PWAs. All brand properties pinned as apps.
sudo rm -rf.
2026
VSCode extension audit across machines
2026
Git workspace, OneDrive sync, and repos
okhp3 alias in .zprofile. Updated VSCode workspace file to OKH root. Updated all 28 repo SSH remotes from HTTPS to SSH via batch script. Installed mlx-lm 0.31.3 — verified at 139 tok/s on Phi-4 mini.
.git/index with file locks during initial sync. Decision: skip for now (low risk — commits come from Replit/Claude Code/Codex/Mac).
2026
Normalization, benchmarking, and baseline archive
models/models path from UI misconfiguration. Created compatibility symlinks. Externalized HuggingFace cache, authenticated as okhp3. Created model inventories, benchmark workspace, restore script, and verification script. Benchmarked llama3.1:8b and phi4:14b across 5 tests. Created known-good baseline documentation and archived closure package.
2026
Post-baseline hardening and publication
/Volumes/OKH-Local/07_Local_LLMs/ollama/models. Edited plist directly, restarted service, confirmed all 6 models visible. Updated Open WebUI — stopped and removed old container, pulled latest image, recreated container with original run parameters. The v0.9.5 update banner resolved. Resolved Edge PWA localhost authentication — Open WebUI pinned as standalone Edge app. Confirmed access at localhost:3000 without login prompt. Project page published to overkillhill.com.
2026
OpenClaw installation
2026
Larry comes online
Meet Larry
Larry is an OpenClaw autonomous agent running on the Mac Studio M4 Max. He doesn't wait for a prompt. He runs in the background, starts at login, and handles the zero-cost tier of the Council of AIs workflow: background tasks, Apple Notes, Reminders, file operations, RAG queries, scheduled jobs, and first-pass summarization.
The baseline stack — Ollama, LM Studio, Open WebUI, mlx-lm — is infrastructure. Open WebUI is a chat interface. You open it, you type, you get a response. Larry is something different.
The original Larry the Lobster — SpongeBob's friend from Bikini Bottom — was the lifeguard at Goo Lagoon. He was the only thing standing between the residents and a watery grave. This Larry has a similar mandate: protect the workflow from token waste and busywork.
He introduced himself on first boot with: "Larry's got it."
| Open WebUI | Larry (OpenClaw) | |
|---|---|---|
| What it is | Chat interface | Autonomous agent |
| How it works | You send a message | Works in background |
| Model | Any local Ollama model | gemma3:27b (131k context) |
| Skills | — | 28 eligible (Notes, GitHub, Notion, web search, more) |
| Starts at | Manual | Login (LaunchAgent) |
| Best for | Interactive queries | Background tasks, automation |
Model Inventory
| Model | Format | Size | Runtime | Use |
|---|---|---|---|---|
phi4:14b |
GGUF | 9.1 GB | Ollama | Fast reasoning, instruction following. Daily driver. |
gemma3:12b |
GGUF | 8.1 GB | Ollama | General purpose, mid-tier |
gemma3:27b |
GGUF | 17 GB | Ollama | Flagship general, heavy reasoning. Daily driver (quality). |
codestral:22b |
GGUF | 12 GB | Ollama | Code generation, Continue.dev autocomplete |
mistral-small3.1:24b |
GGUF | 15 GB | Ollama | General purpose, fast |
llama3.1:8b |
GGUF | 4.9 GB | Ollama | Lightweight utility, bulk tasks |
gemma-4-E4B-it |
GGUF Q4_K_M | 6.33 GB | LM Studio | Multimodal (image input), fast MoE |
gemma-4-E2B-it |
GGUF | ~5 GB | LM Studio | Ultra-light, fastest responses |
gemma-4-26B-A4B-it |
GGUF | ~16 GB | LM Studio | MoE, 4B active params, larger knowledge |
Phi-4-mini-instruct-4bit |
MLX 4-bit | 2.18 GB | mlx-lm | Direct MLX inference — 139 tok/s on Apple Silicon |
Note: llama3.3:70b (42 GB) was attempted and removed — exceeds the 36 GB ceiling with apps running.
Model Selection Policy
Western-lab models only. Every model in this workbench comes from a Western-lab open-weight release: Meta (Llama), Google (Gemma), Mistral AI (Mistral, Codestral), and Microsoft (Phi).
This is a deliberate policy, not a default. Chinese cloud AI services operate under PRC data law. Open-weight models from any lab are architecturally isolated once downloaded — weights are inert files with no network access — but maintaining a clean Western-only boundary is simpler to reason about and easier to defend.
Models excluded: Any cloud-connected Chinese AI service. DeepSeek, Qwen, and similar open-weight releases are technically capable but fall outside this policy's boundary.
Benchmark Snapshot
| Test | llama3.1:8b | phi4:14b |
|---|---|---|
| Exact instruction following | ✅ Pass | ✅ Pass |
| Voice transcript cleanup | ⚠️ Functional (formatting issue) | ✅ Pass |
| YAML generation | ❌ Fail | ❌ Fail |
| Architecture summary | ⚠️ Functional (semantic flaw) | ⚠️ Partial fail |
| Mermaid diagram generation | ❌ Fail | ⚠️ Partial fail |
Conclusion: Neither model is governance-grade for autonomous structured artifact generation under simple prompting. A strict prompt benchmark pass is recommended before declaring models inadequate. gemma3:12b, gemma3:27b, mistral-small3.1:24b, and codestral:22b are not yet benchmarked.
The Operating Model
The Mac Studio M4 Max is not a replacement for frontier AI. It is the zero-cost inference tier at the bottom of a deliberate token routing hierarchy.
The Council of AIs workflow pairs Claude and ChatGPT as blind-iterating collaborators in Notion, uses Replit for execution against locked specs, and GitHub as the versioned source of truth. Every part of that workflow has a cost. Local models handle the parts where cost should be zero: bulk processing, first-pass summarization, RAG queries against personal knowledge, and repetitive validation checks.
Cheap tokens think broadly. Expensive tokens act precisely.
This workbench is what makes that economics work at a $40/month total subscription budget.
Next Phase
Open WebUI as Edge PWA — resolve Edge localhost authentication ✓ Completed 2026-05-13
RAG Roadmap — The baseline is complete. The next phase is the retrieval layer.
nomic-embed-text + Qdrant + Open WebUI Knowledge will turn this workbench into a local AI that knows the author's actual work — not just its training data. Personal writing, GitHub repos, conversation archives, and workspace exports will be chunked, embedded, and indexed locally. The model will retrieve before it generates.
This is how a generic local AI becomes a genuine second brain.
Build Your Own
Everything you need to replicate this stack.
Core Toolchain
AI Runtimes
Storage Architecture
/Volumes/OKH-Local/07_Local_LLMs/ollama/models
/Volumes/OKH-Local/07_Local_LLMs/lm-studio/models
/Volumes/OKH-Local/07_Local_LLMs/huggingface-cache