Skip to main content
✅ Baseline complete · 2026-05-12 · Updated 2026-05-13

Mac Studio Local AI Workbench

Production-grade personal AI workstation — local-first, governed, documented, and restorable. Built in 10 days on Apple Silicon.

M4 Max 36GB unified memory 512GB internal SSD 1TB OKH-Local NVMe macOS Sequoia 10 local models 11 MCP servers 1 autonomous agent
BASELINE COMPLETE — Build finished, normalized, benchmarked, and archived on 2026-05-12. System is in a known-good, restorable state. Archive: /Volumes/OKH-Local/05_Research_Vault/mac-studio-setup_DONE_2026-05-12.tar.gz
2026-05-13 post-baseline: Ollama plist corrected, Open WebUI updated to latest, Edge PWA resolved. Project page published.
2026-05-27/28: OpenClaw installed, agent named Larry, MEMORY.md seeded, SearXNG web search live.

What This Is Worth

💡

In May 2026, a Facebook Marketplace listing offered an M1 Mac Studio 32GB with "PageSpace AI pre-installed" for $7,500. PageSpace is open-source and free. An M1 Mac Studio 32GB sells for $600–$900 on the secondary market.

This workbench runs on a Mac Studio M4 Max — three to four generations newer, with substantially more memory and compute. It runs 10 local models across three inference runtimes. It connects to 11 live MCP servers. It has a documented operating model, governance policies, restore scripts, verified benchmarks, and an offsite backup strategy.

$2,250
Hardware cost
$0
Software cost — all open-source, free, or covered by existing subscriptions

The $7,500 listing was selling the box. This build is the operating model.

Status Board

Hardware unboxed — KVM, Satechi stand, SN7100 NVMe, Logitech Brio MX
macOS Sequoia updated, Apple Watch Auto Unlock configured
Homebrew 5.1.8 installed, PATH configured
Git 2.54.0, Python 3.14.4, Node.js 26.0.0, GitHub CLI 2.92.0
GitHub SSH key (ed25519) configured and verified
Ollama 0.23.1 with MLX acceleration, Flash Attention, q8_0 KV cache
Model storage normalized to /Volumes/OKH-Local/07_Local_LLMs/ollama/models
LM Studio 0.4.12 with MLX backend active
Open WebUI running in Docker on localhost:3000
VSCode 1.119.0 + 43 extensions + Continue.dev 1.2.22
Claude Desktop + 11 MCP servers (Notion, GitHub, PageSpace, Mermaid, Google suite, M365, more)
ChatGPT Desktop, Codex, and ChatGPT Atlas installed
Microsoft Office suite, OneDrive, Edge, Notion, Perplexity installed
mlx-lm 0.31.3 installed — 139 tok/s on Phi-4 mini via direct Apple Silicon inference
HuggingFace cache normalized to /Volumes/OKH-Local/07_Local_LLMs/huggingface-cache
Storage baseline normalized, verified, benchmarked, and archived
Git identity configured, VSCode Settings Sync active across machines
28 GitHub repos mapped, SSH remotes updated
OneDrive sync active, OKH alias configured in .zprofile
Ollama plist path corrected to canonical location · 2026-05-13
Open WebUI updated to latest — container recreated, v0.9.5 resolved · 2026-05-13
Edge PWA localhost auth resolved — Open WebUI pinned as standalone Edge app · 2026-05-13
OpenClaw 2026.5.26 installed — local AI agent, LaunchAgent configured, gateway on port 18789 · 2026-05-27
Larry persona configured — gemma3:27b primary, 28 skills eligible, SearXNG web search active · 2026-05-28
MEMORY.md seeded — agent context-aware: knows Jamie, the Council of AIs stack, and the BFS firewall · 2026-05-28

Timeline

May 3
2026

Hardware planning and research

Did Researched local AI stack options — Ollama vs LM Studio, MLX vs llama.cpp performance, HuggingFace tier requirements, model selection policy (Western-lab only).
Worked Decision: Ollama + LM Studio + Open WebUI stack. HuggingFace free tier confirmed sufficient.
Broke N/A — research phase only.
May 4
2026

Hardware unboxed, OS baseline

Did Unboxed Mac Studio M4 Max, connected to KVM with Dell 34″ widescreen, installed SN7100 into Satechi stand, ran macOS Sequoia updates, configured Apple Watch Auto Unlock, installed Claude Desktop and ChatGPT Desktop.
Worked Apple Watch Auto Unlock working immediately. Claude and ChatGPT signed in — used as setup runbook going forward.
Broke Nothing — clean start.
May 5
2026

Development foundation

Did Installed Homebrew 5.1.8, Git 2.54.0, Python 3.14.4. Configured PATH in .zprofile. Generated GitHub SSH key (ed25519), added as "Mac Studio M4 Max", verified with Hi OKHP3!.
Worked Full development foundation clean. Homebrew Git correctly overrides Apple Git. SSH auth confirmed.
Broke Duplicate eval lines in .zprofile from paste issue — cleaned to single line.
May 5
2026

Ollama install and first models

Did Installed Ollama 0.23.1 via Homebrew (MLX + mlx-c as auto-dependencies). Renamed external drive to OKH-Local. Configured OLLAMA_MODELS, OLLAMA_FLASH_ATTENTION, OLLAMA_KV_CACHE_TYPE in .zprofile and Homebrew plist. Pulled phi4:14b, gemma3:12b, gemma3:27b, codestral:22b.
Worked Ollama running as Homebrew background service with MLX acceleration. First inference passed.
Broke OLLAMA_MODELS env var not picked up by Homebrew service — required editing homebrew.mxcl.ollama.plist directly. Models initially landed in ~/.ollama — corrected with mv and manifest cleanup.
May 6
2026

Model collection and LM Studio

Did Pulled mistral-small3.1:24b and llama3.1:8b. Attempted llama3.3:70b — system froze, required restart, model removed (42GB exceeds 36GB ceiling). Installed LM Studio 0.4.12, confirmed MLX v1.6.0 backend, downloaded Gemma4 E2B, E4B, 26B A4B. Set model loading guardrails to Balanced. Configured LM Studio server on port 1234.
Worked LM Studio MLX backend auto-detected and active. Gemma4 E4B multimodal model running.
Broke llama3.3:70b caused system freeze — 42GB exceeds 36GB ceiling with apps running. Gemma4 31B rejected by guardrails (87GB requirement — correct behavior).
May 7
2026

Open WebUI, VSCode, and MCP foundation

Did Installed Docker Desktop 4.72.0 (required Rosetta update). Deployed Open WebUI container on localhost:3000. Installed VSCode 1.119.0 + Continue.dev v1.2.22. Configured Continue.dev with all local Ollama models (Codestral as primary autocomplete). Installed Node.js 26.0.0 for MCP server support.
Worked Open WebUI auto-detecting Ollama models. Continue.dev configured as local Copilot replacement.
Broke Docker Desktop failed initial start — Rosetta update pending. Resolved after update.
May 8
2026

MCP servers and app ecosystem

Did Configured Claude Desktop claude_desktop_config.json with Notion and GitHub MCP. Used full paths to fix PATH issues for background service. Pulled GitHub MCP via Docker. Verified 11 MCP servers connected in Claude Desktop: Notion, GitHub, PageSpace, Mermaid Chart (enterprise), Google suite (Calendar/Gmail/Drive), Microsoft 365, and additional business tool integrations.
Worked All 11 MCP servers live in Claude Desktop.
Broke Notion and GitHub tokens required rotation during setup — standard credential hygiene after any setup session. GitHub MCP npm package deprecated — switched to Docker-based server. Full path required for both MCP commands (Claude Desktop doesn't inherit shell PATH).
May 9
2026

App ecosystem and Edge PWAs

Did Installed Microsoft Edge, Office suite, OneDrive, Notion Desktop, Perplexity (native), GitHub Desktop. Inventoried all Edge PWAs. Confirmed brand PWAs (AskJamie™, OverKill Hill P³™, Glee-fully™) present. Installed PageSpace MCP.
Worked Full app ecosystem clean. 26 applications in /Applications/. 30+ Edge PWAs. All brand properties pinned as apps.
Broke Several Office apps installed twice (Homebrew + DMG). Cleaned with sudo rm -rf.
May 10
2026

VSCode extension audit across machines

Did Conducted full cross-machine extension audit. Identified Windows-only and redundant extensions. Built MVP lists. Installed 43 extensions on Mac Studio via bulk install command. VSCode Settings Sync enabled — pushing Solarized Light theme across machines.
Worked All 43 extensions installed clean.
Broke Continue.dev flagged Photos Library permissions error — harmless, just noisy.
May 11
2026

Git workspace, OneDrive sync, and repos

Did Configured Git global identity (OKHP3 / noreply email). Set up OneDrive sync. Created okhp3 alias in .zprofile. Updated VSCode workspace file to OKH root. Updated all 28 repo SSH remotes from HTTPS to SSH via batch script. Installed mlx-lm 0.31.3 — verified at 139 tok/s on Phi-4 mini.
Worked Workspace file correct and opening all 28 repos. SSH remotes updated in one pass. mlx-lm functional — 4× faster than Ollama for MLX-compatible models.
Broke Git pull inside OneDrive timing out — OneDrive holding .git/index with file locks during initial sync. Decision: skip for now (low risk — commits come from Replit/Claude Code/Codex/Mac).
May 12
2026

Normalization, benchmarking, and baseline archive

Did Normalized Ollama model storage. Fixed LM Studio nested models/models path from UI misconfiguration. Created compatibility symlinks. Externalized HuggingFace cache, authenticated as okhp3. Created model inventories, benchmark workspace, restore script, and verification script. Benchmarked llama3.1:8b and phi4:14b across 5 tests. Created known-good baseline documentation and archived closure package.
Worked Full storage normalization verified. All 6 Ollama models confirmed. Benchmark smoke tests completed. Archive integrity check passed.
Broke Ollama stale path after move — required service stop/restart with explicit env vars. HuggingFace first token failed (invalid) — second succeeded.
May 13
2026

Post-baseline hardening and publication

Did Corrected Ollama plist path — the Homebrew launchd service had been pointing at the pre-normalization path via symlink rather than the canonical /Volumes/OKH-Local/07_Local_LLMs/ollama/models. Edited plist directly, restarted service, confirmed all 6 models visible. Updated Open WebUI — stopped and removed old container, pulled latest image, recreated container with original run parameters. The v0.9.5 update banner resolved. Resolved Edge PWA localhost authentication — Open WebUI pinned as standalone Edge app. Confirmed access at localhost:3000 without login prompt. Project page published to overkillhill.com.
Worked All three punch list blockers resolved. Ollama service confirmed running on canonical path. Open WebUI loading cleanly as Edge PWA. Project page live and indexed.
Broke Nothing during this session. Plist correction and container recreation both executed cleanly.
May 27
2026

OpenClaw installation

Did Installed OpenClaw 2026.5.26 via npm. Ran setup and onboard wizards. Configured Ollama local-only mode, LaunchAgent for auto-start at login, session-memory and command-logger hooks. Selected 11 skills during onboard. Disabled iMessage channel (imsg binary not available via npm or Homebrew — requires source build). Resolved context overflow — phi4:14b's 16k window cannot accommodate OpenClaw's workspace system prompt; switched primary model to gemma3:27b (131k window, 9% utilization). Cleared bloated session files from iMessage crash loop. Agent confirmed responding.
Worked OpenClaw 2026.5.26 running as LaunchAgent. gemma3:27b responding cleanly. Control UI accessible at localhost:18789.
Broke phi4:14b context overflow on every message (system prompt overhead exceeds 16k window). iMessage channel crash-looping due to missing imsg binary — disabled.
May 28
2026

Larry comes online

Did Named the agent Larry — modeled after Larry the Lobster from SpongeBob SquarePants. Rewrote IDENTITY.md and SOUL.md with full character context and voice rules. Seeded MEMORY.md from a multi-AI identity consolidation (Claude + ChatGPT + Copilot + Perplexity + Gemini + Notion). Ran openclaw doctor — corrected context windows for all 7 models, raised bootstrap limit to 20,000 characters. Fixed clawhub and mcporter binary paths. Deployed SearXNG in Docker on port 8888, configured as Larry's web search provider. Set tools.profile to full. Confirmed web_search returning results. Eligible skill count: 28.
Worked Larry introduced himself correctly on cold start from workspace files alone. Web search confirmed functional via SearXNG JSON API.
Broke SearXNG returned stale version data on first search query — search engine quality tuning pending (open localhost:8888/preferences to configure preferred engines).

Meet Larry

Larry is an OpenClaw autonomous agent running on the Mac Studio M4 Max. He doesn't wait for a prompt. He runs in the background, starts at login, and handles the zero-cost tier of the Council of AIs workflow: background tasks, Apple Notes, Reminders, file operations, RAG queries, scheduled jobs, and first-pass summarization.

The baseline stack — Ollama, LM Studio, Open WebUI, mlx-lm — is infrastructure. Open WebUI is a chat interface. You open it, you type, you get a response. Larry is something different.

The original Larry the Lobster — SpongeBob's friend from Bikini Bottom — was the lifeguard at Goo Lagoon. He was the only thing standing between the residents and a watery grave. This Larry has a similar mandate: protect the workflow from token waste and busywork.

He introduced himself on first boot with: "Larry's got it."

Open WebUI Larry (OpenClaw)
What it is Chat interface Autonomous agent
How it works You send a message Works in background
Model Any local Ollama model gemma3:27b (131k context)
Skills 28 eligible (Notes, GitHub, Notion, web search, more)
Starts at Manual Login (LaunchAgent)
Best for Interactive queries Background tasks, automation

openclaw.ai ↗    View artifact repo on GitHub ↗

Model Inventory

Model Format Size Runtime Use
phi4:14b GGUF 9.1 GB Ollama Fast reasoning, instruction following. Daily driver.
gemma3:12b GGUF 8.1 GB Ollama General purpose, mid-tier
gemma3:27b GGUF 17 GB Ollama Flagship general, heavy reasoning. Daily driver (quality).
codestral:22b GGUF 12 GB Ollama Code generation, Continue.dev autocomplete
mistral-small3.1:24b GGUF 15 GB Ollama General purpose, fast
llama3.1:8b GGUF 4.9 GB Ollama Lightweight utility, bulk tasks
gemma-4-E4B-it GGUF Q4_K_M 6.33 GB LM Studio Multimodal (image input), fast MoE
gemma-4-E2B-it GGUF ~5 GB LM Studio Ultra-light, fastest responses
gemma-4-26B-A4B-it GGUF ~16 GB LM Studio MoE, 4B active params, larger knowledge
Phi-4-mini-instruct-4bit MLX 4-bit 2.18 GB mlx-lm Direct MLX inference — 139 tok/s on Apple Silicon

Note: llama3.3:70b (42 GB) was attempted and removed — exceeds the 36 GB ceiling with apps running.

Model Selection Policy

🛡️

Western-lab models only. Every model in this workbench comes from a Western-lab open-weight release: Meta (Llama), Google (Gemma), Mistral AI (Mistral, Codestral), and Microsoft (Phi).

This is a deliberate policy, not a default. Chinese cloud AI services operate under PRC data law. Open-weight models from any lab are architecturally isolated once downloaded — weights are inert files with no network access — but maintaining a clean Western-only boundary is simpler to reason about and easier to defend.

Models excluded: Any cloud-connected Chinese AI service. DeepSeek, Qwen, and similar open-weight releases are technically capable but fall outside this policy's boundary.

Benchmark Snapshot

Test llama3.1:8b phi4:14b
Exact instruction following ✅ Pass ✅ Pass
Voice transcript cleanup ⚠️ Functional (formatting issue) ✅ Pass
YAML generation ❌ Fail ❌ Fail
Architecture summary ⚠️ Functional (semantic flaw) ⚠️ Partial fail
Mermaid diagram generation ❌ Fail ⚠️ Partial fail

Conclusion: Neither model is governance-grade for autonomous structured artifact generation under simple prompting. A strict prompt benchmark pass is recommended before declaring models inadequate. gemma3:12b, gemma3:27b, mistral-small3.1:24b, and codestral:22b are not yet benchmarked.

The Operating Model

⚙️

The Mac Studio M4 Max is not a replacement for frontier AI. It is the zero-cost inference tier at the bottom of a deliberate token routing hierarchy.

The Council of AIs workflow pairs Claude and ChatGPT as blind-iterating collaborators in Notion, uses Replit for execution against locked specs, and GitHub as the versioned source of truth. Every part of that workflow has a cost. Local models handle the parts where cost should be zero: bulk processing, first-pass summarization, RAG queries against personal knowledge, and repetitive validation checks.

Cheap tokens think broadly. Expensive tokens act precisely.

This workbench is what makes that economics work at a $40/month total subscription budget.

Next Phase

Execute ChatGPT content update — populate all GitHub repository stub docs
Backup baseline archive to OneDrive and Time Machine
Run strict prompt benchmark — all 6 Ollama models, same 5 tests with tighter prompts
Benchmark remaining 4 models: gemma3:12b, gemma3:27b, mistral-small3.1:24b, codestral:22b
Deploy RAG stack: nomic-embed-text + Qdrant + Open WebUI Knowledge corpus
ASUS VSCode extension cleanup (uninstall commands ready, not yet executed)

Open WebUI as Edge PWA — resolve Edge localhost authentication ✓ Completed 2026-05-13

🧠

RAG Roadmap — The baseline is complete. The next phase is the retrieval layer.

nomic-embed-text + Qdrant + Open WebUI Knowledge will turn this workbench into a local AI that knows the author's actual work — not just its training data. Personal writing, GitHub repos, conversation archives, and workspace exports will be chunked, embedded, and indexed locally. The model will retrieve before it generates.

This is how a generic local AI becomes a genuine second brain.

Core Toolchain

Homebrew 5.1.8
Git (Homebrew) 2.54.0
Python 3.14.4
Node.js / npm 26.0.0 / 11.12.1
GitHub CLI 2.92.0
Docker Desktop 4.72.0
mlx-lm 0.31.3

AI Runtimes

Ollama 0.23.1 · MLX + Flash Attention + q8_0 KV cache
LM Studio 0.4.12 · MLX v1.6.0 backend · server port 1234
Open WebUI Docker · localhost:3000 · updated 2026-05-13
mlx-lm 0.31.3 · 139 tok/s on Phi-4 mini

Storage Architecture

Ollama models /Volumes/OKH-Local/07_Local_LLMs/ollama/models
LM Studio models /Volumes/OKH-Local/07_Local_LLMs/lm-studio/models
HuggingFace cache /Volumes/OKH-Local/07_Local_LLMs/huggingface-cache

Key Decisions

Default runtime Ollama for API/headless, LM Studio for GUI/MLX testing, mlx-lm for max performance
Daily driver models phi4:14b (speed) · gemma3:27b (quality) · codestral:22b (code)
Model policy Western-lab only — Meta, Google, Mistral, Microsoft. No Chinese cloud services.
Token governance Local models → Notion AI → Claude/ChatGPT base → Perplexity/Copilot → Replit (spec must be complete first)
Git workflow Primary commits from Replit, Claude Code, Codex, Mac Studio. Secondary machine is read-mostly via OneDrive sync.
Directory conventions All model weights on OKH-Local external NVMe. Repos in OneDrive OKH folder. Workspace file at OKH root.