✅ Baseline complete · 2026-05-12 · Updated 2026-05-30

Mac Studio Local AI Workbench

Name: Mac Studio Local AI Workbench
Author: OverKill Hill P³™

Production-grade personal AI workstation — local-first, governed, documented, and restorable. Built in 10 days on Apple Silicon.

M4 Max 36GB unified memory 512GB internal SSD 1TB OKH-Local NVMe macOS Sequoia 10 local models 11 MCP servers 1 autonomous agent

✅

BASELINE COMPLETE — Build finished, normalized, benchmarked, and archived on 2026-05-12. System is in a known-good, restorable state. Archive: /Volumes/OKH-Local/05_Research_Vault/mac-studio-setup_DONE_2026-05-12.tar.gz
2026-05-13 post-baseline: Ollama plist corrected, Open WebUI updated to latest, Edge PWA resolved. Project page published.
2026-05-27/28: OpenClaw installed, agent named Larry, MEMORY.md seeded, SearXNG web search live.
2026-05-27 through 2026-05-30: OpenClaw/Larry online, RAG stack deployed, all smoke tests passed, strict benchmark complete, architecture diagrams published. Build declared complete.

Value Framing

What This Is Worth

💡

In May 2026, a Facebook Marketplace listing offered an M1 Mac Studio 32GB with "PageSpace AI pre-installed" for $7,500. PageSpace is open-source and free. An M1 Mac Studio 32GB sells for $600–$900 on the secondary market.

This workbench runs on a Mac Studio M4 Max — three to four generations newer, with substantially more memory and compute. It runs 10 local models across three inference runtimes. It connects to 11 live MCP servers. It has a documented operating model, governance policies, restore scripts, verified benchmarks, and an offsite backup strategy.

$2,250

Hardware cost

Software cost — all open-source, free, or covered by existing subscriptions

The $7,500 listing was selling the box. This build is the operating model.

Completion Checklist

Status Board

✓ Hardware unboxed — KVM, Satechi stand, SN7100 NVMe, Logitech Brio MX

✓ macOS Sequoia updated, Apple Watch Auto Unlock configured

✓ Homebrew 5.1.8 installed, PATH configured

✓ Git 2.54.0, Python 3.14.4, Node.js 26.0.0, GitHub CLI 2.92.0

✓ GitHub SSH key (ed25519) configured and verified

✓ Ollama 0.23.1 with MLX acceleration, Flash Attention, q8_0 KV cache

✓ Model storage normalized to /Volumes/OKH-Local/07_Local_LLMs/ollama/models

✓ LM Studio 0.4.12 with MLX backend active

✓ Open WebUI running in Docker on localhost:3000

✓ VSCode 1.119.0 + 43 extensions + Continue.dev 1.2.22

✓ Claude Desktop + 11 MCP servers (Notion, GitHub, PageSpace, Mermaid, Google suite, M365, more)

✓ ChatGPT Desktop, Codex, and ChatGPT Atlas installed

✓ Microsoft Office suite, OneDrive, Edge, Notion, Perplexity installed

✓ mlx-lm 0.31.3 installed — 139 tok/s on Phi-4 mini via direct Apple Silicon inference

✓ HuggingFace cache normalized to /Volumes/OKH-Local/07_Local_LLMs/huggingface-cache

✓ Storage baseline normalized, verified, benchmarked, and archived

✓ Git identity configured, VSCode Settings Sync active across machines

✓ 28 GitHub repos mapped, SSH remotes updated

✓ OneDrive sync active, OKH alias configured in .zprofile

✓ Ollama plist path corrected to canonical location · 2026-05-13

✓ Open WebUI updated to latest — container recreated, v0.9.5 resolved · 2026-05-13

✓ Edge PWA localhost auth resolved — Open WebUI pinned as standalone Edge app · 2026-05-13

✓ OpenClaw 2026.5.26 installed — local AI agent, LaunchAgent configured, gateway on port 18789 · 2026-05-27

✓ Larry persona configured — gemma3:27b primary, 28 skills eligible, SearXNG web search active · 2026-05-28

✓ MEMORY.md seeded — agent context-aware: knows Jamie, the Council of AIs stack, and the BFS firewall · 2026-05-28

✓ RAG stack deployed — nomic-embed-text + Qdrant + Open WebUI · 2026-05-29

✓ Strict prompt benchmark complete — all 6 models, zero outright failures · 2026-05-30

✓ Architecture diagrams published — flowchart, mind map, and architecture-beta · 2026-05-30

Build Log

Timeline

May 3
2026

Hardware planning and research

Did Researched local AI stack options — Ollama vs LM Studio, MLX vs llama.cpp performance, HuggingFace tier requirements, model selection policy (Western-lab only).

Worked Decision: Ollama + LM Studio + Open WebUI stack. HuggingFace free tier confirmed sufficient.

Broke N/A — research phase only.

May 4
2026

Hardware unboxed, OS baseline

Did Unboxed Mac Studio M4 Max, connected to KVM with Dell 34″ widescreen, installed SN7100 into Satechi stand, ran macOS Sequoia updates, configured Apple Watch Auto Unlock, installed Claude Desktop and ChatGPT Desktop.

Worked Apple Watch Auto Unlock working immediately. Claude and ChatGPT signed in — used as setup runbook going forward.

Broke Nothing — clean start.

May 5
2026

Development foundation

Did Installed Homebrew 5.1.8, Git 2.54.0, Python 3.14.4. Configured PATH in .zprofile. Generated GitHub SSH key (ed25519), added as "Mac Studio M4 Max", verified with Hi OKHP3!.

Worked Full development foundation clean. Homebrew Git correctly overrides Apple Git. SSH auth confirmed.

Broke Duplicate eval lines in .zprofile from paste issue — cleaned to single line.

May 5
2026

Ollama install and first models

Did Installed Ollama 0.23.1 via Homebrew (MLX + mlx-c as auto-dependencies). Renamed external drive to OKH-Local. Configured OLLAMA_MODELS, OLLAMA_FLASH_ATTENTION, OLLAMA_KV_CACHE_TYPE in .zprofile and Homebrew plist. Pulled phi4:14b, gemma3:12b, gemma3:27b, codestral:22b.

Worked Ollama running as Homebrew background service with MLX acceleration. First inference passed.

Broke OLLAMA_MODELS env var not picked up by Homebrew service — required editing homebrew.mxcl.ollama.plist directly. Models initially landed in ~/.ollama — corrected with mv and manifest cleanup.

May 6
2026

Model collection and LM Studio

Did Pulled mistral-small3.1:24b and llama3.1:8b. Attempted llama3.3:70b — system froze, required restart, model removed (42GB exceeds 36GB ceiling). Installed LM Studio 0.4.12, confirmed MLX v1.6.0 backend, downloaded Gemma4 E2B, E4B, 26B A4B. Set model loading guardrails to Balanced. Configured LM Studio server on port 1234.

Worked LM Studio MLX backend auto-detected and active. Gemma4 E4B multimodal model running.

Broke llama3.3:70b caused system freeze — 42GB exceeds 36GB ceiling with apps running. Gemma4 31B rejected by guardrails (87GB requirement — correct behavior).

May 7
2026

Open WebUI, VSCode, and MCP foundation

Did Installed Docker Desktop 4.72.0 (required Rosetta update). Deployed Open WebUI container on localhost:3000. Installed VSCode 1.119.0 + Continue.dev v1.2.22. Configured Continue.dev with all local Ollama models (Codestral as primary autocomplete). Installed Node.js 26.0.0 for MCP server support.

Worked Open WebUI auto-detecting Ollama models. Continue.dev configured as local Copilot replacement.

Broke Docker Desktop failed initial start — Rosetta update pending. Resolved after update.

May 8
2026

MCP servers and app ecosystem

Did Configured Claude Desktop claude_desktop_config.json with Notion and GitHub MCP. Used full paths to fix PATH issues for background service. Pulled GitHub MCP via Docker. Verified 11 MCP servers connected in Claude Desktop: Notion, GitHub, PageSpace, Mermaid Chart (enterprise), Google suite (Calendar/Gmail/Drive), Microsoft 365, and additional business tool integrations.

Worked All 11 MCP servers live in Claude Desktop.

Broke Notion and GitHub tokens required rotation during setup — standard credential hygiene after any setup session. GitHub MCP npm package deprecated — switched to Docker-based server. Full path required for both MCP commands (Claude Desktop doesn't inherit shell PATH).

May 9
2026

App ecosystem and Edge PWAs

Did Installed Microsoft Edge, Office suite, OneDrive, Notion Desktop, Perplexity (native), GitHub Desktop. Inventoried all Edge PWAs. Confirmed brand PWAs (AskJamie™, OverKill Hill P³™, Glee-fully™) present. Installed PageSpace MCP.

Worked Full app ecosystem clean. 26 applications in /Applications/. 30+ Edge PWAs. All brand properties pinned as apps.

Broke Several Office apps installed twice (Homebrew + DMG). Cleaned with sudo rm -rf.

May 10
2026

VSCode extension audit across machines

Did Conducted full cross-machine extension audit. Identified Windows-only and redundant extensions. Built MVP lists. Installed 43 extensions on Mac Studio via bulk install command. VSCode Settings Sync enabled — pushing Solarized Light theme across machines.

Worked All 43 extensions installed clean.

Broke Continue.dev flagged Photos Library permissions error — harmless, just noisy.

May 11
2026

Git workspace, OneDrive sync, and repos

Did Configured Git global identity (OKHP3 / noreply email). Set up OneDrive sync. Created okhp3 alias in .zprofile. Updated VSCode workspace file to OKH root. Updated all 28 repo SSH remotes from HTTPS to SSH via batch script. Installed mlx-lm 0.31.3 — verified at 139 tok/s on Phi-4 mini.

Worked Workspace file correct and opening all 28 repos. SSH remotes updated in one pass. mlx-lm functional — 4× faster than Ollama for MLX-compatible models.

Broke Git pull inside OneDrive timing out — OneDrive holding .git/index with file locks during initial sync. Decision: skip for now (low risk — commits come from Replit/Claude Code/Codex/Mac).

May 12
2026

Normalization, benchmarking, and baseline archive

Did Normalized Ollama model storage. Fixed LM Studio nested models/models path from UI misconfiguration. Created compatibility symlinks. Externalized HuggingFace cache, authenticated as okhp3. Created model inventories, benchmark workspace, restore script, and verification script. Benchmarked llama3.1:8b and phi4:14b across 5 tests. Created known-good baseline documentation and archived closure package.

Worked Full storage normalization verified. All 6 Ollama models confirmed. Benchmark smoke tests completed. Archive integrity check passed.

Broke Ollama stale path after move — required service stop/restart with explicit env vars. HuggingFace first token failed (invalid) — second succeeded.

May 13
2026

Post-baseline hardening and publication

Did Corrected Ollama plist path — the Homebrew launchd service had been pointing at the pre-normalization path via symlink rather than the canonical /Volumes/OKH-Local/07_Local_LLMs/ollama/models. Edited plist directly, restarted service, confirmed all 6 models visible. Updated Open WebUI — stopped and removed old container, pulled latest image, recreated container with original run parameters. The v0.9.5 update banner resolved. Resolved Edge PWA localhost authentication — Open WebUI pinned as standalone Edge app. Confirmed access at localhost:3000 without login prompt. Project page published to overkillhill.com.

Worked All three punch list blockers resolved. Ollama service confirmed running on canonical path. Open WebUI loading cleanly as Edge PWA. Project page live and indexed.

Broke Nothing during this session. Plist correction and container recreation both executed cleanly.

May 27
2026

OpenClaw installation

Did Installed OpenClaw 2026.5.26 via npm. Ran setup and onboard wizards. Configured Ollama local-only mode, LaunchAgent for auto-start at login, session-memory and command-logger hooks. Selected 11 skills during onboard. Disabled iMessage channel (imsg binary not available via npm or Homebrew — requires source build). Resolved context overflow — phi4:14b's 16k window cannot accommodate OpenClaw's workspace system prompt; switched primary model to gemma3:27b (131k window, 9% utilization). Cleared bloated session files from iMessage crash loop. Agent confirmed responding.

Worked OpenClaw 2026.5.26 running as LaunchAgent. gemma3:27b responding cleanly. Control UI accessible at localhost:18789.

Broke phi4:14b context overflow on every message (system prompt overhead exceeds 16k window). iMessage channel crash-looping due to missing imsg binary — disabled.

May 28
2026

Larry comes online

Did Named the agent Larry — modeled after Larry the Lobster from SpongeBob SquarePants. Rewrote IDENTITY.md and SOUL.md with full character context and voice rules. Seeded MEMORY.md from a multi-AI identity consolidation (Claude + ChatGPT + Copilot + Perplexity + Gemini + Notion). Ran openclaw doctor — corrected context windows for all 7 models, raised bootstrap limit to 20,000 characters. Fixed clawhub and mcporter binary paths. Deployed SearXNG in Docker on port 8888, configured as Larry's web search provider. Set tools.profile to full. Confirmed web_search returning results. Eligible skill count: 28.

Worked Larry introduced himself correctly on cold start from workspace files alone. Web search confirmed functional via SearXNG JSON API.

Broke SearXNG returned stale version data on first search query — search engine quality tuning pending (open localhost:8888/preferences to configure preferred engines).

May 29
2026

RAG stack + LM Studio consolidation

Did Pulled nomic-embed-text embedding model. Deployed Qdrant 1.18.1 in Docker on port 6333. Recreated Open WebUI container with VECTOR_DB=qdrant environment variable. Consolidated LM Studio model storage — 22 models across 3 directories normalized under models/. Configured energy saver — Mac stays awake permanently.

Worked Qdrant returning {"status":"ok"}. Open WebUI wired to Qdrant. All smoke tests passed.

May 30
2026

Benchmarks + architecture diagrams

Did Ran strict prompt benchmark against all 6 Ollama models. Zero outright failures — significant improvement over loose-prompt May 12 baseline. gemma3:12b and gemma3:27b achieved 5/5 clean sweep. Built three architecture diagrams (flowchart, mind map, architecture-beta) covering the full stack including the complete Council of AIs.

Worked All 6 models functional under strict prompting. Diagrams validated in Mermaid Chart Enterprise.

Autonomous Agent Tier

Meet Larry

Larry is an OpenClaw autonomous agent running on the Mac Studio M4 Max. He doesn't wait for a prompt. He runs in the background, starts at login, and handles the zero-cost tier of the Council of AIs workflow: background tasks, Apple Notes, Reminders, file operations, RAG queries, scheduled jobs, and first-pass summarization.

The baseline stack — Ollama, LM Studio, Open WebUI, mlx-lm — is infrastructure. Open WebUI is a chat interface. You open it, you type, you get a response. Larry is something different.

The original Larry the Lobster — SpongeBob's friend from Bikini Bottom — was the lifeguard at Goo Lagoon. He was the only thing standing between the residents and a watery grave. This Larry has a similar mandate: protect the workflow from token waste and busywork.

He introduced himself on first boot with: "Larry's got it."

	Open WebUI	Larry (OpenClaw)
What it is	Chat interface	Autonomous agent
How it works	You send a message	Works in background
Model	Any local Ollama model	gemma3:27b (131k context)
Skills	—	28 eligible (Notes, GitHub, Notion, web search, more)
Starts at	Manual	Login (LaunchAgent)
Best for	Interactive queries	Background tasks, automation

openclaw.ai ↗ View artifact repo on GitHub ↗

Local Model Stack

Model Inventory

Model	Format	Size	Runtime	Use
`phi4:14b`	GGUF	9.1 GB	Ollama	Fast reasoning, instruction following. Daily driver.
`gemma3:12b`	GGUF	8.1 GB	Ollama	General purpose, mid-tier
`gemma3:27b`	GGUF	17 GB	Ollama	Flagship general, heavy reasoning. Daily driver (quality).
`codestral:22b`	GGUF	12 GB	Ollama	Code generation, Continue.dev autocomplete
`mistral-small3.1:24b`	GGUF	15 GB	Ollama	General purpose, fast
`llama3.1:8b`	GGUF	4.9 GB	Ollama	Lightweight utility, bulk tasks
`gemma-4-E4B-it`	GGUF Q4_K_M	6.33 GB	LM Studio	Multimodal (image input), fast MoE
`gemma-4-E2B-it`	GGUF	~5 GB	LM Studio	Ultra-light, fastest responses
`gemma-4-26B-A4B-it`	GGUF	~16 GB	LM Studio	MoE, 4B active params, larger knowledge
`Phi-4-mini-instruct-4bit`	MLX 4-bit	2.18 GB	mlx-lm	Direct MLX inference — 139 tok/s on Apple Silicon

Note: llama3.3:70b (42 GB) was attempted and removed — exceeds the 36 GB ceiling with apps running.

Governance

Model Selection Policy

🛡️

Western-lab models only. Every model in this workbench comes from a Western-lab open-weight release: Meta (Llama), Google (Gemma), Mistral AI (Mistral, Codestral), and Microsoft (Phi).

This is a deliberate policy, not a default. Chinese cloud AI services operate under PRC data law. Open-weight models from any lab are architecturally isolated once downloaded — weights are inert files with no network access — but maintaining a clean Western-only boundary is simpler to reason about and easier to defend.

Models excluded: Any cloud-connected Chinese AI service. DeepSeek, Qwen, and similar open-weight releases are technically capable but fall outside this policy's boundary.

Smoke Test Results · 2026-05-12

Benchmark Snapshot

Test	llama3.1:8b	phi4:14b
Exact instruction following	✅ Pass	✅ Pass
Voice transcript cleanup	⚠️ Functional (formatting issue)	✅ Pass
YAML generation	❌ Fail	❌ Fail
Architecture summary	⚠️ Functional (semantic flaw)	⚠️ Partial fail
Mermaid diagram generation	❌ Fail	⚠️ Partial fail

Conclusion: Neither model is governance-grade for autonomous structured artifact generation under simple prompting. A strict prompt benchmark pass is recommended before declaring models inadequate. gemma3:12b, gemma3:27b, mistral-small3.1:24b, and codestral:22b are not yet benchmarked.

Architecture · v0.6

How It's Built — Three Views

Three diagrams showing the same workbench from three angles: traffic flow, capability map, and physical deployment. All rendered live from Mermaid source.

Token Routing — How Requests Flow

Every query enters through one of the four input surfaces. Local models handle zero-cost inference; cloud models are reserved for precision tasks where cost is justified.

%%{init: {'theme': 'dark'}}%% flowchart TD A[["🖥️ Input surfaces\nOpen WebUI · Continue.dev · Claude Desktop · Larry"]] --> B{{"Token router"}} B -->|"Zero-cost tier"| C["🦙 Ollama\nphi4:14b · gemma3:27b · codestral:22b\n+ 3 more"] B -->|"Zero-cost tier"| D["🎛️ LM Studio\nGemma 4 E4B / E2B / 26B A4B\n+ MLX models"] B -->|"Zero-cost tier"| E["⚡ mlx-lm\nPhi-4-mini · 139 tok/s"] B -->|"Precision tier"| F["☁️ Cloud frontier\nClaude · ChatGPT · Copilot · Perplexity"] C --> G[["📦 Output\nNotebook · GitHub · Notion · File system"]] D --> G E --> G F --> G style A fill:#2a2320,stroke:#c46a2c,color:#f5f0eb style B fill:#3d2f20,stroke:#c46a2c,color:#f5f0eb style C fill:#1a2a1a,stroke:#4ade80,color:#f5f0eb style D fill:#1a2a1a,stroke:#4ade80,color:#f5f0eb style E fill:#1a2a1a,stroke:#4ade80,color:#f5f0eb style F fill:#1e3575,stroke:#7c93d4,color:#f5f0eb style G fill:#2a2320,stroke:#c46a2c,color:#f5f0eb

Capability Map — What the Workbench Can Do

The full capability surface: local inference, protocol orchestration, retrieval, autonomous operation, and the external integrations that complete the Council of AIs workflow.

%%{init: {'theme': 'dark', 'themeVariables': {'background': '#1f2020', 'mainBkg': '#1f2020'}}}%% mindmap root((Mac Studio M4 Max)) Local Inference Ollama 0.23.1 phi4:14b gemma3:27b codestral:22b LM Studio 0.4.12 Gemma 4 E4B Gemma 4 26B A4B mlx-lm 0.31.3 139 tok/s Orchestration Claude Desktop 11 MCP servers Open WebUI Docker localhost:3000 Continue.dev Codestral autocomplete Autonomous Agent OpenClaw 2026.5.26 Larry persona gemma3:27b primary 28 skills eligible SearXNG web search Retrieval (next phase) Qdrant vector store nomic-embed-text Open WebUI Knowledge Cloud Tier Council of AIs Claude ChatGPT Copilot Perplexity

Physical Architecture — Where Everything Lives

Hardware topology: what runs on internal SSD, what runs on the external NVMe, and how all services bind together at runtime.

%%{init: {"theme": "base", "architecture": {"randomize": false}, "themeVariables": {"primaryColor": "#111827", "primaryTextColor": "#e5e7eb", "primaryBorderColor": "#c46a2c", "lineColor": "#c46a2c", "secondaryColor": "#181f26", "secondaryTextColor": "#e5e7eb", "secondaryBorderColor": "#c46a2c", "tertiaryColor": "#1c3a34", "tertiaryTextColor": "#e5e7eb", "tertiaryBorderColor": "#c46a2c", "textColor": "#e5e7eb", "background": "#111827", "mainBkg": "#111827", "nodeBorder": "#c46a2c", "clusterBkg": "#0d1117", "titleColor": "#c46a2c", "edgeLabelBackground": "#181f26", "fontSize": "20px", "fontFamily": "Trebuchet MS, Calibri, sans-serif"}}}%% %% Theme: OverKill-Architecture %% Theme ID: overkill-hill %% Theme Version: 0.2.0 %% Created with: Mermaid Theme Builder by OverKill Hill P³ %% Tool URL: https://overkillhill.com/projects/mermaid-theme-builder/ %% Tool Version: 0.5.0 %% Theme Created: 2026-06-01T22:25:49.838Z %% Theme Updated: 2026-06-01T22:25:49.838Z %% Brand source: https://overkillhill.com/ architecture-beta group mac_studio["🖥️ Mac Studio M4 Max — 36GB · macOS Sequoia"] group okhlocal["💾 OKH-Local 1TB NVMe"] in mac_studio group docker_runtime["🐳 Docker Runtime"] in mac_studio group ollama_runtime["⚡ Ollama Runtime"] in mac_studio group mcp_layer["🔌 MCP Layer — 11 Servers"] in mac_studio group cloud_paid["💳 Paid Cloud Tier"] group cloud_exec["⚙️ Execution Tier"] group cloud_free["🆓 Free Access Tier"] service soc(server)["M4 Max SoC 36GB Unified Memory MLX Accelerated"] in mac_studio service nvm(disk)["WD Black SN7100 1TB NVMe"] in okhlocal service model_store(database)["Model Weights ~272GB · 29 Models"] in okhlocal service qdrant_store(database)["Qdrant Storage Vector Index"] in okhlocal service ollama(server)["Ollama 0.23.1 Port 11434 · 6 Models"] in ollama_runtime service lmstudio(server)["LM Studio 0.4.12 Port 1234 · 22 Models"] in mac_studio service mlx(server)["mlx-lm 0.31.3 139 tok/s"] in mac_studio service openwebui(internet)["Open WebUI Port 3000"] in docker_runtime service qdrant(database)["Qdrant 1.18.1 Port 6333"] in docker_runtime service searxng(internet)["SearXNG Port 8888"] in docker_runtime service larry(server)["Larry / OpenClaw Port 18789 · 28 Skills"] in mac_studio service claude_desktop(internet)["Claude Desktop 11 MCP Servers"] in mcp_layer service vscode(internet)["VSCode 1.119 Continue.dev"] in mac_studio service claude_pro(internet)["Claude Pro Memory + MCP Hub"] in cloud_paid service chatgpt(internet)["ChatGPT Plus GitHub + Ideation"] in cloud_paid service perplexity(internet)["Perplexity Pro Citation Research"] in cloud_paid service copilot_pro(internet)["Copilot Pro M365 Ecosystem"] in cloud_paid service replit(server)["Replit Core Build Engine"] in cloud_exec service gh_copilot(internet)["GitHub Copilot Code Review"] in cloud_exec service notion(internet)["Notion Business Working Canon"] in cloud_exec service gemini(internet)["Gemini Google"] in cloud_free service grok(internet)["Grok Real-time"] in cloud_free service mistral_free(internet)["Mistral EU Open Weight"] in cloud_free soc:R -- L:nvm ollama:B -- T:model_store lmstudio:B -- T:model_store mlx:B -- T:model_store qdrant:B -- T:qdrant_store openwebui:R -- L:ollama openwebui:B -- T:qdrant larry:R -- L:ollama larry:B -- T:searxng vscode:R -- L:ollama claude_desktop:B -- T:larry claude_pro:R -- L:claude_desktop chatgpt:R -- L:replit notion:R -- L:replit

Strategic Context

The Operating Model

⚙️

The Mac Studio M4 Max is not a replacement for frontier AI. It is the zero-cost inference tier at the bottom of a deliberate token routing hierarchy.

The Council of AIs workflow pairs Claude and ChatGPT as blind-iterating collaborators in Notion, uses Replit for execution against locked specs, and GitHub as the versioned source of truth. Every part of that workflow has a cost. Local models handle the parts where cost should be zero: bulk processing, first-pass summarization, RAG queries against personal knowledge, and repetitive validation checks.

Cheap tokens think broadly. Expensive tokens act precisely.

This workbench is what makes that economics work at a $40/month total subscription budget.

What's Next

Next Phase

Execute ChatGPT content update — populate all GitHub repository stub docs

Backup baseline archive to OneDrive and Time Machine

Run strict prompt benchmark — all 6 Ollama models, same 5 tests with tighter prompts

Benchmark remaining 4 models: gemma3:12b, gemma3:27b, mistral-small3.1:24b, codestral:22b

Deploy RAG stack: nomic-embed-text + Qdrant + Open WebUI Knowledge corpus

🧠

RAG Roadmap — The baseline is complete. The next phase is the retrieval layer.

nomic-embed-text + Qdrant + Open WebUI Knowledge will turn this workbench into a local AI that knows the author's actual work — not just its training data. Personal writing, GitHub repos, conversation archives, and workspace exports will be chunked, embedded, and indexed locally. The model will retrieve before it generates.

This is how a generic local AI becomes a genuine second brain.

Build Your Own

Everything you need to replicate this stack.

🐙 GitHub Repository Docs, scripts, manifests, and BYOLAI guide 🖥️ Apple Mac Studio Compare models — the hardware foundation 🦙 Ollama Local model inference server — start here 💬 Open WebUI Browser-based chat interface over Ollama 🦞 OpenClaw Autonomous local AI agent runtime 🎛️ LM Studio Desktop model workspace and GUI

Quick Facts

Machine Mac Studio M4 Max (2025)

Memory 36GB unified memory

Storage 512GB internal SSD
1TB WD Black SN7100 NVMe
(via Satechi Stand Hub, mounted as /Volumes/OKH-Local)

macOS Sequoia (latest · 2026-05-04)

Build window 2026-05-03 → 2026-05-12
(10 days)

Status ✅ Baseline complete

Primary goals Local LLMs · MCP orchestration · RAG corpus · Council of AIs workflow

Models 10 local models

MCP servers 11 active in Claude Desktop

Max throughput 139 tok/s

North star Production-grade personal AI workstation — functional, governed, documented, restorable

Environment Blueprint

Core Toolchain

Homebrew 5.1.8

Git (Homebrew) 2.54.0

Python 3.14.4

Node.js / npm 26.0.0 / 11.12.1

GitHub CLI 2.92.0

Docker Desktop 4.72.0

mlx-lm 0.31.3

AI Runtimes

Ollama 0.23.1 · MLX + Flash Attention + q8_0 KV cache

LM Studio 0.4.12 · MLX v1.6.0 backend · server port 1234

Open WebUI Docker · localhost:3000 · updated 2026-05-13

mlx-lm 0.31.3 · 139 tok/s on Phi-4 mini

Storage Architecture

Ollama models /Volumes/OKH-Local/07_Local_LLMs/ollama/models

LM Studio models /Volumes/OKH-Local/07_Local_LLMs/lm-studio/models

HuggingFace cache /Volumes/OKH-Local/07_Local_LLMs/huggingface-cache

Architecture

Key Decisions

Default runtime Ollama for API/headless, LM Studio for GUI/MLX testing, mlx-lm for max performance

Daily driver models phi4:14b (speed) · gemma3:27b (quality) · codestral:22b (code)

Model policy Western-lab only — Meta, Google, Mistral, Microsoft. No Chinese cloud services.

Token governance Local models → Notion AI → Claude/ChatGPT base → Perplexity/Copilot → Replit (spec must be complete first)

Git workflow Primary commits from Replit, Claude Code, Codex, Mac Studio. Secondary machine is read-mostly via OneDrive sync.

Directory conventions All model weights on OKH-Local external NVMe. Repos in OneDrive OKH folder. Workspace file at OKH root.