Active · May 26, 2026
ScreenBridge
A macOS desktop tool that translates abstract AI instructions ('open VS Code, edit X') into concrete pointer actions on your actual screen — via ScreenCaptureKit + Accessibility APIs + a vision LLM.
- Role
- Solo
- Stack
- Swift 6.3 · SwiftUI + AppKit · ScreenCaptureKit · AXUIElement · Vision OCR · Anthropic Claude Sonnet 4.6 (vision) · Gemini 2.5 Flash
Project log
Chronological record of troubleshooting, retros, and updates while building this.
Decisions & milestones
Architecture overview — ScreenBridge (jarvis-pc) as of 2026-05-31
SnapshotMay 31, 2026 · 2 min
Swift 6.3 macOS native app that captures the cursor monitor, fuses Vision OCR with AXUIElement, dispatches to a vision LLM (Gemini/Claude with a local Qwen2.5-VL path under construction), and overlays a red box plus bubble on the same monitor.
Multi-vendor LLM failover: Gemini 429 auto-swap to Claude
DecisionMay 31, 2026 · 4 min
Wrap two LLM dispatchers so a 429 from primary (Gemini) auto-swaps to fallback (Claude) inside the same call, keeping the caller unaware of vendors.
Build log
Week of May 25, 20262 entries · 2 Tech retro
Phase 7.3: completion pill + per-session JSON audit log
Tech retroMay 31, 2026 · 2 min
Closed out Phase 7.1 deferred work: a green checkmark completion pill on the HUD and a per-session JSON audit log written to Application Support, completing the 5-layer security architecture.
v0.2 Layer 1 — SecretMasker: 10-pattern regex redact before LLM + audit log
Tech retroMay 31, 2026 · 3 min
First layer of v0.2's 5-layer security architecture: a 286-LOC SecretMasker that redacts API keys, credit cards, and Korean RRNs at the AnalyzeCoordinator boundary — before the instruction reaches the external LLM or hits the audit log on disk.