Daeseon Yoo

문제

사용자가 dogfooding 도중 발견:

"어떻게 빨라진거지? 이거... 이런 속도 개선점은 순서대로 반드시 정리해야한다 이런 엔지니어링 인사이트 도출을 원함 내가 찾은게 아니어도 메커니즘은 배워서 앞으로도 써먹어야지"

→ 임시 fix가 아니라 재사용 가능 knowledge 박는 게 가치. 다음 vision LLM 프로젝트 / 다른 사람 프로젝트에서도 first principle 적용 가능해야.

3 옵션:

A. 인라인 comments — 코드 안에 왜 빠른지 박음

B. 프로젝트 README append — README에 latency section 추가

C. ⭐ 별도 playbook 파일 — docs/playbooks/latency-optimization.md

선택 C.

docs/playbooks/latency-optimization.md — 612 lines, 25 trick, 6 dimension:

Trick 8: maxOutputTokens 적당히 (over-allocation = attention budget 낭비)
Trick 9: responseSchema 강제 (free-form 응답 시 retry cost)
Trick 10: Image FIRST + text AFTER (Gemini quirk, sweep cross-cutting Layer 12)
Trick 11: systemInstruction separation (context cache 가능)
Trick 12: exp backoff retry only retryable codes (429/5xx, 4xx fail-fast)

박힌 메타 원칙 3개:

"Measure-first, optimize-second" — log show 같은 persistent log 먼저 박음. optimization은 measured bottleneck에만.
"Vendor pays for what you send" — Gemini formula tokens = w × h / 750. payload reduction은 돈 + 속도 둘 다.
"Perceived latency ≥ actual latency" — Loading message + skeleton + progressive update이 실 latency 안 줄여도 사용자 만족 압도. 50% UX gain.

박는 비용: ~3-4h playbook 작성 + 5 trick 실측 검증
박은 가치: transferable — 다음 vision LLM 프로젝트 / 다른 사람 프로젝트에 복붙 가능. 6개월+ 가치.
Memory entry: engineering-playbooks-index.md — playbook reference + 다음 playbook 후보 list

memory engineering-playbooks-index 박혀있음:

swift-6-strict-concurrency-traps (AXValue, kAX const, implicit self in Logger 등)
llm-prompt-engineering (✗/✓ pair pattern + constraint encoding)
security-best-practices (regex masking, app exclusion, audit log)
macos-permissions-flow (Screen Recording + Accessibility eager startup)
coordinate-system-4-layer (physical / sent / logical / screen-local 변환)

Playbook over comments: transferable knowledge는 별도 파일 박음. 인라인 comment는 코드 맥락에 묶임 → transfer 못 함.
Priority order matter: 25 trick 다 적용 X. top-5 quick wins 먼저, mid 다음, advanced는 측정된 bottleneck에만.
Dimension grouping: trick 통째 list 대신 dimension으로 grouping → 다른 프로젝트에서 해당 dimension만 복사 가능.

04430d72 (2026-05-30 14:42 -0400) — 612 lines added to docs/playbooks/latency-optimization.md + PROJECT_TIMELINE.md 40 lines.