유대선
프로젝트로
·트러블슈팅·4 ·리뷰 필요

Why codex was 'command not found' in my own terminal app — but claude wasn't

Prepping a demo, I tried to run codex inside a DalkkakAI pane and got 'command not found' — even though it was installed. The culprit: a leaked npm_config_prefix (from launching the app via pnpm) made nvm refuse to load, so every nvm-global CLI vanished. claude worked only because I'd special-cased it. The fix was to stop polluting the pane's environment, not to special-case codex too.

AI 버전

The moment

I was inside my own app, in a pane, lining up tools for a demo. claude ran. codex didn't:

zsh: command not found: codex

Which is absurd — codex (@openai/codex@0.135.0) is installed, and it runs fine in iTerm. So why does my product, a terminal multiplexer whose whole pitch is "run your own CLIs," fail to find a CLI that's right there?

Scrolled up, and there it was at the top of the shell:

nvm is not compatible with the "npm_config_prefix" environment variable:
currently set to ".../apps/desktop"

The chain

codex lives in nvm's node bin (~/.nvm/versions/node/<v>/bin). That directory only ends up on PATH when nvm loads. And nvm refuses to load if npm_config_prefix is set — it prints that warning and returns early.

Where did npm_config_prefix come from? From me launching the app with pnpm tauri dev. The package manager exports npm_config_* into the process it spawns. portable-pty's CommandBuilder seeds each pane's environment from the current process env (get_base_env()), so that stray npm_config_prefix — pointing at apps/desktop — rode straight into every pane. nvm saw it, bailed, and never put its node bin on PATH. Every nvm-global CLI quietly disappeared.

claude survived for a dumb reason: it lives in ~/.local/bin, which my .zshrc adds before the nvm block — and I'd also wired a Claude-specific wrapper dir onto the pane's PATH (so panes can self-summarize). So the one tool I obsess over was triple-covered, and its working state hid the fact that the environment was broken for everything else.

I verified the mechanism instead of trusting the story: set npm_config_prefix and source nvm → the exact warning, verbatim, with the apps/desktop path. Unset it → nvm loads, codex resolves.

The fix — and the two fixes that were wrong first

The most tempting wrong fix: special-case codex too — hardcode its path like I'd done for claude. That's the trap that started all this. It would "work" and leave the actual bug — a poisoned pane environment — waiting for the next tool.

So I went one layer down: strip every inherited npm_config_* from the spawned shell's env (cmd.env_remove) in the Rust PTY code. Obvious. Correct-looking. And it didn't work — the user came back with "여전히 안 됨" (still broken).

Because I'd patched the wrong process. The pane's shell isn't the bash I was configuring — it's spawned by a long-lived tmux -L dalkkak server that captured its environment once, when it first started, and keeps re-spawning panes from that snapshot. Editing the spawn code never touches a server that's already running. I'd hit this exact trap a day earlier with tmux mouse-mode (reverting the code didn't reset the live server) — and walked straight back into it.

The ground truth was one command away:

tmux -L dalkkak show-environment -g | grep npm_config
# npm_config_prefix=/Users/.../apps/desktop   ← there it is, on the server

So the real fix scrubs the server's global environment, not the spawn-bash's:

tmux -L dalkkak show-environment -g | grep '^npm_config_' | cut -d= -f1 \
  | while read v; do tmux -L dalkkak set-environment -gu "$v"; done

Panes spawned after that load nvm like a real terminal would, and the user's full PATH — codex and everything else — comes back. (I kept the Rust env_remove too: it cleans a fresh server, and I ran the scrub against the already-running server so existing installs heal without a kill-server.) Generic by design: it fixes the class, not the instance.

The lesson

My subprocess-env rule (RULE #5b) already said "augment PATH, forward the common vars." But it was all addition. The bug was subtraction: a hostile inherited variable that broke the user's own shell setup. Env hygiene is both — add what's missing, and strip what poisons.

The middle one, which cost an extra round-trip: state lives where the long-lived process holds it, not where the code spawns it. A persistent tmux -L server is a daemon with its own frozen environment; patching the spawn path is invisible to it. When a fix "obviously should work" but doesn't, stop coding and interrogate the running thing (show-environment -g) — the answer is usually sitting in the live state, not the source.

And the sharpest one: don't special-case the tool you care about. Optimizing claude made claude work and made me blind. codex was just the canary. Fix the environment, and every tool works — which is the only version of "BYO any CLI" that's actually true.

리뷰 필요

내 시각이 아직 안 들어간 entry.