Daeseon Yoo
Back to project
·Tech retro·3 min·Review needed

PGlite Embedded DB: The Full Story of Concurrency, Corruption, and Build Conflicts and How They Were Solved

Three traps hit with PGlite (WASM PostgreSQL), chosen for zero external DBs — concurrent queries, disk corruption, and build multi-worker conflicts — and the convergence onto lazy-init + a process guard.

AI version

PGlite was perfect for the goal of "keep PostgreSQL while running zero external servers," but its essence as a single-connection WASM produced three traps.

Trap 1 — Concurrent queries → Aborted()

Intermittently during dev server verification:

Unhandled Rejection: RuntimeError: Aborted(). Build with -sASSERTIONS for more info.

⚠️ The root cause of this early intermittent error was never pinned down for certain. Suspected: the request handler and cron querying the same PGlite instance concurrently (unverified). Trusting that hypothesis, I monkey-patched the PGlite methods, thinking "let's serialize query/exec with a FIFO mutex" — and this caused a bigger incident.

Trap 2 — The monkey-patch broke init, and the disk was corrupted

Right after the monkey-patch, boot itself died:

An error occurred while loading instrumentation hook: Aborted().
    at async Module.register (instrumentation.ts:22:3)
  • Observed fact: when the monkey-patch was added, boot died, and when it was removed, boot came back.
  • Hypothesis (unverified): if PGlite's exec internally calls query, wrapping both in the same single-tail chain would have caused a self-deadlock/abort on the first DDL. I did not read the PGlite source directly to confirm.
  • Corruption (verified by action): even after reverting the monkey-patch, boot aborted, and only after rm -rf .data/pglite did it boot normally → disk corruption confirmed. The corruption trigger is suspected to be that aborted monkey-patch run (could not be stated for certain).

Fix:

  1. Fully revert the monkey-patch (do not touch PGlite internals).
  2. Delete the corrupted DB: rm -rf .data/pglite.
  3. Re-verify — without the monkey-patch, after removing the corruption, in the lazy-init+guard state, 30 concurrent writes → 0 Aborted() (confirmed from test output).

⚠️ I do not assert that "PGlite queues internally, so serialization is unnecessary" — I did not read PGlite internals. All that was observed is the result of #3 above, and the root cause of the early intermittent error remains undetermined.

Trap 3 — next build multi-worker conflict

Generating static pages using 11 workers ...
Error: PGlite failed to initialize properly
RuntimeError: Aborted().

Verified cause: lib/db/index.ts created the PGlite instance at module import time → the build's 11 parallel workers opened the same disk concurrently.

Fix: lazy-init. getBundle() creates the instance on the first getDb() call (not at import time). Rebuild clean.

function getBundle(): DbBundle {
  if (!globalForDb.__bk_db) globalForDb.__bk_db = createBundle();
  return globalForDb.__bk_db;
}

Unattended-run stability — process guard

So that a stray async rejection wouldn't kill the process during unattended overnight runs, the unhandledRejection handler in lib/guards.ts is dynamically imported from instrumentation.ts (Node-only, separated from Edge compilation). This separation also resolved the Edge-Runtime warning for process.on.

Commits

  • lazy-init + process guard + Edge separation: e1a924e
  • monkey-patch/turbopack.root experiments: reverted before commit — no hash

Lessons

  • Do not monkey-patch a WASM library's internal methods.
  • Do not open an embedded DB at module top-level — lazy-init.
  • An embedded DB can be corrupted by an aborted write — rm -rf .data/pglite is the dev recovery.
  • None of these three traps exist in the real PostgreSQL in production (multiple connections, separate server). PGlite is a dev-only convenience.

Review needed

No human review on this entry yet.