Tagged
industry-retro
9 posts.
No staging, so every deploy was an incident
A high-throughput factory-floor pipeline had no staging environment that matched production, so every deploy validated only on local and dev hit prod untested. The differences those two hid surfaced as incidents.
·1 min readA SELECT That Stopped a Factory
I ran an unverified SELECT against production. It took a library cache lock and stopped equipment at an overseas site, and it took me 15-20 minutes to realize my own session was holding the lock.
·1 min readLocking down a fleet of handheld devices with MDM
We used an MDM tool to remotely lock a fleet of factory-floor handhelds so only approved apps would run. It was a setup task more than an incident — the only correction is that the lockdown should have been part of the standard provisioning step, and at the time it wasn't.
·1 min readA save button wired to a fixed column index
A save button in a data grid read from a hard-coded column index. When the column layout was reconfigured, it saved the wrong column.
·1 min readThe TLS Certificate Nobody Was Watching Expired
A TLS certificate on the link between two systems expired without being renewed, and the integration went down in production. Nothing had been watching the expiry date.
·1 min readWhen the table said done and the work hadn't happened
A transaction interface to an ERP system marked rows as success while the downstream operation had actually failed, and its logs were sometimes missing, sometimes duplicated. A short note on what broke and the one line that wasn't in the code.
·1 min readA protocol change on one side, an integration server that never heard about it
A piece of factory equipment switched its communication mode from push to request-response. The integration server in the middle was never changed to match, so the link broke in production.
·1 min readPutting a handheld device on a locked factory network
Getting one handheld scanner onto the plant network meant clearing several access-control layers one at a time. None of it was written down anywhere.
·1 min readBuilding an Android app outside the network it had to run inside
An Android app for an air-gapped factory network was built on a machine outside that network, then carried in by hand. When it failed inside, there was no way to see why from where the build happened.
·1 min read