Scylos
All articles
SecurityApril 30, 2026 · 5 min read

OT resilience by architecture, not by heroics

The 2024 outage proved a point the industry keeps relearning: resilience built on agents and persistent state is fragile. Stateless devices reboot to known-good and keep running.

S
Scylos Security
Research
OT resilience by architecture, not by heroics

Operational resilience is usually sold as a process: runbooks, on-call rotations, recovery time objectives. All of that is real, and all of it is heroics, humans compensating for an architecture that fails in ways it did not have to.

Why stateful endpoints fail badly

A persistent OS can be corrupted. A kernel-level agent can ship a bad update to every machine at once. When that happens, recovery means touching each device, and at fleet scale, that is days of downtime and a very long night.

There is no persistent OS to corrupt and no kernel agent to push a bad update into. The device reboots to known-good and keeps running.

Resilience as a property, not a procedure

On a stateless substrate, the worst day of the year becomes a non-event. No fire drill, no truck rolls, no all-hands at 3 a.m. The device does the only thing it knows how to do: boot clean and run authorized work.

That is resilience by architecture. It does not depend on anyone being awake.

See the stateless endpoint on your own hardware.

Flash an idle machine and run your real workloads. No hardware to buy.

Start a pilot