Lately, I’ve been doing a lot of thinking about complex systems and how translatable the study of systems is to the data and creative-driven world of digital media.
Dr Richard Cook MD produced a list of 18 “truths” about how complex systems break down. While Cooke was, I imagine, talking about the healthcare industry (though he has also gone on to talk about the tech industry and specifically the web), we can draw some parallels with his analysis of why complex systems fail:
- Complex systems are intrinsically hazardous systems.
- Complex systems are heavily and successfully defended against failure.
- Catastrophe requires multiple failures – single point failures are not enough.
- Complex systems contain changing mixtures of failures latent within them.
- Complex systems run in degraded mode.
- Catastrophe is always just around the corner.
- Post-accident attribution to a ‘root cause’ is fundamentally wrong.
- Hindsight biases post-accident assessments of human performance.
- Human operators have dual roles: as producers and as defenders against failure.
- All practitioner actions are gambles.
- Actions at the sharp end resolve all ambiguity.
- Human practitioners are the adaptable element of complex systems.
- Human expertise in complex systems is constantly changing.
- Change introduces new forms of failure.
- Views of ‘cause’ limit the effectiveness of defenses against future events.
- Safety is a characteristic of systems and not of their components.
- People continuously create safety.
- Failure free operations require experience with failure.
Going deeper into the subject, in a talk given at Velocity NY 2013, Cook painted a picture of two kinds of systems: systems as imagined and systems as found. Or to put it another way, fantasy versus reality. Cook’s talk – “Resilience In Complex Adaptive Systems: Operating At The Edge Of Failure” – puts forward an opinion that every system always operates at capacity, essentially running at the edge of failure and not performing “extremely well” but always looking towards the next thing to come along.
Systems are in constant flux – both intentionally and unplanned, reactive and active, and across multiple different timescales and contexts. For people like Cook – those who make an actual study of systems rather than a blog post paraphrasing the research of others (ahem) – the surprising element of complex systems is not that they fail so often, but that they fail so rarely. The operating point of systems tends naturally to drift towards the “accident boundary” – the “normalisation of deviance” as Dianne Vaughn puts it – and it’s only through applying new rules and learning through the failures that the boundary gets pushed back.
Cook’s important conclusions from his and others’ studies is that making systems resilient – no matter what industry or context – is critically dependent on the human element, the operator themselves, and their capacity to anticipate and monitor the system, be conscious of and act on/react to threats to the system, and prioritise goals pragmatically to prevent failure.
There are multiple factors pushing us towards the accident boundary in SMEs in the digital space, from resourcing, to finance, to forecasting, to strategy – the list is endless. It is crucially important that we ensure the systems that keep our organisations running are as resilient as possible, and that means monitoring our systems, reacting to failures, anticipating when they might occur and learning from our experiences with failure and adpating accordingly.
This, ultimately, leads us towards creating more resilient systems. But for most of us, that is a constant state rather than a fixed point.