Thursday, April 19, 2012

Cloud and Moral Engineering

[nominal delivery draft, SOURCE Boston 18 April 2012]

Criticality, Rejectionists, Risk Tolerance - Daniel E. Geer, Jr. http://geer.tinho.net/geer.sourceboston.18iv12.txt
[excerpt] Summing up so far, risk is a consequence of dependence. Because of shared dependence, aggregate societal dependence on the Internet is not estimable. If dependencies are not estimable, they will be underestimated. If they are underestimated, they will not be made secure over the long run, only over the short. As the risks become increasingly unlikely to appear, the interval between events will grow longer. As the latency between events grows, the assumption that safety has been achieved will also grow, thus fueling increased dependence in what is now a positive feedback loop.

In the language of statistics, common mode failure comes from under-appreciated mutual dependence. Quoting from NIST's section on redundancy in their "High Integrity Software System Assurance" documentation[6] *public link permission revoked on previous link*:

[R]edundancy is the provision of functional capabilities that
would be unnecessary in a fault-free environment. Redundancy
is necessary, but not sufficient for fault tolerance. ... System
failures occur when faults propagate to the outer boundary of
the system. The goal of fault tolerance is to intercept the
propagation of faults so that failure does not occur, usually
by substituting redundant functions for functions affected by a
particular fault. Occasionally, a fault may affect enough
redundant functions that it is not possible to reliably select
a non-faulty result, and the system will sustain a common-mode
failure. A common-mode failure results from a single fault (or
fault set). Computer systems are vulnerable to common-mode
resource failures if they rely on a single source of power,
cooling, or I/O. A more insidious source of common-mode failures
is a design fault that causes redundant copies of the same
software process to fail under identical conditions.


That last part -- that "A more insidious source of common-mode failures is a design fault that causes redundant copies of the same software process to fail under identical conditions" -- is exactly that which can be masked by complexity precisely because complexity ensures under-appreciated mutual dependence.....[excerpt]

No comments: