Fri Apr 8 06:49:41 PDT 2016
Redundancy: Fault model: What fault model should be assumed for analysis of redundancy?
Options:
Option 1: No fault model analysis
Option 2: Single point of failure analysis
Option 3: Multiple-failure modes analysis
Option 4: Common mode failure analysis
Option 5: Cascade failures and interdependency analysis
Option 6: Fault and failure graph analysis
Option A: Hidden faults are accounted for
Option B: Within faults are considered
Option C: Independent faults are considered
Option D: Limited interdependent faults are considered
Option E: All interdependent faults are considered
Decision:
Create your variation on the following table by altering areas per your requirements:
High threat |
Risk avoidance/deception area - Do not operate here |
Single point of failure analysis AND Common mode failure analysis AND Cascade failures and interdependency analysis [Hidden faults are accounted for AND Within faults are considered AND Independent faults are considered AND Limited interdependent faults are considered] |
[Single point of failure analysis AND Multiple-failure modes analysis AND Common mode failure analysis AND Cascade failures and interdependency analysis] OR [Fault and failure graph analysis] [Hidden faults are accounted for AND Within faults are considered AND Independent faults are considered AND All interdependent faults are considered] |
Medium threat |
Risk avoidance/deception area - Do not operate here |
Single point of failure analysis AND Common mode failure analysis [Within faults are considered AND Independent faults are considered AND Limited interdependent faults are considered] |
[Single point of failure analysis AND Multiple-failure modes analysis AND Common mode failure analysis AND Cascade failures and interdependency analysis] OR [Fault and failure graph analysis] [Hidden faults are accounted for AND Within faults are considered AND Independent faults are considered AND All interdependent faults are considered] |
Low threat |
No fault model |
Single point of failure analysis |
Treat the threat as at least medium and reassess |
| Low consequence | Medium consequence | High consequence |
Fault modeling approaches
Basis:
Fault models are assumptions that form the
basis for analysis and decisions regarding the use of redundancy in
systems. As such, they lie at the heart of any analytical process
involving the need for and use of redundancy.
No fault model:
In many cases, no fault model is used for analytical purposes,
leading to an approach in which judgment and estimates are made
without detailed analysis. In low consequence situations where the
analysis may be more costly than the consequences of faults, or where
system failures are not important enough to justify fault analysis,
this is reasonable, although it leads to less reliable operational
systems.
Single point of failure only analysis:
Most fault analysis is based on identification and selective
elimination of single points of failure, depending on the cost of
mitigation and the consequence of the failure. Methods like fault tree
analysis are used and fault assumptions like "stuck-at", "bridging"
and/or "transient" faults are made for analytical purposes.
Multiple-failure modes analysis:
Analysis of and compensation for multiple failures is essentially
never done for a complete system, however, multiple failure modes are
analyzed for some medium and many high consequence subsystems, such as
select control systems on aircraft. This sort of analysis is usually
limited to specific fault assumptions for specific subsystems and
specific classes of common-mode failures.
Common mode failure analysis:
Common mode failures occur when some commonality between
otherwise unrelated components is exercised such that it causes these
otherwise unrelated components to fail simultaneously. For example, a
fiber optic cable in proximity to a copper cable may experience a
common mode failure when the same backhoe cuts both of them. The
unlimited nature of potential commonality between all sets of things
makes it infeasible to anticipate and protect against all common mode
failures, but many such failures may be easily avoided once
identified. For example, redundant communications cables should not
run through common wire runs and should be separated by some distance
associated with the size of holes dug by backhoes.
Cascade failures and interdependency analysis:
Analysis of and compensation for cascade failures is based on
identification of interdependencies that may produce sequences of
events in which dependent system fail because of failures in systems
they depend upon. This is done recursively until it reaches either the
underlying physics of the world or exhausts the willingness of the
organization to consider further. Generally, analysis may include
{internal and/or external} x {limited | comprehensive} x {recursive to
level} interdependencies.
Fault and failure graph analysis:
To the extent that more comprehensive understanding is desired,
the generalization of fault modeling and analysis is to consider all
event sequences with potentially serious negative consequences and
model the sequential system behavior in this context. All sequences
include all fault models associated with all components of the
composite and identified numbers of {simultaneous / sequential} events
with timing. Such analysis is, in general, too complex to ever be
thoroughly performed, however, simulation methods are sometimes used to
provide runs through the space at a defined level of granularity,
particularly to compare architectural or design alternatives. This may
be generalized to a more real-time view of model-based situation
anticipation and constraint. This method, if properly undertaken,
includes all of the analytical methods of other analysis techniques,
subsumed into the overall graph approach.
Hidden faults are accounted for:
Hidden faults are faults not normally exposed because redundancy
covers them. These faults can lie undetected until a second fault
occurs, leading to a failure from lack of adequate and planned
redundancy. To account for them it is usually necessary to expose them
for testing or otherwise find ways to identify or mitigate them.
Within faults are considered:
Faults within the area being reviewed are in scope. For example,
if an enterprise is being considered, systems and mechanisms within
the enterprise are within this scope.
Independent faults are considered:
Independent and seemingly unrelated faults may combine to cause
failures. Analysis in this case should consider independent faults
such as simultaneous power failures of two completely unrelated systems
with no link between their power sources or mechanisms. Pure coincidence.
Limited interdependent faults are considered:
Interdependent faults, such as cascade failures identified above,
are related but typically not all within the specific scope of the
review. In other words, if an enterprise is being reviewed, external
interdependencies, such as the DNS hierarchy and external power supply
are considered.
All interdependent faults are considered:
In this case, an attempt to be complete in the review of
interdependencies is to be undertaken. This ranges from the
instantaneous to the long-term strategic (e.g., the education system
is not producing enough experts so that in 30 years we won't have
enough experts in power systems to operate the regional power grid.)
Copyright(c) Fred Cohen, 1988-2015 - All Rights Reserved
|