Internal Control System

This Internal Control System (ICS) write-up describes the approach indicated by Gamma (www.gammassl.co.uk) for internal controls. It is interesting because it was written by smart people who have lots of experience and it makes good sense. It is summarized here followed by a brief critique.

There are 7 classes of control:

Class Ability to detect the event and take recovery action Type
1 Prevents the event, or detects the event as it happens and prevents it from having any impact Preventive
2 Detects the event and reacts fast enough to fix it well within the time window Detective
3 Detects the event and just reacts fast enough to fix it within the time window Detective
4 Detects the event but cannot react fast enough to fix it within the time window Detective
5 Fails to detect the event but has a partially deployed BCP Reactive
6 Fails to detect the event but does have a BCP. Reactive
7 Fails to detect the event and does not have a BCP. Reactive

Class	Ability to detect the event and take recovery action	Type
1	Prevents the event, or detects the event as it happens and prevents it from having any impact	Preventive
2	Detects the event and reacts fast enough to fix it well within the time window	Detective
3	Detects the event and just reacts fast enough to fix it within the time window	Detective
4	Detects the event but cannot react fast enough to fix it within the time window	Detective
5	Fails to detect the event but has a partially deployed BCP	Reactive
6	Fails to detect the event but does have a BCP.	Reactive
7	Fails to detect the event and does not have a BCP.	Reactive

Table 1: Control Class Definitions

Each event ej occurs at some time Tej and if the damage that it causes is not fixed by time Tfj, where Tfj is less than some time Twj (where d/Twj = Tfj - Tej is referred to as the time window), the event will cause a loss of business benefit, Ipj (referred to as the impact penalty). The objective of an ICS is to control activities and detect unwanted results. Events it detects are detected at times Tdj (where Tej < Tdj ).

Class Time metric
1 d/Tdj and d/Tfj are very very small
2 d/Tdj is sufficiently short for Tfj to be comfortably within d/Twj
3 Tdj is such that Tfj is close to Twj (i.e. a near-miss)
4 Tdj is too late Tfj being greater than Twj
5 Tmj is greater than Twj , Tfj follows on soon after
6 Tmj is greater than Twj , there is an appreciable delay before Tfj
7 Tmj is greater than Twj , there is a significant delay before Tfj

Class	Time metric
1	d/Tdj and d/Tfj are very very small
2	d/Tdj is sufficiently short for Tfj to be comfortably within d/Twj
3	Tdj is such that Tfj is close to Twj (i.e. a near-miss)
4	Tdj is too late Tfj being greater than Twj
5	Tmj is greater than Twj , Tfj follows on soon after
6	Tmj is greater than Twj , there is an appreciable delay before Tfj
7	Tmj is greater than Twj , there is a significant delay before Tfj

Table 2: How the time metrics relate to control class

Effectiveness Principles and Criteria

Extremes of effectiveness and ineffectiveness

The worst a control can get

Whatever controls it did have, if they did not work you would not find out until it was too late. bullet
Indeed, all the detective controls would be so slow to detect an event that the time window would always expire before the problem could be fixed.
There would be no BCPs. When an incident happened, management would always be unprepared.

The best controls can get

Whatever controls it had, if they did not work you would find out immediately and be able to take appropriate action well within the time window. In fact all of the controls would be fail-safe self-policing procedures.
Indeed, all the detective controls would work so fast that they would be Class 2 non-degradable. The reactive controls would all be Class 5.
The BCPs would be so comprehensive that, when an incident did happen, management would always find that its existing Class 5 BCPs would deal with the problem entirely.

The underlying principles: robustness, speed, and anticipation

The robustness of the ICS in the event of a control failure
The speed at which the ICS can react to events
The ability of the ICS to deal with the unexpected.

Middle Ground

Robustness

The middle ground criterion is:

R1 - There are some self-policing procedures, some of which may be fail-safe.

A stronger criterion is:

R2 - There are some self-policing procedures, at least one of which is fail-safe.

Speed

The middle ground criterion is:

S1 - There is a mixture of Class 2, 3 and even Class 4 detective controls. The Class 2 and 3 controls that are not protected by fail-safe self-policing procedures may degrade to Class 4.

A stronger criterion is:

S2 - There is a majority of Class 2 detective controls, with possibly some Class 3 or even Class 4. The Class 2 and 3 controls that are not protected by fail-safe self-policing procedures may degrade to Class 4.

Anticipation

The middle ground criterion is:

A1 - There is at least one Class 6 BCP dealing with some catastrophe (e.g. fire). Other unexpected events incidents are dealt with through an ad hoc procedure.

A stronger criterion is:

A2 - There are a variety of BCPs (some of which may be Class 5) dealing the failure of control or some catastrophe (e.g. fire). Other unexpected events incidents are dealt with through an ad hoc procedure.

The ICS Marking Scheme

Determine the category of the ICS by awarding 3 marks for each of R1, S1 and A1 and 1 extra mark if it is exceeded.

The resulting categorization is:

Well above average (AAA rating) 11 or higher
Above average (A*) 10
Average (A) 9
Below average (B) 6 - 8
Well below average (C) 4 or lower.

#	Question
1	Should we be using be using a preventive control? Ask "Is the cost of using a preventive control less than the sum of cost-to-fix and possible impact penalties for all the events that the preventive control is designed to detect?" If the answer is yes, then there is indeed a case for using a preventive (i.e. Class 1) control.
2	Should we improve the efficiency of our detective controls? Upgrade from Class 4 to Class 3 Ask "Is the cost of the upgrade less than the average impact penalty times the number of events?" If the answer is yes, then an upgrade from a Class 4 to a Class 3 control is worthwhile. Upgrade from Class 3 to Class 2 Ask "Is the cost of the upgrade less than the average reduction in the cost-to-fix times the number of events?" If the answer is yes, then an upgrade from a Class 3 to a Class 2 control is worthwhile.
3	Should we pre-deploy our BCPs? Ask "Is the cost of pre-deployment over Y years minus the business benefit prior to invocation less than the reduction in impact penalty, minus the loss in business benefit, multiplied by the number of times the BCP might be invoked in that period of Y years?" If the answer is yes, then pre-deployment is worthwhile.
4	Should we have a BCP? Following consideration of the impact penalty and likelihood of occurrence, ask "Is his an acceptable risk?" If the answer is no, then you need a BCP.

Table 4: Determination of cost-effectiveness

Events

Events are bad things that cause trouble. Events that are common across many businesses include: Theft, Acts of God, vandals and terrorists, Regular fraud, IT failure, Hacking, Denial of Service attacks, Disclosure, and Breach of the law.

Impacts

Impacts that are common across many businesses include: Customer dissatisfaction, Adverse press coverage, Loss of revenue, Unanticipated costs, Inability to carry out some or all of its business, Loss of the monetary value of buildings and contents, Failure to prosecute, and Court action against an employee or the business itself.

Risk Treatment Plans

Step 1: Identify events: Name the event and briefly describe it. Start with the standard events described above and augment them with client specific concerns.

Step 2 - identify the assets: Start with a generic list that includes: Buildings and their contents, IT hardware and networks, Infrastructure and application software, Computerized data concerning the organization's business, Paper documents and records concerning the organization's business, and Supporting data, documentation and records.

Step 3 - identify the impacts: start with the standard impacts described above and augment them with client specific impacts as required.

Step 4 - identify the threats: We usually start with a generic list of threat agents that includes such entries as: Fire, flood and other forms of natural disturbance, Power and other utility failure, Customers and suppliers, Disaffected staff, Spies, Thieves, Vandals and terrorists, Hackers, Errors and mistakes.

Step 5 - produce the RTPs: This step is repeated for each event.

1) write down the description of the event and list the assets that are affected. Augment/modify the asset inventory if there is an asset that we wish to refer to that is not already in the list.
2) document the applicable impacts and order them in the priority they are to receive. Record if any are to receive equal priority treatment.
3) list the applicable threats.
4) repeat the steps 5a-5d below until all the impacts have been dealt with. If the impacts are listed in priority order, take them in that order. If two or more have the same priority, take them together.

Step 5a identify the risks leading to a particular impact (or impacts if the impacts have the same priority) for known threats.

Step 5b - identify the risks leading to a particular impact for unknown threats; Consider the event and the impact(s), current defenses, and where they place things in the ICS model.

Step 5c - dealing with unacceptable residual risks; Find countermeasures to move up the ICS.

Step 5d - optimizing the ICS; for items that are too highly mitigated, consider moving them down the ICS.

Step 6 - tidy up: Review the process to make sure nothing has been forgotten.

Critique

ICS, like so many other methodologies, depends on four fundamentals that seem to be present in all such approaches; (1) "magic happens here", (2) "a skilled expert takes care of it", (3) "for all possible of events...", and (4) "getting the right number for this is left as an exercise for the reader".

The criticism I have is that all of this means that real experts must be involved to make reasonable and prudent decisions. While I personally agree with this as a fundamental, it will likely be misapplied by those that know too little resulting in bad decisions that cause substantial harm. And when I am asked to come in later to look at the situation I will find all sorts of really bad things that were completely missed. At some point we need to come out and say it. Experts are needed to perform this magic and the only way we can tell an expert from anyone else is by having another expert say so.

The third item is the easiest to criticize. How do we get the list of all possible bad events? This ICS folks have helped a great deal by providing a short list of common events (and have completely ignored sequences of events that tend to be far harder to codify) and telling their readers to augment it with... (magic happens here). They have, by the way, done the same thing for threats (which they have mixed up a bit with other things) and impacts (which is actually a lot more helpful relative to the other helpful items provided).

The fourth item is also pretty easy to criticize. How do we measure all of these times. The time window assumes a step function in damage, while damage tends to involve step functions and linear as well as non-linear losses with time. Indeed there are situation-dependent losses that are far more complex than this simple model implies. Indeed all of the time notions are identified in pretty simplistic ways while they are actually far more complex phenomena. The implication of making decisions that are highly nonlinear (use or do not use this class of defense) based on metrics that are not very well defined and are complex means that wrong decisions with high differences in costs and consequence will be made. While the idea of embracing time as a metric is a good one, the particular approach is quite too simplistic for the reality of the situation. But fear not - because expert judgment will supersede any specifics associated with the metrics anyway.

This of course brings us back, as it inevitably does, to "a skilled expert takes care of it" and "magic happens here". The magic is that an expert who knows what they are doing makes a judgment that makes all the difference in the world. And as if that weren't the worst part of it, the expert will not be evaluable by the outcomes because, unless they really screw up big time, it will be impossible to tell if they made a good judgment or a bad one, and even then, we will only find out if the low-probability high-consequence event takes place. And this will no doubt be met by the claim "nobody could have predicted this" - that we here so often about airplanes running into buildings, even though these sorts of event sequences were predicted and associated with the threats that ultimately carried them out. They were not and are not mitigated even today.

As "a skilled expert ..." who makes "magic happen here" I know that trying to codify what the folks who wrote ICS and I do is not an easy matter. I only wish that somewhere they would indicate "Here's where a skilled expert makes magic happen" at every point where that applies so that people will know which things to hire us for.