Reduction of system change or operation risk

This is published under the terms of the licence summarized in the footnote.

All free-to-read materials on the Avancier web site are paid for out of income from Avancier’s training courses and methods licences.

If you find them helpful, please spread the word and link to the site in whichever social media you use.


Weak risk management can lead to serious pains.

Some business operations depend on digitised systems that are large and complex beyond most people’s understanding.

Business managers should value reductions in the risk of system change project failure, or system operation failure.

In some fields/domains, there is science behind risk management.

In others, including enterprise and solution architecture, it is more art than science.


A risk management procedure. 1

The nature of architectural risks. 3

Commentary. 3

What about EA in particular?. 6


A risk management procedure

Risk management is about reducing the probability of a forecast risk event and/or recovering from its occurrence as an issue.

Identify and catalogue risks

Obviously, risk management starts with identifying risks.

This is the first way in which risk management falls short of science.

In practice, some risks will not be identified before they appear (surprisingly) as issues.


Risks are commonly listed in a Risks, Assumptions, Issues and Dependencies (RAID) catalogue.

Assumptions and dependencies can turn into risks.

Risks can turn into issues.

A risk event has a description and a potential time period in which it may occur.

Risk events (described as when they happen) might include "key supplier has gone bust" and "the project has gone overrun the deadline.”

The first is an all-or-nothing event, though you may see it coming.

The second is an event on a continuum that might be sub divided into several risks:

·         Risk of overrun by 0 to 10%

·         Risk of overrun by 10 to 30%

·         Risk of overrun by 30 to 100%

·         Risk of overrun by more than 100%.

Classify and prioritise risks

A risk manager should strive to define the probability and impact of a risk.

A common way to classify and priorities risks is using a grid such as the one below.

A matrix of 4*4 or 5*5 matrix might be used, but 3*3 may be as precise as you can be.










Very high






Very low




Note that people score:

·         “probability” for one-time risks in system change projects

·         frequency” for repeatable occurrences in on-going system operations.


The latter applies better to some risks in short-cycle manufacturing and service management


Scoring risks for probability and impact

Writing down some numbers in a table or grid looks like science.

But this is a second way in which risk management falls short of science.

The numbers are not quantities you can do serious maths on.


Risk = impact * probability is not scientific formula.

Risk = impact + probability has no more or less scientific basis.

Both formulas work equally well for ranking risks into an ordered list.

If you must use maths, then add rather than multiply.

Risk of 3rd party supplier failing







3 + 1

3 + 2

3 + 3


2 + 1

2 + 2

1 + 2


1 + 1

1 + 2

1 + 3


Scoring impact is a third way in which risk management falls short of science.

In practice, the impact may have least three dimensions; there can be an impact on Cost, Time and Quality.

Coming up with one number to the quantify the combination of these three is an art; and it will depend on judgement calls about which matters most


Ranking risks

The scoring does help you to rank risks from high to low.



Level of risk



6 Very high



5 High



5 High



4 Medium



4 Medium



4 Medium



3 Low



3 Low



2 Very low


The purpose of ranking risks is to prioritise them for attention to:

·         Definition of mitigation actions to avoid or reduce the risk

·         Definition of recovery or containment actions to minimise the impact when issues arise.

Define risk mitigation and containment actions

A risk manager should strive to define risk mitigation and containment procedures (aka controls).

Focusing most attention on the highest risks – of course.

Before risk event occurrence, take risk avoidance/mitigation actions

Add, update and delete RAID catalogue entries to reflect changed circumstances.

Look forward, look for clues as to likely events.

Is there a trend? Are things getting worse or better?


In the example, you would want to monitor the supplier's financial health.

You want one or more reports on a regular basis about the supplier's finances.

Reports might be classified for simplicity as red/amber/green.
If the reports turn amber or red, you need to take action.

Perhaps make sure the supplier is paid on time; perhaps start shopping around for a new supplier; etc.
Hopefully, taking avoidance/mitigation actions shifts the risk event probability or impact back to low.

Recognise when a risk event occurs as an issue

In the example, the risk event occurrence is the point in time you find the supplier has gone bust.

That is, the moment the risk turns into an issue in the RAID catalogue.

After risk becomes issue, take risk recovery/containment actions

Now, you need to recover (e.g. engage a new key supplier; start retooling; etc.)

We also need to measure the success of the recovery.

The nature of architectural risks

Risk management in an architecture framework

Our concern is business systems, systemisation and system integration, in information-intensive organisations.

TOGAF is an architecture framework for this

Phases A to F of TOGAF can be seen as the inception phase of a programme.
Phase G is associated with a PMO running projects in that programme.
Phase H is associated with ITSM operating the implemented systems.
So risk management of TOGAF-style EA is risk management of a programme.


Is this scientific?
Ask a large systems integrator - one who carries out hundreds of IS projects every year and ask them.
Do you collect statistics of risk events in risk catalogues - type, frequency and impact/cost?
Do you collect statistics of issues in issue logs - type, frequency and impact/cost?
Do you use the latter to judge the effectiveness of the former?
If you can find one who says yes - and they are willing to share their stats - then you have a source for your science, and perhaps fame and fortune.

Relatively quantifiable risks

The risks facing a business, a system, or a project might include the kind that might be predicted with a measurable accuracy from historical records

·         exchange rates will change in the wrong direction

·         interest rates will change in the wrong direction

·         stock prices will change in the wrong direction

·         hardware components will fail

·         network connections will fail

·         software components will fail.

Relatively unquantifiable risks

However, the risks facing the architects of enterprise systems are usually of a different kind.

They are more particular to a specific business, system, or project and so less quantifiable:

·         requirements will turn out to be unclear or unrealistic

·         requirements will change

·         time estimates will prove inaccurate

·         a key deadline will be missed

·         cost estimates will prove inaccurate

·         the project sponsor will be replaced by one with a different agenda

·         our relationship with the customer will be inadequate

·         third party suppliers will fail to deliver

·         technology vendor assurances will prove unreliable.


Beware that people (encouraged by quality standards like ISO 9000 and CMMI) can hide behind a cloak of procedures and methodology.

Getting a quality system auditor to approve what you are doing doesn't mean you know what you are doing.

Following a risk management method doesn’t mean you are doing risk management well.


Q) Is risk management an objective science anywhere?

Risk management in power generation or investment banking industries is - more science than art – we are told.

“The Wall Street Journal published a statement by one Matthew Rothman, financial economist,

expressing his surprise that financial markets experienced a string of events that “would happen once in 10,000 years”.

A portrait of Mr Rothman accompanying the article reveals that he is consider­ably younger than 10,000 years;

it is therefore fair to assume he is not drawing his inference from his own empirical experience

but from some theoretical model that produces the risk of rare events, or what he perceives to be rare events.”

Financial Times, October 23 2007. Nassim Nicholas Taleb: author of ‘The Black Swan: The Impact of the Highly Improbable’


Risk management in financial institutions has a special meaning

A lot of mathematics goes into banks’ efforts to preserve a healthy debt-to-equity ratio.

But Taleb (above) later blamed over-reliance of managers on back-room analysts for the credit crunch.

No amount of mathematics can replace the judgements managers must make.


Outside of the special meaning above, risk is defined as the probability and impact of an adverse event occurring.


Q) Is risk management an objective science in IT projects?

Experience suggests risk management in such projects is usually more art than science.

Ways that risk management in falls short of science include:

·         Some risks are not identified before they appear, surprisingly, as issues.

·         The numbers in a risk management grid are rankings, not quantities.

·         Scoring impact is naďve: it depends on how you choose to juggle separate impacts on cost, time and quality (and perhaps other factors, such as reputation).

·         Scoring probability is naďve in two ways: some risks are not all or nothing; some risks are likely to surface as issues more than once, perhaps many times.

·         Unless a process is repeated, risks are recorded and outcomes measured for future reference, any application of numbers impact and probability is educated guess work.


All the issues you have ever faced at work could have been listed in a risk catalogue beforehand. But were they?

·         Often, risk catalogues are naive and out of date.

·         It is difficult-to-impossible to identify all risks.

·         Ignorance prevents all risks being identified.

·         Fear and politics prevent all recognised risks from being documented: e.g. "project manager turns out to be ineffectual".


Managers are often:

·         unwilling to document the true degree of a politically unacceptable risk such as underestimation..

·         unable to quantify the effect of mitigating actions (e.g. more training) on reducing the impact of a risk or issue.

·         inclined to follow culture, politics and personal judgements rather than the questionable mathematics of risk analysis.


You may find your subcontractor shows you a public risk catalogue, and maintains a private version for themselves.


Beware the risky shift phenomenon

“When people are in groups, they make decision about risk differently from when they are alone.”

Corporate governance codes imply that collective decision making is a way to reduce, or at least control, risky random decisions.

You might think that is obvious, But psychology research suggests the reverse.

An individual is likely to make riskier decisions in a group, since the shared risk makes the individual risk less.


Greater risks are chosen due to a diffusion of responsibility, where emotional bonds decrease anxieties and risk is perceived as shared.

High risk-takers are more confident and hence may persuade others to take greater risks.

Social status in groups is often associated with risk-taking, leading people to avoid a low risk position.

As people pay attention to a possible action, they become more familiar and comfortable with it and hence perceive less risk.

Ref. “The Risky Shift Phenomenon”


So, if you want support for a risky decision, then present it for approval to a board full of Feelers.

In the group, accountability is reduced and each Feeler is reassured by the support of other Feelers.


Q) Can we make risk management more scientific?

Successful risk management depends on the knowledge, experience and skills of the risk identifiers and mitigators.


Applying a bureaucratic risk management procedure (e.g. see the appendix) is one thing; doing it to good effect is another.

How to be sure you have:

·         Identified the most likely risk events?

·         Accurately quantified probability and/or frequency, and impact?

·         Determine the optimal risk mitigation/avoidance and containment/recovery actions?


If you don’t test risk prediction hypotheses, and collect no evidence, then you aren't trying to be scientific.

Across your range of project types and sizes, you ought to:

1.      Catalogue and classify risk event forecasts - with frequency and impact costs for each risk event and risk class.

2.      Catalogue and classify issues that arise - with frequency and impact costs for each issue and issue class.

3.      Compare risk event forecasts against issue logs

4.      Revise your approach to 1 accordingly

5.      Look for fewer/cheaper issues in 2 and closer matches between 1 and 2.


If you do this on a systematic basis, you'll surely discover initial risk predictions are usually flaky, and many issues are never mentioned as risks.

But you surely will do risk management better next time.


Without cross-project metrics, there is no science behind risk management.
But yes, collecting metrics and interpreting them is very difficult.

The more a process is repeated, the more quantifiable the impact and probability of risks.

In bidding for one-off projects, in considering one-off risks, the quantification of impact and probability is educated guesswork.

So, you should collect anecdotal evidence that the risk management method or team adds value.


Q) Have you done any research into risks against issues?

Yes. Three general lessons might be drawn from the somewhat informal surveys I carried out.


First, if there are 100 people on an IS project, by the end of it, there are 100 different views of whether it was a success or failure, and what the issues were.

Second, when you aggregate all reports, you may well find the top three issues are:

(1)   the time needed was underestimated

(2)   the cost was underestimated

(3)   the customer-supplier relationship (however good at the start) was difficult.


Third, none of these were given due weight in the risk catalogue ahead of them turning into issues.

What about EA in particular?


Q: Is a value of an EA team to reduce “technical debt” and “lost opportunity costs”?

These metrics are rather fanciful ways to justify investment, and attaching numbers to them is generally implausible.

Still, managers do like to attach numbers (quantified by somebody else) to a decision, so they are worth bearing in mind.


Q: Is a value of an EA team to improve the planning and performance of change projects?

This is certainly a goal, though difficult to measure (as said in the related slide show).


Q: Is a value of an EA team to reduce operational issues (incidents and problems)?

More the role of IT services management than EA? It is one thing to measure problems in system operation, and another to decide what to do with those measures.

What if measurements suggest your system is over-engineered? Should you discard expensive availability and recovery solutions?


Q: So what are you saying?

The EA team should be skilled in identifying and mitigating risks to the success of system operations and to system change projects.

But remember – in this domain - risk management as much art as science.

Some risks will be missed, risk reduction has costs as well as benefits, and some risks have to be taken.




The papers on the “Enterprise Architecture” page at contain much advice relating to EA.


Footnote: Creative Commons Attribution-No Derivative Works Licence 2.0     07/02/2015 09:49

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited:” before the start and include this footnote at the end.


No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this work, not derivative works based upon it.

For more information about the licence, see