Measures of size and complexity
Copyright
2014-17 Graham Berrisford. One of more than 100 papers on the
System Theory page at http://avancier.website. Last updated 07/05/2019 14:37
To master complexity implies knowing how to
define, describe and design it.
The
earlier paper Complexity explored
differences in what people mean by complexity.
· Complexity = state variety?
· Complex = chaotic?
· Complex = unpredictable?
· Complex = complicated?
· Complex = adaptable?
· Complex = self-organising?
This paper is about measuring complexity; it lists some candidate metrics for system complexity.
Contents
Measurements
of process size and complexity
Measurements
of component size and complexity
Darwinian evolution creates complexity.
The most complex systems we know of are the organic machines
that have evolved in biology.
And today’s complex business systems are socio-technical
machines that have evolved over centuries by incrementally extending older
ones.
But we cannot wait for evolution to produce changes we want
now or tomorrow.
To master complexity implies knowing how to
define, describe and design it.
On first thinking about it, one is bound to think that complexity has something to do with:
·
variety in the
parts a thing is made of (differentiation)
·
variety in
relationships between those parts (integration)
So to assess the complexity of thing, we need to know
something of its parts and their interrelationships.
Suppose we couple a hundred different parts (each one to the
next) in a chain; that feels like a relatively simple system.
Suppose, instead, we couple two parts in a hundred different
ways; that feels like a some what more complex
system.
It
seems reasonable to suggest that complexity is a measure
of how the parts of a thing are related.
However,
that is only to scratch the surface of what
complexity can mean.
And
it turns out that different people have very different ideas about it.
As W Ross Ashby said, everything we can see in the real world is infinitely complex.
All we can measure are the properties we select for inclusion in our description of it
And there are scores of complexity measures.
Here are a few complexity measures abstracted from paper that follows.
· Structural complexity = variety = number of states (Ashby).
· Structural complexity = inter-component dependencies (Sessions).
· Maximum structural complexity = components * (components - 1) / 2 (Brooks).
· Procedural (cyclomatic) complexity = number of decisions + 1 (Mcabe).
· System complexity = number of variables and the number and the nature of their interactions required to explain the properties and behavior of that system (Ackoff, 2003).
· Complexification formula: For every 25% increase in the complexity of the problem space, there is a 100% increase in the complexity of the solution space (Glass’s Law).
I have proposed two complexity measures of my own:
· Structural complexity = relationships between component types / component types.
· Behavioral complexity = the number of event/state combinations * the average procedural complexity of the rules applied to them.
There are many more (at least 40) possible measures of complexity.
"The diversity of measures that have been proposed indicates that the notions of complexity that we're trying to get at have many different interacting dimensions and probably can’t be captured by a single measurement scale” ("Complexity: A Guided Tour" Chapter 7 "Defining and Measuring Complexity" Melanie Mitchell.)
We can count the number of steps in a process flow.
But what level of granularity are the process steps? If we count the process steps in an architectural specification, then we have a measure only at that level of abstraction.
To compare the size and complexity of different procedures, we need count process steps at the same level of granularity, perhaps the atomic level.
In both cases, the process steps are not defined to a bottom level of granularity and detail.
In software, the process steps are defined only to the level that can be understood and further detailed by the computer.
In human activity systems, the process steps are defined only to the level that can be understood by people with the right level of education, experience and skill.
Sources:
Thomas McCabe reference to be obtained
Dead link? http://www.aivosto.com/project/help/pm-complexity.html
Cyclomatic complexity as defined by Thomas McCabe as a simple measure of the structural complexity of a procedure, and it gives useful results.
Decisions are conditional statements in a procedure’s control flow.
(In Visual Basic
they are If..Then..Else,
Select Case, For..Next, Do..
The minimum cyclomatic complexity is 1, for a procedure with no decisions.
There is no maximum value.
There are guidelines for the complexity of a procedure.
“The original, usual limit for a maximum acceptable value for cyclomatic complexity is 10.
Other values, such as 15 or 20, have also been suggested.
Regardless of the exact limit, if cyclomatic complexity exceeds 20, you should consider it alarming.
Procedures with a high cyclomatic complexity should be simplified or split into several smaller procedures.”
Anecdote: when I started life as a programmer, we commonly wrote procedures with over 1,000 lines of code and perhaps 100 decisions.
That was natural in the context, which was usually the processing of a large sequential data structure.
On the other hand, our programs contained many subroutines, and today these would surely be factored out into distinct modules (classes if you like).
To measure the size of a component, we can add up all the steps in all its processes.
Then, to measure the size of a system, we can add up all the component sizes.
Alternatively, we can count the number and complexity of the services offered by the component.
We can count the components in an architecture or in a system.
The enterprise
architect of a retail business in the
The level of granularity is unclear.
An information container is not as fine-grained as an entity in a data model, but how much larger is it – on average?
To know the size the system described by an architecture, we need to know the size of the gap between the composite structures in the architectural description (say a process step, or information container), and the atomic unit (the executable statement, the data item).
Ashby said that: “a system is any set of variables which he [the observer] selects”.
“A system's variety V measures the number of possible states it can exhibit, and corresponds to the number of independent binary variables.
But in general, the variables used to describe a system are neither binary nor independent.”
Ashby proposed complexity = variety = the number of possible states a system can exhibit.
There are several difficulties with this
definition of complexity:
First, the measure is incalculably large
for any system with a non-trivial state.
Second, many other measures have been
proposed: e.g. see the measures in the next section.
Third, does the measure apply to the control system, the target system, or the real machine, of which only selected variables are controlled?
“every real machine embodies no less than an infinite number of variables,
all but a few of which must of necessity be ignored.” (Ashby in “Design for a Brain”)
tbd
R is the number of relationships.
C is the number of components.
R/C is very simple measure of complexity that can be applied to any component dependency diagram, entity-relationship model, box-line diagram, or node-arc structure.
For example:
Software is written at very specific level of granularity.
The atomic units of software are data items and lines of code.
Ways to measure the scale of a software system are usually underpinned by measures that involve counting these things.
There has been a huge amount of research into measurement of the functional size of a software application.
See Measuring software productivity for discussion of how the COSMIC FSM method has (by international agreement) replaced function point analysis.
What is equivalent way to measure the functional size of a human activity system?
It is difficult to imagine being able to quantify these things in a consistent way between different businesses and differently skilled human resources, but it might be possible.
Card & Agresti defined system design complexity as composite measure of complexity inside procedures and between them.
It measures the complexity of a system design in terms of procedure calls, parameter passing and data use.
Relative system complexity = average of external complexity + internal complexity for all procedures.
RSYSC = average (SC + DC) over all procedures.
SC Structural complexity (the external complexity) for a procedure equals its fan-out squared = SFOUT2.
SFOUT = Structural fan-out = Number of other procedures called.
DC Data complexity (the local or internal complexity) for a procedure = IOvars / (SFOUT + 1)
IOvars = number of input/output variables for a procedure.
RSYSC
So, system designers should aim for a relatively low RSYSC.
This requires a good balance between procedure division, the structure of the call tree and techniques of data read/write.
Dead link? http://www.aivosto.com/project/help/pm-complexity.html for more details of this and some other measures?
The advice there is that system complexity is originally a design-time metric - is not suitable for the evaluation of how difficult it is to change an existing system.
Source: “The Mythical Man-Month: Essays on Software Engineering” by Fred Brooks."
Fred Brooks stated that "Adding manpower to a late software project makes it later."
In a nutshell, assigning more programmers to a project running behind schedule will make it even later, due to the time required for the new programmers to learn about the project, as well as the increased communication overhead.
When N people have to communicate among themselves (without a hierarchy), as N increases, their output M decreases and can even become negative (i.e. the total work remaining at the end of a day is greater than the total work that had been remaining at the beginning of that day, such as when many bugs are created).
For example: given 50 developers, there are 50(50 − 1) / 2 = 1,225 channels of communication.
Brooks surely understated the problem, because several other variables increase with the size of a system development, and each one reduces productivity.
See More is not better.
We can adapt Fred Brooks formula to show that the
potential complexity of a system’s structural rises disproportionately with the
size of the structural.
For example: given 50 components there are 50(50 − 1) / 2 = 1,225 potential interdependencies between them.
For example: given100 components there are 100(100 − 1) / 2 = 4,450 potential interdependencies between them.
Combine this with our structural complexity formula (R/C), and you can see that adding more components disproportionately increases the potential complexity of a system.
See More is not better.
Source: Glass’s Law comes
from Robert Glass’s Fact and Fallacies About Software Engineering.
He did not discover the law.
He actually described it from a paper by Scott Woodfield,
but Glass did more than anybody to publicize the law.
(MIT reference to be obtained, also Turski and Lehman).
Glass’s Law says for every 25% increase in the complexity of the problem space, there is a 100% increase in the complexity of the solution space.
The formula for structural complexity of R/C (relationships divided components), supports Glass Law.
In practice, as new features are added, the network that is the structure of a software application might well increase in complexity at the rate of Glass’ law.
But you can probably constrain the rate of complexity increase by separation of subsystems, and forcing some hierarchical structure onto subsystem inter-communication.
I have yet to check it out, but white papers by Roger Sessions at http://www.objectwatch.com seems relevant to this point.
A difficulty may be that the interfaces between components developed by different suppliers need to be pretty firm at the beginning.
The general trouble with top-down division into subsystems is this: it is only after you have completed the bottom level design that you realise what the higher-level subsystems should have been.
A SOA (at least, a software SOA) is a design at a defined level of granularity.
We can count
Is it possible to define the optimum numbers here, or a formula in which these numbers are related?
People are working on very large scale SOA systems, and coming to conclusions about the SOA complexity metrics mentioned above.
Let us limit our attention to software systems.
You might hope it would be easy to select the description elements to be counted.
"Requirements", "use cases", “user stories” and "features" are not reliably standardised system description elements.
(Given a rule of thumb of 10 to 20 customer requirements per man year, I never had the courage to use it in real estimating.)
“Lines of code” sounds standardised, but different programmers, tackling different problems, trained differently, produce differently weighted lines of code.
And it is easy to (apparently) boost productivity by writing more lines of code.
All data processing systems can be reduced to actions on elementary data items.
In the COSMIC Functional Size Measurement Method, the basic measure of size is the count of data movements (in and out of the software, to and from persistent storage).
That may be the most scientifically respectable measure we have.
Whether a process or system is designed well or badly has an impact on size.
System A can be over-engineered, over-layered and over-distributed compared with system B.
So, a single use case in system A might involve 10 or 100 times more data item movements than would be countable in the more elegant and economical system B.
Huge variations can exist between programmers coding he same process or system, as reported in
Sources:
“The Psychology of Computer Programming”, Gerry Weinberg
<
http://www.compaid.com/caiinternet/ezine/Schofield-LOCValue.pdf> Joe
Schofield
Gerry Weinberg reported a study in which the slowest programmer was 22 times less productive than the fastest.
Joe Schofield compared the sizes of ten programs witten by more than 20 progammers in Java, C and Visual Basic.
(I gather nine programs were written by the 37 programmers and the tenth program by 23 others.)
He compared them in many ways; the table below shows lines of code typically varied by a factor of between 4 and 22.
Lines of code |
Program 1 |
Program 2 |
Program 3 |
Program 4 |
Program 5 |
Program 6 |
Program 7 |
Program 8 |
Program 9 |
Program 10 |
Smallest
program |
13 |
15 |
20 |
21 |
22 |
23 |
25 |
25 |
27 |
497 |
Largest
program |
289 |
336 |
311 |
383 |
221 |
306 |
190 |
284 |
202 |
2044 |
Small/large
ratio |
22 |
22 |
16 |
18 |
10 |
13 |
8 |
11 |
7 |
4 |
The variation tends to decrease with increasing program size (though the mean and standard deviation do not follow that pattern).
But even for the largest program, the variation was a factor over 4.
Perhaps the variation between program sizes correlates with program complexity.
Moreover, systems A and B may make differing uses of “infrastructure” systems (message brokers, database management systems, operating systems etc).
And what about the number of client devices and server devices the system runs on?
Should we count the infrastructure elements in the size of the system? If not, why not?
Yes, we can measure the scale and complexity of an architecture or a system in various ways.
We can count components and the relationships between them, we can count process steps and decisions.
But it is far from clear there is an agreement about what measures to use and when.
There is a book that makes a thorough mathematical examination of 98 proposed measures for structural intra-modular complexity.
Horst Zuse
(1991) Software Complexity. Measures and Methods. Walter de Gruyter.
References
Ref. 1: “The Mythical Man-Month: Essays on Software Engineering” by Fred Brooks
Ref. 2: “Agile Traceability” in the Library at http://avancier.co.uk
Ref. 3: “Software is not Hardware” in the Library at http://avancier.co.uk
Ref. 4: “When agile becomes fragile” http://billyonopensource.blogspot.com/2008/04/when-agile-becomes-fragile.html
Footnote:
Creative Commons Attribution-No Derivative Works Licence 2.0 07/05/2019 14:37
Attribution: You may copy,
distribute and display this copyrighted work only if you clearly credit
“Avancier Limited: http://avancier.website”
before the start and include this footnote at the end.
No Derivative Works:
You may copy, distribute, display only complete and verbatim copies of this
work, not derivative works based upon it.
For more information about the licence, see http://creativecommons.org