Measures of size and complexity

Copyright 2014-17 Graham Berrisford. One of more than 100 papers on the System Theory page at http://avancier.website. Last updated 07/05/2019 14:37

 

To master complexity implies knowing how to define, describe and design it.

The earlier paper Complexity explored differences in what people mean by complexity.

 

·         Complexity = state variety?  3

·         Complex = chaotic?  4

·         Complex = unpredictable?  5

·         Complex = complicated?  5

·         Complex = adaptable?  6

·         Complex = self-organising?  7

 

This paper is about measuring complexity; it lists some candidate metrics for system complexity.

Contents

The challenge. 1

Measurements of process size and complexity  2

Measurements of component size and complexity  3

Software system complexity. 6

Reservations. 8

 

The challenge

Darwinian evolution creates complexity.

The most complex systems we know of are the organic machines that have evolved in biology.

And today’s complex business systems are socio-technical machines that have evolved over centuries by incrementally extending older ones.

But we cannot wait for evolution to produce changes we want now or tomorrow.

To master complexity implies knowing how to define, describe and design it.

 

On first thinking about it, one is bound to think that complexity has something to do with:

·         variety in the parts a thing is made of (differentiation)

·         variety in relationships between those parts (integration)

 

So to assess the complexity of thing, we need to know something of its parts and their interrelationships.

Suppose we couple a hundred different parts (each one to the next) in a chain; that feels like a relatively simple system.

Suppose, instead, we couple two parts in a hundred different ways; that feels like a some what more complex system.

It seems reasonable to suggest that complexity is a measure of how the parts of a thing are related.

However, that is only to scratch the surface of what complexity can mean.

And it turns out that different people have very different ideas about it.

 

As W Ross Ashby said, everything we can see in the real world is infinitely complex.

All we can measure are the properties we select for inclusion in our description of it

And there are scores of complexity measures.

 

Here are a few complexity measures abstracted from paper that follows.

·         Structural complexity = variety = number of states (Ashby).

·         Structural complexity = inter-component dependencies (Sessions).

·         Maximum structural complexity = components * (components - 1) / 2  (Brooks).

·         Procedural (cyclomatic) complexity = number of decisions + 1 (Mcabe).

·         System complexity =  number of variables and the number and the nature of their interactions required to explain the properties and behavior of that system (Ackoff, 2003).

·         Complexification formula: For every 25% increase in the complexity of the problem space, there is a 100% increase in the complexity of the solution space (Glass’s Law).

 

I have proposed two complexity measures of my own:

·         Structural complexity = relationships between component types / component types.

·         Behavioral complexity = the number of event/state combinations * the average procedural complexity of the rules applied to them.

 

There are many more (at least 40) possible measures of complexity.

"The diversity of measures that have been proposed indicates that the notions of complexity that we're trying to get at have many different interacting dimensions and probably can’t be captured by a single measurement scale” ("Complexity: A Guided Tour" Chapter 7 "Defining and  Measuring Complexity" Melanie Mitchell.)

Measurements of process size and complexity

Procedure size = number of steps

We can count the number of steps in a process flow.

But what level of granularity are the process steps? If we count the process steps in an architectural specification, then we have a measure only at that level of abstraction.

 

To compare the size and complexity of different procedures, we need count process steps at the same level of granularity, perhaps the atomic level.

 

  • In software, the atomic process steps are lines of code written by programmers (the code that people have to write and maintain – not all the code in the system as a whole).
  • In a human activity system, the equivalent must be the step we can ask a human to perform without further instruction.

 

In both cases, the process steps are not defined to a bottom level of granularity and detail.

In software, the process steps are defined only to the level that can be understood and further detailed by the computer.

In human activity systems, the process steps are defined only to the level that can be understood by people with the right level of education, experience and skill.

Procedure complexity = number of decisions + 1

Sources:

Thomas McCabe reference to be obtained

Dead link? http://www.aivosto.com/project/help/pm-complexity.html

 

Cyclomatic complexity as defined by Thomas McCabe as a simple measure of the structural complexity of a procedure, and it gives useful results.

 

Decisions are conditional statements in a procedure’s control flow.

(In Visual Basic they are If..Then..Else, Select Case, For..Next, Do..Loop, While..Wend/End While, Catch and When.)

The minimum cyclomatic complexity is 1, for a procedure with no decisions.

There is no maximum value.

 

There are guidelines for the complexity of a procedure.

“The original, usual limit for a maximum acceptable value for cyclomatic complexity is 10.

Other values, such as 15 or 20, have also been suggested.

Regardless of the exact limit, if cyclomatic complexity exceeds 20, you should consider it alarming.

Procedures with a high cyclomatic complexity should be simplified or split into several smaller procedures.”

 

Anecdote: when I started life as a programmer, we commonly wrote procedures with over 1,000 lines of code and perhaps 100 decisions.

That was natural in the context, which was usually the processing of a large sequential data structure.

On the other hand, our programs contained many subroutines, and today these would surely be factored out into distinct modules (classes if you like).

 

Measurements of component size and complexity

Structural size

To measure the size of a component, we can add up all the steps in all its processes.

Then, to measure the size of a system, we can add up all the component sizes.

Alternatively, we can count the number and complexity of the services offered by the component.

Structural population

We can count the components in an architecture or in a system.

The enterprise architect of a retail business in the USA tells me that they have a business architecture model with more than 150 business functions and more than 500 information containers.

The level of granularity is unclear.

An information container is not as fine-grained as an entity in a data model, but how much larger is it – on average?

 

To know the size the system described by an architecture, we need to know the size of the gap between the composite structures in the architectural description (say a process step, or information container), and the atomic unit (the executable statement, the data item).

Structural complexity = variety = number of states (Ashby)

Ashby said that: “a system is any set of variables which he [the observer] selects”. 

“A system's variety V measures the number of possible states it can exhibit, and corresponds to the number of independent binary variables.

But in general, the variables used to describe a system are neither binary nor independent.”

 

Ashby proposed complexity = variety = the number of possible states a system can exhibit.

There are several difficulties with this definition of complexity:

 

First, the measure is incalculably large for any system with a non-trivial state.

 

Second, many other measures have been proposed: e.g. see the measures in the next section.

 

Third, does the measure apply to the control system, the target system, or the real machine, of which only selected variables are controlled?

every real machine embodies no less than an infinite number of variables,

all but a few of which must of necessity be ignored.” (Ashby in “Design for a Brain”)

Structural complexity = number of dependencies (Sessions)

tbd

Structural complexity = R/C

R is the number of relationships.

C is the number of components.

R/C is very simple measure of complexity that can be applied to any component dependency diagram, entity-relationship model, box-line diagram, or node-arc structure.

For example:

  • given 50 components and 50 dependencies, the complexity measure is 1.
  • given 50 nodes and 100 arcs, the complexity measure is 2.

Functional size

Software is written at very specific level of granularity.

The atomic units of software are data items and lines of code.

Ways to measure the scale of a software system are usually underpinned by measures that involve counting these things.

  • External view - the data items input to and output from a software component.
  • Internal view - the lines of code within a software component.

 

There has been a huge amount of research into measurement of the functional size of a software application.

See Measuring software productivity for discussion of how the COSMIC FSM method has (by international agreement) replaced function point analysis.

 

What is equivalent way to measure the functional size of a human activity system?

  • External view - the products and services input to and output from the business.
  • Internal view - the steps in the business procedures followed by it employees.

 

It is difficult to imagine being able to quantify these things in a consistent way between different businesses and differently skilled human resources, but it might be possible.

Relative system complexity (RSYSC)

Card & Agresti defined system design complexity as composite measure of complexity inside procedures and between them.

It measures the complexity of a system design in terms of procedure calls, parameter passing and data use.

 

Relative system complexity = average of external complexity + internal complexity for all procedures.

 

RSYSC = average (SC + DC) over all procedures.

SC Structural complexity (the external complexity) for a procedure equals its fan-out squared = SFOUT2.

SFOUT = Structural fan-out = Number of other procedures called.

DC Data complexity (the local or internal complexity) for a procedure  = IOvars / (SFOUT + 1)

IOvars = number of input/output variables for a procedure.

 

RSYSC

  • measures the average complexity of procedures in a system.
  • does not depend on the system size.
  • has been found to be a good predictor of the error rate in a system (a high RSYSC predicts lots of errors per lines of code).

 

So, system designers should aim for a relatively low RSYSC.

This requires a good balance between procedure division, the structure of the call tree and techniques of data read/write.

 

Dead link? http://www.aivosto.com/project/help/pm-complexity.html for more details of this and some other measures?

The advice there is that system complexity is originally a design-time metric - is not suitable for the evaluation of how difficult it is to change an existing system.

Group intercommunication = n(n − 1) / 2

Source: “The Mythical Man-Month: Essays on Software Engineering” by Fred Brooks."

Fred Brooks stated that "Adding manpower to a late software project makes it later."

In a nutshell, assigning more programmers to a project running behind schedule will make it even later, due to the time required for the new programmers to learn about the project, as well as the increased communication overhead.

 

When N people have to communicate among themselves (without a hierarchy), as N increases, their output M decreases and can even become negative (i.e. the total work remaining at the end of a day is greater than the total work that had been remaining at the beginning of that day, such as when many bugs are created).

For example: given 50 developers, there are 50(50 − 1) / 2 = 1,225 channels of communication.

 

Brooks surely understated the problem, because several other variables increase with the size of a system development, and each one reduces productivity.

See More is not better.

Potential structural complexity = n(n − 1) / 2

We can adapt Fred Brooks formula to show that the potential complexity of a system’s structural rises disproportionately with the size of the structural.

For example: given 50 components there are 50(50 − 1) / 2 = 1,225 potential interdependencies between them.

For example: given100 components there are 100(100 − 1) / 2 = 4,450 potential interdependencies between them.

 

Combine this with our structural complexity formula (R/C), and you can see that adding more components disproportionately increases the potential complexity of a system.

See More is not better.

The relationship of functional size to complexity

Source: Glass’s Law comes from Robert Glass’s Fact and Fallacies About Software Engineering.

He did not discover the law. He actually described it from a paper by Scott Woodfield, but Glass did more than anybody to publicize the law.

(MIT reference to be obtained, also Turski and Lehman).

 

Glass’s Law says for every 25% increase in the complexity of the problem space, there is a 100% increase in the complexity of the solution space.

 

The formula for structural complexity of R/C (relationships divided components), supports Glass Law.

  • You can increase the number of elements in a flat list of components without increasingly structural complexity.
  • You can increase the number of elements in hierarchical structure with only a little increase in structural complexity (though the increase in top-to-bottom depth has unfortunate side effects).
  • You can’t increase the size of a network structure without greatly increasing the potential structural complexity.

 

In practice, as new features are added, the network that is the structure of a software application might well increase in complexity at the rate of Glass’ law.

But you can probably constrain the rate of complexity increase by separation of subsystems, and forcing some hierarchical structure onto subsystem inter-communication.

 

I have yet to check it out, but white papers by Roger Sessions at http://www.objectwatch.com seems relevant to this point.

A difficulty may be that the interfaces between components developed by different suppliers need to be pretty firm at the beginning.

The general trouble with top-down division into subsystems is this: it is only after you have completed the bottom level design that you realise what the higher-level subsystems should have been.

SOA size and structural complexity metrics

A SOA (at least, a software SOA) is a design at a defined level of granularity.

We can count

  • Lines of code per operation
  • Number of operations per component (or web service)
  • Number of components in a structure.

 

Is it possible to define the optimum numbers here, or a formula in which these numbers are related?

People are working on very large scale SOA systems, and coming to conclusions about the SOA complexity metrics mentioned above.

 

Software system complexity

Let us limit our attention to software systems.

You might hope it would be easy to select the description elements to be counted.

"Requirements", "use cases", “user stories” and "features" are not reliably standardised system description elements.

(Given a rule of thumb of 10 to 20 customer requirements per man year, I never had the courage to use it in real estimating.)

“Lines of code” sounds standardised, but different programmers, tackling different problems, trained differently, produce differently weighted lines of code.

And it is easy to (apparently) boost productivity by writing more lines of code.

 

All data processing systems can be reduced to actions on elementary data items.

In the COSMIC Functional Size Measurement Method, the basic measure of size is the count of data movements (in and out of the software, to and from persistent storage).

That may be the most scientifically respectable measure we have.

 

Whether a process or system is designed well or badly has an impact on size.

System A can be over-engineered, over-layered and over-distributed compared with system B.

So, a single use case in system A might involve 10 or 100 times more data item movements than would be countable in the more elegant and economical system B.

 

Huge variations can exist between programmers coding he same process or system, as reported in

Sources:

“The Psychology of Computer Programming”, Gerry Weinberg

< http://www.compaid.com/caiinternet/ezine/Schofield-LOCValue.pdf> Joe Schofield

 

Gerry Weinberg reported a study in which the slowest programmer was 22 times less productive than the fastest.

Joe Schofield compared the sizes of ten programs witten by more than 20 progammers in Java, C and Visual Basic.

(I gather nine programs were written by the 37 programmers and the tenth program by 23 others.)

He compared them in many ways; the table below shows lines of code typically varied by a factor of between 4 and 22.

 

Lines of code

Program 1

Program 2

Program 3

Program 4

Program 5

Program 6

Program 7

Program 8

Program 9

Program 10

Smallest program

13

15

20

21

22

23

25

25

27

497

Largest program

289

336

311

383

221

306

190

284

202

2044

Small/large ratio

22

22

16

18

10

13

8

11

7

4

 

The variation tends to decrease with increasing program size (though the mean and standard deviation do not follow that pattern).

But even for the largest program, the variation was a factor over 4.

Perhaps the variation between program sizes correlates with program complexity.

 

Moreover, systems A and B may make differing uses of “infrastructure” systems (message brokers, database management systems, operating systems etc). 

And what about the number of client devices and server devices the system runs on?

Should we count the infrastructure elements in the size of the system? If not, why not?

Reservations

Yes, we can measure the scale and complexity of an architecture or a system in various ways.

We can count components and the relationships between them, we can count process steps and decisions.

But it is far from clear there is an agreement about what measures to use and when.

 

There is a book that makes a thorough mathematical examination of 98 proposed measures for structural intra-modular complexity.

Horst Zuse (1991) Software Complexity. Measures and Methods. Walter de Gruyter. Berlin - New York.”

 

 

References

Ref. 1: “The Mythical Man-Month: Essays on Software Engineering” by Fred Brooks

Ref. 2: “Agile Traceability” in the Library at http://avancier.co.uk

Ref. 3: “Software is not Hardware” in the Library at http://avancier.co.uk

Ref. 4: “When agile becomes fragile” http://billyonopensource.blogspot.com/2008/04/when-agile-becomes-fragile.html

 

 

Footnote: Creative Commons Attribution-No Derivative Works Licence 2.0     07/05/2019 14:37

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.website” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this work, not derivative works based upon it.

For more information about the licence, see http://creativecommons.org