Measures of size and complexity

This is published under the terms of the licence summarized in the footnote.

All free-to-read materials on the Avancier web site are paid for out of income from Avancier’s training courses and methods licences.

If you find them helpful, please spread the word and link to the site in whichever social media you use.

 

As W Ross Ashby said, real machines and operational systems are infinitely complex.

All we can measure are the properties we select for inclusion in our description of a system.

This paper lists some candidate metrics for system complexity.

"The diversity of measures that have been proposed indicates that the notions of complexity that we're trying to get at have many different interacting dimensions and probably can’t be captured by a single measurement scale”

"Complexity: A Guided Tour" Chapter 7 "Defining and  Measuring Complexity" Melanie Mitchell.

 

Here are a few measures from the paper below.

·         Procedural (cyclomatic) complexity = number of decisions + 1 (Mcabe)

·         Structural complexity = variety = number of states (Ashby)

·         Structural complexity = inter-component dependencies (Sessions)

·         Maximum structural complexity = components * (components - 1) / 2  (after Brooks)

·         Structural complexity = relationship types / component types (my own measure)

·         Complexification formula: For every 25% increase in the complexity of the problem space, there is a 100% increase in the complexity of the solution space (Glass’s Law).

Contents

Measuring size. 1

Measurements of process size and complexity  2

Measurements of component size and complexity  3

Reservations. 6

 

Measuring size

We can measure the physical size of system by height or weight or volume.

To a measure the logical size involves counting system elements.

But there is no one definitive measure of an operational system's size.

You can only measure size by reference to the elements in your chosen system description.

And every description is a subjective abstraction from a reality.

To compare the sizes of two human bodies, do you count organs, cells, molecules, atoms or electrons?

Whichever elements you choose to count, you are not counting some other kind of element.

 

To compare systems A and B on size – you have to standardise the elements of their descriptions.

Let us limit our attention to software systems (automated data processing systems)

You might hope it would be easy to select the description elements to be counted.

"Requirements", "use cases", “user stories” and "features" are not reliably standardised system description elements.

(Given a rule of thumb of 10 to 20 customer requirements per man year, I never had the courage to use it in real estimating.)

“Lines of code” sounds standardised, but different programmers, tackling different problems, trained differently, produce differently weighted lines of code.

And it is easy to (apparently) boost productivity by writing more lines of code.

 

All data processing systems can be reduced to actions on elementary data items.

In the COSMIC Functional Size Measurement Method, the basic measure of size is the count of data movements (in and out of the software, to and from persistent storage).

That may be the most scientifically respectable measure we have.

 

Whether a process or system is designed well or badly has an impact on size.

System A can be over-engineered, over-layered and over-distributed compared with system B.

So, a single use case in system A might involve 10 or 100 times more data item movements than would be countable in the more elegant and economical system B.

 

Huge variations can exist between programmers coding he same process or system, as reported in

Sources:

“The Psychology of Computer Programming”, Gerry Weinberg

< http://www.compaid.com/caiinternet/ezine/Schofield-LOCValue.pdf> Joe Schofield

 

Gerry Weinberg reported a study in which the slowest programmer was 22 times less productive than the fastest.

Joe Schofield compared the sizes of ten programs witten by more than 20 progammers in Java, C and Visual Basic.

(I gather nine programs were written by the 37 programmers and the tenth program by 23 others.)

He compared them in many ways; the table below shows lines of code typically varied by a factor of between 4 and 22.

 

Lines of code

Program 1

Program 2

Program 3

Program 4

Program 5

Program 6

Program 7

Program 8

Program 9

Program 10

Smallest program

13

15

20

21

22

23

25

25

27

497

Largest program

289

336

311

383

221

306

190

284

202

2044

Small/large ratio

22

22

16

18

10

13

8

11

7

4

 

The variation tends to decrease with increasing program size (though the mean and standard deviation do not follow that pattern).

But even for the largest program, the variation was a factor over 4.

Perhaps the variation between program sizes correlates with program complexity.

 

Moreover, systems A and B may make differing uses of “infrastructure” systems (message brokers, database management systems, operating systems etc). 

And what about the number of client devices and server devices the system runs on?

Should we count the infrastructure elements in the size of the system? If not, why not?

Measurements of process size and complexity

Procedure size = number of steps

We can count the number of steps in a process flow.

But what level of granularity are the process steps? If we count the process steps in an architectural specification, then we have a measure only at that level of abstraction.

 

To compare the size and complexity of different procedures, we need count process steps at the same level of granularity, perhaps the atomic level.

 

  • In software, the atomic process steps are lines of code written by programmers (the code that people have to write and maintain – not all the code in the system as a whole).
  • In a human activity system, the equivalent must be the step we can ask a human to perform without further instruction.

 

In both cases, the process steps are not defined to a bottom level of granularity and detail.

In software, the process steps are defined only to the level that can be understood and further detailed by the computer.

In human activity systems, the process steps are defined only to the level that can be understood by people with the right level of education, experience and skill.

Procedure complexity = number of decisions + 1

Sources:

Thomas McCabe reference to be obtained

Dead link? http://www.aivosto.com/project/help/pm-complexity.html

 

Cyclomatic complexity as defined by Thomas McCabe as a simple measure of the structural complexity of a procedure, and it gives useful results.

 

Decisions are conditional statements in a procedure’s control flow.

(In Visual Basic they are If..Then..Else, Select Case, For..Next, Do..Loop, While..Wend/End While, Catch and When.)

The minimum cyclomatic complexity is 1, for a procedure with no decisions.

There is no maximum value.

 

There are guidelines for the complexity of a procedure.

“The original, usual limit for a maximum acceptable value for cyclomatic complexity is 10.

Other values, such as 15 or 20, have also been suggested.

Regardless of the exact limit, if cyclomatic complexity exceeds 20, you should consider it alarming.

Procedures with a high cyclomatic complexity should be simplified or split into several smaller procedures.”

 

Anecdote: when I started life as a programmer, we commonly wrote procedures with over 1,000 lines of code and perhaps 100 decisions.

That was natural in the context, which was usually the processing of a large sequential data structure.

On the other hand, our programs contained many subroutines, and today these would surely be factored out into distinct modules (classes if you like).

 

Measurements of component size and complexity

Structural size

To measure the size of a component, we can add up all the steps in all its processes.

Then, to measure the size of a system, we can add up all the component sizes.

Alternatively, we can count the number and complexity of the services offered by the component.

Structural population

We can count the components in an architecture or in a system.

The enterprise architect of a retail business in the USA tells me that they have a business architecture model with more than 150 business functions and more than 500 information containers.

The level of granularity is unclear.

An information container is not as fine-grained as an entity in a data model, but how much larger is it – on average?

 

To know the size the system described by an architecture, we need to know the size of the gap between the composite structures in the architectural description (say a process step, or information container), and the atomic unit (the executable statement, the data item).

Structural complexity = variety = number of states (Ashby)

tbd

Structural complexity = number of dependencies (Sessions)

tbd

Structural complexity = R/C

R is the number of relationships.

C is the number of components.

R/C is very simple measure of complexity that can be applied to any component dependency diagram, entity-relationship model, box-line diagram, or node-arc structure.

For example:

  • given 50 components and 50 dependencies, the complexity measure is 1.
  • given 50 nodes and 100 arcs, the complexity measure is 2.

Functional size

Software is written at very specific level of granularity.

The atomic units of software are data items and lines of code.

Ways to measure the scale of a software system are usually underpinned by measures that involve counting these things.

  • External view - the data items input to and output from a software component.
  • Internal view - the lines of code within a software component.

 

There has been a huge amount of research into measurement of the functional size of a software application.

See Measuring software productivity for discussion of how the COSMIC FSM method has (by international agreement) replaced function point analysis.

 

What is equivalent way to measure the functional size of a human activity system?

  • External view - the products and services input to and output from the business.
  • Internal view - the steps in the business procedures followed by it employees.

 

It is difficult to imagine being able to quantify these things in a consistent way between different businesses and differently skilled human resources, but it might be possible.

Relative system complexity (RSYSC)

Card & Agresti defined system design complexity as composite measure of complexity inside procedures and between them.

It measures the complexity of a system design in terms of procedure calls, parameter passing and data use.

 

Relative system complexity = average of external complexity + internal complexity for all procedures.

 

RSYSC = average (SC + DC) over all procedures.

SC Structural complexity (the external complexity) for a procedure equals its fan-out squared = SFOUT2.

SFOUT = Structural fan-out = Number of other procedures called.

DC Data complexity (the local or internal complexity) for a procedure  = IOvars / (SFOUT + 1)

IOvars = number of input/output variables for a procedure.

 

RSYSC

  • measures the average complexity of procedures in a system.
  • does not depend on the system size.
  • has been found to be a good predictor of the error rate in a system (a high RSYSC predicts lots of errors per lines of code).

 

So, system designers should aim for a relatively low RSYSC.

This requires a good balance between procedure division, the structure of the call tree and techniques of data read/write.

 

Dead link? http://www.aivosto.com/project/help/pm-complexity.html for more details of this and some other measures?

The advice there is that system complexity is originally a design-time metric - is not suitable for the evaluation of how difficult it is to change an existing system.

Group intercommunication = n(n − 1) / 2

Source: “The Mythical Man-Month: Essays on Software Engineering” by Fred Brooks."

Fred Brooks stated that "Adding manpower to a late software project makes it later."

In a nutshell, assigning more programmers to a project running behind schedule will make it even later, due to the time required for the new programmers to learn about the project, as well as the increased communication overhead.

 

When N people have to communicate among themselves (without a hierarchy), as N increases, their output M decreases and can even become negative (i.e. the total work remaining at the end of a day is greater than the total work that had been remaining at the beginning of that day, such as when many bugs are created).

For example: given 50 developers, there are 50(50 − 1) / 2 = 1,225 channels of communication.

 

Brooks surely understated the problem, because several other variables increase with the size of a system development, and each one reduces productivity.

See More is not better.

Potential structural complexity = n(n − 1) / 2

We can adapt Fred Brooks formula to show that the potential complexity of a system’s structural rises disproportionately with the size of the structural.

For example: given 50 components there are 50(50 − 1) / 2 = 1,225 potential interdependencies between them.

For example: given100 components there are 100(100 − 1) / 2 = 4,450 potential interdependencies between them.

 

Combine this with our structural complexity formula (R/C), and you can see that adding more components disproportionately increases the potential complexity of a system.

See More is not better.

The relationship of functional size to complexity

Source: Glass’s Law comes from Robert Glass’s Fact and Fallacies About Software Engineering.

He did not discover the law. He actually described it from a paper by Scott Woodfield, but Glass did more than anybody to publicize the law.

(MIT reference to be obtained, also Turski and Lehman).

 

Glass’s Law says for every 25% increase in the complexity of the problem space, there is a 100% increase in the complexity of the solution space.

 

The formula for structural complexity of R/C (relationships divided components), supports Glass Law.

  • You can increase the number of elements in a flat list of components without increasingly structural complexity.
  • You can increase the number of elements in hierarchical structure with only a little increase in structural complexity (though the increase in top-to-bottom depth has unfortunate side effects).
  • You can’t increase the size of a network structure without greatly increasing the potential structural complexity.

 

In practice, as new features are added, the network that is the structure of a software application might well increase in complexity at the rate of Glass’ law.

But you can probably constrain the rate of complexity increase by separation of subsystems, and forcing some hierarchical structure onto subsystem inter-communication.

 

I have yet to check it out, but white papers by Roger Sessions at http://www.objectwatch.com seems relevant to this point.

A difficulty may be that the interfaces between components developed by different suppliers need to be pretty firm at the beginning.

The general trouble with top-down division into subsystems is this: it is only after you have completed the bottom level design that you realise what the higher-level subsystems should have been.

SOA size and structural complexity metrics

A SOA (at least, a software SOA) is a design at a defined level of granularity.

We can count

  • Lines of code per operation
  • Number of operations per component (or web service)
  • Number of components in a structure.

 

Is it possible to define the optimum numbers here, or a formula in which these numbers are related?

People are working on very large scale SOA systems, and coming to conclusions about the SOA complexity metrics mentioned above. MORE TO SAY HERE.

Reservations

Yes, we can measure the scale and complexity of an architecture or a system in various ways.

We can count components and the relationships between them, we can count process steps and decisions.

But it is far from clear there is an agreement about what measures to use and when.

 

Are you really interested in structural complexity measures?

There is a book that makes a thorough mathematical examination of 98 proposed measures for structural intra-modular complexity.

Horst Zuse (1991) Software Complexity. Measures and Methods. Walter de Gruyter. Berlin - New York.”

 

 

References

Ref. 1: “The Mythical Man-Month: Essays on Software Engineering” by Fred Brooks

Ref. 2: “Agile Traceability” in the Library at http://avancier.co.uk

Ref. 3: “Software is not Hardware” in the Library at http://avancier.co.uk

Ref. 4: “When agile becomes fragile” http://billyonopensource.blogspot.com/2008/04/when-agile-becomes-fragile.html

 

 

Footnote: Creative Commons Attribution-No Derivative Works Licence 2.0     08/02/2015 01:54

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.website” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this work, not derivative works based upon it.

For more information about the licence, see http://creativecommons.org