Copyright Avancier Limited. All rights reserved.
The paper on abstraction included the table below. It shows four scales in which abstraction works from the bottom up. This paper is about only one of those scales, the one called generalisation.
Abstraction by |
Omission |
Composition |
Generalisation |
Idealisation |
Abstract |
Vacuous |
Coarse-grained composite |
Universal |
Logical Concept |
|
Sketchy |
Mid-grained composite |
Fairly generic |
|
|
Elaborate |
Fine-grained composite |
Fairly specific |
|
Concrete |
Complete |
Elementary part |
Uniquely bespoke |
Physical Material |
An instance of a specialisation is at the same time an instance of all generalisations above it. A specialisation contains its generalisation in some sense. So the full description of a specialisation must be longer than the description of its generalisation. The specialised concept inherits from or extends the generalised concept.
For example, the two subtypes below extend the definition of their super type:
Similarly, the two subtypes below extend the definition of their super type:
A generalisation is often an abstract group of things, rather than a concrete composite. There may be instances of the generalisation in the real world; but you cannot address them as a group, they are not co-located, they have no manager, no owner, no physical shell around them.
Generalisation or specialisation
can be done repeatedly to create a multi-level hierarchical structure. For
example: Consider the Linnaean classification of living things.
This is an abstract composition hierarchy. A higher category in the classification is a composite of
lower categories. But more importantly, every category is a generalisation; it
defines those properties that are shared by all the specific types below it.
Its main purpose is to generalise.
Taxonomic group |
Particular general types |
|
Kingdom |
Animalia |
|
Phylum |
Chordata |
Mollusca |
Class |
Mammalia |
Gastropoda |
Order |
Primates |
Pulmonata |
Family |
Hominidae |
Arionidae |
Genus |
Homo |
… |
Species |
Homo sapiens |
Black slug |
Generalization is fundamental to the highest level of enterprise architecture, where enterprise architects define:
Generalization in enterprise architecture can be overdone. It can yield vacuous abstractions - structures that are highly reusable yet of little value in each place they are used. The more general the structure, the less helpful in designing a specific solution, the more work is left to the designers. Also, in IT architecture, generalisation is often the enemy of performance. So use it with caution.
The concept of a generic type is
also fundamental at the lowest level of data processing system design, where
software architects define:
Generalization in detailed software design
can also be overdone. See below.
One might say.
Why is differentiating composition and generalisation is so tricky? Is it because the concepts of composite and generalisation relate to the confusingly overlapping concepts of set and type? Wikipedia tells us:
“Set theory is the branch of mathematics that studies sets, which are collections of objects. Set theory, formalized using first-order logic, is the most common foundational system for mathematics. The language of set theory is used in the definitions of nearly all mathematical objects, such as functions. Elementary facts about sets and set membership can be introduced in primary school, along with Venn diagrams, to study collections of commonplace physical objects. Elementary operations such as set union and intersection can be studied in this context. More advanced concepts such as cardinality are a standard part of the undergraduate mathematics curriculum.”
“Type theory is any of several formal systems that can serve as alternatives to naive set theory. In programming language theory, type theory can refer to the design, analysis and study of type systems.… Alonzo Church, inventor of the lambda calculus, developed a higher-order logic commonly called Church's Theory of Types. Church's type theory is a … a typed lambda calculus. …In typed lambda calculi, types play a role similar to that of sets in set theory.”
Although
Wikipedia tells us that “types play a role similar to that of sets in set theory”, I gather that set theory and type theory
are somehow in competition with each other.
The concept of a type is fundamental to and ubiquitous in data processing system design.
Data type: A type that defines the properties
shared by instances of a data item. (E.g. integer, floating-point number
(decimal), and alphanumeric string.) A data type constrains the values of a
data item. It also defines the processes that can be performed on a data item
or larger data structure. Thus, a data type is an interface to a component, a
kind of service contract.
“A type system divides values into sets called types — this is called a type assignment — and makes certain program behaviors illegal on the basis of the types that are thus assigned. For example, a type system may classify the value "hello" as a string and the value 5 as a number, and prohibit the programmer from adding "hello" to 5 based on that type assignment. In this type system, the program.” Wikipedia
However, some (Chris Date for one) argue that type theory applies only to programming language design, at the small scale of universal data types. And that for enterprise database applications, set theory provides a better basis for software design. This seems to be another example of how you have to change your view of things as you move up from the small to the large.
In the loosest sense of the term, types appear in many guises as:
· a data type – defining the properties of a data item
· a set of instances in database design.
· a base class in a class hierarchy in object-oriented software design.
· a subroutine or shared service of some kind.
· a generic process structure - with generic steps like capture, validate, approve, execute, close.
· a generic data structure - with generic entities like place, party, product, property and process.
Types in mathematics are fixed concepts. A study of physics and chemistry will yield some fixed types. And in biology, the hierarchical tree of evolution is a fixed structure, describable in a cladogram. Outside of these hard sciences, types are transient. Types in the natural and business worlds are not fixed. Biologists know the Linnaean hierarchical classification of species is only an imperfect or approximate description of the structure of the biosphere as it is today – and they still tinker with it. Business managers frequently change their mind about how to classify their products and their customers, not so much tinkering as rethinking.
Inheritance is the mechanism in object-oriented programming by which one component type (or class) includes the properties of another. Inheritance is used in two different ways that correspond to two of our four kinds of abstraction.
Abstraction type |
Base class |
Inheritor Class |
Inheritance is used to |
Idealisation-realisation |
Ideal |
Real |
Implement: provide concrete
operations to implement an abstract interface |
Generalisation- specialisation |
Generic |
Specific |
Reuse: add further specific
operations to the generic operations |
Around 1990, use inheritance to increase reuse was all the rage. Around 1995, OO gurus were tending to the view that using inheritance to extend generic types was less important than using it to reify abstract interfaces. However, this paper is about generalisation rather than idealisation. Generalisation runs into some difficulties.
A reader writes:
Single inheritance among programming language classes offers a poor simulation of abstraction.
Part of the problem is that in real abstraction, you are just reducing the number of assertions you're making about things. If you always boil any type description down to a number of Boolean assertions, you can tell whether A is an abstraction of B by looking to see whether all the statements of B imply statements of A. In the simplest case, A's assertions are a subset of B's.
But in any practical programming language, there are assertions you could mistakenly make about a superclass that aren't true of its subclasses. For example if in Java, class Investment has a method GetReturn(), it would be a mistake to assert that all instances that are members of Investment have that particular GetReturn() method - there might be an override in a subclass. On the other hand, it would be correct (with Java) to say that all instances have some sort of GetReturn() method - you just can't say much about the results.
On the other hand, in a good discipline of testing, especially in an agile context where functionality is being monotonically added in successive sprints, it should normally be possible to add tests gradually without changing them. So the earlier deliverables conform to an abstraction of the spec conformed to by the later ones.
The author replies. Thanks for your observations on the risk or issue of defining a class hierarchy that isn't a proper generalisation-specialisation structure. My concerns about inheritance from a generalisation are more to do with time (evolution), size and the fuzziness of the real world.
I spent much of the 1990s telling my clients that if they want reuse – certainly in the domain where most enterprise and solution architects work - reuse by delegation of work to shared services is more useful than reuse by inheritance. I wrote several papers on this theme and co-authored book on it. Google will find them if you type in “Graham Berrisford”.
I suspect the set theory versus type theory debate underlies what surfaces in software engineering as the OO-Relational paradigm clash. And I suspect this last is rather too often a self-inflicted wound, caused by people over-engineering class hierarchies into their OO software design, where a more ‘relational’ or set-based structure would prove a more economic and flexible design.