This booklet is published under the terms of the licence summarized in footnote 1.
This part catalogues patterns in the relationships between entities. It reveals the rules of thumb and business analysis questions triggered by pattern recognition. It continues from where Part one finishes, starting on ground that is relatively firm and finishing with some more speculative suggestions.
Using patterns and model transformations to get the constraints right.
This paper highlights the importance of the getting the constraints right. It introduces a catalogue of standard structural shapes and the notion of entity model transformations. These ideas are developed in later chapters.
The success of a system depends on the relationships between entities being specified correctly. If they are not, then it becomes:
· easier for useless data to get into the system
· harder for programmers to locate the information they need to find.
To prevent the above difficulties from occurring, to sharpen up the act of analysts, and to save time and effort, you can apply some simple quality assurance techniques. One technique is enquiry access path analysis. This means defining the route by which required information is extracted from the model.
Another is pattern analysis, which has the advantage that it can be applied with less detailed knowledge of the required outputs. Fortunately, there are recognisable patterns and questions that lead you to transform an intuitive or poor design into a well-engineered design. These patterns help you to raise the quality of analysis and design work, and thereby improve the quality of the resulting systems.
The figure below shows a pattern called the double-V shape.
The basic pattern can be obscured by intermediate entities.
The figure below includes a double-V shape, even though the
Yes, a Holiday Booking is made to meet the Client
Requirement for the Feature Type that classifies the
There is a quality benefit. It is now impossible for users
to create a Holiday Booking for a Client who has not expressed an interest in
the Feature Type of the
There is a productivity benefit. Programmers do not have to navigate around the model to find the relevant Client Requirement for a Holiday Booking, or sort Holiday Bookings by Client Requirement within Client.
Perhaps, but you cannot rely on any one technique to reveal everything. See the Chapters on <Model transformations> for further examples.
Patterns occur in various kinds of specification, but among the simplest and most widely useful are those that involve the specification of relationships between entities. The figure below is an attempt to summarise and name the entity model shapes that I am most interested in. The arrows show some of the possible transformations.
Our pattern names include: parent-child, V, level, bridge, relation, diamond, triangle, double-V and Y shapes, double and triple Y shapes, tramlines, X shapes and recursive shapes. The Chapters on <Model transformations> shows the transformations in the first row apply to both data models and process models. Other Chapters (not yet collected into a volume) detail the shapes and transformations indicated by arrows on the diagram above.
Many people teach the mechanics of how to document system specifications. Plenty of CASE tools help you with these mechanics; they ensure you get the syntax right; they constrain you to use the proper boxes, symbols and lines. But there are no tools that help you with the semantics - the difficult part - the thinking - the analysis
The skill of professional analysts lies in recognising patterns in specifications, especially in object and event models. They use standard shapes constructively to build up a large and complex picture. They also use them destructively, to analyse an existing specification, to take it to pieces and question whether a better construction should be put upon these pieces.
The patterns catalogue above is a chart of simple shapes with memorable names. I have listed the questions you should ask about each shape, and the possible transformations that you might need to apply. So I can now teach students:
· the name, meaning and use of a pattern in constructing a specification
· the analysis questions which discovery of the pattern prompts
· the design or redesign work that is necessary depending on the answers.
This new approach means that for the first time I can envisage a CASE tool that helps us with the thinking part of analysis and specification. It will help us to build better quality systems, not just better documented systems.
A pattern on its own isn’t much help. What to do with the pattern that has been recognised? This is the expert knowledge I want to capture. A tool can highlight or report on patterns, and prompt its user to answer specific quality assurance questions.
If a CASE tool is to ask us questions about patterns, it must first have the appropriate pattern recognition functions. To recognise the named patterns, the entity at one end of each relationship must be declared as the ‘parent’, and the other must be the ‘child’. E.g. given a one-to-many relationship I always nominate the ‘one’ end to be the parent. It is the parent-child hierarchy inherent in each relationship that makes the shapes recognisable, whether by a person or by a tool.
For people, always drawing the parent above the child imposes a hierarchical structure that helps us to display the known patterns in an easily recognisable form, corresponding to the shapes in the analysis patterns catalogue.
Of course, patterns are careless of the diagram symbols or the presentation form. As long as each relationship has parent and child ends, a CASE tool can detect a pattern if the model is drawn upside-down, or using different symbols, or written down in the form of text or code.
Mike Burrows has developed a CASE tool called Validator (see <www.asplake.demon.co.uk) that detects and reports on most of the structural patterns I have catalogued. It asks you analysis questions, and suggests some transformations that may improve your model. See the Chapters on <Model transformations> for further details.
I earlier applied various transformations to a relational database
design from Halpin. Some of the same transformations appear in a classification
developed by Petia Wohed (neé Assenova) and Paul Johannesson at
Petia and Paul set out with the intention of making schemas more graphical, to make the rules fully explicit for the purpose of schema integration. Schema integration is a different job from schema design, so I will add comments and guidance from the view point of somebody who designs schemas for enterprise applications.
Also, their modelling language is different from ours. Notable differences are listed below:
attribute or relationship (see below)
attribute or 1:1 relationship
parent-child relationship (from one master object to many child objects)
partial or total
optional or mandatory
total in union
at least one must exist
parent-child relationship with at least one child
Given an entity with an optional group of attributes, you may move the optional attribute group into a subclass where it is mandatory.
Petia and Paul discuss thus under ‘transforming partial attributes’. Figure 5a illustrates their example.
Figure 5b shows an example from chapter 4. The entity Paper has attributes that only apply to papers accepted for presentation. So you may move the optional attributes (Total Pages, Total Figures, Total Tables) into a subclass where they are all mandatory.
Chapter 6 suggests that people normally do the reverse in practical system design. They roll up optional data groups into an aggregate entity, partly to reduce the length and complexity of access paths by events and enquiries, and partly for other reasons explored in Later chapters.
Given a parent-child relationship that is optional from the master’s view point, you can make it mandatory by replacing the child by a subclass of itself.
P&P call this ‘transforming non-surjective attributes’. An attribute is ‘surjective’ when each instance of its range (the master entity) is associated with at least one instance of its domain (the child entity). So a surjective attribute is an attribute whose inverse is a mandatory relationship.
Figure 5c illustrates their example.
After the transformation, each object of the parent entity is associated with at least one object of the child entity. In this example, the relationship starts optional at both ends and becomes mandatory at both ends.
Again, people often do the reverse in practical system design. If a superclass has only one subclass, they would roll the data of the subclass up into the superclass, partly to reduce the length and complexity of access paths by events and enquiries, and partly for other reasons explored in Later chapters.
Given optional attributes that are mutually exclusive, so at least one must exist, you can introduce a generalised attribute, a superclass of the mutually exclusive attributes.
P&P call this ‘transforming partial attributes which are total in union’. Figure 5d illustrates their example.
Figure 5e shows a different convention in database design - to turn the mutually exclusive attributes into mutually exclusive relationships.
Later chapters explores the difference between a ‘class hierarchy’ as in figure 5d and an ‘aggregate’ as in figure 5e.
Given several parent-child relationships that are optional from the parent’s view point, but where at least one must exist, you can make them mandatory by introducing a superclass of the various children.
P&P call this ‘transforming non-surjective attributes which are total in union’. Figure 5f illustrates their example, where a Head Teacher is obliged to take responsibility for at least one course.
This transformation is unusual in practical enterprise application development. The requirement that a parent must have at least one child drawn from different types is not very common.
Designers are likely to apply the reverse transformation, that is, relax an ‘at least one’ constraint after it has been defined, because where a business monitors hundreds or thousands of objects, it is normally easy come up with counter examples, valid exceptions to the rule.
P&P call this ‘transforming m-m attributes’. Figure 5g illustrates their example.
This transformation is very common in practical enterprise application development. So common that it is second nature to database system developers, not just because it is required for implementation reasons, but because resolving many-to-many relationships is a valuable step in analysis. See chapter 5 for further discussion.
Given an entity with non-key attributes, you can raise any attribute other than the primary key to become a parent entity connected by a 1:N relationship.
P&P call this ‘transforming lexical attributes’. Figure 5h illustrates their example.
This is very common in practical enterprise application development. But why and when to do this?
Let us focus on a tiny part of the model at the end of chapter 4. Figure 5i shows the non-key attributes of the Room entity raised to become key-only parent entities.
The best kind of analysis pattern prompts ‘Ask of this pattern…’ questions.
E.g. Room Type is constrained (lab,lec,office) and Building is constrained (1...5). You can prevent mistaken classification of a Room under an invalid entity by defining these values as objects of a parent entity.
If no, nobody wants to control the range of values, then don’t make it a parent entity. E.g. Say nobody cares too much about what is recorded as a Area. The Area entities are derivable from whatever values happen to be recorded.
E.g. Users might want to control the range of Room Types (lab,lec,office).
If no, or you want to stop users from change the system’s rules by adding or deleting objects of the class, then define the parent entity in a layer of the design controlled by designers. E.g. you might define Building as a class in the UI layer or a table in the data storage structure.
E.g. you might record the total number of Rooms as an attribute of the Room Type entity. Even a derivable total like this turns the key-only parent entity into an entity like any other.
If no, then you may later treat the key-only parent entity differently from other entities in the data storage structure, perhaps define it as an index rather than a table.
Given an entity with an attribute that has a small fixed range of values, you may transform the fixed range into distinct sub entities.
P&P call this ‘transforming attributes with fixed ranges’. Figure 5j illustrates their example.
Given the attribute Room Type (lab, lecture room, office) in the case study in chapter 3, you may transform the fixed range into distinct subclasses.
This transformation is rare in practical enterprise application development, for reasons explored in Later chapters.
Given a class hierarchy in which subclasses share properties in an orthogonal dimension, you can create a class network.
P&P call this ‘transforming to lattice structures’. Figure 5l illustrates their example.
Later chapters explores this transformation in more child, but a little of the discussion is repeated below.
Figure 5m shows that a data structure in which Class Teacher and Head Teacher inherit from Teacher might be extended to include a subclass that inherits from both Class Teacher and Head Teacher.
Figure 5m shows a diamond-shaped is-a tree in which a Dual Role Teacher entity has been introduced to accommodate the few Teachers that are both Head Teacher and Class Teacher
· A Dual Role Teacher is a Head Teacher is a Teacher.
· A Dual Role Teacher is a Class Teacher is a Teacher.
Defining a diamond-shaped is-a tree may be a recognised practice in object-oriented languages that support multiple inheritance, but one should be aware that the meaning of the model is ambiguous, in the way described below.
The model does not specify the rule that a Dual Role Teacher is a single Teacher. It might equally well be read to imply that two Teacher objects are needed instantiate one Dual Role Teacher object.
One way or another, an object-oriented programming environment that allows multiple inheritance must work out that Dual Role Teacher inherits only once from Teacher. But the semantics of the diagram notation don’t tell you this, and we want the conceptual model to act as a specification for relational database programmers as well as object-oriented programmers.
Diamond shaped structures are discussed further in Part Two.
Where an entity has a list of similar attributes, you can generalise these attributes into a relationship. P&P call this ‘transforming non-unary attributes’. Figure 5n illustrates their example.
Once again, this transformation is not very common in practical system design. Figure 5o shows two more common transformations discussed in Later chapters.
Schema integration v. schema design
Petia has commented as follows.
Petia and Paul are interested in these transformations for the purpose of schema integration. Making things visible makes the process of schema integration easier. If you plan to merge two schemas, you do need to make all the current rules fully explicit.
But note that schema integration is a one-off exercise. You can be confident that the range of a type, the instances of a class, the rules of the business, will not change while you are working.
Building a conceptual entity model for long term use is a different matter. The model has to hold object data for years. It has to survive while objects are created, amended and destroyed, while the ranges of apparently fixed values are altered, while the rules evolve.
This gives the modeller a different perspective. The modeller will try to avoid fixing temporary rules (like a range of subclasses) into the data structure. I tend to avoid creating class hierarchies for this and the other reasons explored in Later chapters.
The important thing is to record the semantics of the problem domain, one way of another. Different diagram drawing conventions lead you to draw different-looking conceptual models.
Some people like to represent every term and every fact in a box of its own. You might specify each fact about an object by drawing a rectangle. You might place a rectangle on each and every line between one named term and another named term. Figure 5p shows the kind of diagram that results from this.
There is no law saying you have to represent every attribute or relationship in a rectangle. Doing this usually creates a diagram that is far too large for practical use.
When you are building an enterprise application with perhaps 2,000 data items; you cannot handle a picture that shows every data item in a box (let alone every value of every data item as some of the transformations in this chapter lead to).
It is more convenient to roll one entity up to become an attribute of the other. You may do this where there is a 1:1 relationship, or where one entity is a key-only entity, with no attributes of its own.
Figure 5q features both 1:1 relationships and a key-only entity. It can be condensed by rolling the ‘key-only entity’ into the ‘state entity’.
Figures 5q and 5r shows that whether a business term becomes an entity or an attribute depends on the perspective of the system’s users.
Colour might seem indisputably to be an attribute. But Colour might be easily be an important business entity in a company that manufactures paint.
By the way, figure 5s shows the term ‘state entity’ comes from one way to classify different kinds of entity in an entity relationship model.
Value that constrains business data
Universal value object
Defined outside the business (colour, month)
created and destroyed by the business (customer, application)
Value that currently applies to objects in the business
derived from values stored in other objects, not a constraint on them
(month of birth)
How relationships prompt the analyst to ask questions.
All the commonly used notations show entities as boxes and relationships as lines between them. I use a diagram notation based on that developed (I think) by Charles Bachman in the 1960s, from which a number of other variants have been derived. It doesn’t matter if you prefer another notation (say, after Chen, or OMT) that expresses the same semantics.
To show the dependence of one object on another, or its independence of other objects, our notation uses a continuous line or a broken line:
shows that an object at that end of the relationship
can exist without the relationship
cannot exist without the relationship
A solid continuous line is a mandatory 1:1 relationship. The objects at either end share the same identity, even though they might be given different keys by a business.
Later chapters shows you may draw a mandatory 1:1 relationship to connect the parallel aspects of an aggregate. But as a rule of thumb, you should assume that nature abhors a symmetrical or non-hierarchical relationship.
In this case, you may discover that a School can exist without a Head Teacher, but not vice-versa.
In a semi-optional 1:1 relationship, the independent entity is called the parent entity of the relationship. The dependent entity is called the child entity of the relationship.
The parent-child nature of relationships helps us to draw an entity model in a structured way, with parents towards the top and children towards the bottom.
This hierarchical structuring gives us opportunities for naming standard shapes, recognising them in specification diagrams, using them to ask questions, and teaching the analysis and design implications.
Figure 6f shows you can test the meaning by trying to write either ‘belongs to’ or ‘is a’ on the child or subclass end of the relationship, and ‘may have’ or may be’ at the top.
What I want is a graphical notation that combines the cardinality rules specified by a database structure notation, with the semantics specified by object-oriented notation. Figure 6g shows a notation you can use to express the different semantics, while retaining the cardinality information.
Figure 6h shows a deep is-a tree.
Figure 6i shows notations you can use to show aggregates and is-a trees with several overlapping children or subclasses. The fact that the lines are dotted at the top means the children or subclasses may not apply.
Figure 6j shows notations you can use to show that the children or subclasses of an aggregate or class hierarchy are mutually exclusive.
Both these diagrams say ‘either one case or the other case’. If you wanted to allow ‘neither case’ as well, then you would draw the top half of the relationships with a dotted line
In this short section I have entered the territory of object-oriented modelling; see later chapters for much more discussion of aggregates and is-a relationships.
It is usually easiest to start by drawing the entity model without history. The model above will record for a School only the currently employed Head Teacher; it won’t keep a history of past Head Teachers. Let us say we are not interested in this history.
However, if the child object is a group of attributes in 1:1 correspondence (so if one is present then all are) then the semi-optional 1:1 relationship saves you from specifying the rule of 1:1 correspondence between attributes of the group within a larger entity.
An object in this kind of model becomes divided between entities as it progresses through its life. Somebody who starts as a Pupil may later become a Senior Pupil as well. This is an unnecessary elaboration, leading to some redundant design and coding effort. Where the child entity represents merely a later stage in the life of the parent entity, you may roll the two entities into one.
The convention I favour is that objects don’t normally change class.
Some people propose the reverse, that you should create a subclass for each state an object may pass through, showing them as mutually exclusive subclasses in a class hierarchy. But this adds to the design and coding effort. It increases the number of entities in the design and the complexity of coordinating separate objects during an enquiry or update process. If each entity becomes a database table, it slows down performance, since more objects must be retrieved and stored.
Later chapters say a great deal more about types and states.
Returning to the first example and first question, you should ask about the objects at both ends of the relationship: Can they exist without it? If yes, you should define the relationship as being optional at one or both ends.
In this case, you may discover that the system has to record both headless Schools and unemployed Head Teachers.
Again, nature abhors a symmetrical or non-hierarchical relationship. There are two ways to introduce a third entity into the picture.
E.g. you may discover that a School and a Head Teacher only become linked via a Contract. You can redraw the entity model in a hierarchical V-shaped structure.
The link entity at the bottom of a V shape acts to constrain the relationship between the two higher entities. It gives the relationship a meaning. It restricts the possible links between objects of the two higher entities; you can only connect objects which are in reality connected by this meaningful relationship.
Remember: the identifier or key of an entity state record is not just a database concept, it is a necessary business concept. It enables you to:
a) distinguish that object from another of the same class and
b) map the entity state record onto a real-world object in the business environment.
Later chapters includes analysis questions that are relevant to a 1:1 link entity.
The 1:1 bridge shape gives two child entities a common parent. Use it where entities are additive roles rather than mutually exclusive subclasses.
For example, suppose that you wish to combine two legacy systems, one
from Europe and one from the US, that maintain information about an overlapping
range of stock types. The two systems identify their range of stocks by different
numbering systems. Some European stock types are the same as those in the
In this case you might create an entity that sits over and between the two systems.
The entity model above says ‘either, both or neither’. It says you can instantiate a superobject that has no related subobject. Specifying an ‘either or both’ constraint to exclude ‘neither’ is beyond us here.
Both the 1:1 V shape and the 1:1 bridge shape are aggregates, and they prompt analysis questions. Aggregates and semi-optional relationships often transform in one of the ways described in Later chapters
The Bridge shape is akin to one of Gamma et al. patterns called Adapter that is designed to ‘convert the interface of a class into another interface clients expect. Adapter lets classes work together that couldn’t otherwise because of incompatible interfaces.’
A typical object-oriented system records only the current state of transient objects. A typical enterprise application records historical data about long-lived real-world objects. Both the real-world objects and the entity state records are persistent. History and persistence make for 1:N relationships.
The result of asking this question is that the majority of associative relationships in enterprise applications turn out to be 1:N, shown in our notation using a fork:
shows that at that end of the relationship
fork on line
there may be several objects
no fork on line
there may be no more than one object.
Combining the continuous or broken line with the fork, the notation can show four kinds of 1:N relationship, as illustrated below.
In a 1:N relationship, I call the entity at the ‘one’ end the ‘parent’ entity of the relationship; and the entity at the ‘many’ end the ’child’ entity of the relationship.
It is often helpful to name a relationship at both ends, as I have done here.
The relationship that exists between Teacher and Pupil is optional at both ends. Not all Teachers manage a class. Not all Pupils are assigned to a class with a class teacher, only the younger ones. So the relationship line is broken at both ends.
You may wonder about introducing School Class into the model. Thankfully, the concept is not recorded in our system, otherwise I would have to worry about confusing ‘Class’ with ‘class’ in our discussion here.
There are at least three more questions you should ask about any 1:N relationship.
· Can a parent object have more than one active child object at once?
· Does a parent object retain historic children as well as active children?
· Can a child swap from one parent to another?
It might be possible to extend the notation to show all the answers in a graphical form. But this way lies madness. If you try to show all constraints on an entity model, you end up with a picture that is so large, so rich in semantics, and so complex in appearance that you cannot use it.
It is better to ask these questions during Event Modelling and object behaviour analysis, and document the answers there, in the diagrams for each persistent entity class and each transient event class.
Of course, you may revise or extend the entity model with new entities or relationships after you have answered the questions.
You may at first include N:N relationships an entity model of business objects.
Again, nature abhors a symmetrical or non-hierarchical relationship. You may draw explicit N:N relationships in the early stages of a model, but you should always resolve them before completing the specification.
E.g. you may conclude that the relationship is established via a Pupil.
A Pupil is a very concrete entity, but a weak way associate a School with a Teacher. Not every Teacher is a class Teacher. Not every Pupil is assigned to a class Teacher.
Asking about this case, you might discover the Employment Contract. You can reveal the hidden data by simplifying the N:N relationship into two or more 1:N relationships.
I will return to discuss how to use the V shape to constrain a system’s behaviour, and some design issues raised by it.
We’ve looked mainly at questions about single relationships. Part Two discusses some of the ways in which relationships may form larger and perhaps more interesting shapes.
This chapter reviews traditional data analysis techniques. As Winston Churchill said in a very different context: ‘It may be unfashionable, it may be unpopular, it may be unpalatable, but its the truth.’ Well, it is part of the truth. I add a few analysis questions to be asked during data analysis.
Do not confuse a database view in the UI layer with the data structure of the underlying business. You must decompose aggregate objects displayed in the UI layer for processing inside the system.
Business rules belong to the entities in the underlying application, not the aggregate objects in the UI layer (though these might unfortunately be called ‘business objects’).
Relational data analysis is a good way to reduce the aggregates of data items found on forms, screens or data files, into a set of simple normalised relations. Allow us to equate the concepts of entity and normalised relation for the time being.
Normalisation, a technique used in data analysis, is a fine example of generative patterns, of model transformation by question and answer. It reduces complex data structures to the simple building blocks from which they are made. It reduces unnormalised data in stages through successive normal forms.
The starting point for the example below is the data to be found on a batch of Sale Returns emailed to head office by a salesman.
Separate the entity from the repeating group
Salesman name *
Salesman name *
Salesman name *
Product name *
Product name *
Cust Num *
Remove attribute who value depends on part of the key
Remove attribute who value depends on part of the key
Some object-oriented designers dismiss normalisation, because it is entirely data-oriented, but there are many things to be said in favour of it.
For one, it encourages you to think in detail about the users’ requirements for information. Several of the case studies published to illustrate object-oriented methods appear to be complex and challenging – finding the right classes is relatively difficult or mysterious. My study of the case studies suggests thet would be easier if the authors had defined some input and output messages at the start, then (dare I say it) applied a little relational data analysis to those inputs and outputs!
For another, the normalisation process depends on the analyst choosing an identifier or key for a data group. The key is underlined in our examples. When a data group is in third normal form, each attribute is ‘determined by’ or ‘dependent on’ the key. Given the value of an object’s key there is only one possible value for any given attribute of that object.
Why is thinking about keys helpful?
Choosing a key may seem merely an implementation decision. Indeed, you might not decide between various possible candidate keys for an entity state record until relatively late in the design process.
But the intention or desire to give an entity state record a key is not just a database concept, it is a business concept. When you choose a key during relational data analysis you are making a statement about the business perspective you are taking of the real world.
Users need a key that will enable them not only to:
• distinguish one object from another of the same class, but also to
• map the entity state record onto a real-world entity in the business environment.
One reason for taking required output reports, or a legacy database, as the source documents for data analysis is that these sources will reveal the things the users already care about enough to have awarded keys.
The key must uniquely identify an object, and not have more than one value for it. In other words, the values of the key must be in 1:1 correspondence with objects of the class.
You may have to choose between several candidate keys. Since users need keys that help them map entity state records onto real-world entities, you should favour natural attributes over artificial identifiers.
If you have to make up a key from a long list of attributes, then so be it. The important thing as far as data analysis is concerned is that you have established the business need for a key.
In the old days, designers might have said to choose numbers over text, short items rather than long ones, and few items rather than many, but the ability of users of a graphical user interface to select objects from lists now saves people from having to type long multi-item keys.
Students normally learn normalisation by completing a table such as the one drawn above. This is a bit like practising scales when you start to play the piano.
Few if any professionals do data analysis the way students are taught to, just as a concert pianist almost never plays a scale during a stage performance.
Professionals use data analysis to place facts into an existing entity model, albeit an informal or provisional entity model. They:
· reconcile input and output data flows with an existing entity model
· refine an informally defined entity model
· reverse engineer entities out of an existing database schema
Data analysis is a technique for both forward and reverse-engineering. Nowadays data analysis is a common way to start reengineering a legacy system, it helps you take advantage of the effort that has already gone into to defining the legacy database.
The analogy between analyst and concert pianist is a poor one, because it is possible to teach novice analysts to do data analysis the way the professionals do it. The trick is to focus on the way normalisation reshapes an entity model, on graphical model transformations.
The analyst starts normalisation by choosing an identifier or key for the entire unnormalised data group. This is not always easy. The key should uniquely identify at least one other data item. So the choice of key in figure 6b is a poor one.
Choosing a poor key for unnormalised data has little effect on the entities defined at the end of the analysis, but it dictates the path that normalisation takes, and it leaves you with a key-only entity. This key-only entity may turn out to be redundant, not interesting to the business.
Following the rule that a fork grabs an asterisk - a relationship grabs a foreign key - you can draw the data groups that result from data analysis as entities connected by relationships. So let us repeat the data analysis of the example by following generative patterns.
Figure 6b shows the standard pattern for the first normalisation step is to drop out a child entity.
Notice the instruction to choose a key for each new child entity. We’ll come back to talk about this in a moment. Consider the example in figure 6b.
If you had chosen Product Name as the key for the unnormalised data in a Sale Return, then Salesman would not end up as an entity on its own. But having chosen Salesman Name as the key, every other data item is immediately removed as repeating data, data that has several values for the key, so you end up with Salesman Name as a key-only entity.
In other cases, you might choose to drop the key-only entity. But in this case, the Salesman is probably an important entity and worthy of record. You may well discover an attribute for Salesman during data analysis of another form, screen, report or file.
A choice between candidate keys often arises when choosing a key for an entity revealed at first normal form. Typically the revealed entity is a link or bridge between two or more entities with simple keys of their own.
choice of key for a
But this is not user-friendly. If the users do not already use
The trouble with a compound key is that objects of the parent entities can only be linked once, by one link object. If you want to allow duplicates, you have to extend the compound with date and time attributes, or with some other qualifying element or sequence number.
Number in these examples is used to extend the range of values provided by
combining the keys of parent entities. All of these hierarchic keys for a
There is another difficulty with the way data analysis is taught. Listing data items in an unnormalised data group means you lose sight of the data structure you started with.
Given a complicated document or file, you may need to divide it at first normal form into a complex structure of parallel and nested repeating groups.
It is impossible to visualise this structure by looking at a list of unnormalised data items. It is much easier if you record the repeating data groups as distinct entities from the outset, or draw boxes around data groups on the original document.
The standard pattern for the second normalisation step is to raise a parent entity with a key that is part of the key of the child.
In general, given any multi-item key it is well worth asking about the classes that might exist identified with one part of the key as their own key. See Part Two.
The standard pattern for the third normalisation step is to raise a parent entity and assign the determining attribute(s) as its key, as shown below.
Fourth and fifth normal forms are discussed in Part Two. Boyce-Codd normal form is a variation of third normal form that eliminates possible anomalies where there are several candidate keys which share a common attribute, a complication that need not concern us here.
You will normally merge the end results of various data analysis exercises. You can merge any classes whose objects are in 1:1 correspondence; this is usually indicated by their having the same key.
Two of the classes in our little case study pick up an extra attribute from data analysis of other documents.
Notice that Cust Address has sneaked into the
After data groups have been merged, items will have been brought together in new combinations, so you should apply two tests to ensure that the resulting classes are in third normal form (TNF).
For example, is Cust Address really determined by Cust Num, and best
moved into the Customer class? Or perhaps the address is recorded afresh on
Data analysis is never as easy as presented on training course, because you have to ask business analysis questions about how data changes over time, and whether historic data has to be remembered.
A relation is an aggregate of several attributes. Chapter 4 showed you might define each attribute as a key-only parent entity in its own right. What I call the ‘relation shape’ is a class with three or more parents.
For example, let us say a Product is a type of Ingredient with a unique combination of four other characteristics, each of them user-defined. You can define each attribute as a class in its own right, as shown below in a fraction of the full model:
Users will define the valid range of each attribute by creating objects of the parent entities. Suppose it turns out that users start to record products in the database that cannot actually exist in practice, products with invalid combinations of size and ingredient. Four solutions might be designed.
A weak solution is to place some kind of security constraint, using a password perhaps, on who is allowed to set up products in the system.
UI layer solution
A weak solution is to code the rules as constraints on data items where they are entered into the system, in the user interface code. This may prove difficult to maintain as the code is added to several datan entry screens. A stronger solution is to record the constraints in reusable modules underlying the user interface.
Data services layer solution
A strong solution: specify validation rules applying to data items in a data dictionary attached to the database management system. Unfortunately, few database management systems come with a sufficiently clever data dictionary, one that can apply the rules dynamically to a live system. If you do have a clever enough data dictionary, then think of it as belonging in the Business services layer rather than the Data services layer.
Business services layer solution
A strong and practical solution: record the validation constraints as a cross-reference table, or link class, in the entity model of the Business services layer.
The introduction of a V shape domain above the State Class currently looks like the best design option for most applications.
There are weakness in the relational view of the world.
Data-centred, database semantics are not handled
People have attempted to extend the relational model to accommodate business rules. These approaches are sometimes called ‘semantic entity modelling’. The difficulty is that these approaches tend to be so heavily data-centred that you have to think in rather abstract and difficult ways to discover and define the processing logic and processing rules.
In our view, object interaction and object behaviour analysis is a ‘semantic entity modelling’ approach, though you specify the rules on event models rather than the entity model itself. The data and process models are all part of one coherent conceptual model.
Objects are key-oriented rather than type-oriented
Relational theory does not account for mutually exclusive or optional data. I will show how object interaction and behaviour analysis deals with class hierarchies of super and subclasses.
Parents don’t know where their children are
Relational theory suggests that a parent entity should not know about its children. There are no tables or lists, only foreign keys. This minimises data redundancy, but access from parent to child involves a great deal of processing redundancy. This is a big factor in slowing down system performance. Where a database is distributed it is almost inconceivable that a parent entity should not somehow know where its children are.
This is an issue for the Data services layer and you should not even have to think about when defining the Business services layer!
How access from parent to child is achieved is a matter for the Data services layer. The database designer may implement a relationship in either relational or network style. In network databases, tables and lists are allowed, especially for storing relationships. Thus, while data redundancy is thus permitted, access from parent to child involves no redundant processing.
Aggregate objects cannot be stored
You cannot store objects without repeating data in a relational database, because relations must be in first normal form. I will show this is fine and correct for the Business services layer. The Business services layer must separate out the low-level normalised classes, partly for update efficiency and partly so that they can be viewed from many different perspectives.
You may however choose to design and process aggregate objects in the presentation and Data services layers. Some people use the term business object to describe an aggregate object in the UI layer. Once more, be careful not to confuse a database view with the database itself. Business objects in the UI layer must be decomposed for processing in the Business services layer.
To supplement data analysis and overcome the above weaknesses, object interaction and behaviour analysis techniques are needed.
There are two main techniques that complement entity modelling. Both help to validate and improve the entity model.
Chapter 7 introduces event modelling techniques that can be used in the Event Modelling face of the cube. The volume “The Event Modeler” goes into much more detail.
You can specify static and invariant constraints (applied in every case and unchanging) as properties of data. You can specify validation rules governing the ‘domain’ of a data item, and you can fix referential integrity rules by specifying the optionality and cardinality of relationships.
But some constraints are dynamic or changeable, so it is not appropriate to build them routinely into implementation database structures, or even a logical entity model.
E.g. English law lays down a number of constraints governing a wedding event: a marriage must relate two partners, no more, no less; one partner (the husband) is male; one partner (the wife) is female; both partners must be over 18 years of age; a person can have only one marriage at a time; a person can only have marriages in their sex of birth. And there are further preconditions to do with the notice period, the number of witnesses, the residential addresses of the partners, the location of the marriage, and so on.
You need ways to make all constraints explicit, not just referential integrity rules. In general, constraints are assertions about the actions that are possible. You prevent a data item from being entered, or a relationship from being established, by preventing an event from taking place. So you can specify all remaining constraints as preconditions on events.
Chapter 7 includes an illustration.
An overview of Event Modelling techniques that specify how events hit objects in the entity model.
Level of granularity
Validation of the entity model during analysis
You should validate the classes and relationships by testing that they support all the enquiries that users say they want to make of the required system. In simple cases, programmers can do this by defining an SQL query. At an early stage of analysis, especially for complex cases, it helps to define the enquiry model for the enquiry requirement.
To verify that atomic enquiries within the system functions can get the information they need by accessing entities - to test that every known output data flow (message, report or file) can be derived using the relationships in the Entity model - you can draw every enquiry access path as an enquiry model. This means defining the entry point object (identified by the input data parameters) and the navigation path along relationships to collect the required output information from other objects.
You can redraw the enquiry model from the perspective of the specific enquiry.
Notice that an enquiry process that follows this particular access path will find the same Customer many times.
An enquiry may perform redundant processing, retrieve more entities than are necessary for the required output data flow. If the enquiry is infrequently made, you may assume the output data flow will be sorted and duplicates removed.
However, if the enquiry is a primary system function, triggered many times a day, you may perhaps prefer to refine the Entity model so that no redundant accesses are made, by adding a derivable entity into the Entity model.
The new entity is not entirely a matter of performance optimisation. The fact that users enquire about ‘Customer Interest in Product’ so often shows that this associative entity (derivable though it may be) is a matter of concern to users in running their business.
If you do include such an entity, make sure the text description of the entity starts with the word DERIVABLE, and name the requirements that is used for. Designers may choose either to store the entity as a database table, or write relatively complex enquiry processes.
If there is only one route through the Entity model from the entry point, then the access path is obvious from the enquiry model. Otherwise, you have to specify which of several relationships are followed. You can draw arrows to show this.
Or you can draw the enquiry model from the perspective of the specific enquiry.
Note that one entity type may appear twice in an enquiry model playing different roles. One convention is the name the entity role in brackets after the entity name.
Don’t forget management and audit reporting requirements. Wherever you find historical facts are needed on output,you should include historical attributes and relationships in the Entity model. Occasionally, you might be led to include an extran entity, and restructure the entity model accordingly.
Triage in enquiry access path analysis
Only document those enquiry models that are not obvious from the specification of the output data flow and Entity model. Under pressure of time, analyse only the outputs of primary system functions.
An event is like an enquiry except that it updates one or more of the objects it hits.
Events are more complex than enquiries. Events require more careful analysis. You can use object behaviour analysis techniques to analyse events and define the behaviour of each class as a state machine composed of event effects.
The volume ‘Event modelling for enterprise applications’ shows it is not far from an event model diagram to either a procedural program, or to object-oriented programming.
For example, suppose a recruitment consultant wants to discover which Jobs are available for an Applicant you must find out:
what Skills the Applicant has, and
what Skill Type each Skill is classified under.
what Jobs are available under that Skill Type.
The graphical representation below shows how the relationships provide the path to select the objects that satisfy the enquiry.
A CASE tool can mechanically convert figure 7w can into Figure 7x.
A CASE tool can mechanically convert figure 7x can into a Jackson-style program structure, with read statements allocated at the correct points.
The ability of an entity model to support an enquiry access path (however it is documented) is very important. The access path tells you which relationships are needed to select objects. It also enables you, in physical design, to design a database which has records, representing classes, stored so as to provide efficient paths for retrieving information.
Our focus is on enterprise applications, but event modelling techniques are useful for other kinds of system - especially process control or embedded systems.
Methods for designing embedded systems normally focus on behaviour and process modelling techniques, and pay little or no attention to the entity model. But embedded systems do have an entity model.
The objects in a process control system may not be numerous or persistent enough to be stored in a database and connected therein by pointers. However, there is an entity model behind the scenes, and if you draw it, the model does tell you something. The relationships specify the paths along which objects may communicate.
How knowledge of patterns such as double-V shapes and diamonds can give productivity and quality benefits, helping people not just to draw entity models, but to get them right.
A class can participate in several relationships, either as parent or child. Fig. 1a shows a School is both the parent of many Pupils, and a child of one Local Authority.
The figure also shows three classes that appear in the shape of a triangle. This shape prompts an analysis question.
Why? First, the long relationship is redundant; it says nothing that is not said without it. Second, it may wrongly permit the end-user to attach the bottom-level child to two different parents via the direct and indirect routes. Later sections discuss the question of double parents in triangles and diamonds.
Fig. 1b shows that adding Teacher into the picture creates a diamond shape.
The two sides of the diamonds represent what might be called a ‘boundary clash’ between two conflicting ways for low-level objects to be grouped into a batch or collection.
E.g. Fig. 1c shows the Customer Interest in Stock class clusters all the Orders for the same combination of Customer and Stock.
You may discover a key-only relation as a result of applying relational data analysis to an output report. For example, a report of Orders listed by Customer within Stock, or by Stock within Customer, may lead you to specify Customer Interest in Stock as a key-only relation or sorting class.
You may also discover a derivable sorting class when defining an access path to create such a report.
Some database designers are obsessed with removing all derivable data from the entity model, careless of the expense of redundant programming effort and processing time. The three-tier architecture gives us an opportunity to reexamine this assumption.
Entity classes in the Business services layer
If the users’ requirement is for frequent reports that sort Orders by a combination of Customer and Stock, then the derivable sorting class surely belongs in the entity model. You can now specify simple enquiry processes that return the results the users want. You can code these enquiry processes in the Business services layer on the assumption that the derivable sorting class exists.
Soft classes in the Data services layer
What if the derivable sorting class is missing from the data storage structure? Perhaps the database designer rejects it, or you have inherited a legacy system without it?
This can complicate the specification or the coding of enquiries that generate the required reports. Since this complication is caused by the database designers’ requirements, not by users’ requirements, you should hide the complication from Business services layer processes, in the Data abstraction layer to the Data services layer.
The idea is that any enquiry process that wants to read a Customer-Interest-in-Stock object will call the Data abstraction layer to sort through the stored data, manufacture the object and return it. Such objects, manufactured by the Data abstraction layer rather than stored in the data storage structure, might be called ‘soft objects’.
The notion that some derived data rightly belongs in the Business services layer runs against the received wisdom, so I return to soft objects and derived data in later chapters.
Is there any similarity between the two entity models in Fig. 1d?
It is hard for us to spot that these unstructured models are two instances of the same general pattern. Tools don’t care how messy the diagram looks, but people do.
A tool can help us by redrawing the models in a hierarchically structured fashion, placing the parent of each relationship above the child. If the analyst always draws the relationship from the parent to the child, and the tool constrains them to draw one-to-many relationships in this direction, then the tool can easily remember which end of a relationship is parent and which is child.
The analyst may request: ‘Please reshape my diagram for me in a hierarchical fashion.’ A tool can respond by redrawing the two models as in the Fig. 1e below.
(By the way, algorithms that try to avoid crossing lines become increasingly useless as the complexity of a network diagram grows.)
After a person or a tool has rearranged the diagrams hierarchically, it is much easier to see these are both examples of the double-V shape shown in Fig. 1f.
This shape is a generative pattern that prompts you to ask an analysis question.
The analyst may request: ‘Please highlight or report on any double-V shapes for me.’ A tool might respond by thickening or colouring the questionable relationships, then ask the analyst the following question.
E.g. Fig. 1g shows a book can only be loaned to someone who is a member of a library; and a time sheet must be submitted by an employee within an employment.
Hierarchical arrangement makes it easier to see triangles. After asking the earlier question about triangles, you are left with two Y-shape structures.
The classes at the heart of the Y shapes in Fig. 1h represent real-world entities. Users create objects of these classes in order to constrain the creation of objects of the class at the bottom. But there is another kind of Y shape.
The examples so far have revealed two kinds of Y shape. Fig. 1i shows the class at the heart of the Y shape can be either a domain class or a derivable sorting class.
Objects of a domain class are created by users. The domain class at the heart of a Y shape might represent a business entity with attributes of its own (like Membership and Employment in Fig. 1h), or it might be no more than a key-only link class that relates its two parents.
Objects of a derivable sorting class can be derived from the existence of child objects. An example of a derivable sorting class appears as part of the solution to the problem described in the next section. But first, a warning that some of our patterns can appear in disguise.
The basic patterns or shapes can be obscured by intermediate classes. Fig. 1j includes a double-V shape, even though the Job class sits in the middle of one side of one of the two V shapes.
Readers may like to consider ways to resolve this double-V shape for themselves. A possible refinement of the structure appears later in this chapter.
The idea of teaching patterns is that analysts should save money by getting the system right first time. But the patterns are just as useful if you are trying to correct or improve a system that isn’t working correctly. What follows is based closely on a real example.
The business has an enterprise application for recording what it does to meet customers’ needs. The business supplies ingredients to food manufacturers. Ingredients are packaged in various ways, by size, quality and so on, to make distinct products, each with a distinct price. People (‘Contacts’ below) enquire about products. They may be sent a brochure and/or samples. They ask for quotes; they are given prices for specific products. They place purchase orders for a quantity of product at either the current price (an attribute of product) or the price given to them in an earlier quote.
The manager asked for our help. He had already set up a database, using an application generator, to record customers orders, and requests for information about products. Fig. 1k shows the structure of the database.
The manager had quickly generated a system to maintain this database, but problems were now being experienced with the quality of the information in it. The problems centred on the multiple-V shape, that is, the four child entities owned by the same two parents, Product and Contact.
The historical record of a contact’s interest in a product was patchy, incomplete and out-of-date. Users forget, or cannot be bothered, to set up an Interest in Product record every time they record an Enquiry, Quote or Purchase Order.
Spotting the multiple-V shape prompts us to ask the question: Is an Interest in Product related to the various possible reasons for that interest?’ Of course it is. Fig. 1l partially resolves the double-V shape by setting up explicit relationships in the data structure.
An Interest in Product record is now created automatically, whenever the detail of an Enquiry, Quote line or Purchase Order Line is recorded for a new combination of Contact and Product.
Note that the model does not match the pattern in Fig. 1i in one way; objects of the new derivable sorting class need not have any children.
Fig. 1m illustrates the transformation described in the volume ‘Introduction to rules and patterns’ whereby you might elaborate the model to show the rule that there must be at least one ‘Reason for Interest’.
The trouble with introducing this rule is that it constrains us never to maintain an object of the class Interest in Product without a reason. The ‘at least one child’ rule is more rigid than is required by this business, so I will relax it again.
There is still a multiple-V shape in the model. The three bottom-level classes are all owned by both Product and Contact parents.
End-users cannot record for historical analysis whether the price they give for an order line is the current price, or the price given on an earlier quote (they have some discretion to price order lines in either way). Also, they lose track of which quotes have been successful, that is, which quotes have resulted in orders. Fig. 1n resolves the multiple-V shape.
Further analysis of the child entities jointly owned by both Product and Contact may lead you to ask: Are users interested in whether a quote line results from an enquiry? or an enquiry led to a quote? or a quote resulted in an order? If so, you might add further relationships to the model. The exclusion arcs show that not all order lines come from quote lines, and not all quote lines stem from enquiries.
A third reported problem in the case study centers on another kind of pattern. The reported problem is that users are recording products in the database that cannot actually exist, products with impossible combinations of size and ingredient. Fig. 1o shows a pattern I call the relation shape.
Fig. 1p introduces a V shape domain class.
The introduction of a V shape domain class above the relation currently looks like the best design option for most applications.
It is important to realise that not all triangular or double-V shapes are bad. It would be a mistake for a tool to automatically remove all such structures from a specification. Below are three cases where a triangle is a valid structure.
Children with optional parents
Fig. 1q shows a triangle that is valid because one of the short indirect relationships is optional at the bottom end.
This case is well-known and has been illustrated by many others. Cases where all relationships are mandatory at the bottom end are more interesting.
Fig. 1r shows triangle that is valid because there is a current 1:N relationship in parallel a historic N:N relationship. The current relationship to the link class is monochronous (one at a time); the historic relationship to the link class is polychronous (several at a time).
Some argue the current relationship is redundant because it is a subset of the historic relationship; but removing the current relationship creates redundant processing.
Without it, to find the current Department of an Employee you have to hunt through the historic memberships for the latest one, and then perhaps check that is still active. This redundant processing is avoided by making the current relationship explicit.
Fig. 1s shows a triangle that is valid because the bottom-level child may have two different top-level parents.
Specifying the constraint that parents are the same
To say that a Task can only be done in the same Department that the Employee is contracted to, you should remove the long direct relationship.
Specifying no constraint.
To put no constraint on what Department a Task is done in, you can define the Task as having two Department attributes, one direct and one via Employee.
Specifying the constraint that parents are different.
To say (bizarrely) that a Task can never done in the same Department the Employee is contracted to - you specify the constraint by defining the Task with two Department attributes (foreign keys inherited by different routes) with the rule that these cannot match each other.
There is another way to specify the last rule - that a Task can never done in the same Department the Employee is contracted to. Fig. 1u shows you introduce a V shape domain class.
Introducing a V shape domain class in this case is an exceedingly clumsy solution, because all but one Department is valid for each Employee. Constraints that exclude a single value from a range are normally specified as a rule restricting the domain of an attribute of a class, as shown on the previous diagram.
However, multi-value constraints are normally better specified in the form of relationships. If there were a range of Departments for which an Employee is allowed to do a Task, then the structure above would be a good specification of this constraint.
You cannot remove any of the relationships in a diamond shape (unlike a triangle shape) without loss of information from the specification. But you should still
Fig. 1v shows a Fire Appliance can be related to two Counties: the County where the Incident is that the appliance is attending, and the County where the Fire Station is that the appliance is based at.
Fig. 1w shows the answer to the earlier question. It includes a diamond shape. So how do you specify whether the Interview has only one Skill Type, or may have two?
Specifying the constraint that parents are the same
To say that an Interview can only be arranged for a qualified Applicant who has the same Skill Type as that of the Job, you can define the Interview as having only one Skill Type attribute (the same foreign key inherited by different routes).
Specifying no constraint
To say that there is no rule on whether an Applicant must be qualified for a Job or not, you can define the Interview as having two Skill Type attributes, without any constraint on their values.
Specifying the constraint that parents are different
To say (bizarrely) that an Interview can only be arranged for an unqualified Applicant, you can define the Interview with two Skill Type attributes (foreign keys inherited by different routes) with the rule that these cannot match each other.
By the way, the classic example of a diamond shape or boundary clash is
the ‘Telegrams problem’ described by
The need for this kind of two-pass serial file processing has been
reduced by the introduction of network databases that can impose many clashing
hierarchical structures on the underlying data. In terms of an entity model,
A later chapter discusses another design issue raised by the diamond shape - the possibility of a process that travels from top to bottom, or vice-versa, via two different routes.
Some relatively advanced techniques for analysing data structures, including reasons to contravene 4th and 5th normal forms by maintaining derivable sorting classes.
You may find, perhaps as a result of relational data analysis, that some classes have compound keys, but there are no parent entities with elements of the key.
Fig. 2a shows V shapes you can generate from the classes Holiday Feature and Client Requirement in a Travel Agency
Fig. 2b shows V shapes you can generate from the classes Patient Admission and Employment Contract in a hospital system.
Fig. 2c shows V shapes you can generate from the classes Task and Course Booking in a personnel system.
Given a three-way compound key, then try transforming it into either a double Y shape or a triple Y shape, as shown below.
E.g. Suppose Surgical Operation has a compound key of Patient, Hospital and Surgeon (perhaps date and time ought to be included as well, but I shall gloss over this). You may draw a shape with three simple key classes, and two or three two-way key classes.
Assuming all Surgeons in a Hospital are allowed to operate on all Patients in the Hospital, analysis may reveal the classes shown in Fig. 2d.
Fig. 2e introduces an extra class to model the constraint that Surgeons in a Hospital can only operate on a Patient in the Hospital after the Patient and Surgeon have both signed a consent form.
The three-way key class is necessary. A Surgical Operation records an event in the real world and it has attributes of its own. But some three-way key classes are redundant. They result from data analysis of a poorly designed input or output document, where there ought instead to be two or three two-way keys.
Reducing to fourth normal means replacing a derivable three-way key class by two classes with two-way keys. Fourth normal form is most easily explained in terms of a pattern.
E.g. Fig. 2f shows that the Suitable Holiday class is merely a product of matching Holidays against Client Requirements. It can be derived at any time, and need not be placed in the model.
By way of contrast, Fig. 2g shows that the Holiday Booking class below is not merely a product of matching Holidays against Client Requirements. It is record of an event in the real world that users want to remember.
The Suitable Holiday and Holiday Booking classes give rise to a double V shape, resolvable in the normal way, as shown later.
A double Y shape may be incomplete at the top. Its essence is the V at the bottom - a class with unique compound of three attributes that appear in parent entities as unique two-way compounds.
E.g. Fig. 2h shows that if each
E.g. Fig. 2i shows that the Suitable Holiday class below is not merely a
product of matching Holidays against Client Requirements. It is constrained
also by the need for the Client to express an interest in the
It can be derived from joining all three parents, and may be discarded from the model.
Designers are familiar with tradeoffs between:
• minimising redundant processing versus minimising redundant data
• simplifying enquiry processes versus simplifying update processes.
Theoreticians tend to advocate the latter option in each case. They say to eliminate all redundant data, including derivable key-only classes, and to minimise update processing. They don’t say these options may conflict with each other. Consider the derivable sorting class called Suitable Holiday in the entity model below.
Since the system is designed to produce reports of Holidays suited to Clients, and reports of Clients suited to Holidays, the derivable sorting class will be useful.
Obviously, it will simplify and speed up enquiry processes. Without it, you will repeatedly have to manufacture Suitable Holiday objects in views of the data structure that users request for presentation. And you might have to account in some way for earlier Holiday Bookings on a Suitable Holiday.
Less clearly, the Suitable Holiday class can also simplify and speed up
update processes. When a Holiday Booking is made, you can more easily check any
history of previous Holiday Bookings. When a Client makes a Holiday Booking,
you can more easily locate and check any Holiday Booking already made for the
same compound of Client and
Overall, it may prove cheaper to maintain Suitable Holiday as a sorting class than to leave it out. This contravenes the established view of physical database design. See the chapter ‘Clashing entity models’ for discussion of how and why you might maintain a derivable sorting class in the entity model rather than the data storage structure.
Where a class has a list of similar attributes, you can generalise these attributes into a relationship. Fig. 2k shows an example drawn from Assenova and Johannesson .
The transformation in Fig. 2k is not very common in practical system design. Fig. 2l shows two more common transformations.
The patterns are discussed separately on the next page.
E.g. consider the three totals recorded in the Paper class in Fig. 2m. You might show the common properties of the three attributes by relating all three attributes to a single domain class. The resulting shape is called Tramlines.
E.g. Fig. 2n shows the transformation of the tramlines in Fig. 2m.
Finally I come to a shape you often see in large business databases - a core entity, surrounded by many parents and many children. Fig. 2o shows this as the X shape.
I call this the X shape (in line with the V,W and Y shapes) but you might better call it a star shape, since it can have many points, perhaps a dozen parents and a dozen children.
This is a rather vaguely-defined shape and a rather vague question. I don’t say how many points the X shape must have before it is likely to reveal significant missing constraints. Nor do I prescribe what to do in response to the question. Further research may reveal further rules of thumb in this area.
Designing an entity model for maintenance, anticipation of amendments.
Surveys tell us that maintenance costs far outstrip initial development costs; 70-30 is a proportion often quoted. Some hold out ‘design for maintenance’ as a primary goal of system development.
Analysis patterns can help you to design for maintenance and facilitate amendments.
But maintainability cannot be the primary goal. Correctness must be the primary goal. You should strive to get the system right this time, not next time. If you don’t strive, you won’t succeed. And if you don’t succeed, you’ll have to spend more on ‘maintenance’ later.
Other surveys tell us it is cheaper to correct errors sooner rather than later. It is obviously much cheaper to revise analysis documentation than program code in a working system. So people have proposed ways of exposing errors in analysis and design as early as possible.
One way is to follow an analysis and design methodology that produces graphical design documentation. Current methodologies have many weaknesses. Above all, they lack effective quality assurance mechanisms. It is no use having paper mountains of analysis and design documentation if nobody can tell whether the documentation is any good or not, and programmers throw most of it away.
Analysis patterns provide a solution to this problem; they provide quality assurance questions.
Another way is to follow the path of ‘iterative development’, rapidly producing prototypes of parts of the required system. Prototyping makes design results more concrete, more visible, so you can more easily see if designers are going in the wrong direction and head them off.
An enterprise application will not be entirely right first time. Some amount of trial and error is necessary. Some amount of iterative development is inevitable. But setting out with the objective of delivering a wrong system, then developing it by trial and error, is likely to add time and costs to the overall project.
Iterative development stretches the costs of development over smaller cycles. In effect, it moves maintenance (which we know to be expensive) into the development phase. Change control and configuration management become bigger issues. So if you iterate more than a a couple of times, the overall project will cost more and take longer.
Iterative development runs counter to design for maintenance. Designers who are focussed on the next small increment won’t take a long-term view. The code will grow haphazardly with each iteration into a pile of spaghetti that is hard to maintain. Agilists consequently promote “refactoring”. And of course, good design up front will reduce refactoring costs.
Iterative development encourages low expectations. Designers who think it normal and acceptable to deliver unfinished code will not strive hard enough to get the system right before giving it to users. Designers have an excuse to escape from their responsibility to do their best work.
Is there a credible way to improve on iterative development? Current methodologies are failing us. We lack a methodology that embodies professional expertise about designing for correctness and designing for maintenance.
Specification and design patterns address this problem; they encourage right-first-time design and can reduce maintenance effort.
In one sense there is no such thing as maintenance, there is only further development. The things you have to do in maintenance are the same as you have to do in development.
If it means anything, design for maintenance means designing the current system in a different way from how you would design it if no changes were ever expected.
It is meaningless to design for maintenance per se. Flexibility in every direction is impossible. Changes come from many different angles. You have to decide what changes are likely, and design with those changes in mind.
There are three basic design for maintenance strategies.
Some changes are due to new technology, perhaps a new database management system or new user interface management system. The way to anticipate changes in technology is to isolate, as far as possible, those parts of a system that are technology-specific.
Other changes are due to new user requirements: people changing their mind about the way they want to system to operate. New user requirements may be subdivided into ‘correctness’ requirements and ‘usability’ requirements. The way to anticipate changes in these requirements is to isolate, as far as possible, those parts of a system that are specific to specific kinds of requirement.
You can separate these concerns using the high-level analysis pattern of the 3-schema architecture. This architecture is divides an enterprise application into subsystems that isolate different areas you may want to change.
Business services layer
business rules and constraints
Data services layer
database management system
You can anticipate exceptional cases by not constraining the system to accept only normal cases. This can prove counter productive. Some of the tradeoffs are discussed in the next section.
You can generalise the design so that it is easier to accommodate new cases and reconFig. the system with new rules. See the section after next.
After you have specified constraints within the Business services layer of code, you may find you have to relax them to deal with exceptions. Users often submit maintenance requests asking for the freedom to break the normal rules, record unusual cases not previously envisaged.
A natural reaction is to relax the constraints on datan entry. This increases the danger of incorrect system usage and gives users the opportunity to screw up the system. Users need a system that constrains datan entry, prevents garbage from being stored in the database.
The trouble is that to design for exceptions, rather than reduce the constraints on the current system, can make the system considerable more complex.
Fig. 3a shows the entity model of the Marriage Registration system, introduced in the volume ‘Introduction to rules and patterns’.
One problem might be that changing a Person’s recorded sex would automatically invalidate all previous Marriage of that Person. All historic Marriages for that Person would now be in a state inconsistent with the rules of the system - recorded as being between two people of the same sex.
The solution is to record a Person’s sex of birth separately from their current sex, and apply the validation constraint only to their sex of birth.
This tiny Marriage Registration system has caused much debate in our tutorials, on grounds ranging from design and coding style, to culture and political correctness. Please don’t be offended if I go on to illustrate laws and societies you disapprove of.
An exclusion arc over the relationships implies that the class at the focus of the arc may be divided into subtypes. In this case, the two subtypes are man and woman.
Suppose the Marriage Registration system is bought by a country where sex changes are illegal and unrecognised. You might specify a fixed class hierarchy as in Fig. 3b.
It is normal for an entity state record to belong to many types. You might regard sex and job title as types of a Person.
Types are not normally represented as class hierarchies in the entity model of an enterprise application. One reason is that types often turn out to be additive rather than mutually exclusive - a Person can can have more than one job title at once - a bisexual Person might be recorded as having two sexes.
Another reason (the one that applies here) is that with the passage of time, an entity may change its ‘type’ many times. You may reasonably expect that most if not all of an entity’s types can be altered during the life history of an object.
The longer an object persists, the more that a type (even one as fixed in real life as male or female) tends to become a temporary state.
I don’t think of the object as changing class each time one of its types is updated. I think of it as remaining of the same class, but changing its state. Where a type change or state update constrains the future behaviour of an object, this is most naturally specified as a state-transition in the life history. So the type becomes a state variable.
You can specify the cyclical alternation between sex roles as state changes within the state machine of a Person. This state machine will record the current state, the current sex role, but not remember past ones.
Suppose the system is bought by a country where transsexuals are allowed to contract a marriage in their new sex. After a few months, the users submit an amendment request:
“Can we please be allowed to record the exceptional case where, over time, a Person plays both husband and wife roles in different Marriages?”
You might simply erase the exclusion arc constraint, as shown in Fig. 3c.
Is it worth removing the constraint on transsexuals remarrying under the new sex, just for the sake of just one or two individuals?
In general, relaxing a constraint may cause more trouble than it saves, by allowing some normal cases to be erroneously recorded as exceptions. You have to trade-off giving the end-users freedom to process rare cases, against specifying constraints that maintain the quality of stored data for the normal cases.
In this case, the danger is slight. A Marriage is still defined as connecting one Person of each sex. So users must change the recorded sex of a Person before they can record a Marriage under their new sex role. It would be difficult to do this by chance, in error.
Again however, changing a Person’s recorded sex would automatically invalidate all previous Marriage of the same Person. These would now be in a state inconsistent with the rules - recorded as being between two Persons of the same sex.
The previous solution, of recording a Person’s sex at birth separately from their current sex, only works if a Person can only change sex once. The proper solution is to record the life history of a Person’s sexual roles, and attach each Marriage to the period of time that they play a given sex role.
To keep a history of a Person’s sex changes, and record the Marriages contracted within each sex role, you should extend the entity model as in Fig. 3d.
Is it worth enriching the specification to record history? Yes if the user wants to be able to inspect past occurrences of an object’s state. Yes if it helps you maintain the constraints on system behaviour. It is now possible to change a Person’s sex and record new Marriages, without invalidating all their previous Marriages.
Can you have it both ways? Can you place constraints on the normal cases, yet also give end-users the freedom to process rare cases? Yes, but at considerable expense.
You might design the user interface so that the user is presented by default with the normal case - the possibility of entering a Marriage between two people in their sex at birth. To enter an exceptional case, the user must make a conscious effort to pop up a menu and select an entry for entering an exceptional case - Marriages involving one or more Transsexuals.
Fig. 3e enriches the entity model to show all possible valid types of Marriage as distinct classes.
In Fig. 3e, more than 50% of the design effort is devoted to handling what are likely to much less than 1% of the cases.
Is it worth enriching the specification to distinguish normal cases from exceptions? You should present the development costs for users to decide.
So far, users must change the recorded sex of a person before they can record a marriage under their new sex role, since a marriage is still defined as having one partner of each sex.
You can anticipate more exceptional cases by giving control over the system’s rules to the users. Fig. 3f generalises the specification so that users can define new kinds of marriage, with new combination of sexes, or even more than two people.
Fig. 3f is only an illustration, not a serious design. I look further at generalising classes in the next section.
Focusing on the specification of constraints within the Business services layer of code, how do you anticipate changes, design for ease of amendment, ahead of time?
You can anticipate changes in requirements by generalising aspects of the design. There is a trade off however. Generalisation can make a system harder to understand, and harder program, and possibly harder to use.
A rigid hierarchy
Imagine a personnel system that records a company’s organisation hierarchy. You might specify an entity entity model of the kind in Fig. 3g.
A rigid hierarchy is a generative pattern. You should ask: Is it possible for levels of the hierarchy to be omitted?
Suppose you find out the system has to record Companies that don’t have Departments, Divisions that don’t have Departments, and Company Employees who are not allocated to any Division or Department. Fig. 3h shows a more generic model.
The entity model is smaller and more flexible. On the other hand, the system is a harder for designers to work with, and it is more difficult to give users the same kind of usability.
Imagine a vehicle licensing system that records the various reasons why a Person is related to a Vehicle. You might specify an entity entity model of the kind in Fig. 3i.
But how many other reasons are there to relate a Person to Vehicle? What about Thief? Damager?
Again, the entity model is smaller and more flexible. On the other hand, the system is a harder for designers to work with, and it is more difficult to give users the same kind of usability.
Imagine a simple accounts system that records sales. You might specify an entity entity model of the kind in Fig. 3k.
The model is clear and specific about what entities are to be recorded in the system. If you translate this model into a data storage structure, then designers can easily write enquiry programs that report on all the sales for one customer, or all the sales of one stock type. Both designers and end-users can readily see what the classes are for, and use them correctly.
But suppose your prime design objective is flexibility. Your brief is to make sure the system can be extended to record new entity types (a Return of Goods, a Salesman), and heaven knows what else.
To accommodate future requirements, you might specify only a few generic
classes: ‘Contact’ instead of Customer and Supplier and ‘Stock Transaction’
This model is more flexible. To record a Salesman, all you need to do is extend the range of ‘types’ allowed for a Contact. You don’t have to change the structure. On the other hand:
• there is more danger of giving the end-users a system that fills up with garbage. It is easy to imagine people mistakenly entering Customers as Suppliers, or Sales as Purchases. Designers will have to work that much harder to constrain how the system used and give users the same degree of usability.
• the system is no longer so easy for designers work with. The
programming is more complex. You will have to write extra code to test the
contents of a Stock Transaction object, to find out what subclass it really is
• the system’s performance may be degraded, because events and enquiries that require access to all the objects of a logical class (all Sales), will have to trawl through all the objects of the physical class (all Stock Transactions).
It is easy to get carried away with the idea of generalisation and take it too far. Nobody in their right mind would go the extreme shown in Fig. 3m.
Or would they? There is a real motivation to do this, the cost of ‘data migration’. Data migration is a serious issue in system maintenance and its one of the issues I come back to in chapter 6.
Briefly, other chapters suggest you can have your cake and eat it too. You can separate the entity model from the data storage structure. You can code the entity model in the Business services layer on a business rules server - where it can be changed without the need for data migration. You can code the more generalised entity model as the data storage structure on a data server - where it will be sufficiently flexible to reduce the need for data migration. You may find it is not easy to do this using current technology. However, SQL is a natural tool for implementing the Data abstraction layer that is necessary to achieve this separation, and ODBC technology is also helpful.
Ref. 1: “Software is not Hardware” in the Library at http://avancier.co.uk
Footnote 1: Creative Commons Attribution-No Derivative Works Licence 2.0
Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.co.uk” before the start and include this footnote at the end.
No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it. For more information about the licence, see http://creativecommons.org