Business rules in entity models

This booklet is published under the terms of the licence summarized in footnote 1.

 

 

This chapter introduces some the basic ideas about business rules, because the use of entity models to specify business rules is a theme that runs through many later chapters. It discusses the specification of structural terms and facts; these appear as the names of entities, attributes and relationships in an entity model. It also discusses the specification of structural constraints and derivation rules which have to be coded into the enterprise application one way or another.

Structural terms

An entity model is built around entity types or classes. Model builders often start by naming entities, then go on to name the structural features of those entities, that is their attributes and relationships.

Entities

How to begin? You may start by listing the terms used in the business, and consider creating an entity class for each term. Some people say to list the nouns written down in requirements statements. But this is rather trite as an analysis technique. More helpful techniques are suggested below.

Ask about process control requirements

Consider the system as a process control system and name the things the users want to monitor, if not control. E.g. In an order processing system, the users want to stimulate their customers to place orders and to pay for them, to control the creation and completion of orders, and to monitor the stock level of each product type.

Business process modeling can help here, though the data analysis is more to do with considering what it is in the real world that the business seeks to influence, rather than how this influence will be exercised.

Look for business keys

Look for the things that the business already assigns keys to. Business people only give identifying numbers and codes to things they care about and want to keep track of over time. The key of an entity state record is not just a database concept; it is a necessary business concept.

“This point sometimes seems to be lost on object-oriented folks.” Michael Zimmer

A key enables users to:

·       distinguish an object from another of the same class and

·       map an entity state record onto a real-world object in the business environment.

So, you may reasonably start by looking for identifying numbers and codes that are important in the business - customer numbers, product codes, order/invoice numbers and so on.

Specialise generalised abstractions

Analysts can prompt business people to think about their business by starting with some generalised super classes and asking about the specialisations that the business people are interested in.

“Abstractions are useful in discussions with users. Abstractions force users to think more broadly about their business, and can be an aid to reengineering the business.” Michael Zimmer

David Hay tells me his enterprise models feature entity types that are general enough to be recognizable by people in every business he models.

David’s entities

aka

Party

 

Product Type

or Item Type, or Asset Type

Product

or Item or Asset

Activity Type

or Procedure

Activity

usually along with Work Order

Contract

or Order or Agreement


Most people recognise broad generalisations such as those. So you can use these to uncover the subtypes that are specific to the business domain you are working in. I like the alliteration of the P words in my scheme below.

Graham’s “P” entities

David’s entities

David Hay’s comments

Party

Party

Party subtypes into Person and Organization.

Person

A subtype of Party

 

Partnership

Contract

 

Product

Product Type

And Product

Process and Event

Activity Type and Activity

I have to model these for some clients, but not others.

Place / Point In Space

Real Spatial Element

I show subtypes of this in a later chapter.

Point In Time / Date

 

Never an entity in my models. But all my intersect entity types have beginning and ending dates.

Trial and error

In practice, experts normally start by guessing a few major entities and naming them in boxes. They then use their expertise to ask the right analysis questions - the questions that will help them to refine their initial guess. I am interested in cataloguing the analysis questions that experts ask.

One of the messages of pattern-based modeling is: Don’t worry too much about getting the entities right to begin with. Looking for patterns will help you to assure quality, and to correct whatever picture you draw to start with.

“How does this relate to the sorts of patterns that others have published?” Michael Zimmer

I am talking about a different kind of pattern, useful for asking questions and making model transformations. Wait and see.

Predicates

Predicates are characteristics that define what an entity is. Close on the heels of naming an entity, you may extend the range of business terms by naming at least some of its predicates. E.g.

Predicates of the customer entity

Customer Number

Name

Country

Telephone Numbers

Predicates are an excellent, if rather data-oriented, way of looking at entities.

“Bob Schmid, in his book Entity modeling for Information Professionals has an original way of describing entity modeling that I think is quite brilliant. Among other things, he discusses "Predicates” early as characteristics of a Class.” David Hay

Predicates are structural features. Some object-oriented designers prefer to define an entity by its behavioral features, its services or operations, but I don’t take that view until I get to discuss rules in behavioural models later.

It is usual to divide the predicates of an entity between attributes and relationships. An attribute becomes a relationship when the attribute is specified separately as an entity in its own right (e.g. country).

Entity

Predicates

Attribute or Relationship

Related entity

Customer

Customer Number

Attribute

 

Name

Attribute

 

Country

Relationship

Country

Telephone Numbers

Relationship

Telephone Number

The distinction between attributes and relationships is a subtle one. A question that helps to make the distinction clear is the question of uniqueness. Do users care if two objects have the same value for a given attribute? Can they change one value without changing the other?

Users don’t care if two customers have the same name; they can change the spelling of one name without changing the other. So, “name” is only an attribute, not a relationship to a uniquely identifiable entity.

Users care rather more if two customers have the same country; they would not want to change the spelling of a country name without changing it everywhere it appears. So, “country” is more than an attribute, it is a relationship to a uniquely identifiable entity.

“What about the situation where for one class there are multiple values of an attribute? The old first normal form issue.” David Hay

I treat first normal form as a policy rather than a rule. There are some cases where a repeating attribute (e.g. telephone numbers above) is reasonably regarded as contained within an entity, rather than turned into a child entity related to the parent entity. I will return to this in the paper “Aggregate entities”.

“Terry Halpin’s ORM is clever in that it treats both entity types and data types as ‘objects’. An attribute in relational-speak becomes a relationship to a data type. Among other things, this means that you can relate an entity type to a data type, and if the datavalue type later turns into an entity type, the changes required to the model are minimal.” David Hay

I have explore this pattern and transformation in later chapters, but I usually stick to a more conventional view, closer what people expect a database schema to look like.

Typical attribute terms

Attribute word

Notes

Description

text description

Memo

narrative, large text blocks

Name

well nigh an identifier, but no uniqueness constraint

Short Name

abbreviation

Number

what people call numbers, may include characters

Locator

map co-ordinates, postal address, email, phone number

Amount

usually currency, could be a Balance

Value

usually currency

Measure

quantity, size, length (not currency)

Sequence

 

Date and time

 

Date

 

Time

 

Indicator

short range of values: Boolean (true, false) and longer (yes, no, undecided)

Code

medium range designator: countries, colours, tastes…

Identifier

long range key: name, national insurance number

Image

picture

Video

moving image

Sound

voice, audio

Document

 

Executable

 

Structural facts

Every attribute or relationship is not just a term; it is also a structural fact. It specifies a relationship between one structural term (an attribute or relationship name) and another structural term (an entity name). Each attribute and relationship is potentially an entity in its own right, and you can view the name of an attribute or relationship as the name of a connection to this other potential entity.

Consider a system to record the pupils and teachers in schools run by local authorities. The boxes in the figure below represent terms used in the business. The lines reflect facts of life - reasons why terms are related in this business. Local Authorities employ Teachers; Pupils attend Schools, and so on.

You might initially represent the facts by connecting entities with wavy lines that don’t specify constraints, but multiplicity constraints will press themselves upon your attention very quickly, and you should capture them as soon as you can.

Discovering terms and facts

Though it may be declaimed as heresy by some object-oriented purists, a very good way to discover entities, attributes and relationships is to use relational data analysis to divide relevant data structures (especially legacy databases and required outputs) into normalised relations.

“Craig Larman, in his book on OO, has a half page discussing the idea of normalisation, without once using the term as far as I can see.” Michael Zimmer

I’ll say more about relational data analysis later.

Structural constraints

Terms and facts are fundamental. They come first. But you can’t do much without the constraints; this is where all the useful stuff is.

A constraint is a business rule that limits the way entities are born, change state or die, or limits the values of attributes that are stored. A model with just six or seven entities and relationships might require the specification of 300 constraints.

“I don't question that you have observed this, but I am surprised.” Michael Zimmer

A member of the Business Rules Working Group reported those numbers to me. Don’t forget that constraints include the data type of every data item, and every other precondition for valid processing.

Some constraints can be captured as rules governing the multiplicity of attributes and relationships. On discovering a fact that connects two terms you may immediately ask mathematical questions about the constraints that govern an object at one end of the relationship:

Ask of each end of a relationship about its optionality - can the object exist without the relationship? And its multiplicity - how many objects at the other end can the object at this end be related to?

It is a pity that UML hides the optionality of a relationship inside the definition of multiplicity. So you have to look the far end of the relationship (right across the page sometimes) to see whether it is optional for that entity or not.

Remember technology-independence: the relationship lines in the model show facts about a business, they do not necessarily define how pointers are stored in objects or database records; this is a lower level of specification.

When drawing an entity model, you make no technology or implementation-level decision. You don’t choose between different database management systems, decide which objects store pointers to other objects, choose between pointer chains or indexes, or choose between object-oriented and procedural programming languages.

Getting the semantic constraints on a relationship right is important. Relationships control how the system behaves and performs. The relationships not only constrain the behavior of the system. They also act as message passing or navigation routes between objects of different entities.

“This seems to be the fundamental difference between a class diagram and an entity relationship diagram.” Michael Zimmer

I have to disagree with you. The relationships in an entity model show the possible navigation routes between entities. These turn into message passing routes if you encapsulate the processing of each entity in the form of object-oriented style classes.

“You weren’t forceful enough in your reply to Michael. An entity/relationship diagram is technology independent. A relationship is structural. We don’t care what kind of database structures or processes will be required to implement it.

A relational database implements relationships with foreign keys. (This is why IDEF1X is fundamentally a design technique.) Foreign keys are fundamentally structural. You “navigate” them with joins.

An object-oriented designer implements relationships with program code. A “behavior” for a class may include navigating an association to get information from another class. The navigation may be both directions, or always in the same direction.” David Hay

I am not so anti-relational as that. For me, the foreign key is merely an alternative representation of a relationship, and I see no harm in that. I find the presence of foreign keys in an entity-attribute definition can help me to define some business rules in a concise way.

Primitive data types

Among the most basic constraints are primitive data types. Somebody, somewhere has to specify the data type of every data item in an enterprise application. How to do this in a technology-independent specification?

In the absence of an internationally-accepted standard, can we give analysts an instantly understandable shared language? Ignoring mathematical and complex number formats, people tend divide data items into five broad categories.

Data type

Perhaps applicable to attributes of this kind

String

Description, Memo, Name, Short Name, Alpha Number, Locator

Number

Amount, Value, Measure, Sequence

Date and/or time

 

Label

Indicator, Code, Identifier, anything defined with a uniqueness constraint:

Complex object

Image, Video, Sound, Document, Executable

I don’t mention primitive data types in the examples that follow. But they are important. And before I leave them, which level of model do we declare the primitive data types in?

Technology level? Yes. Primitive data types must appear at the technology level. Each implementation technology provides its own range of data types, or requires that you define them.

Enterprise application level? They do belong in the enterprise application model. You certainly have to define any user-defined data type (say country codes) and derivation rules. And you have to define any non-trivial display formats on outputs (e.g. a date might appear in several formats). For each model you build, you should have a list of the primitive data types. But in practice there are three reasons to exclude primitive data types from a model at this level.

·       First, there is no internationally agreed standard.

·       Second, they will in any case have to be translated into different data types for the given technology.

·       Third, they are relatively trivial; you can surely trust intelligent educated developers to define them, and they have to be involved in system analysis at least to this extent.

Enterprise level? If the enterprise level truly is a model of real-world objects, then it might be argued that data types do not belong there. Data types apply to entity state records rather than to real-world objects.

Structural derivation rules

A derivation rule derives data by some kind of calculation from other data. E.g.

Attribute of Invoice

Derivation rule

InvoiceNumber

= LastInvoiceNumber + 1

AmountDue

= AmountBanked + AmountRemaining

AmountBanked

= AmountDue - AmountRemaining

AmountRemaining

= AmountDue - AmountBanked

Which attributes are stored and which are derived? You can derive any one of the three ‘amount’ attributes from the other two. But you don’t need to specify the derivation rule against all three attributes. Any one of them will do. By convention, specifying a derivation rule against one attribute (say AmountRemaining) implies the other two are stored attributes, not derived.

The illustration above might be part of an entity model, or it might be part of a database model. These different models need to be considered separately.

Storing derived data in a database

Some people insist that since derived data is redundant, it should never be stored in a database. But refusing to store derived data has led many systems designers into a wasteful excess of redundant processing. There is always a trade-off between update and enquiry efficiency.

In fact, financial institutions (banks, insurance companies and the like) do maintain a good deal of derived data in their databases. Or do you think your bank calculates your account balance every time that you request it, by working through all your transactions since you opened the account, adding all the credits and subtracting all the debits?

So, database designers may decide to store the AmountRemaining, or not. Some technologies allow them to specify this by declaring a derived data item to be ‘actual’ or ‘virtual’. This is one way that behavioral processing operations have crept into the database paradigm.

“Sign me up as in favor of representing derived data on the diagram. It is essential in explaining what’s going on. I usually use typography (parentheses, or a leading /) to describe a derived attribute, and the derivation logic itself, of course, has to be documented behind the scenes. You correctly point out that to assert that an attribute is derived in the model says nothing about how that derivation should be implemented. Way back in the early 80’s I used a wonderful dbms that had the concept of a derived field. This was the first time I had encountered this idea and it was wonderful. It made coding whole chunks of the business logic trivial. The only problem was, when we ran a query, the lights would dim. It turns out that in many cases, derived data should be derived when the data are input, not when the query is run. Oh, well.” David Hay

Separating specification from implementation

Derived data is important to users. Ideally, a business rules model will define all data that is important to users, whether it appears in the form of information displayed on a user’s screen, or is used in testing constraints on input events.

The analysts job is to define the universe of discourse of system purchasers and users. At least some derived data and derivation rules are part of this universe of discourse, and naturally belong in a business rules model. Analysts should be able to name a derived data item as an attribute, and specify this attribute in the form a derivation rule, without deciding whether the attribute will be stored and updated in a database, or derived when needed for an enquiry.

E.g. A Business Rules specification should include the AmountRemaining, but need not say whether it will be stored as a data item, or derived when needed by calculation.

The figure below shows a fragment of an order processing system’s specification. It shows that four of the attributes are derived by calculation from other attributes in the model.

This diagram presents four calculations as structural derivation rules, defining them as properties of attributes in the entities of the entity model.

In fact, the specification is incorrect. The four supposedly structural rules are not applied on the Order Registration event, nor or on the Order Item Addition event that adds an Order Item to an Order. The business would regard it as a mistake if the rules were applied at these times. The four derivation rules in our example are only fired by an Order Closure event, and only guaranteed to be true just after that event has been completely processed. So these derivation rules belong in a behavior model. See the companion book <The event modeler>.

Some questions and answers

Do we build our entity model for software engineers to use?

Yes. I don't really mind what conceptual models people draw during analysis, and/or for communication with users. It is the hand over to design I worry about. I see too many analysts drawing models for users that have to be completely rebuilt by the designers - and some analysts never realise that.

“I recognize this problem. It's a two-way street. Yes, we modelers should work with designers to make sure that they understand the intentions and implications of the models. But the designers could but more energy into understanding the models as well.

I rarely have problems [with users]. It seems to be the developers who have the most trouble understanding the kind of abstraction that is a model.” David Hay

I am all in favour of discussing models with users. I am wholly against expecting users to validate a model. They are not equipped and don¹t attempt to understand an entity model in the way that designers and developers do. Developers worry about the implications for design and code, and that makes them (reasonably in my view) a tougher audience.

“Ah, but the point is that a conceptual model is not supposed to be concerned with whether it can be implemented. It is supposed simply to describe the nature of what is.” David Hay

That’s OK if you model an enterprise per se. I want my models to be used by software engineers coding enterprise applications. For me, a conceptual model is supposed to be logical (technology independent) but it is also supposed to capture specific requirements and, in the end, be codable as the basis of a system that meets those requirements.

Do we define attributes, or hide them behind operations?

A few object-oriented purists do not define attributes, define only operations. They say ‘encapsulation’ means the attributes should be hidden behind the operations. This is plain silly for most enterprise applications. Do define the attributes.

When you name a boring attribute like CustomerAddress and define it as freely updatable, this is a short-hand way of saying that a CustomerAddress value can be retrieved by an enquiry operation from a Customer object, and overwritten with a new value by an update operation on that object. Spelling out such trivial operations in a specification would be tedious and unhelpful.

Do we store derived attributes as persistent data?

That is a physical design decision. When you name an attribute like AccountBalance and define its derivation rule (say, AccountBalance = Credits - Debits) you are not saying whether AccountBalance will be stored in a database or derived by an enquiry operation. Whether the business fact is implemented in the form of data or process is a design decision.

Can we build an entity model without a behavioral model?

You can, but it really is better to consider both data and processes in parallel. Even if you don’t care to document behavior, an entity model does imply behavior. The attributes in the entity model represent the state data (local variables) of long-running business processes.

“I understand this from your volume “The Event Modeler”, but it is probably not commonly known.” Michael Zimmer

These long-running processes can be represented in the form of state machines. And the relationships in the entity model declare which of these state machines are able to locate and talk to each other.

Do we build the entity model before the behavioral model?

Not necessarily. You might start with some behavioral analysis that identifies the use cases and the events or transactions to be processed. You might even specify the behavioral operations of an entity before its structural attributes. I have done this on process control system examples.

But for enterprise applications, defining the entity model first is natural. A typical enterprise application maintains a large data structure, and most of the operations merely store or retrieve the values of variables in that data structure.

Does an entity model imply a database?

Not necessarily. I have drawn entity models (using patterns in this book) for process control system specification, where the state is merely a few variables stored in memory. But for enterprise applications, the entity model does usually imply a database.

·       A database becomes necessary where there are so many parallel-running state machines you cannot hold all their state data in main store.

·       And a database is practical where almost all state machines are 'asleep' almost all of the time, so most of the data is inactive.

These two conditions pretty much define an enterprise application.

“I see this relates to other papers where you discuss the essential differences between enterprise applications and the embedded systems often used as case studies in the object-oriented world.” Michael Zimmer

Is an entity model exactly the same as a data model?

No. Sometimes the ‘right’ entity model corresponds to a relational database structure; other times it differs. The entity model of a business services layer is not ‘merely’ a data model. It is the structure against which behavior is specified, and operations are coded, just as any object-oriented designer would expect.

Does an entity model allow denormalization?

Yes, in two ways. An entity model allows division of one entity into smaller parallel aspect or role entities.

“This would surely make relational purists have a fit.” Michael Zimmer

Probably. But this kind of denormalization can be useful where an entity has parallel state machines or life histories. And it is essential where stored data is distributed, perhaps as a result of component-based development.

And an entity model also allows aggregation of child entities with a parent entity into an aggregate entity. However, this kind of denormalization is perhaps not as common as you might expect from reading books on object-oriented design, for reasons explored in the later chapter called <Aggregate entities>.

From data model to entity model

An enterprise application is a software system that records the state of entities in the business. The entities in an enterprise application tend to differ from entities in other kinds of application in two ways:

·       there are thousands or millions of them, and

·       they persist while the computer is switched off.

For these reasons, the state of enterprise application entities is usually stored in a database. And these reasons do influence they way you draw entity models. I start here with a traditional database model.

Some object-oriented designers are uncomfortable with the database, look on it as “the crazy aunt in the attic”. But it is fundamental and they neglect it at their peril. You can always reverse engineer an entity model from a database schema. This chapter starts with a physical database structure and explores ways to represent the business rules in the more conceptual model of classes and relationships that I call an entity model.

Terms and facts appear in software specifications as the names of entities, attributes and relationships. You can document terms and facts on their own, outside the context of a model in which constraints and derivation rules are also documented.

“I would prepare a glossary of business language, even if some of the terms and facts were not part of the model, just because the client will use them in discussions.” Michael Zimmer

But facts tend to disappear, because as soon as you start to draw an entity model, you merge facts with constraints into the form of relationships. Constraints and derivations appear in software specifications as invariant conditions and procedures attached to entities, attributes and relationships.

“Ah ha! As I generalize my models, I remove some business rules. While that sounds dramatic it in fact is not.

As you have observed, three categories of business rules are terms, facts, and derivation rules. Those stay very nicely in my models.

I have specifically excluded the constraints (except for multiplicity, of course). First of all, these are entity models we are creating that describe what can be done. It is not appropriate for them to also try to describe what may or may not be done. That is a different kind of model. Ron Ross tried to lay constraints on top of entity models with his notation, which demonstrated this point, even though it's a terrible notation.” David Hay

Here lie some differences between us. My concern is application-specific entity models rather than generalized models. I want my entity models to specify as many business rules as they can bear. I leave it up to the designers whether these rules are coded in programs or built into a database structure. Second, I don’t believe you! You don’t remove every constraint. Every one-to-many relationship constrains entities at the many end to be related to no more than one entity at the one end.

I believe it is true that the lack of a distinct behavioral model has forced Ron Ross into notation overload. I'm not sure Terry Halpin's Object Role Modeling (ORM) escapes this criticism entirely.

“Object Role Modeling does model many constraints very well. It is a completely different approach to modeling, however, and I haven't had enough experience with it to know how to relate my patterns to it. I don't mind leaving constraints out. These are the things that do change a lot, so I am happy to model them as a separate exercise. I also look forward to the day when there are tools that let us model them (and change the models) separately from the database design effort.” David Hay

I am concerned to model constraints. I propose people should capture invariant constraints in the entity model and capture transient constraints in the behavioral model.

You make the point that it is shaky linking business rules to data, since data change. That is the basis for your argument that they should be linked to behavior. But if you follow my philosophy, the data structures won't change. While rules may change, the kinds of things they refer to won't.

Most businesses I enter are in an environment of total chaos. When I can show them the relatively simple structures that underlie their business, it gives us all an opportunity to examine what is truly unique about the organization and address it systematically. Every model I create is unique, just for that client. It is only that every model starts with the same underlying structure. By giving them an understanding of what is fundamental, I give them the ability to be truly creative with what is unique.” David Hay

Again, and it is worth repeating to be sure all readers know what I am talking about. My concern is application-specific entity models, where you have to define the business rules somehow. I am using an entity model to specify constraints. I am happy to leave the database designer to decide whether these constraints will be implemented in the database structure, or coded by programmers. One way or another, constraints do have to be specified and implemented.

A data storage structure

There are various ways to specify constraints by annotations on a data storage structure. I can use a case study to illustrate some ideas in an informal way. I will revisit the concepts and notations in more detail in later chapters.

Figure 4a shows how tables are connected in a relational database structure. A common convention is to underline the primary key for a table, that is, the unique identifier used to distinguish one row in the table (one object of the class) from any other.

Fig. 4a

Where the primary key appears in another table, it is called a foreign key. A foreign key imposes a uniqueness constraint on a table; it says an entry in that table cannot be related to more than one entry in another table (where the foreign key appears as a primary key).

You can think of a foreign key, or any attribute that is not a primary key, as a rolled-up relationship. When you list attributes inside a table, you are saying that each attribute has a 1:1 relationship to the table.

In small examples, you can list the keys and other attributes inside boxes on the diagram. This is impractical where there are hundreds of tables. So most CASE tools provide a documentation scheme to back up the model and help you retrieve the attributes behind a box when you want them.

Relational database tables

I have stolen a case study from Halpin [1995]. Figure 4b shows the tables in a relational database and the relationships between them that are implied by foreign keys.

It is reasonably clear that a single entry in one of these tables (one object of one of these entities) will map onto a thing in the real world that users of the system will understand: a Person, a Paper, a Referee and so on.

Similarly, the value for an attribute in a table will represent a fact about a thing in the real world: a Person’s email number for example.

Some people view each database table as a business entity class, and each row or entry in a table as the state of a business entity object. But a relational database design is not a full conceptual model. There are many business rules missing from the structure of tables shown above.

Constraints in attribute specification

Constraints on attribute values

The Referee and Author tables both contain an attribute that has only a short, fixed range of values. Figure 4c illustrates a convention that shows the range of values within {} brackets.

Author

Person Name

Paper Num

Presenter {Y,N}

Referee

Person Name

Paper Num

Rating {1…10}

Fig. 4c

Where the range of values for an attributes may include ‘null’ because the fact is missing or unknown, some say the attribute is optional (rather than mandatory), some say it is partial (rather than total). Figure 4d brackets the optional attributes in the Person and Referee tables.

Person

Person Name

Affiliation

o-- 

Email

Referee

Person Name

Paper Num

o-- 

Rating {1…10}

Fig. 4d

In practice, I am lazy about distinguishing mandatory and optional attributes; so you won’t see the ‘o’ symbol where it would be appropriate in every one of our examples.

How to show in the Paper table that three statistical attributes (the total number of pages, figures and tables) are only recorded for papers that are accepted? Figure 4e places an IF condition against the optional list of attributes.

Paper

Paper Num

Paper Title

Paper Status {approved, undecided, accepted}

Selection

Optional attribute group

o--  If paper selected

Total pages

Total figures

Total tables

Fig. 4e

The table brackets the group to show that if one total exists, then they all exist.

Constraints between different attributes’ existence

Figure 4j introduces a three-way optional structure into the list of attributes in the Room table, and employs an IF, IF, ELSE structure.

Paper

Building Num

Room Num

Area

Room type [lab, lec, office}

Selection

Mutually exclusive attributes

o--  If Room type = lab

Total PCs

o--  If Room type = lec

Total seats

o--  Else

 

Fig. 4f

The table brackets the mutually exclusive options, reflecting the three subtypes recorded as values in the Room Type attribute.

Attribute roles

The Committee table contains an attribute which appears twice playing different roles. Figure 4g shows the two role names in square brackets. It also shows the second role is optional.

Committee

Committee Code {Org, Prog}

Person Name [Chair 1]

o--

Person Name [Chair 2]

o--

Budget

Fig. 4g

Since the primary key has only two values (Organisation and Programme), there can be only two entries in this table, two objects of this class.

Constraints between attribute values

How to show the rule that the same person cannot be both chair 1 and chair 2 of the same Committee? Figure 4h specifies the constraint as a statement against the second of the two attributes.

Committee

Committee Code {Org, Prog}

Person Name [Chair 1]

o--

Person Name [Chair 2]

Not = Person Name [Chair 1]

o--

Budget

Fig. 4h

Constraints in relationship specification

So far, all the relationships have been drawn as one-to-many. How to show that a Paper can have no more than one Presentation Slot? Figure 4i shows this by taking the fork off the relationship line.

Fig. 4i

Figure 4j shows the same kind of semi-optional 1:1 relationship can be used to model the situation where the optional component is a subclass (rather than an different kind of thing connected in an aggregate).

Fig. 4j

The difference between an aggregate of different things and a hierarchy of subclasses is not always obvious. Later chapters explore the difference further.

The triangle symbols here are only for the human reader. The relational database designer might implement these is-a relationships using ‘foreign keys’ in the normal way.

Introducing subclasses to impose constraints

The yes-no attribute in Author masks the fact that there are really three reasons why a Person may be related to a Paper. Figure 4k shows these many-to-many relationships as distinct boxes.

Fig. 4k

A multiple V shape (one of the patterns in the volume ‘Patterns in entity modelling’) prompts us to ask about constraints on the relationships between the child or link entities.

Figure 4l reshapes the model to show the constraint that a Presenter must be an Author, but an Author may not be a Presenter.

Fig. 4l

Figure 4m reshapes the model to show the constraint that a Presenter can only be related to a Paper that has been accepted.

Fig. 4m

Figure 4n reshapes the model to replace the two subclasses by an optional relationship from Author to Paper.

Fig. 4n

Figure 4n implies the primary key of Paper, Paper Num, will appear twice in Author, playing two different roles; the second role is optional since an author may not be selected to present their paper.

Author

Person Name

Paper Num [author]

Paper Num [presenter]

Not = Paper Num [author]

 

Figure 4o shows the second Paper Num can be transformed into a yes-no attribute without loss of meaning.

Author

Person Name

Paper Num

Presenter {Y,N}

Fig. 4o

The yes-no attribute in the table I started with implied a second relationship from Author to Paper. The relationship may be obscure, but it does belong in a full conceptual model.

Conditional relationships

How to show that a Room cannot be used to present a Paper if it is an Office? How to show that a Paper cannot have any presenters until it has been accepted? Figure 4p shows both these constraints by writing IF statements on relationship lines in the diagram.

Fig. 4p

Rather than write IF statements in the boxes and on the relationships, you can introduce is a relationships into the data structure, then attach attributes or relationships that are specific to the subclasses to them rather than to superclasses where they are optional or mutually exclusive.

Figure 4q shows the is-a tree more graphically. So you can see more readily that only an Accepted paper can have a Presentation Slot or Presenters selected for it, and only a Presentation room (not an office) can be used for presentations.

Fig. 4q

I do not recommend you introduce subclasses like this to define constraints on optional data, I am only showing it is possible.

Remember also, for the moment, the triangle symbols are only for the human reader. The relational database designer might implement these is-a relationships using ‘foreign keys’ in the normal way.

Figure 4r shows how you might extend the original relational database structure to show enforce at least some of the business rules.

Fig. 4r

Figure 4r is not meant to be a definitive conceptual model for the case study. There is one more transformation worth illustrating before I leave this example.

Turning attributes into relationships

Starting from an entity model where entities have lots of attributes but a few relationships, you can turn all the attributes into parent entities.

You can relate each non-key attribute (other than primary keys) to a key-only parent of the original entity. The key-only parent entity (otherwise known as an operational master entity, a collection entity, or a categorising entity, or perhaps a domain entity) stores the valid or actual range of values for the attribute.

Figure 4s shows the kind of diagram that might result from turning attributes into relationships. You may compare it with the diagram on page 381 of Halpin’s book.

Fig. 4s

You need not show all non-key attributes as entities. However, you will want to raise the status of some attributes in this way.

Why and when should you do this? Chapter 5 provides some analysis questions. Briefly, you should consider whether the range of attribute values is controlled by users or by designers, and whether the attribute has properties of its own.

Some teach that a logical model must be simpler than a physical one. So it is worth noting is that the conceptual model above is a good deal more complex than the set of physical database tables I started with.

So where does the complexity get coded if it is not in the database structure? Probably in processing rules. You do need to understand how to capture business rules in process specification as well as in database specification.

I will revisit some of the ideas illustrated by Halpin’s case study, and add many more.

 

 

References

Ref. 1:   “Software is not Hardware” in the Library at http://avancier.co.uk

 

Footnote 1: Creative Commons Attribution-No Derivative Works Licence 2.0

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.co.uk” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it. For more information about the licence, see  http://creativecommons.org