This page is published under the terms of the licence summarized in the footnote.
“Aggregation and Composition are one of my biggest bete noirs” page 80 of ‘UML Distilled’ Martin Fowler.
Should we specify a coarse-grained entity/class such that it contains several finer-grained entities/classes?
Or specify only the finer-grained entities/classes? (Leaving composition to be done by processes when needed).
Should we specify an entity/class to have a multi-valued attribute?
Or specify classes to have only single-valued attributes?
These are questions over which object-oriented designers and entity modelers might come to blows.
I have corresponded with an object-oriented guru, James Rumbaugh, about aggregate entities and the related issue of multi-valued attributes.
Some of our correspondence is included in the paper below.
It tends to the conclusion that aggregation is used to optimisation a physical design, but adds little or nothing to a purely analytical business rule model.
Aggregation based on kernel entities
Aggregation based on aggregation/composition relationships
The weakness of the aggregation concept
Composition in the business layer
Composition to suppress detail
Enterprise applications typically maintain a persistent data structure, which is describable as a set of related entities. E.g.
Customer ---< Order ---< Order Item >---- Product
Enterprise applications typically consume and produce input/output flows that are aggregate data structures.
A use case/session is usually supported or enabled by a particular data structure displayed at the user interface. E.g.
<![if !supportLists]>1. <![endif]>In the "order capture" use case, the customer enters or views an Order-topped data structure: Order ---< Order Item.
<![if !supportLists]>2. <![endif]>In the "study product demand" use case, the product manager views a Product-topped data structure: Product ---< Order Item.
The Model-View-Controller design pattern is used by OO programmers to structure the software that supports a use case/session:
The Views represent the HTML pages displayed during the session.
The Controller handles the events and enquiries entered during the session, which either update data in the Model or fetch data to populate a View.
The Model is the data structure relevant to that session. E.g.
<![if !supportLists]>1. <![endif]>In Model 1, the Order Item appears as an element in an Order aggregate data structure.
<![if !supportLists]>2. <![endif]>In Model 2, the Order Item appears as an element in a Product aggregate data structure.
A user interface naturally represents an I/O data flow in a hierarchy and sequence that aggregates elements in one direction rather than another.
This paper is about whether the underlying Business Model should aggregate the Order Items in any particular direction.
Specifying a business rules model involves a certain amount of unavoidable work.
If your model is detailed down to the level of fine-grained normalised entities, then all one-to-many relationships are made explicit in the structure.
If not, then these relationships must still be documented, not in the entity model diagram, but behind it.
People seem to prefer abstracting from a fully normalised entity model, to specify coarse-grained classes.
Some simply want suppress detail from the diagram, perhaps to facilitate discussion with users.
Others are looking to define fewer, coarser-grained classes for the purpose of defining fewer, coarser-grained operations.
How is this done?
In practice, the grouping of entities into aggregate entities is normally done in one of the following two ways.
A kernel entity is one with a simple primary key recognised by users (say Order).
A characteristic entity is one whose existence depends on a kernel entity (say Order Item).
You can define an aggregate entity (also called Order) composed of the kernel entity with its characteristic entities
Any entity who existence depends on the existence of a kernel entity, might grouped into an aggregate with the kernel.
Typically, a dependent class can be recognised by the fact that its primary key is an extension of the kernel entity's key.
For example, in this model:
Customer ---< Order ---< Order Item >---- Product
Order ---< Payment
Suppose the primary keys of the Order Item and Payment classes are formed by extending the Order Number with a further identifier.
Then you might define an aggregate composed of Order, Order Item and Payment.
Coarser-grained classes mean coarser-grained operations.
UML includes aggregation and composition relations, which may be drawn between two classes
Aggregation connects a component class to the class that “contains” it in some sense.
Aggregation is a vague concept that can be given more substance by thinking of the associated objects’ state machines.
Do the associated objects share the same lifetime? even the same state transition diagram?
“An aggregation relationship implies that the object and its owner have the same lifetimes” Gamma etc. Design Patterns' Prentice Hall 1995
Consider two classes – parent and child – in which the identity of the child entity is constructed by extending the identity of the parent entity.
<![if !supportLists]>· <![endif]>The parent entity - Order - has the primary key Order Number.
<![if !supportLists]>· <![endif]>The child entity - Order Item - has the primary key Order Number + Item Number.
Are these two classes? Or one aggregate class that contains the other?
James Rumbaugh: “In this case I would treat them as separate objects.
You want to manipulate them separately and treat the order item as something you can change, with the changes reflected to the order.”
Graham: One might ask: Do you ever invoke an operation on a child object without invoking an operation on its parent?
If yes, then they are better specified as two distinct classes.
E.g. Do you ever enquire on, update or delete an Order Item without accessing its Order?
If yes, it is better to specify Order Item as a distinct class.
Composition is a tighter form of aggregation.
James: Note the concept of "composition" that we have added to the UML.
“A composition is a strong aggregation in which the composite is the sole owner of the part and is responsible for its creation and destruction.
The part cannot exceed the lifetime of the composite.” James Rumbaugh 1998
Graham: Is the concept of composition helpful here?
Does it add much to the relational database principle of referential integrity between child and parent.
The principle that the ‘child cannot exceed the lifetime of the parent’ applies to any child-parent association where
<![if !supportLists]>· <![endif]>the child’s primary key includes the parent’s key, or
<![if !supportLists]>· <![endif]>the relationship is mandatory and fixed at the child end.
James: “I’m not sure you can distinguish composition and association so well in the real world, or in an analysis model.
But at the design level I think it is clear.
The composite has responsibility, and sole responsibility, for the memory management of the part.
There will be no conflict over the reference and no danger of dangling pointers if the owner deletes one.
It is a guarantee that there will be no garbage collection problems.
Therefore physical embedding is composition, because the part is allocated and de-allocated with the whole.
But you can use a pointer to memory off the heap, but it may not become independent.
That is the meaning of composition in a practical sense: it doesn't matter if the part is physically part of the whole or stored in a separate block, it is handled the same way.”
Graham: In any case, the UML definition of composition isn’t applicable to this case study.
Because Order Item is owned not only by Order but also by Product, it is a cross-reference between them.
An Order Item cannot exceed the lifetime of its owner Order, but nor can it exceed the lifetime of its owner Product.
(Nor any other fixed mandatory parent, including indirect parents such as the Customer who placed the Order).
There is one more analysis question to be asked of the case study example.
Can an Order include two Order Items for the same Product?
If not, then you might do better to redefine the primary key of the Order Item as a compound of Order Number and Product Type.
And so prevent several items on one Order from requesting the same Product.
Martin Fowler again: “Aggregation is easy to explain glibly.
The trouble is, there is no single accepted definition of the difference between aggregation and association.
In fact, very few [methodologists] use any kind of formal definition.”
The aggregation concept seems natural where object-oriented programming was first successful, that is, in the handling of graphical user interface objects.
Consider a dialogue box that can be moved across the screen.
All the objects (buttons, fields, whatever) within that dialogue box have no existence before, after or outside the box, and must move with it.
There are three reasons why aggregation relationships between business information classes far less natural.
1) There is a scale of associations from weak to strong
“Aggregation and acquaintance relationships are easily confused” Gamma et al. ‘Design Patterns' Prentice Hall 1995
It is easy to waste time arguing about the distinction.
It seems better to regard association relationships as being placeable on a sliding scale from tight aggregation relationships to loose acquaintance relationships, without any firm dividing line between them.
2) Aggregation and association relationships are normally implemented the same way
They cannot be distinguished in the compile-time structure or the implementation language.
Martin Fowler again “The cascade delete is often considered to be a defining part of aggregation, but is [clearly applicable to structures that are not aggregations].”
3) Persistence undermines aggregation.
Time destroys compositions.
Time reveals apparently strong aggregation relationships to be weak associations or acquaintance relationships.
Events break up aggregates (the Soviet Union, Yugoslavia and the United Kingdom come to mind).
In short, aggregation is a fragile concept.
Aggregation relationships seem natural in a static unchanging data structure, in a short-lived data structure, in a transient input/output view (like a graphical data structure perhaps).
But where entities persist and change over time, the concept of aggregation, composition or containment is a weak one.
There is a debate about whether an attribute can have plural values.
According to most people’s interpretation of relational theory, an attribute that is plural (e. g. Telephone Numbers) ought to be specified as an independent class of objects.
Applying this ‘first normal form’ principle sometimes leads to a different model from object-oriented design, which allows what might be called an aggregate entity that contains multi-valued attributes.
Customer entity: attribute list
James: I wouldn’t rule out attributes with plural multiplicity, in the situation that the set of values has no independent identity.
Graham: Asking about uniqueness constraints might help.
Do users care enough to prevent an attribute value appearing twice in the list? If they don’t care, then the multi-valued attribute is reasonable.
If they do care, this implies that each attribute value uniquely identifies something (a telephone), and suggests you should normalise it to become a distinct class.
James: I might put it differently.
If the reference is to a telephone as an object, you could change the number and all of the users of the telephone would see the new number.
If it is just a string in each list, then it has no identity.
Graham: There is also a more practical design question. How long is a list of attributes?
A database designer might not like either a long fixed-length list (empty in most cases) or a variable length list.
James: As an implementation issue you would likely make it (Telephone) a class, but that's not the modeling issue.
Graham: Strictly, the designer or implementer’s decision should not influence the builder of the business rules model.
But it is often tempting to align the entity model with the database model.
And in the case of multi-valued attributes, it normally seems convenient and harmless to do so.
It is easy to draw an aggregate entity box around a parent entity and some or all of its children.
But does such composition actually help?
User interface designers find composition useful.
A window or dialogue box can be regarded as an aggregate entity.
The user may trigger operations that act on the whole aggregation, rather than elements of it.
Database designers find composition useful.
Group entities into an aggregate entity may imply the clustering of tables into a ‘block’, so as to speed up processes that access data from closely related entities.
Composition does seem a good way to document the relatively uncommon situation where a class has multi-valued attributes.
Such as the list of telephone numbers illustrated earlier.
And composition might also be used in circumstances constrained by the following two rules.
Include only single-parent characteristic entities with the parent in the aggregate entity.
A single-parent characteristic entity is a child that has no relationship other than to its parent.
Do not include any characteristic entity upon which operations can be invoked separately from the parent entity (which implies there is a distinct identifier for the characteristic entity).
But many people are insensitive to these constraints.
They often include link entities, and sometimes even parent entities, in aggregate entities.
They may do this for many reasons, but their reasons are always to do with design rather than analysis.
There is a limit to how much I can conclude from an email correspondence.
But I have the impression that James Rumbaugh also regards most composite classes as artefacts of physical design rather than analysis.
The business rules model is not supposed to be influenced by the design decisions of user interface designers and database designers.
There are only a very few cases where the concept of an aggregate entity seems helpful in a business rule model.
Beyond these cases, specifying composite classes in a business rules model seems inappropriate and unhelpful.
It forces the analyst to make a design decision that need not be made, a decision that often has to be undone later.
Some people need to suppress detail from the entity model diagram, perhaps to facilitate discussion with users.
This is a harmless enough idea.
If the aim is merely to draw a higher-level of entity model that suppresses detail from the full one, then a simple procedure will do the job:
Examine every entity that has only one relationship.
If this entity is not of primary importance to the users, then hide it inside its neighbouring entity.
Examine every many-to-many link entity.
If this entity has no attributes or operations of its own, then draw the entity as a relationship line rather than a box.
Some people are looking to define fewer, coarse-grained classes for the purpose of defining fewer, coarse-grained operations.
However, composite classes or aggregate entities are simply too small for component-based development.
This is the reason why people complain that the granularity of object-oriented design is too small.
The practical way forward is to move towards a component-based development approach where many classes are grouped into a coarse-grained business component, and operations are defined at the level of the business component.
The component-level operations, or business services, correspond to what most pre-object-oriented methods called transactions.
Footnote: Creative Commons Attribution-No Derivative Works Licence 2.0 01/07/2014 21:26
Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.co.uk” before the start and include this footnote at the end.
No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it.
For more information about the licence, see http://creativecommons.org