Notes on model-driven analysis (an old paper that I’d like to rewrite)

This page is published under the terms of the licence summarized in footnote 1.

This page contains a somewhat rambling discussion model-driven analysis, MDA (model-driven architecture) and UML

On abstraction

On conceptual, logical and physical models

Model-driven architecture transformations

Transformations between logical and physical models

Transformations between conceptual and logical models

We just have to work around UML

Fashions over a quarter century


On abstraction

Models are abstractions. An enterprise may own and maintain millions (even a billion?) lines of software code. We all know how difficult it is to read code and understand its purpose.  Even the most extreme of extreme agile developers agree that we need to maintain some abstract specifications that are more concise than the code.

A never-ending fascination of software engineering lies in the immense variety of answers to three questions:

·       What kind of abstract specification is best?

·       How many levels of abstract specification do we need?

·       How tightly do we maintain the abstract specifications in alignment with the code?

Abstraction involves four basic tools: Omission of detail, Composition, Generalisation and Idealisation. These kinds of abstraction are intertwined in everyday discussion. They are intertwined in many of the concepts and examples that people present as abstractions without thinking what they really mean by that word. They are explored in depth in other papers at But I want to talk about abstraction just a little here too.

Omission of detail - and views – are necessary

Omission of detail means ignoring some details because they are not interesting, or they are handled by another view, another person or machine. This saves analysts, designers and developers from being overwhelmed by too much information.

A large and complex specification can be divided into views; each view suppresses details of other views. E.g. four City Plan views might show

·       Surface features only: buildings and roads

·       Underground railway lines

·       Local government boundaries

·       Postal district boundaries

Architecture frameworks similarly divide descriptions into several views, several parallel structures, making it easier to understand and change each on its own. They tend to separate business architecture from applications architecture from technology/platform architecture. In higher level views of architecture, we do not have to specify functions that are provided by the platform (say data storage, indexing, sorting, transaction roll back and transaction logging).

A view does not abstract (by composition or generalization) from details in other views; it simply omits them.

Composition is a fundamental tool

Large things are both divided into, and composed from, smaller things. Grouping smaller components and hiding them behind the interface of a larger component enables people to manage large and complex systems. This has been a tenet of most if not all analysis and design methods since the 1970s.

A component or package is notable for abstracting by composition (and thus Omission of details inside the component or package).

 “Composition is the primary way of giving a structure to a software system. It is how I start when I walk onto a site and try and understand a new business.” Chris Britton

Composition means grouping smaller things into fewer, bigger things, with bigger interfaces. The recent wave of component-based development approaches focus attention on the bigger/higher-level components, rather than the smaller/lower level objects. They also concentrate attention on component interfaces. Attending to interfaces is one of the things that enterprise architects with 100 databases ought to be doing. I sometimes call an enterprise Model that focuses on interfaces between systems a “federal model”.

Generalisation is a dangerous tool

Generalisation means creating a super type. Two or more specific things are generalised into one more general thing. E.g. Vehicle is a generalisation of Plane and Train

A generalisation can save us from having to know the differences between subtypes. Specialisation extends the properties of a supertype. Conversely, generalization omits properties related to specific subtypes. The generalization is smaller and simpler. A little generalisation is a good thing in system design. But generalisation is the enemy of performance and programmability in design. It is also the enemy of understanding and agreement about business rules in analysis.

Idealisation is necessary to form a logical model

Idealisation is a special kind of generalisation, It hides physical details in different implementation variations. A logical entity model (for example) is an idealisation that saves us from having to know any details specific to a database management system technology. Building an idealised model is a tried and tested tool for analysts to develop an understanding of a business. Idealisation typically involves some omission of detail as well as generalisation.

All kinds of abstraction can be hierarchical

Hierarchies are very useful. By successive abstraction from the bottom upwards, you can create a composition hierarchy over any set of items. From your experience of organsing the folder structure on you desk top computer you will recognize that the resulting structure is arbitrary and temporary; it frequently needs restructuring; and many structures are possible.

It is probably evident that all three basic kinds of abstraction can be hierarchical. As Chris Britton wrote to me

“Whenever people in other disciplines have had to organise massive quantities of data they have ended up with one or more hierarchical classifications. And multiple hierarchies are possible.

Composition can be hierarchical, resulting in groups within groups. Of course, describe a software system using composition and you soon find it is not a strict hierarchy because lower level elements can be shared. Informally speaking, it is a network. Formally speaking, it is a directed acyclic graph.

Omission of detail can be hierarchical. The classic example is maps, where the larger the scale the more minor roads are omitted. We use similar techniques in IT, usually to show the big picture of an existing IT system. The difficulty is deciding what is major and what is minor.

Generalizations too can be hierarchical. The zoologist’s classification (including phyla, genera, families, orders and species) is a hierarchy formed by grouping animals according to common properties. These common properties can be arcane and they do not hide detail.“

On conceptual, logical and physical models

The PIM principle

Analysts produce a logical specification (platform-independent model) that takes account of physical design constraints, but ignores the programming paradigm and technology.

What form should an abstract specification take? Many believe it should take the form of a model. Before discussing the practical use of models, we ought to consider what a model is, and the curious nature of software systems.

Modeling involves four activities:

·       Analysis: Studying and dividing a thing into parts

·       Abstraction: Selecting parts and features to be modeled

·       Representation: Drawing or recording an image of the thing

·       Pattern recognition: Model viewers recognize the model as sharing the characteristics of the thing.

A model is an abstraction of the real world. A model can represent only tiny part of the real world. In building a data processing system, that tiny part is a part of the real world we want to monitor if not control using software.

The difference between logical and physical models

I'm not convinced there is a satisfactory consensus about the distinction between “logical” and "physical” models. This table shows some different interpretations.

Supplier dependence

Resource dependence




Supplier independent

Technology or material resource independent

Encapsulated behind external service contracts.

Designed for simplicity and integrity


Supplier specific

Technology or material resource specific

Internal processes and components

Designed for performance (speed and throughput)

This table shows some more specific interpretations.


Data structure

Process / module structure



Logical data model

OO class diagram

Data flow diagram

Application deployment diagram


Clustering into a block of entity types that are accessed together.



Next/prior/owner pointers

Other features specific to a vendor’s DBMS

Language-specific executable instructions (in Java, C++, …)

Platform-specific invocation instructions (using CICS, WebSphere…)

Assignment of modules to locations/devices (via CORBA, DCOM…)

Other features specific to a vendor’s programming environment.

Cache - for response time and throughput

Load balancers and clustering - for availability

Remote replication - for recoverability

Firewalls - for security

Server monitoring - for serviceability

The difference between conceptual and logical models

I'm not convinced there is a satisfactory consensus about the distinction between “conceptual” and “logical” models. Often, a logical model is an abstract description of the structure or behaviour of a specific computer system, and by contrast, a conceptual model an abstract description of the structure or behaviour of a system of interest, unrelated to any specific computer system.

The notion of a conceptual model is well rooted in data modeling. It is less firmly established in process modeling. You might consider a “domain model”, drawn as a class diagram, as a conceptual model of processes. But in practice, the term conceptual model is more usually attached to a model of human activities – a business process model. (The concept of real world event modeling – discussed elsewhere – seems to be missing from most discussion of conceptual models.)



Data model

Process model



independent of computer systems altogether.

A conceptual data model (aka domain model) is a model of business terms and concepts, not necessarily related to any specific database - current or intended.

A computation independent model – a business process model - a model of human activities – drawn without reference to any specific software system - current or intended.



independent of technology choice and of design optimisation decisions.

A logical data model is idealised by “normalisation” to remove duplication (apart from foreign keys); it is usually a specification for an intended database, or reverse engineered from a current database.

A platform-independent model – typically a class diagram of a specific software system - current or intended – with “hooks” wherever platform services (e.g. the start or rollback of a transaction) are needed.



technology specific and/or optimized for performance

A physical data model is the transformation of a logical data model into a database schema (in the language of Oracle, DB2, SQLServer whatever) with the addition of technology-specific detail and transformation in whatever way is needed to meet performance requirements (denormalisation, indexes, whatever).

A platform-specific model -  the transformation of a platform-independent model into a compilable form – in a programming language and connected to the necessary platform technology (CORBA, Transaction Management, ODBC, whatever).

The eternal golden braid + 1

In “Godel, Escher, Bach: the eternal golden braid.” Douglas Hofstedder wrote of the eternal golden braid - a philosophical notion.


Eternal Golden Braid

EA framework

Zachman’s terms

TOGAF’s terms

Descriptions of descriptions

EA meta model

Zachman Framework

“Content framework”

Descriptions of things

Enterprise model

“Descriptive representations”

“Architecture definition doc.”


Operational enterprise

“Operations instances”

“Deployed system”


The trouble is that information systems and software systems don’t fit. A operational software system is at once a reality and a description of a reality. It exists, it runs, but it also models a part of the real world that the system’s owners want to monitor, support or direct.


·         The data in a database represents persistent real world objects (e.g. customer, payment, person).

·         The process and transactions represent transient real world events, things happening in the lives of people or electro-mechanical devices (e.g. invoice, marriage).

·         A transient event can move a persistent object through its lifecycle from one state to the next.


So an operational software system occupies a curious position between description and real world. It adds a 4th strand to the eternal golden braid.


Eternal Golden Braid + 1

Eternal Golden Braid

OMG’s Model-Driven Architecture

Change rate

Descriptions of abstract description

Meta-model of software modeling concepts

MOF (meta-object facility).

Changes never?

Abstract descriptions of an operational system

Software system model (conceptual, logical, physical)

CIM (computation-independent model)

PIM (platform-independent model)

PSM (platform-specific model)

Changes in response to business rule changes

Operational model of real system

The software system

The software system

Changes in response to discrete events

Operational real system

The entities and activities supported by the system


Changes continually

It is tempting to add a third scheme into the picture, the Zachman framework. It is easy to confuse schemes that are subtly different, perhaps quite different.

Traditional model hierarchy

OMG’s Model-Driven Architecture

Zachman framework





CIM (computation-independent model)



PIM (platform-independent model)



PSM (platform-specific model)






It is not at all clear that “conceptual”, “computation independent” and “business” should be equated.


I'm not convinced there is a satisfactory consensus about the distinction between “conceptual”, “logical” and "physical” models. There are degrees of system and platform independence, and degrees of optimisation. To that area of uncertainty, one might add questions about whether "type", "class" and "entity" are the same thing or not. After following some OMG discussions of these things, I can report that gurus do not agree.


Model-driven architecture transformations

The OMG’s Model-Driven Architecture features three levels of abstraction:

·       CIM (Computation-Independent Model),

·       PIM (Platform-Independent Model) and

·       PSM (Platform-Specific Model).

There are potentially

·         two reverse engineering transformations: PSM-to-PIM and PIM-to-CIM; and

·         two forward engineering transformations: CIM-to-PIM and PIM-to-PSM.

All four are discussed below.

There is something to be said for a Model-Driven Architecture scheme that is vaguely defined. Interest groups and vendors can use such a scheme as a springboard to invent new ways to be more efficient and effective in the analysis, design and construction of software systems. Even if they only reshape their existing ideas to fit the scheme, they are likely to clarify and elaborate those ideas.

There is value in promoting and discussing Model-Driven Architecture. It is a device to bring together interest groups and vendors from different realms. It encourages cross-fertilisation and new ways of thinking and working. It may help to improve UML and to reinvigorate efforts to define higher level or more universal programming languages. 

This last appears to be what many are focused on.

Nevertheless, pedants and veterans are not satisfied. The pedants amongst us hanker for more clarity in and wider agreement about the definitions of CIM and PIM. The veterans amongst us are wary of people using Model-Driven Architecture to recycle ideas that have not proved successful in the past.

Later chapters set out some reason why some pedants and veterans are sceptical about Model-Driven Architecture.

Why many professionals don’t like MDA

Let me start by summarising what people in architect classes tell me:

      People do like reverse engineering from code

        because they can erase a lot of stuff from the model to leave a useful abstraction for discussion.

      People do not like forward engineering from model to code.

        The model has to be as detailed as the code – so there is no abstraction benefit – developers can work as readily with the code.

        Forward engineering leads to ugly, unreadable and sometimes inefficient code.

        Developers still have reasons to look at and work with the code anyway – which breaks the round-trip paradigm.

        People almost never have the requirement that justifies MDA – to port code from one platform to another

        If they do seek portability – they don’t trust the tool vendor will keep step with all possible platforms and upgrades thereof.

        An enterprise is locked into a niche CASE tool, and developers who are trained in it.

Model-Driven Architecture has to meet two opposing challenges

Taking UML as a starting point, Model-Driven Architecture is expected to meet the challenges of those wanting more abstraction and those wanting more detail

Do we want a Van Gogh?

A software engineer who religiously maintained an abstract specification of his software in the form UML diagrams concluded:

“Unfortunately, maintaining a very detailed model is really no easier than maintaining the code.  It's very important to work at the correct level of abstraction.  I won't be going back to using forward code generation from a model.”

Even this most earnest of software engineers concluded that higher-level or more abstract specifications are needed. So, Model-Driven Architecture has to meet the challenge of analysts and developers who find UML not abstract enough for their specification purposes.

Or do we want a Canaletto?

At the same time Model-Driven Architecture has to meet the challenge of analysts and designers who want UML to be semantically rich enough to specify all the necessary business rules in detail.

We can’t generate code unless the business rules are specified somehow. This means specifying the pre and post conditions of process at every level of software engineering, in conditions governing a step-to-step transition in a business process, and in conditions governing the commit/rollback of a database transaction.

How should business rules appear in higher levels of Model-Driven Architecture? How can we abstract business rules from the elementary data item level?

How can we build models that meet these apparently conflicting challenges, giving us both the abstraction of a Van Gogh and the detail (comprehensive business rule specification) of a Canaletto?

Forward and reverse engineering transformations

A good way to explore the meaning and usefulness of the three levels of abstraction in Model-Driven Architecture is to consider possible CIM<->PIM<->PSM transformations

We must recognise that these transformations may never be fully automated. Automated transformations sell tools. Transformations that require human intervention are no less interesting, and they sell training courses.

Of course we look for automated support wherever it is possible. But most of the interesting transformations require some human intervention.

The interest in Model-Driven Architecture centres on our ability to transform a model at one level into a model at a lower level or higher level. This is often called forward engineering or reverse engineering.

·       Reverse engineering is a process of abstraction; it abstracts by omission of detail, generalisation, or composition, or a combination of those three devices.

·       Forward engineering is a process of elaboration; it elaborates by adding detail, specialising or decomposing, or a combination of those three.

Human beings find reverse engineering easier than forward engineering. It is easier to remove detail than to add it. It is easier to generalise from than to create specialisations. It easier to group details into a composition than to detail the members of a group.

However you look at it, Model-Driven Architecture implies building and maintaining a lot of models. The challenge is to reconcile the

·       Agilists view that code and tests are what matters

·       Model-Driven Architecture view that models are what matters.

We ain’t going to build and maintain all those models if we can’t transform one into another without too much work. Model-Driven Architecture implies four transformations, shown in the table below.

Transformation direction




Forward engineering




Reverse engineering




I will go on to look at the same picture from another angle, first the lower-level transformations and then higher-level transformations.

I will identify problems with the Model-Driven Architecture approach to specifying transformations, and barriers to Model-Driven Architecture adoption. I will promote the importance of understanding where and how persistent data is divided between discrete data stores and where business services can be rolled back. I will suggest that models that are to be completable by business analysts, yet also transformable by forward engineering, must be event-oriented as well as object-oriented.


Transformations between logical and physical models

It seems the main interest and effort in Model-Driven Architecture is in automating the forward engineering transformation of a PIM into PSM. Software engineers think Model-Driven Architecture is about code generation. They see Model-Driven Architecture as first of all about generating PSM level model/code from PIM level model/code. They discuss what a platform is, what is platform-specific and what is not. They discuss degrees of platform-independence. They see writing a PIM as writing in a new and higher-level programming language. They are interested in using UML as a coding vehicle and in the portability of UML specifications between lower CASE or programming tools. They hope UML will become a language in which they can specify the business rules of operations by their pre and post conditions, and in a declarative style rather than procedural style.

PSM-to-PIM reverse engineering

People have been transforming PSMs to PIMs for decades. Take a database schema. Erase some of the DBMS or platform-specific details and you can express the database design as Bachman diagram. Erase some more detail and you have an entity-attribute-relationship model (aka data model).

Notice there are degrees of platform independence. Abstracting upwards from a PSM, there is not one level of PIM but many.  And even at the first level of abstraction, we may choose to abstract in different directions, so there are potentially many branches as well as many levels of PIM.

Most Model-Driven Architecture enthusiasts are interested in process structures more than data structures. Indeed, some in the data management community are yet to be convinced the OMG or Model-Driven Architecture are relevant to the issues of data management.

Some are interested in abstraction by generalisation of coding languages

Where the PSM is a model of Java or C++ code, then the PIM could employ a more generic OOPL. An even higher level PIM could abstract between a generic OOPL and a procedural language like COBOL. 

But what Model-Driven Architecture may deliver by way of a programming language is likely to be more complete rather than more abstract.

BTW. Suppose the OMG make UML into a super type programming language, or define a super type platform or operating system. Would such a super-type help us to build a PIM? Or would it remove the PIM-PSM distinction?

Stephen Mellor is interested in a more complete language

“What UML calls a computationally complete "action language" will have at least the following features:

·         complete separation of object memory access from functional computation. This allows you to re-organise data and control structure without restating the algorithms… critical for Model-Driven Architecture.

·         data and control flow, as opposed to purely sequential logic. This [enables you to] distribute logic across multiple processors on a small scale (e.g. Between client and server, or into software and hardware).

·         map functions and expansion regions that let you apply operations across all the elements of a collection in parallel. This … maximizes potential parallelism, again important for distribution, pipelining, and hardware/software co-design.

While not huge linguistic advances, these properties enable translation of complete executable models into any target. In my view, that is the key reason we build models of the more-than-a-picture kind.”

My interest is different

Enterprise architects are less interested in turning UML into a more complete programming language than in building a models that abstract by omission of detail. E.g. where the PSM defines all the details of transaction start, commit and rollback processes, then the PIM can be a model that is very much simpler because it includes only hooks for these platform functions.

A PIM posits a platform

A PIM must contain hooks for the transformation to a PSM. So it is not purely platform-independent, it posits the existence of a platform with transaction start, commit and roll back functions. I think this particular postulation (transaction roll back) is vital to the making of a CIM that can be related to a PIM (I will return to this later).

Aside on true platform-independence

On the one hand, we want to build platform-independent models. On the other, we want those models to be transformable with minimal effort into software systems.

The trouble is that software systems can be implemented in many ways and using many technologies. So, to ease forward engineering, people do in practice model with their chosen technology in mind.

A model that somebody claims to be a PIM may be more tied to a specific platform than the claimant recognises. When building a PIM for coding in C++, does the modeller ask: Would I draw this model the same way if we were to code in Java? And when building a PIM for coding in either C++ or Java, does the modeller ask: Would I draw this model the same way if we were to code in PL/SQL? Or VB.Net?

The meta model underlying UML looks, at its heart, to be a model of an object-oriented programming languages. If we want a truly universal modelling language, designed to model a truly platform-independent model, then the meta model might need some revision. I will come back to this later.

PIM-to-PSM forward engineering

It is possible to reverse the abstraction examples discussed above, and to employ a tool that will automate some of the forward engineering elaboration.

This kind of PIM-to-PSM transformation dominates many people's view of Model-Driven Architecture. So much so that one wag round here renamed Model-Driven Architecture as MDCG (Model-Driven Code Generation).

Allan Kennedy has, in an OMG discussion, defined Model-Driven Architecture thus

“In a world where Model-Driven Architecture is the dominant development paradigm, all that most developers will work with is an Model-Driven Architecture development environment supporting executable UML as the new whizzy programming language supplemented by a number of commercially available model compilers for popular platforms.

Platform specialists and software architects will work with tools for building custom model compilers which might even be based on whatever emerges from the current QVT process.

The need for the majority of developers to fill the 'gaps in their IT knowledge' will have been eliminated by the move to "platform-independent" UML as the abstraction level for specifying system behaviour.”

Lower-level code can be generated from a higher-level language. But see <Don’t trust code generation from models>.

Expertise and training needed. A colleague enthusiastically proposed we try a specific Model-Driven Architecture tool/product. Then added the rider that “you need to be a Java, J2EE, struts, UML, Model-Driven Architecture and  product expert to properly leverage the product.” How do we find or train these people?

Veterans will need a lot of convincing that mainstream projects should use a tool that requires designers to understand all that, the MOF (Meta Object Facility), and tool-specific patterns and transformations.

Veterans, listening to a presentation on Model-Driven Architecture tools, are likely to worry about the potential costs and risks above.

A kind of forward engineering that yields real productivity benefits is based on Omission of detail.  We don't want developers having to model or write code that a general-purpose machine can do for them. So we look to automate forward engineering by getting a machine to elaborate, add detail, add generic infrastructure.

e.g. PIM-to-PSM transformers that add in the detail of platform-specific transaction management or database management functions enable us to limit our modelling effort to more business domain-specific concerns.

Having said that, IT veterans have been modelling and coding in ways that assume the support of transaction management and database management functions since about 1980.  So marketing PIM-to-PSM transformation tools on the grounds that they add functions of this kind can raise something of a wry smile.

Other challenges facing Model-Driven Architecture

There are many practical obstacles to successful forward engineering from one level of Model-Driven Architecture to the next. Some are mentioned above. Other barriers to Model-Driven Architecture (mostly suggested to me by Chris Britton) include:

·       Existing systems: current tools generate UML from code - not much help really. Are there tools to reverse engineer PIMs and CIMs from legacy systems?

·       System integration: how to build CIMs and PIMs for message brokers and adapters?

·       Verification: how to verify models than have not yet been implemented?

·       Abstraction from the distributed object paradigm: aren't user requirements essentially event-oriented rather than object-oriented?

·       Aren't outline solution components (subsystems) really rather different from programming level components (DCOM objects, whatever)?

·       Non-functional requirements: these limit what is acceptable by way of results from a PIM-to-PSM transformation

·       Primitive data types: how to define basic or generic constraints on data item values in a CIM or PIM?

Interim conclusions

The chapter has discussed the nature of PIM, a PSM and transformations between them. Different people take different views of what Model-Driven Architecture is about. In so far as Model-Driven Architecture is about code generation from models, it remains more dream than realistic for most enterprise applications.

In so far as Model-Driven Architecture is about abstracting high-level specifications from the lower levels, it will fail because there is not enough shared understanding about the degree of abstraction or the kind of abstraction that is needed. The next chapters go on to discuss the nature of a CIM, and transformations between it and a PSM.

Don’t trust code generation from models (reprise arch)

This point is not explored here. But see also the papers on Software Architecture at


Transformations between conceptual and logical models

Brilliant work has been and is being done on executable UML. How relevant is this work to the enterprise models that architects and analysts are asked to build? This chapter discusses the higher-level transformations in MDA

What is a Computation-Independent Model?

It seems to me that differing groups take differing views of Model-Driven Architecture and have different visions of a CIM.

“An enterprise architect wants to use a coarse-grained CIM to manage a whole enterprise.

A business rules analyst wants to use a CIM to specify some specific business rules in a language read/writable by one or two domain experts.

A CIM for the first person may prove useless to the second, and vice-versa.” Contribution to OMG discussion list

I suspect some of the people who contribute to OMG discussions haven’t always realised they are taking one view point rather than another, and this has led to some misunderstandings.

Software engineers

In my experience, the majority of software engineers on practical software projects have little respect for purely conceptual models. They see conceptual modeling as paralysis by analysis. They would rather analysts specify a logical model that is more directly useful to software system designers. They would hope and expect that a CIM can be transformed into PIM(s) for software system(s). So, their CIM is a very abstract PIM. And given there are degrees of PIMness, there must be degrees of CIMness. As one person's PIM is another person's PSM, so one person's CIM may be another person's PIM.


Some analysts prefer to think of a CIM as conceptual rather than logical. They take a top-down view. They see Model-Driven Architecture as about specifying business process and/or business rules. They see a CIM (perhaps even a PIM) as a specification that analysts (perhaps even users) can read and write.

The analysts’ CIM is model of a domain or business enterprise that may stand alone, independent of data processing and potential software systems. It is interesting for its own sake. But there are two different schools of thought here.

Business rule modelers

Business rule analysts see the three levels of abstraction in Model-Driven Architecture as being the same as distinctions made in the 1980s between Conceptual, Logical and Physical models of data and processing.

Business rule analysts are analysts with attitude. Some are ex-data modelers. Some are only interested in models at the CIM-level. They see Model-Driven Architecture as about as about creating business specifications per se, regardless of any current or required software system.

Business rule analysts are not especially interested in using UML as a specification vehicle or in the portability of UML specifications between upper CASE tools. They don’t think much about code generation from their models; or they think it implausible or even misguided.

I have long said that the next wave of analysis and design methodology will be business rules-centric. I have said it for so long that I now begin to doubt it. And the OMG efforts in this area look ponderous. Perhaps one day analysts will specify a UML model, adorn it with business rules and generate a solution coded in Java or whatever. But ways to define rules on diagrams still feel like coding to me, and code generation tools have a poor track record in the industry.

Business process (and workflow) modelers

Those make or sell a business process or workflow modeling tool may see their tool as providing the vehicle to define CIM. If they notice the gulf between analysts and software engineers noted above, then they hope to bridge it.

A business process model can be purely conceptual, a model of a domain or business enterprise that stands alone, independent of data processing and potential software systems. But a business process model is basically a flowchart. And you can generate a program to implement the logic documented in a flowchart. At this point the conceptual model of a business turns into a logical model of software.

Sometimes, the steps of a business process involve little coding (simply send an email, or invoke an existing web service), so a generated program does much of what the user wants. But usually, a business process model requires much additional software design because:

·       the ‘straight-through’ or ‘happy path’ business process is simple, most (say 70%) of the complexity and effort is in error and exception processing

·       a lot of specification and coding effort is needed to implement the user interface and business services that support the user at each step.


Can we hope to reconcile the analyst’s top-down view (user-friendly declaration of business process and/or business rules) with the software engineer’s bottom-up view (abstraction from the code that must be written)? Past attempts to reconcile these two views have disappointed for various reasons.

Both analysts and engineers view the higher levels of a Model-Driven Architecture as abstractions from the lower levels, but tend to think about abstraction in different ways. It is not clear they share an understanding about the degree of abstraction or the kind of abstraction that is needed.

Analysts work at a level of abstraction that is an order of magnitude higher than engineers. Analysts don’t have the time to work to the same degree of (compilable) precision as engineers.

PIM-to-CIM reverse engineering

How should we abstract from PIM to CIM? What kind of CIM is useful? Does a CIM relate to one specific software system, or does it relate to many software systems, is it enterprise-wide?

Beware idle abstraction

The trouble with abstraction is that you can always do it. You can take any two data items or two process and define one higher level form.

You can create a higher level form by generalizing from two applications. If one application's PIM includes an EmailAddress (must include an @ sign) and another application's PIM includes TelephoneNumber (must be numeric), then you might define in a CIM a more generic ContactDetails item with a more generic data type.

You can create a higher level form by omitting details in two applications. If one application's PIM includes an orderValue formula that calculates sales tax one way and another application's PIM includes orderValue formula that calculates sales tax another way, then you might define in a CIM a simpler orderValue calculation that suppresses the detail of tax calculation altogether.

You can create a higher level form by defining composite that covers two applications. If a CustomerAddress has 3 lines in a regional application, has 5 lines in a global application, and has a structured set of attributes in another application (that uses name, town and postcode for other purposes), then you might define in a CIM a single composite CustomerAddress data item.

One challenge is to create a higher level form that is useful. And a bigger challenge is to create this form before the more elaborate forms are devised.

Beware the million-rule problem

The large enterprise has hundreds of applications and a million business rules. We cannot maintain an enterprise model with that many rules.

This has classically led enterprise ‘data architects’ (really, ‘data abstracters’) to define an abstract data structure containing a few generalised entities such as party, contract, place and event. They define for each entity a few attributes that appear in several applications. They may perhaps define a few business rules associated with those few attributes.

Similarly, enterprise process architects have defined abstract business processes, each with a generalised sequence of business process steps such as register, authorise, process, deliver and close.

In practice, I haven’t found people making good use of such a highly generalized enterprise-scale models. How to separate the business rules that are somehow most essential or important from the impossibly vast multitude of necessary business rules? I don't see people successfully grappling with specifying business rules in an enterprise architecture (in the sense I mean enterprise, that is 'enterprise-scale', rather than simply 'business level') other by being highly selective, by focusing on only a tiny part of the enterprise problem domain.

Beware the reality of loosely-coupled systems

The large enterprise works with many distributed and loosely-coupled systems in which different, perhaps conflicting, business rules apply. The enterprise's business process have to work despite the fact that data in discrete data stores will be inconsistent.

Surely an enterprise CIM (if it is to be useful for forward engineering into more than one PIM) must acknowledge that consistency cannot be guaranteed across all the discrete business data stores maintained by the enterprise?

Where infrastructure is missing, we must model more. Suppose we cannot roll back a mistaken process across discrete data stores, then we have to design all manner of error handling and undo processing, and our models of the code have to incorporate all this design.

Where infrastructure exists, we can model less. Where we know a transaction can be automatically rolled back, we certainly don’t want to model the error handling and roll back process by hand.  We can/should/must suppress the roll back details from our abstract models.

Surely we can do this only by employing the corresponding abstract concept of a "business service" or "discrete event" in our models?

CIM-to-PIM forward engineering

A logical CIM posits discrete software system(s). We cannot realistically envisage forward engineering from a purely conceptual CIM. We can however envisage forward engineering from a CIM that abstracts from data processing systems. We can recognise the latter kind of CIM because it will:

·       acknowledge the divisions between data in discrete loosely-coupled data stores

·       define what business services clients invoke or require on each distinct data store, with the pre and post conditions of each business service

·       define what data must persist in each discrete data store for those business services to be completable.

To put it another way: whatever paradigm you follow or platform you use, to build model that can be transformed into a data processing system, you must answer two requirements-oriented questions:

Q1) what business services do clients invoke or require?

A business service is a unit of work. It is a process that acts on persistent data, or, if the necessary conditions are not met, it does nothing but return/output a failure message. A “client” could be a user, or a user interface, or an I/O program, or an actuator or sensor device.

Q2) what data must persist in a coherent data structure for those business services to be completeable?

Every software system of note maintains some persistent data. The data structure could be anything from a handful of state variables representing the state of a few devices in a process control system, to millions of enterprisebase records representing the orders by customers for products.

The eternal verities hold. In specifying the business rules of software systems, the persistent data structures and the business services on them are fundamental. Whether your coding language is Java or PL/SQL, you will have to specify them.

Conclusions and remarks

Is CIM-to-PIM-to-PSM a sensible basis for a software development methodology? I fear Model-Driven Architecture has confused in one scheme modelling the real world per se with modelling a data processing system (which is itself a model of the real world). The two are related, but nobody I know in the IT industry looks to define a PIM from a purely conceptual CIM.

In practice, people define a PIM from a statement of data processing system requirements. And these requirements are better expressed in terms of use cases and input and output data structures, rather than in the form of a CIM. Inputs and outputs are an aspect of system theory that Model-Driven Architecture seems, curiously, to have overlooked.

There is much more to be said on this theme, and there are approaches that may prove useful for defining enterprise-scale models. See “The Agile Enterprise”.

We just have to work around UML

Many good modeling languages for analysis and design are currently deprecated in favour of the Universal Modeling Language (UML). Yet UML is not truly universal, is still rooted in an OO programming paradigm. The use of UML outside of OO design is anything but universal.

  • People model one thing using different UML terms and notations (e.g. they draw a process as an activity diagram, interaction diagram or state chart).
  • People model different things using one UML notation or term (e.g. they overload the terms like entity, class, type and operation).

Such local dialects become dangerous when speakers mistakenly believe they all know and talk the same language. And when managers assume that buying a UML drawing tool is a standard or a solution in itself.


Many people ask whether/when/how systems analysts should use a UML tool during what is often called functional specification. After years mulling this over, then months talking about it in Object Management Group (OMG) discussion groups, my observations are as follows.


Some people are really asking: “How can I make our analysts use the UML tool we have bought for programmers?” rather than “How can UML help our analysts?” Systems analysts have no pressing reason to use UML over the alternatives available to them for business process modeling, data modeling and business rule specification.

Analysts should learn about UML basics, and should be ready and willing to use a UML tool when asked. But that doesn’t mean they have to like it. UML is good in parts and for some purposes, but it remains basically a set of notations for object-oriented program design. Somebody has to stand up to the hegemony of object-oriented design gurus, and I will here.

OO developers use UML lightly

Let me quote a developer who reported his experience in a company bulletin board.

“On some previous projects, I went full-scale for using Rose to produce detailed models and generate code from them. When I needed to extend the design, I would always return to the Rose model and make the change there, forward generate the code, and then fill in the details. I worked for several years this way and advocated for it strongly among my peers, although few took me up on the approach, at least in part because of the steepness of the learning curve for Rose with code generation in C++.

The models I produced were useful to me, but far too detailed to be helpful for talking about the design with someone else. I had to produce more simplified views for this, but I was proud of the fact that my models always represented the state of the code.

Unfortunately, maintaining a very detailed model is really no easier than maintaining the code. My models tended to become rigid, even though I was working with UML diagrams. Finally, I have decided that working with too detailed a model is a trap. Basically this is the same trap that we were trying to avoid by working with UML in the first place.

It's very important to work at the correct level of abstraction.“

Trying to complete UML models before coding object-oriented programs; trying to forward engineer from UML, has not yet proved popular or helpful to the majority. But using UML in a less formal way can help. Let me quote further from that developer.

 “In my current project, we have taken the simple design and refactoring approach. We develop in three week iterations. We draw the designs we need for each iteration on a whiteboard. When we all understand them well enough, we code them. We refactor continuously, driven by code smells and design considerations. At the end of an iteration, we reverse engineer our code (using Rose) and produce simplified views that we use to inform our designs on the whiteboard in the next iteration.

This seems to have worked very well, and I feel our design is quite good. Partly, this represents that I have more experience as a designer.

However, I also see that this way of working keeps us focused on the right level of detail at the right time. In other words, a white board can be a more effective tool than Rational Rose for working out a high level design, and refactoring using the code can be a very effective tool for improving a design. The reverse engineering works well enough and the model is always up to date. We have a team of three that has been working for 15 months in this way, and the design and code have not become rigid.

I won't be going back to using forward code generation from a model.”

Analysts have gripes about the notation

The UML diagram type I like best is the sequence diagram. It is the diagram type that looks most useful for architecture-level design, and for analysts wanting to specify how a transient event coordinates several persistent entities.

Analysts may well find UML training helps them understand OO designers and programmers, but it can also lead to misunderstandings. It is not the analyst’s job to understand OO analysis and design. The analyst does not need to draw a 'class diagram' or a 'type model' of the kind used in OOP design, and should not, because software specialists do not want it done for them. UML includes a large number of OO programming terms and concepts that analysts don’t need, some that are confusingly similar but different from analysis concepts, and even one or two concepts that are ambiguous and therefore dangerous, notably, the aggregation relationship symbol.

The UML notations are not well designed from a graphical point of view. UML uses a confusing variety of similar arrowhead shapes to show very different concepts.

UML uses the * symbol to mean two very different things, iteration and multi-threading. Surely it is sensible that different concepts are represented using different symbols? There should be an iteration symbol (say *) and a multi-threading symbol (say <). A crow’s foot for multi-threading would distinguish the concepts nicely.

UML provides four ways to model a process structure: sequence diagram, collaboration diagram, state chart, activity diagram, and use case. Yet there is no way to model a regular expression.

UML provides four ways to show a set of related subsystems: class diagram, package diagram, component diagram and deployment diagram. Yet there is no symbol for a data flow passing asynchronously from one subsystem to another.

UML provides only one way to model a data structure, and that is a class diagram designed for modeling OOPL code. There is no notation for a data flow. And class diagrams were not designed for data modeling.

There is no hierarchical structure notation, and no notation for documenting a regular expression.

To be universally applicable, the use of <<stereotypes>> is encouraged, meaning most people end up drawing BML (B for bespoke) diagrams.

UML doesn’t support all aspects of use case definition

Analysts can document use cases in a UML tool. But almost all do so in Word, not least because this gives customers and users better access to the specification.

Conventional use case definitions, as defined in UML, contain very little of what is needed to develop code. We are likely to need also:

·       the UI design, its appearance, fields and commands

·       the I/O data structures (an XML schema-like grammar or regular expression notation might be useful for serial data flows)

·       the state date of a user session (which may be stored on the client machine)

·       any state transition constraints on the user session (state charts may be used here)

·       the pre and post conditions of each business service that is invokable.

UML is not want analysts want to use for business rules

Analysts could use OCL to attach expressions to attributes. But none of them do this in practice; they all document constraints and derivations using informal language. Alternatively, analysts could extend UML with ‘Action Semantics’ to specify business rules in a way that may prove compilable into an OOPL, but this is tantamount to programming.

There are several reasons why analysts do not work at a level of precision and detail that is compilable into code.

·       the average analyst is different from the average programmer, with different aptitudes, knowledge and skills

·       analysts' specifications should be platform-independent to the extent that they do not tend towards either procedural or OO programming languages

·       one analyst will likely work with four or five programmers, which means the analyst simply has too little time to work at a level close to coding.

It might be argued that the analyst should work down to the code level for the 'important bits', but it makes far more sense for a software specialist to work with the analyst at this point.

UML is weak for data models

Analysts can document a component’s data store and draw any data model as a UML class diagram. But a data model is not a class model, and data model tools tend to be better for recording primary keys, candidate keys, foreign keys and uniqueness constraints, and for forward/reverse engineering to/from databases.

I am taking a stand for the data model as a distinct product - with its own purposes and patterns, and perhaps even some shades of semantics different from an OO structural model.

If the OMG included a distinct data model notation within UML, that could go long way to restore the attention that the data model deserves as a thing apart from the OO paradigm.

·         Agilist: You're preaching to the choir. See It's only a start, but not a bad one. It's tool limitations that I think are the real problem right now. I haven't run into any notation limitations.

The following limitations of UML as a notation for data models come to mind:

·       In UML, the one or many multiplicity distinction relies on reading numbers rather than seeing graphical symbols. One-to-many relationships are the mainstay of data models (as opposed to OOP class models).

·       In UML, the multiplicity numbers float in space. If your relationship lines are close together (and a kernel entity may have ten or twenty relationships), you cannot tell which numbers belong to which relationships. You should not have to move the line to spot which number is related to it!

·       UML does not distinguish relationship optionality from multiplicity. To know if entity A can exist without a relationship to entity B, you have to look for a zero multiplicity number at the other end of the relationship, which may be way across the diagram in a large data model. And again, optionality is not a graphical symbol on the line itself.

·       UML is careless about primary keys and foreign keys. Giving everything an object id is merely to duck the issues the analysts need to address in this area (for the sake of users) to do with uniqueness constraints on user-friendly attribute values, rather than persistent fixed identifiers. UML allows this of course, but I gather UML tools are not designed to help in the way data model tools tend to be.

·       UML uses the * symbol to mean both iteration and multi-threading. Every graduate should be taught and understand the difference between the concepts of iteration (many one after another) and multi-threading (many in parallel). See <Data-structure driven programming works>.

Analysts can document a process’s access path through entities by drawing a UML interaction diagram. But a notation designed to record message passing between objects is not ideal, and there are better ways to document an access path.

Other things analysts may document

Interfaces? Analysts can document a component’s interface using a stereotype of a UML class. But they are usually concerned with a very, very large component, and they usually find writing a service catalogue in Word is adequate to record each services’ name, inputs, outputs, preconditions and postconditions. Again, this gives wider access to the specification.


Distributed processes? This is a fine use for UML. Analysts may well find an interaction diagram helps to specify how a distributed process is implemented by messaging passing between distributed objects/systems. But attempts to manufacture a need for this (to force the OO paradigm on database processing) by subdividing a coherent database should be resisted for reasons given elsewhere. Also, workflow diagram tools may prove more suitable for the kind of macro-level distributed process that analysts are likely to specify.


State machines? I find it difficult to be dispassionate about these, having spent years of my life drawing the life histories of persistent entities in the form of finite state machines using regular expression diagrams. Analysts can draw UML-style state charts for this, or to record the state transitions in a use case, user interface dialogue or session. State charts are little used in practice, perhaps because people lack of a coherent methodology and techniques of the kind SSADM (taught properly) used to provide.


Any analysis diagram? Analysts can document anything they want to in UML by ‘stereotyping’ notations. But while this tool vendor’s device can widen a tool’s usage, recycling one notation for different purposes can hinder communication. Purpose-designed notations and well-structured Word documents often communicate as well or better.

More challenges for UML

How can Model-Driven Architecture meet our two apparently contradictory challenges – to abstract from detail and to comprehensively specify business rules? How to specify business rules in models? At what level of process granularity to specify business rules? How to make UML more helpful to analysts looking to build a PIM or CIM?


Resolving confusions surrounding the three terms Type, Class and Entity, and the many concepts people want to use these terms for, would, I believe, prove enormously helpful to acceptance of UML for use in analysis and design outside of OOP. If we need more terms - please let us have them rather than overload the old terms.


It is tempting to extend the UML meta model by adding a new thing as a subtype of an old thing. But is this wise? Especially if the supertype is an OO programming concept and the new subtype is a logical analysis concept. Or more generally, the supertype has features (perhaps even tacit implications) we do not want in the new thing.

Above, UML is missing the concept of the event effect.

Do we want a truly universal modeling language or method? Are we serious about building truly platform-independent models? Do we want to capture requirements in models? Do we want to help systems analysts document what matters? How to make UML more helpful to analysts looking to build a PIM or CIM in the ways indicated above? I propose that units of system composition (discrete systems) and business services (discrete events) should be first-class concepts in the UML meta model, not merely stereotypes of 'class' and 'operation'. If the OMG truly wants UML to be a truly universal modeling language, then making the event-effect concept explicit in the UML meta model might help. This will make for a more complex meta model, since one event effect can involve more than one operation, but it will also make for a more universal meta model, one that embraces event orientation and object orientation as equals.


We really would like UML to be improved for systems analysis. UML needs a name for the lowest ‘success unit’ level of process that a systems analyst needs to specify. UML should include the concept of the 'event effect' (as opposed to the one or more operations invoked to achieve that effect). UML needs a notation for data models, to graphically distinguish entity-attribute-relationship models from OOP class and type models. But we don't see these concepts or this notation emerging from the OMG, or being recognised as issues by more than a few people in OMG.

UML is OK. Let us not fight against using it. Let’s just agree it is not ideal or the end of the story.

UML is not a leap forward for systems analysts. The use of UML by analysts remains a project-level, even team-level, choice. Mandating its use would be costly and probably counter productive. We allow that various notations and tools might be used in practice.

“As for representing whatever view in different forms - well, as Tufte notes, a well-chosen representation goes a long way in describing complex stuff, and so I'm in violent agreement with you about alternative notations for alternative stakeholders - as long as there's a common meaning below the surface of those representations.” Grady Booch


Fashions over a quarter century

Alistair Cockburn (use case and Agile methods guru) spoke at our IT&SW conference in September 2002. He said then that he focuses his training on Agile development principles and people issues, partly because they remain constant in sea of changing analysis and design techniques.

Afterwards, Alistair and I had a brief and unsatisfactory exchange on the subject of software engineering techniques. While Alistair has every right to focus on soft and informal management principles, I put it to him that people still need education in more traditional analysis and design techniques. And the basic techniques remain little changed since 1980. And it is a pity some are now overlooked in our schools and universities.

A queue of people was forming to talk to Alistair, so we parted without reaching any mutual understanding. On reflection, I can see some places Alistair might have been coming from. And I have been thinking “What has changed over 25 years?”

Some notations for a deliverable change

Perhaps Alistair was talking about notations for analysis and design deliverables. Over the last 30 years, people have used many notations for one deliverable. e.g. There have been many notations for drawing a data model or drawing a state machine.

But switching notations isn't such a big deal. The data and entity life cycle analysis techniques taught in 1980 can be used today with any notation you prefer.

Privately, I am a notation nut. But we all have to use UML or ArchiMate nowadays. I cannot convince UML or ArchiMate fans that our concerns about using UML for data models (e.g. the need to separate relationship optionality from relationship multiplicity) matter. So, I can't say notation is important to other people, and it is in truth less important than other things we have to worry about.

Sometimes the deliverable of a technique changes

Perhaps Alistair was talking about fashions in choosing a deliverable to record the results of one analysis technique. e.g. To record the findings of entity life cycle analysis, I used to document constraints on processing and state changes using a state machine notation (regular expressions, known as Jackson structures in SSADM). I am now more likely to document these rules as pre and post conditions of a business component's services, or in some other form of business rule specification.

But the use of state machine diagrams may one day return to prominence; see www.. (reference to be supplied). And we'll always need a technique to analyse business rules, constraints and state changes, whatever deliverable we document them in.

Some techniques replace others

Different analysis and design techniques overlap; they examine the same things from different angles. You rarely need all the techniques, which is one reason why asking everybody to use one technique, or comply to a general standard, can be counter productive.

Perhaps Alistair was saying that fashion dictates which technique(s) to focus on. e.g. People in OMG and UML circles currently neglect data analysis and entity life history analysis techniques in favor of techniques for drawing class and interaction diagrams.

If Alistair was thinking of the changes mentioned above, then these are mostly superficial. But perhaps Alistair was thinking of something I didn't think of during our brief conversation - of analysis and design techniques that have fallen by the way side over the last 25 years.

I thought it would be interesting to list some old and proven techniques, some dead and decaying techniques, and fun to predict which techniques will fall out of favor in the next 10 years.

Age-old analysis and design principles and artifacts survive

But much remains as it always was. I propose that all the points in the manifesto are as true now as they were twenty or thirty years ago. A large body of tools and techniques survive in good health from around 1980, are still used and useful today. The core software engineering techniques are not much changed since then. You might then and today specify a system by documenting:

·              external entities or actors

·              input and output data flows and messages

·              function definitions or use cases

·              back-end services

·              diagrams showing the interaction between subsystems

·              an entity-attribute-relationship model

·              a three-layer software architecture.

Age-old analysis and design techniques survive

Alistair Cockburn has refined Jacobson's use case analysis technique, but how much has he changed it? Use case analysis still begins and ends with something akin to system test cases or scenarios. And at least one of Alistair's emphases (pinning use cases to users' goals) is familiar to process modeling veterans from the 80s.

Besides use case analysis; three complementary analysis and design techniques have survived for decades, if reinvented by each generation:

·              Structural analysis techniques – using class diagrams or data models

·              Behavioral analysis techniques - using entity life history or state machine models

·              Behavioral design techniques – using interaction diagrams and the like

I recall training courses from the 1970s and 1980s that featured these three techniques. Basic data analysis and entity life history analysis techniques haven’t changed since I was taught them by Keith Robinson in a public training course in 1980. His ‘advanced systems analysis and design’ course featured all three techniques above and postulated the remote object paradigm that still remains a distant prospect for most applications.

Most modern methodologies make use of the techniques and deliverables listed above. Important parts of UML were prefigured in the 1970s and 80s. The old-style analysis techniques remain with us, and are not wholly replaced by OO-style techniques. True, inheritance and polymorphism aren’t listed above, but they don't figure much in systems analysis or in architectural level design.


Like most US gurus, Alistair Cockburn seems unaware of process modeling techniques traditionally used in association with data models. The US fashion in 1980s was for top-down decomposition and data flow diagrams, which is why the early OO books bang on about the failure of these process analysis and design methods, and why OO design was such a radical departure for them. The UK fashion was for entity and event models, which makes things worse for us, because OO design can be confusingly both similar and different.

At the end of this review, I think I can still say to Alistair Cockburn that the core analysis and design principles and techniques are not much changed in 25 years. I still recommend.

Focus systems analysts on entities and business services

Analysts should define each system’s data down to the level of persistent entities and attributes, and record any invariant rules associated with them. They should define process down to the level of business services, and record any preconditions or post conditions relating to each entity affected by the business service. They should sketch out an access path for any non-trivial business service.

Don’t ask analysts to build code-level models

Analysts should not define how code is structured within a business service, or any message-passing that takes place within a business service. The translation of entity and business service specifications into code, be it procedural or OO, is a matter of detailed design. OOP classes, their responsibilities and operations are platform/architecture-specific; they should be defined by object-oriented designers and developers, following the structural guidelines of a given software architecture. (In distributed systems, where a business service cannot be rolled back automatically, the analyst may have to get involved in message passing.)

Use event-oriented analysis to discover responsibilities and rules

Business services are fundamental analysis and design artifacts, as important in an object-oriented design as the entity-oriented classes. So, it is a good idea to identify the business services that clients invoke or require, and consider the effects of each business service on the entities in the persistent data structure. This analysis should reveal the responsibilities of entity classes and the business rules that operations must apply to objects.




Ref. 1:  Papers at



Footnote 1: Creative Commons Attribution-No Derivative Works Licence 2.0

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited:” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it. For more information about the licence, see