Levels of abstraction in enterprise data models
This page is published under the terms of the licence summarized in the footnote.
The paper is about levels of abstraction in data modelling.
It is loosely based on a paper by Eskil Swende March 1, 2010 (Article URL: http://www.tdan.com/view-articles/12655)
© Copyright 1997-2014, The Data Administration Newsletter, LLC -- www.TDAN.com
Eskil’s purpose was to achieve:
· A common standard for data model naming and meaning
· A common platform for the further development of data modeling in Scandinavia
· High quality traceability from abstract data models to physical data models and back again.
This paper outlines the proposed approach, adding some variations and some commentary.
It was partly written with two additional aims in mind:
· To connect enterprise data architecture to micro services or micro apps (fashionable at the time of writing)
· To remind people of the role data models can play in shaping and evaluating applications to be built and bought.
Note that people draw different distinctions between "data" and "information"; but no distinction is made here.
This paper uses the term Business Data Model in place of Eskil’s Business Information Model.
The main reason is for compatibility with other sources, including TOGAF.
Do not worry about this naming difference; life is too short; it has no impact on the concepts or proposal below.
IBM have written up their approach to enterprise data governance here.
This IBM paper describes the "silo system" issues that led to EA.
The ability to utilize data as an enterprise asset is central to every enterprise transformation initiative.
This ability is critical for reusing data consistently throughout the enterprise and deriving actionable information from it.
Accurate and high-quality data must have meaning and value throughout the enterprise and support the processes of the enterprise.
For a variety of reasons, large enterprises tend to manage data at a local level (e.g., for each department and location).
This results in information silos where data is redundantly stored, managed, and processed, each with its own policies and processes, leading to inconsistency.
Making the data available to be used outside legacy and monolithic applications is the first step toward creating ‘‘flat’’ enterprises.
In which data is freely and securely accessible to any user with access permission.
The same ambition is known by the Open Group as achieving “boundaryless information flow”.
The same IBM paper describes three challenges and aims.
Using the Internet as the main platform for interaction with customers, partners, and suppliers requires integration of information, functions, and processes that are often trapped in information and organizational silos.
To address these and other daily business challenges, we focused on solving a number of data- specific challenges:
1. Trusted data sources must be established for enterprise-critical data.
This becomes a major challenge when ownership of data is fragmented in many areas.
There are multiple sources for the same data (each with its own unique definition).
Existing data is undocumented and tightly coupled inside monolithic applications, and subject matter experts in these areas are in short supply.
2. Information must be integrated
This includes the integration of data from multiple domains to support critical business functions in addition to business intelligence initiatives.
The main contributing factors to this challenge are platform dependencies, organizational issues, tight coupling with business processes.
And the lack of business data standards and an enterprise data model to define an explicit relationship among entities from multiple domains.
3. Data quality must be improved.
This is a major challenge in any enterprise with a large amount of data in which data fields are not standardized, multiple ways of capturing the same data are used, and data is copied in many places.
The IBM paper goes on to describe the creation of data governance organisation, processes and tools.
It does not say if any of the three aims above were met, one has to presume that.
On data mastering and application integration
The enterprise architect is a champion and architect of system integration and reuse.
EA aims to constrain silo-system-producing solution architecture projects.
The business case for integration and reuse varies according to the context.
Business benefits can include data quality, integrity and security.
Duplication of data entry and data storage is a feature of silo applications.
EA goals include de-duplication of data entry and "single version of the truth."
To these ends, various "master data management" and application integration approaches are possible.
To evaluate an application, there are both process-oriented and data-oriented approaches.
You may do a gap analysis between
· required and provided information system services or use cases
· required and provided data models.
The data-oriented approach has been a standard practice for decades.
The approach is not always applicable or workable, but where it is, it can help you compare scopes almost at glance.
Comparing two data models is usually quicker and easier than comparing the many use cases or user stories of an application.
It is also informative about:
· the potential of the application to support future use cases
· minimising redundancy and overlaps in the data stored across application portfolio
· indicating where application integration may be beneficial.
Enterprise architecture involves modelling business data and business processes – across the enterprise.
Enterprise data architects are concerned with the content and quality of business data – in stores and in flows.
They must work closely with enterprise application architects.
First, you should try to catalogue the essential elements of data architecture
· business data entities of interest (at whatever level of granularity you decide)
· data flows passed between internal business applications and/or external entities
· data stores maintained by internal business applications
Then decide what matters to your enquiries and start to draw some tables.
For example, you might want to record
· data entities <accessed via> end user devices or physical channels
· data entities <contained in> external I/O data flows
· data entities <contained in> internal data flows
· data entities <contained in> locations or data stores
· data entities <mastered or copied in> data stores (implies you have a data mastering policy
· data entities <created or used by> business roles (are they definable?)
· data entities <created or used by> application use cases (are they definable?)
· data entities <created or used by> applications (have you an application portfolio catalogue)
A CASE tool might help you document the above.
You may have to tweak the meta model of any tool;
ask the vendor if they can do it.
Or else knock up the database you need using Microsoft Access.
And then, you can turn to consideration of how you might maintain a set of data models to document the data stored by the business.
Bearing mind you’ll probably need a range of views from abstract to detailed.
The Zachman Framework is a 6 by 6 grid of which Zachman says:
· “Columns show the primitive interrogatives”
· “Rows show reification - the transformation of an abstract idea into an instantiation”
Eskil positions data/information models in rows of the Zachman Framework, primarily in the “What” column as shown below (*)
Data Models placed in the Zachman Framework
The Operating Model
Overall Business Data Model
Overall Business Process Model
Detailed Business Data Models
Logical Data Models
Physical Data Models
A challenge for enterprise data architects is to build and maintain models at different levels of abstraction, with traceability between them.
This means that an abstract model at level 2 can be traced down to models level 5 and back again.
For data processing systems, data models are usually much more concise than process models.
Populating the “what” column with data models gives a potential for traceability that is hard to achieve in all columns.
(* This is not an attempt to apply the Zachman Framework by the book.
For your interest only, Zachman says the rows do not represent levels of decomposition, and he no longer maps the What column to data.)
Top level business managers are accountable for the business architecture.
Enterprise architects are responsible for describing this architecture at a level of abstraction that business executives appreciate.
The idea is to help managers understand the enterprise estate, and evaluate changes to it.
Zachman says EA objectives include integration, reuse, flexibility, and reduced time-to-market.
TOGAF recommends using an “operating model” as a tool for discussing integration and reuse with executives.
(After “Enterprise Architecture as Strategy” by Jeanne W. Ross, Peter Weill and David C. Robertson.)
The approach begins by positioning a business (or part of it) in a quadrant of this grid.
Positioning the “operating model” of core business processes
Positioning the future state of a business in the grid helps to clarify the desire for standardisation and/or integration of its core business processes.
To standardise or integrate business processes requires the business to standardise and integrate its business data and business rules.
The next step is to draw a cartoon that represents your own business’s operating model.
The cartoon represents the interaction of processes and data at a very high level of abstraction.
See the avancier.website slide show on “EA as Strategy” for examples.
The main use of this cartoon is in discussion with business managers.
It may or may not be formal enough for traceability to models in the next level down.
At the highest level of enterprise modelling, enterprise architects can create:
· The Overall Business Data Model (“what” column).
· The Overall Business Process Model (“how” column).
The Overall Business Data Model
Data models are usually placed the “what” column of the Zachman Framework.
However, a business data model describes entities and events that a particular business needs to monitor and/or direct
The data represents all entities (e.g. customer, product) and events (e.g. order, delivery) that the business wants to remember.
So, the data may be classified into categories that correspond to columns of the Zachman Framework.
Products & Services
Partners & People
Architects may prioritise some categories for attention over others
The Overall Business Data Model identifies and names the most important entity groups in each category.
Decomposing all the categories might reveal (say) 50 entity
groups in a whole business.
Overall Business Data Model: Eskil’s example – Showing 28 entity groups in 4 categories.
An entity group comprises several normalised entities..
An important EA rule is that one entity belongs in one entity group.
(If not, traceability is difficult, and the same descriptive details will be captured more than once).
Overall Business Data Model: Eskil’s example – showing only the most important entities in each entity group.
Generalisations can form useful categories for entity groups. For example:
· Products (products, services)
· Properties (maintained resources, buildings, vehicles, assets)
· Promotions (campaigns, adverts, mailings)
· Processes (transactions, events, orders, payments, applications)
· Places (areas, offices, branches, invoice and delivery addresses)
· Pipes (routes, networks)
· Parties and People (customers, suppliers, organisations, employees)
· Points in time (calendar, dates, times)
· Pounds and Pennies (accounts, budgets, currencies)
· Papers (documents).
The more abstract the data entities, the fewer business rules or constraints can be defined.
At this level, relationships between entities tend to be many-to-many rather than one-to-many.
Overall Business Process Model
Business data is created and used by business processes.
The Overall Business Process Model can be presented in either or both of two forms:
· A Business Function Decomposition Hierarchy (structural view)
· A Top-level Business Process Map (behavioural view).
These things are discussed elsewhere at avancier.website.
Mapping processes to data
Business processes create and use business data.
Enterprise architect are supposed to relate process models to data models.
This is usually done by mapping Business Functions to the Entities or Entity Groups they are responsible for maintaining.
Data modelling languages
UML isn't designed for EA. ArchiMate is supposed to be designed for EA, but doesn't address data well.
You might do better to use a more traditional data modelling notation.
There is no universal model of the real world, independent of business needs.
A good data model describes the particular entities and
events that a particular business needs to monitor and/or direct.
So the Overall Business Data Models should be detailed to reflect the data maintained by applications (built or bought) that (now or in future) support core business processes.
There are at least two different approaches to the scoping and elaboration of Detailed Business Data Models.
You may define a Detailed Business Data Model for a group of data entities that are related because:
· They appear in one entity group in the level 2 Overall Business Data Model above (as Eskil’s paper proposes).
· They are created and used by one Business Function at the 2nd or 3rd level of a function decomposition hierarchy
This paper assumes the first approach, though much of what follows applies equally to the second approach.
Eskil’s example of a Detailed Business Data Model
Ekil’s principle is that a Detailed Business Data Model must be strictly normalized.
And one entity should exist in only one entity group, to simplify traceability up and down between models.
An ideal is that all the entities maintained by one business application should be found in one Detailed Business Data Model.
One Detailed Business Data Model may map to one or more LDMs (of one or more applications).
Enterprise data architecture needs time and resources outside of specific project resources.
Indeed, since EA documents persistent systems, any transient project-oriented models need not be retained.
How many logical data entities in total?
Perhaps 20-50 entity groups? Perhaps 5 to 40 entities in each?
If your enterprise were to require the full SAP ERP database, there could be more than 10,000 normalised entities.
The enterprise data architect should prioritise business data that supports core business processes, and is passed between systems.
How far elaborated by decomposition?
Normalisation involves analysing many-to-many relationships to identify link entities or events of significance.
Even Eskil’s example includes many-to-many relationships, which suggests further normalisation may be possible.
How far elaborated by specialisation?
Normalisation assumes you can identify instances of entity and event types by means of “primary keys”.
In modelling what you perceive to be the world, it is tempting to invent a primary key, such as party id, person id, or place id.
What if business people don’t use these primary keys, and have no other way to uniquely identify an instance of the entity type you envisage?
What if it cannot, or need never, be recognised that an employee and a customer might be the same person?
If so, there is a question mark against the usefulness of the primary key you invented (person id), and against any normalisation that associates attributes with that key.
Generalisation can merge into one entity type what are better distinguished as different entity types in LDMs and PDMs.
How far partitioned into entity groups?
What if normalising an entity group results in distinct sub groups, with no meaningful or useful relationship between them?
Probably, the entity group should be further divided.
What if there are strong, meaningful and useful relationships between entities in different entity groups?
You can consider either merging closely-related entity groups, or using application integration to maintain the relationships
See additional in practice remarks below.
A LDM usually describes the data maintained (or to be maintained) in one data store, usually associated with one business application.
A LDM should relate to one and only one Detailed Business Data Model
A LDM is usually normalised, but might be de-normalised to reflect the processing or security needs of an application.
A LDM normally relates to one and only one PDM.
A PDM defines the physical structure of a data store, usually associated with one business application.
The PDM should be mappable (directly or indirectly) to a corresponding LDM.
However, a PDM may be de-normalised to support processing or security requirements, for example by composition and by generalisation.
In a PDM, logical data entities may be divided into smaller physical data entities or aggregated into a larger physical data entity.
Generalisation of a PDM can remove meaning from the data store structure and make mapping to an LDM or Business Data Model difficult.
Could two LDMs or two PDMs contain the same Business Data Model-level entity? See below.
The paper above reflects the proposals in Eskil’s paper – if I understand it correctly.
I have added some in-practice observations above and below.
There has been something of reaction against large monolithic applications.
People talk about Component-Based Design (CBD) and Service Oriented Architecture (SOA).
They discuss dividing the enterprise’s application portfolio into what
· used to be called business components (1990s)
· may now called micro services or micro apps
Martin Fowler discusses the characteristics of micro services:
For commentary on these characteristics, read the "Micro services - micro apps" paper on the “Software architecture” page at avancier.website.
The idea is that each application offers a unique set of information system services, or use cases, based on its own data store.
Where needed, applications are integrated, perhaps via middleware of some kind, perhaps simply by REST.
This idea may be realised by assigning part or all of one Detailed Business Data Model to one application.
A business might envision (say) 50 such data-centric applications.
Note that dividing a de-duplicated enterprise architecture model into distinct systems implies
· some duplication in solution architecture level models.
· application integration overheads.
Remember the EA is a champion and architect of system integration and reuse.
De-duplication of data entry is a goal of EA.
Eskil’s proposal de-duplicates data definition in Detailed BIMs.
This means a data mastering policy can be imposed by assigning the maintenance of some or all the entities in a Detailed BIM to one application.
Ideally, the master version of a specific data entity (or specific attributes of it) should be maintained by one application.
Of course, other applications may need to refer to that same data.
Those applications can hold a copy of the data and (sooner or later) update it from the master, or else hold a pointer to the master.
There are at least eight relevant application integration patterns.
This means PDMs may well contain duplicate data definitions.
And this duplication may appear also in the corresponding LDMs.
The duplication is typically recorded a data dissemination matrix (in Avancier Methods at least).
Generalisation affects traceability between data models at different levels
There can be mis-matches in the degree to which entities are generalised or specialised at different levels.
Generalisation of a BDM can merge into one entity type what are better distinguished as different entity types.
Generalisation of a PDM can remove meaning from the data store structure.
Either of these can makes traceability up and down the levels difficult.
Generalisation affects traceability between data and process models
Business terms, facts and rules can be specified in data models or process models, or both.
If data entity types are generalised, then the weight of specification shifts from data models into process models.
Creative Commons Attribution-No Derivative Works Licence 2.0 15/07/2015 18:26
Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.website” before the start and include this footnote at the end.
No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it.
For more information about the licence, see http://creativecommons.org