Facts and conclusions from dictionary/repository experience

This page is published under the terms of the licence summarized in the footnote.


How to manage the structure and behaviour of business systems if you don’t know that structure or behaviour?

How to plan changes that are optimal for the enterprise (rather than suboptimal and localized) if you have no enterprise-wide repository of the system estate?

On the other hand, why is building and maintaining an enterprise-wide architecture repository more challenging than EA frameworks and CASE tool vendors tell you?

(CASE = Computer-Aided System Engineering).


It is one thing to populate an architecture repository for a discrete solution or migration planning exercise.

It an entirely different thing to maintain an EA-wide repository of the kind envisaged by so many gurus since the 1980s.

The paper below was edited from messages posted by David Eddy in an on-line discussion, and posted with his approval.


Our abstract system dictionary/repository. 1

Why detail matters. 2

David’s conclusions. 2

Definitions of some system elements in the dictionary/repository. 2


Our abstract system dictionary/repository

We use a dictionary/repository to record system elements, and the relationships (direct & indirect) between the elements.

We can track more than 50 types of element, only a few are mentioned below by way of example.

Generally speaking, elements are related thus:



·         PROGRAMS have LOGIC and LINKS to other PROGRAMS



For example, one client, a small company, maintains a dictionary/repository describing these system elements.

·         10,000 employees,

·         826 applications,

·         50,000 programs,

·         154,000 JOB Steps,

·         120,000 datasets, and

·         38,000 parameter control cards (configure how you want your (Mac or Windows) machine & applications to behave).


This company records 1.7 million elements, but the meat is in the relationships, which enable fast, accurate research.

A calls B is a direct relationship. A calls B calls C is an indirect relationship.

Defining all such dependencies is complex, especially when deriving them from production code.

Why detail matters

One data element may have several formats.

An individual may provide a Zip code 99999 or 99999 9999 formats.

But a US mail order house will likely use the full 99999 9999 99 99 format.

This kind of technical detail gets lost in a top down directions from the bridge.

An officer may think it enough to define "Zip Code" (everyone knows what that is) in a business data model.
But if somebody requests use of Zip Code 99999, and software engineers don’t know to push back and ask for 99999 9999 99 99, that leads to expensive trouble.

Suppose you're posting a million pieces of mail every other week

Then to get best postal discounts, the post office will demand the full 99999 9999 99 99 format

Most people have no idea Zip Code can be up to 13 digits long.

A good data dictionary / metadata repository will show analysts/programmers there are multiple Zip Code formats.

And then, which is correct for this situation? At least you know what the options are.


Boring? Sure. But big bucks too.

There's a story about a mutual fund system that did not handle Canadian Post Codes.

An important client didn't getting his statements.

David’s conclusions

Humans are and must be removed from documenting the production system estate.

All of the 1.7 million system elements in our dictionary are machine readable. 

Most are static, unchanged for long periods of time; so you need update the dictionary only when elements change.

If the ideal software configuration tools are not available, then scanning & loading the dictionary can be done in a single job on a weekly basis.

Definitions of some system elements in the dictionary/repository

The client above tracked more than 50 kinds of system element, only a few are defined below (this is dry enough).

While there are similarities from company to company, which elements they wish to track from their production environment does vary.


The digital elements are defined below in as technology-neutral a way as possible.

Remember, this is an example from a mainframe environment, so the terminology reflects this.

Described in the jargon of distributed systems, the terms would be just as dry and opaque.


Application: a fuzzy thing. The boundaries of applications are highly subjective, often squiggly.

Applications tend to be defined by a business related function… accounts payable, accounts receivable, etc.

Over time, business functions are often mushed together.

People can easily think of a spreadsheet & 500 programs as two applications.

A package application like MS Office and a Fortune 500’s business applications are only remotely related..

Program: is the source code—COBOL, Fortran, Assembler, etc.—that programmers use to express business logic.

Load Module: An executable. Source programs are compiled into load modules. The load modules are what runs in a JOBstep.
COPYbook: used to be confined to defining schemas or record layouts. Today they are also used to include procedural code as practiced in languages such as C & Objective-C.
Dataset: IBM jargon for “file.”

DB2: the well-known relational database from IBM.

DB2 Column: a column/field in a DB2 table.
DB2 Database: a DB2 database. [Sorry about that.] DB2 is IBM’s flagship large company RDBMS. There is a LUW version.
DB2 Plan: a means of organizing DB2 resources. (A DB2 DBA gave me a detailed explanation which went right over my head. Think complex & vital.)
DB2 Table: a collection of columns. A DB2 database will have many tables.

IMS Database: IBM’s flagship database product before DB2.

Introduced in late 1960s from the Apollo moon shot, as of 2008, IMS was still generating $700M annually for IBM.
IMS PSB: (Program Status Block) Somewhat like a COPY but for IMS structures. (I think).

JOB: a job in JCL (job control language) defines what files, databases & programs will be used in a series of JOBsteps.
JOBstep: one step in a JOB with perhaps a 100 or more steps.
Parameter Card: configuration data (e.g. a single character D, W, M, Q, etc. for daily, weekly, monthly, & quarterly) telling a program what to do.

Similar to the preferences on a PC that customize how you want your computer to look & behave.
TSO Parm Card: Time Sharing Option Parameter Card, contains settings to configure an online environment.



Copyright conditions

Creative Commons Attribution-No Derivative Works Licence 2.0             10/06/2015 20:50

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.website” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it.

For more information about the licence, see http://creativecommons.org