On maintaining a large persistent data structure

And its impact on agility

The importance of large data structures to business (recap from the paper on OO thinking)

A software system is a discrete event-driven system.

It responds to input events (messages) by performing operations.

It determines what to do by comparing input data against persistent state data (memory).

The memory takes the form of a data structure that relates data types of interest to users of the system.

COBOL was designed for business applications that process large data structures.

It was supposed to be readable (if verbose) and maintainable.

People wrote un-maintainable code – due to the lack of disciplined modular design.

OO programming languages came to prominence at the end of the 1980s.

OOOLs were designed for applications that process small data structures.

They were supposed to be elegant (non-verbose) and efficient - and maintainable.

People still wrote un-maintainable code – partly due to the difficulty of processing large data structures.

Business applications have to maintain large data structures.

A business database schema might contain hundreds of data types (and in operation, millions of data instances).

Chopping a large data structure up to suit OO thinking creates complexity in messaging and other difficulties.

Difficulties include maintaining data integrity and querying/analysing a whole data set.

The logical structure of persistent business data

In a business application, users do not know or care about a data server’s physical storage technology or location.

But they do care about the logical data structure, since it records entities and events of interest to them.

The logical structure of business database is an expression of a domain-specific language known its users.

The terms, concepts and rules of a business domain are embedded in the structure.

E.g. It contains names for business concepts like Customer, Order and Product.

And business rules that constrain the grammar of sentences in which these words are used.

E.g. an Order has a Value, an Order must be placed by a Customer.

The users of the data server that maintain this state data must “know” these business concepts and rules.

They need this knowledge to update the state data, and to make sense of any data structure extracted from it.

The logical data structure of a business database rarely is well described as in OOPL case studies.

Drawing class hierarchies with specialisation relationships between subtypes and super types is rarely a good model.

Because the persistence and fuzziness of real world entities turns object subtypes into states or roles of an object, and inheritance relationships into associations.

So, business data is usually better described using one-to-many association relationships between entity types (as in an RDBMS for transaction processing).

Or else as data structures that relate entity types as they appear in the structure of data entry forms (as in a document store).

Coupling clients to a server’s internal data structure

Sometimes, a client and server may share a vocabulary for facts remembered in the server’s state.

Meaning that the names the server uses for its internal state variables are made visible in the parameters and responses of operations it performs.

Around 2002, I noticed that among the most successful of our projects were three systems developed using Visual Basic clients and SQL servers.

These systems were built more or less on time and to budget; the users liked them; the programmers found them easy to maintain.

Meanwhile, some projects struggled with radical structure clashes between OOPL code structures on application servers and database structures on data servers.

It seemed to me that (at least sometimes) the time and cost they invested in building and maintaining elaborate data abstraction layers was a self-inflicted wound.

The point here is not deprecate all data abstraction layers, only to point out they tend to complexify.

So, one way to keep a system simple is to minimise data abstraction layers.

Simplicity might be achieved by imposing the vocabulary of a database schema at the user interface.

But it is infinitely better, of course, to impose the vocabulary of a business domain on the database schema.

Decoupling clients from a server’s internal data structure

Alternatively, designers may decouple clients from a server’s internal state.

They insert some kind of data abstraction layer or broker between them.

The state data maintained by a server subsystem may be defined

· internally in the form of a physical database schema

· externally exposed to clients in using the terminology of a different, more logical, data structure.

The server has the job of transforming items in one structure into items in the other structure.

In effect, the names the server uses for its internal state variables are synonyms of the names used by clients for the same domain-specific facts.

The server can declare the business rules for an operation with reference to the data types in that logical data structure.

A modern practice is to decouple clients from the physical schema of a remote data server using the OData protocol.

A client asks the remote data server to return an XML schema that defines the logical data structure it maintains.

Clients create input messages using the data types in that structure, and receive responses expressed using those data types.

He point here is that the data abstraction layer hides the physical data format of variables rather than their logical meanings.

It decouples clients from the vocabulary a server uses to describe those remembered facts, but not the facts themselves.

OK, a customer’s Name, Address, Telephone Number and Debt may be input, output and stored using different data names and formats.

But each variable has the same meaning, conveys the same information, whatever its data name or format.

Hiding the variables in a database schema from external clients does not prevent them from knowing what those variables mean.

And translating data values from one format to another is an overhead you’d rather not have if you don’t need it.

The impact on agility and desirability of stabilising business data structures

Abstract descriptions are easier to change than concrete realities.

Process code is readily changeable, because (between run times) it is purely abstract description.

So, one can change the structure in which classes of transient objects are related on an app server.

It is much harder to change the structure in which data types of persistent entities are stored by a data server.

The persistent data structure is not readily changeable, because (between run times) it holds concrete data values.

And also, typically, because many clients depend on that data structure, and directly or indirectly they “know” what is in it.

In the earlier paper on agile principles, Kelly’s rule 10 was that “the specifications of hardware must be agreed to well in advance of contracting”.

Similarly, agile development methods recommend stabilising the “infrastructure” in advance of software development.

Where the infrastructure can include not only platform technologies but also any persistent data structure that must be populated and maintained.

Many business applications are built on top of a large persistent data structure that records entities and events of interest to business people.

Agile system development proceeds best when the logical structure of this state data (this memory) is stable.

So, it helps to get the logical data structure as complete and right as possible before coding.

It also helps to implement the data structure as directly as possible, to minimise the complexity and processing overhead of any data abstraction layer.

As OO thinker Craig Larman pointed out, decoupling with no good reason is not time well spent.