On maintaining a large persistent data structure
And its impact on agility
A
software system is a discrete event-driven system.
It
responds to input events (messages) by performing operations.
It
determines what to do by comparing input data against persistent state data
(memory).
The memory takes the form of a data structure that relates data types of interest to users of the system.
COBOL was designed for business applications that process large data structures.
It was supposed to be readable (if verbose) and maintainable.
People wrote un-maintainable code – due to the lack of disciplined modular design.
OO programming languages came to prominence at the end of the 1980s.
OOOLs were designed for applications that process small data structures.
They were supposed to be elegant (non-verbose) and efficient - and maintainable.
People still wrote un-maintainable code – partly due to the difficulty of processing large data structures.
Business applications have to maintain large data structures.
A business database schema might contain hundreds of data types (and in operation, millions of data instances).
Chopping a large data structure up to suit OO thinking creates complexity in messaging and other difficulties.
Difficulties include maintaining data integrity and querying/analysing a whole data set.
In a business application, users do not know or care about a data server’s physical storage technology or location.
But they do care
about the logical data structure, since it records entities and events of
interest to them.
The logical
structure of business database is an expression of a domain-specific language
known its users.
The terms, concepts and rules of a business
domain are embedded in the structure.
E.g. It contains names for business concepts like Customer, Order
and Product.
And business
rules that constrain the grammar of sentences in which these words are used.
E.g. an Order
has a Value, an Order must be placed by a Customer.
The users of the data server that maintain this state data must “know” these business concepts and rules.
They need this knowledge to update
the state data, and to make sense of any data structure extracted from it.
The
logical data structure of a business database rarely is well described as in
OOPL case studies.
Drawing
class hierarchies with specialisation relationships between subtypes and super
types is rarely a good model.
Because the persistence and fuzziness of real world entities turns object subtypes into states or roles of an object, and inheritance relationships into associations.
So,
business data is usually better described using one-to-many association
relationships between entity types (as in an RDBMS for
transaction processing).
Or
else as data structures that relate entity types as they appear in the
structure of data entry forms (as in a document store).
Sometimes,
a client
and server may share a vocabulary for facts remembered in the server’s state.
Meaning that the names the server uses for
its internal state variables are made visible in the parameters and responses
of operations it performs.
Around 2002, I noticed that among the most
successful of our projects were three systems developed using Visual Basic
clients and SQL servers.
These systems were built more or less on time
and to budget; the users liked them; the programmers found them easy to
maintain.
Meanwhile, some projects struggled with
radical structure clashes between OOPL code structures on application servers
and database structures on data servers.
It seemed to me that (at least sometimes) the
time and cost they invested in building and maintaining elaborate data
abstraction layers was a self-inflicted wound.
The point here is not deprecate all data abstraction layers, only to point out they tend to complexify.
So, one way to keep a system simple is to
minimise data abstraction layers.
Simplicity might be achieved by imposing
the vocabulary of a database schema at the user interface.
But it is infinitely better, of course,
to impose the vocabulary of a business domain on the database schema.
Alternatively,
designers may decouple
clients from a server’s internal state.
They
insert some kind of data abstraction
layer or broker between them.
The
state data maintained by a server subsystem may be defined
·
internally
in the form of a physical database schema
·
externally exposed to clients in using the terminology of a different,
more logical, data structure.
The server has the job of transforming items in one structure into items in the other structure.
In
effect, the names the server uses for its internal state variables are synonyms
of the names used by clients for the same domain-specific facts.
The server can declare the business rules for an operation with reference to the data types in that logical data structure.
A modern practice is to decouple clients from the physical schema of a remote data server using the OData protocol.
A client asks the
remote data server to return an XML schema that defines the logical data
structure it maintains.
Clients create
input messages using the data types in that structure, and receive responses
expressed using those data types.
He
point here is that the data abstraction layer hides the
physical data format of variables rather than their logical meanings.
It
decouples clients from the vocabulary a server uses to describe those remembered
facts, but not the facts themselves.
OK, a customer’s Name, Address, Telephone
Number and Debt may be input, output and stored using different data names and
formats.
But each variable has the same meaning, conveys the same information, whatever its data name or format.
Hiding
the variables in a database schema from external clients does not prevent them
from knowing what those variables mean.
And
translating data values from one format to another is an overhead you’d rather
not have if you don’t need it.
Abstract descriptions are easier to change
than concrete realities.
Process code is readily changeable, because
(between run times) it is purely abstract description.
So, one can change the structure in which
classes of transient objects are related on an app server.
It is much harder to change the structure in
which data types of persistent entities are stored by a data server.
The persistent data structure is not readily
changeable, because (between run times) it holds concrete data values.
And also, typically, because many clients
depend on that data structure, and directly or indirectly they “know” what is
in it.
In the earlier paper on agile principles, Kelly’s rule 10 was that “the specifications of hardware must be agreed to well in advance of contracting”.
Similarly,
agile development methods recommend stabilising the “infrastructure” in advance
of software development.
Where the infrastructure can include not only
platform technologies but also any persistent data structure that must be
populated and maintained.
Many
business applications are built on top of a large persistent data structure
that records entities and events of interest to business people.
Agile system development proceeds best when the logical structure of this state data (this memory) is stable.
So,
it helps to get the logical data structure as complete and right as possible
before coding.
It
also helps to implement the data structure as directly as possible, to minimise
the complexity and processing overhead of any data abstraction layer.
As
OO thinker Craig Larman pointed out, decoupling with
no good reason is not time well spent.