Integrity challenges
This paper is published under the terms and conditions in the
footnote.
This paper outlines a number of contrasting
principles related to maintaining the integrity of a system’s state.
I've tried to convey what might be called "OO" and/or "Agile" principles with my usual scepticism.
And relate them to the tension between
enterprise architecture and agile system development.
Contents
Global
integrity v. local agility
Continuous
consistency v. eventual consistency
There is
a tension between broad-scoped enterprise architecture and narrow-scoped agile
system development.
This table contrasts the two in terms of choices faced by
people throughout history (after Mark Madsen).
EA tendencies |
Agilist tendencies |
Top-down |
Bottom up |
Authority |
Anarchy |
Bureaucracy |
Autonomy |
Control |
Creativity |
Hierarchy |
Network |
Consistency |
Flexibility |
Mark Madsen says of such: “in every choice, something is
lost and something is gained.”
Usually, a compromise has to be made.
Enterprise architecture looks to standardise, integrate
and reuse systems, with a view to minimising:
·
issues arising from hand-offs between people and systems.
·
cases where systems get out of step to the point where dangerous or costly
mistakes are made.
·
elaborate and expensive compensatory processes to undo the effects
of inappropriate actions.
Agile
system development is fine in many ways.
Local
agile development tends to result in silo systems that are not standardised,
not integrated and don’t share common services and resources.
The
enterprise architect’s desire for global integrity (data quality, system
integration and reuse) guides and constrains local system development.
And this
overarching governance inhibits agility to some extent.
All kinds of data replication create the possibility that data (stored or copied in
different places) becomes inconsistent.
When and
how to detect inconsistency, when and how to restore consistency, are among the
essential challenges of enterprise architecture.
Continuous consistency
One ambition of enterprise architecture is to eliminate
issues arising from inconsistent data sources.
Ideally,
all related data is updated together, inconsistencies are not allowed.
In
practice, this requires all related data to be closely located, ideally within
one database.
And
updates to the stored data will be ACID transactions, or atomic state changes.
A transaction
moves the stored data from one consistent state to the next consistent state
(or else fails).
Eventually consistency
This
means that inconsistencies are allowed temporarily, and consistency is restored
later.
In
practice, temporary inconsistency is forced on a business by the enterprise
data architecture, if not by human nature.
But then,
architects have to understand how that inconsistency will affect how people
behave in business processes.
The way humans behave is naturally asynchronous.
A person never stops (frozen, unable to think) waiting
for a reply from another person.
People work in different places, based on the information
they have locally, and send messages to each other.
Even if they do want a reply to a message, they think
about or do something else while they are waiting.
If there are discrepancies between items of information
held in different places, then things may have to be patched up later.
However, inconsistency makes work for developers.
Google on eventual consistency
“Designing applications to cope with
concurrency anomalies in their data is very error prone, time-consuming, and
ultimately not worth the performance gains.
“developers
spend a significant fraction of their time building extremely complex and
error-prone mechanisms to cope with eventual consistency and handle data that
may be out of date.
We think this is an unacceptable
burden to place on developers and that that consistency problems should be
solved at the database level”
F1: A Distributed SQL Database That
Scales”, Proceedings of the VLB Endowment, Vol. 6, No. 11, 2013
The CAP triangle refers to:
·
Consistency = All nodes see the same
data at the same time, so data consumers will always get the
latest information.
·
Availability = One or more node
failures will not stop surviving nodes from working, so your system will always
be available.
·
Partition tolerance = Inter-node communication failures will not stop
any node from working.
The CAP theorem can be summarised (this
is edited from Wikipedia) thus:
· The proof of the CAP theorem by Gilbert and Lynch is limited
· The theorem sets up a scenario in which
o two conflicting requests arrive at
o components replicated in distinct locations,
o at a time when a link between them is failed.
· The obligation to provide Availability despite Partitioning failures means both components must respond.
· At least one response shall necessarily be inconsistent with what implementing a true one-copy replication semantic would have done.
· The researchers then go on to show that other forms of Consistency are achievable, including a property they call Eventual Consistency.
The CAP theorem doesn't rule out achieving consistency in a distributed system, and says nothing about cloud computing or scalability.
However, people use the CAP triangle to explain other things, with implications for how to build systems in practice.
These two patterns may be contrasted with reference to the CAP triangle (if not the CAP theorem).
Remember there is a difference between replicating data between:
· different tiers of one client-server application
· differently structured update (OLTP) and reporting (OLAP) data stores
· data copies in the data storage tier of one client-server application.
· different databases at the bottom of different client-server applications.
ACID: Atomic,
Consistent, Isolated, Durable
If you want data to be consistent, conformant to rules, correct and up to date.
Then you may use the ACID pattern.
· RDMS (ideally a normalised data structure)
· Synchronous write to central disk
· Referential integrity checking
· ACID transactions - rolled back on error.
In client-server system, the database may be:
· Consistent
· Available enough for business needs
·
At the expense of Partition-tolerance
If a failure prevents access to the
database, then server-side components cannot guarantee a consistent response,
and may reject the query or command.
BASE: Basic
Availability, Soft-state, Eventual consistency
If you are happy with stale data, approximate answers, and eventual consistency.
Then you may use the BASE pattern.
· Divide data between nodes of the network.
· Maximise read availability through replication of the data
· Minimise update response time via asynchronous replication
In a client-server
system, the higher-level components are usually:
·
Available - components respond immediately and
continually
·
Partition-tolerant - components respond even if a
failure prevents access to lower-level resources.
·
At the expense of perfect Consistency.
Though consistency in underlying database
may still be required.
CAP in a conventional
client-server system
Remote
client nodes send Queries and Commands to server nodes.
Since the
network will fail at some point, P is needed.
P between client and server nodes requires clients to
call asynchronously.
A of server nodes is achieved by scaling out the
upper-most nodes
But A of
the upper-most server nodes doesn’t guarantee the C of the data returned.
If A matters more, use cached data.
The server node returns their version of the required data immediately, a possibly stale copy of the master data.
Caching data increases the chance of inconsistency.
If C matters more, use master data
The server node consults the database or other “master” node to get the latest data before returning a response to a client,
Checking data for consistency (given the risk of a node or network outage) decreases the level availability.
How to optimise the balance between Consistency and
Availability?
·
invalidate
caches based on time and/or
·
update caches from Events published by the master node.
ACID v BASE
Eric Brewer interpreted CAP as precluding consistency for components in
the highly scalable first tier of a modern cloud computing system.
So, CAP means we sacrifice consistency to gain faster responses in a
more scalable manner.
It’s harder to design in the fault-tolerant BASE world (compared to the
ACID world)
But Brewer says you have no choice if you want very high throughput
and concurrency.
Remember global
integrity v. local agility
Bear in mind the wider EA problem, replication of data in different databases at the bottom of different client-server applications
Global integrity may favour data consolidation and continuous Consistency, but this is usually impractical.
Local agility tends to result in data replication, and leave Eventual Consistency as the most practical option.
Footnote: Creative Commons Attribution-No Derivative Works Licence
2.0 20/04/2015 08:22
Attribution: You may copy, distribute and display this copyrighted work
only if you clearly credit “Avancier Limited: http://avancier.website” before
the start and include this footnote at the end.
No Derivative Works: You may copy, distribute, display only complete and verbatim
copies of this page, not derivative works based upon it.
For more information about the
licence, see http://creativecommons.org