Logical business transactions v. physical database transactions

This paper is published under the terms and conditions in the footnote.

A business transaction shifts a business system’s state from one consistent state to the next.

A database transaction shifts a database state from one consistent state to the next.

This paper is about the impact of dividing and separating one business system state between databases.

Contents

Transactions. 1

Physical database transactions. 1

Logical business transactions. 2

Example 1. 3

Eventual consistency and compensating transactions. 4

Example 2. 5

Nobody needs reliable messaging. 6

But you do need a time out on request-reply queries and commands. 6

There is little or no “fire and forget” in business systems. 7

Transactions

Here, a transaction is a discrete event-triggered process that shifts a system’s state from one consistent state to the next state.

A transaction can succeed, or fail for various reasons.

Physical database transactions

Decades ago, structured design methods distinguished database update transactions from database queries.

· The queries contain nothing but read operations.

· The update transactions apply business rules and maintain stored data to an agreed level of consistency.

“Most business applications can be thought of as a series of transactions.” Martin Fowler.

These server-side transactions are triggered by Commands and Events, and act on stored data.

Different categories of failure can be seen in the error-messages or codes returned or published.

For example, the http protocol distinguishes five categories, using error codes as below.

1xx Informational (progress reports)

2xx Success

3xx Redirection (client has to send more data)

4xx Client Error

5xx Server Error.

In our business app context, the outcomes of a typical update transaction can be categorised as.

1. Success

2. Failure because the Command sender was not authorized to send the command.

3. Failure because the Command message contents are invalid in themselves (e.g. wrong data types, out of range values).

4. Failure because the Command message triggers an operation that detects a reason to reject the message (e.g. integrity errors).

5. Failure for another reason (e.g. bug, out of memory, database connection failure).

Numbers 2 to 4 are failures to meet the logical business preconditions for the Command to succeed.

Number 5 is a failure to meet the physical technological preconditions for the Command to work.

Most database systems come with a transaction manager.

On success, the transaction manager will commit the transaction by changing data values to the new consistent state.

On failure, the transaction manager will roll back any update done so far, leaving the data in the consistent state it was before the transaction started.

Logical business transactions

An issue for enterprise architects and business people alike is the division and distribution of business data between data stores.

Partly-overlapping data structures are maintained by apps working on data stores at different nodes on the network.

These data stores are maintained by different apps, built at different times for different reasons, using different technologies.

Data may be divided such that (for example) products are stored in one database, and orders for products are stored in another.

Also different attributes of one entity (say employee address and employee salary) may be mastered in different locations.

A business transaction shifts a business system’s state from one consistent state to the next.

An database transaction shifts a database state from one consistent state to the next.

Clearly, separation of the business system state across several data stores can create difficulties

When a Place Order Command is processed in the CRM database, is the product that the customer wants available?

When the Contract Renewal Command is processed, is the contractor still at the address that was copied into the Contract Management system?

There are many approaches to managing a business transaction that acts on separate data stores, including the three below.

Consolidate data stores

If all related data stores can be merged into one, then business transaction and database transaction can be aligned.

Suppose all customer, order and product data is held in one database.

When a customer presses the Place Order command button, the app can check and update both Customer Credit Limit and Product Stock Balance.

This business transaction can be implemented as an database transaction – which is either committed or rolled back by an database transaction manager.

Unfortunately, consolidation of data stores takes more time and money than most want to spend, and may be impractical for other reasons.

Distributed transaction manager - continuous state

Some years ago, it was widely assumed the distributed system problem could and would be solved by the use of additional infrastructure/platform system software.

A federated or distributed transaction manager enables an app layer transaction to be committed or rolled back across several database transactions.

But this creates a tight coupling between remote apps.

The increasing distribution of systems, across ever wider networks, makes this a risky solution, because a network connection can during a transaction.

This can leave one component in the dark about whether to commit or roll back a transaction (also known as a transaction 'in doubt').

Event-Driven Architecture (EDA) – eventual consistency

The solution assumed in the CQRS pattern is that when one node commits, it will publish an Event to say that has happened.

Each app processes its own transactions and posts the outcome (success or failure) in the form of Events that apps on the same local network will be notified of.

EDA using “publish/subscribe” and "fire-and-forget" patterns, is sometimes promoted as simplifying systems.

But the truth is that, overall, it creates complexity and extra work for developers.

Remember what Google say on eventual consistency:

“Designing applications to cope with concurrency anomalies in their data is very error prone, time-consuming, and ultimately not worth the performance gains.”

“developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date.

We think this is an unacceptable burden to place on developers and that that consistency problems should be solved at the database level”

F1: A Distributed SQL Database That Scales”, Proceedings of the VLB Endowment, Vol. 6, No. 11, 2013

Example 1

Suppose related business data is maintained by three app components - Customer, Product and Accounting apps.

The apps are logically inter-dependent, because some business transactions involve all three.

E.g. the Place Order business transaction requires Customer and Product apps to both succeed or both fail.

Suppose Customer, Product and Accounting apps are decoupled by an EDA from knowing each other

The Customer app publishes Order Placed Events; the Product and Accounting apps subscribe to receive these Events.

The software architect of the Customer app may know nothing of the Product and Accounting apps

He/she may say that the Customer app does not care about what other apps that process the Order Placed Event.

But the enterprise architect cares; else, why make those apps subscribe to the Event?

And what those apps do with the Event can in turn influence the design of the Customer app.

Suppose the business rules require that an Order is successfully processed by both Customer and Product apps.

In other words, there is a logical OrderPlacement transaction that spans the two components.

In an EDA:

· the Customer app processes the PlaceOrder Command

· the Customer app posts an OrderPlaced Event.

· the Product app detects that Event and processes it.

What if the Product app rejects the Event?

The business rules mean the two apps must be coordinated.

The designer may coordinate the apps by designing an overarching control procedure, workflow, or saga.

Or else, design inter-component communication

· The Product app posts an OrderRejected Event.

· The Customer app processes the OrderRejected Event.

So, it turns out the customer app does “care” about what happens after it posts the OrderPlaced Event.

Since it must know how to recognise the undo Event and perform the appropriate compensating transaction.

Eventual consistency and compensating transactions

What is logically one business transaction has to be implemented one way or another

If it cannot be implemented one database transaction, then business people have to think about what happens.

Keeping orders coming in essential to most online businesses

So, usually, they accept all orders and then figure out how to grant these orders later.

If a Place Order command is processed by the CRM app without knowing there is product available, then the order has to be accepted without certainty it will succeed.

The problem can be mitigated by caching product stock levels in the CRM app, synchronised (eventually) with the Product app.

The CRM cache can be kept reasonably up-to-date by listening to Events about product shipments/stock updates published by the Product or other apps.

Since Events can travel fast (seconds down to milliseconds), the CRM system can be reasonably accurate in predicting whether or not an order can be accepted or not.

Still, if it does turn out the required product is not available, then additional processes and work arounds must be designed, and the potential customer must be placated in some way.

The company could solve this by informing the customer and offering a discount on the current or following orders as a compensation.

In other words: much can be done on the business side to solve problems not necessary best solved by software solutions.

Generally speaking, the business transaction is implemented a some kind of workflow, as a sequence of distinct database transactions

And designers have decide what to do if the business transaction fails at a second or subsequent database transaction.

The question then arises as to whether the logic of the business transaction is best implemented as

· A set of rules distributed between components, each app reacts Events as they arrive, meaning there is no component that “knows” the whole transaction?

· A central workflow, a transaction coordinator or process controller, that orchestrates all the distributed components?

OO purists tend to favour the first approach, partly out of a feeling that software components should collaborate as humans can do, without overarching management

But biological analogies are questionable, since software is designed to support end to end processes in the first place; and the second approach is respectable in theory and practice.

Example 2

In discussion, Bavo de Ridder provided this example:

“The Sabre system (which manages communication between airlines and related companies) is Event-based.

For example, Delta Airlines fires an Event that flight D-1234 has departed from Brussels to New York

That Event must be processed by the cleaning and catering company in New York, so that plane will be serviced after arrival.

Delta doesn’t want the responsibility of guaranteeing message delivery (as this would expose Delta to the knowledge of all Event subscribers)

On the other hand that Event must be handled by the catering and cleaning company.

Because if the Event is not processed, Delta’s business continuity is threatened.

Sabre solves this by having a Policy Manager that contains the business rules relating to Event handling.

The Policy Manager – applying a configured business rule - ensures the catering/cleaning company has acknowledged the message.

The pattern they use is called "VPEC", sadly there is little documentation about this app pattern.

Most information you find is about the business pattern counter-part VPEC-T.

I yet have to see an Event-based architecture that does not have some business rules of this kind.

But those architectures rarely cater sufficiently for the rules.

At best it, assumes that the reliability features of the messaging middleware takes care of this.

In other words, things can and often do go wrong.”

Nobody needs reliable messaging

You can send a letter by registered post and be assured by the post office that they delivered the message to the address.

But this does not mean you know the delivered message was read by the person you sent it to.

Communication between parties over a network employs components that work at different levels of the communication stack.

A component at a higher level delegates details of message transport to components at lower levels.

A component at a lower level of the communication stack cannot do the job of a component in the level above it.

The TCP ensures data packets arrive at a node on a network, but does not ensure messages (composed from data packets) arrive at apps.

Middleware products can ensure messages arrive at apps, but cannot ensure those messages are read by the humans who use those apps.

If there is a business rule that a message must arrive, then it belongs in the business logic of the business information systems

You cannot relegate the responsibility to the messaging layer.

Read "Nobody needs reliable messaging" (http://www.infoq.com/articles/no-reliable-messaging) for further explanation.

But you do need a time out on request-reply queries and commands

A golden rule to build robust systems is to wrap any call to a system you are not a 100% sure about, in a time-out.

In practice this translates to calling anything not in the same address space … even inter-thread communication.

The question is: who can and will respond to the time out happening?

In most (all?) cases this is the caller, not the messaging middleware.

In the VPEC pattern, the policy manager can also respond to time outs.

I suspect that in VPEC the policy manager will absorb more and more business logic (read: all logic related to non-happy flows) turning it into a hard to manage monolith and single point of failure.

But reality is that Events must be handled, it’s never “fire and forget”.

There is little or no “fire and forget” in business systems

In practice, little or no communication is truly or completely "fire-and-forget".

Not least because failures in components that subscribe to an Event tend to rebound on the components that published that Event.

From a business point of view, when an Event is published by an app, it usually must be processed by the apps that subscribe to receive the Event.

This section describes some ways to send a Command or Event

Request-reply

A client component can invoke a server component to perform a transaction using a synchronous request-reply mechanism.

The client establishes a connection to the server and waits for a reply.

The server replies with a success or failure messages; the client resumes on reply.

Fire and forget

The sender sends a Command asynchronously, then gets on with its own business.

The sender assumes (or does not care) whether the Command arrives at the receiver.

On receiving the Command, the receiver performs a transaction, but does not reply to the Command sender.

When you press the print button in a word processor, you expect it will place a message in a message queue for the printer.

Suppose the printer does not reply to the word processor.

The writers of the word processor may call that a fire-and-forget invocation.

But those writers understand you will or can get a response anyway.

You may see a response if the printer isn’t connected, or out of paper.

You can visit a shared data space (the print monitor) to see what is going on.

So when you press the print button isn’t a really fire and forget Command as far as you are concerned

Fire and wait for acknowledgement

The sender sends a Command asynchronously, then waits for acknowledgement of receipt before proceeding with its own business;

On receiving the Command, the receiver replies to acknowledge receipt, but does not report the outcome of the Command.

Fire and be invoked later with a reply

The sender sends a command asynchronously, then gets on with its own business.

However, on sending the message, the sender also creates some kind of proxy, through which the receiver can reply.

Eventually, the receiver replies to the proxy, which in turn invokes the sender.

Fire and look out for a relevant Event later

The sender sends a command asynchronously, then gets on with its own business.

However, after processing the message, the receiver publishes an Event to report the transaction outcome

So, the sender can wait for Event to arrive in some kind of in-tray, perhaps within a defined time span.

Footnote: Creative Commons Attribution-No Derivative Works Licence 2.0 23/04/2015 10:57

Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.website” before the start and include this footnote at the end.

No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it.

For more information about the licence, see http://creativecommons.org