CQRS and Event
Sourcing
This paper is published under the terms and conditions in the
footnote.
This paper outlines a number of contrasting
principles and patterns,
I've tried to convey what might be called "OO" and/or "Agile" principles with my usual scepticism.
And relate them to the tension between
enterprise architecture and agile system development.
Contents
Command-Query
responsibility segregation
Segregating
data stores as well as processing components
Smaller
simpler components – more complex coordination
Event sourcing v. database transaction
processing
Database transaction processing
Notes
on the integrating the patterns above.
CQRS
and update/reporting data store separation
Domain-Driven
Design, CQRS and Event Sourcing
Appendix
1: some (reader supplied) code
Appendix
2: disputable (reader supplied) observations on Command v Event
Greg Young and Udi Dahan (specialists in Event-Driven Architectures and Event-Sourcing) took Meyer’s Command Query Separation pattern to a higher application interface level.
They proposed application component interfaces should contain only update operations, or only Query operations.
The idea
is that splitting writes from reads should make systems faster, more stable,
more testable and more maintainable.
So there are:
· Application components that write state data, change something without returning data (bar failure messages).
· Application components that read data, get something without data updates or other side-effects.
The CQRS pattern features three kinds of communication: Queries, Commands and Events.
Typically, a Command or Event is handled by a single component, and starts and commits one physical database transaction.
A Query is sent,
usually by a UI component to retrieve a snapshot of data
There is usually a response, a Data Transfer Object (DTO) that maps directly to a UI view.
|
Messages |
|
UI |
QueryRequest |
Data Component |
QueryResponse |
A Command is how a UI component requests action on data, like Place Order and Register Customer.
|
Message |
|
UI |
Command |
Update Component |
The server-side component features:
Command handlers: check Commands, retrieve data for transactions.
Command operations: perform transaction logic, store data and posts events (that an order has been placed or a customer registered)
An Event
is typically published after a transaction is complete to notify all listeners/subscribers
of what has changed.
An Event is typically in the past tense: Order Placed, Customer Registered.
|
Message |
|
Message |
|
UI |
Command |
Update Component |
Event |
Event subscriber |
What kind of communication mechanism?
This can be seen as logical - technology-independent - design pattern.
But it is often expected that Commands and Events will be transported by Message Queues or Message Bus.
Commands post updates in an Event store.
It contains time-ordered log of events, the state changes made by successful transactions.
The state of an entity or aggregate – at any time - can be constructed from the event log
Rolling snap shots save you have to start from the beginning of time.
Queries are directed to a data store called a View Model or Read
store.
It stores current entity state data, new states overwrite old states
It may be de-normalised to match UI views, and cached on the
Query server.
It is updated by Events published by Command operations.
What kind of data store?
Again this can be seen as logical - technology-independent - design pattern.
Read stores and Event stores could be relational or any other kind of database.
However, an Event store could well be a message queue or a NoSQL database.
Events (once posted) can be received and processed by any other component.
A Query component simply consumes Events.
A Command component may post Events that trigger business logic in another update component, which publishes other Events as a result, and so on.
|
Message |
|
Message |
|
Message |
|
UI |
Command |
Update Component 1 |
Event 1 |
Reporting Component |
|
|
Update Component 2 |
Event 2 |
Update Component 3 |
Suppose a Command triggers a Customer component to post an Event which is processed by Product and Accounting components.
Does the Customer component need to know which other components read and process that Event?
No. But the business cares about this, and system architects should know.
Does the Customer component need to know if the Event has been processed by any other component?
Perhaps. The business needs to know, and the system architects have to think about why and how two or more components are coordinated by Events.
If the Product component posts an OrderRejected Event, the Customer must process that Event.
So, the customer component does “care” about what happens after it posts the OrderPlaced Event.
It must know how to recognise the undo Event perform the appropriate compensating transaction.
The designer may coordinate the apps by designing inter-component communication, or an overarching control procedure, workflow, or saga
A saga can be introduced to coordinate applications involved in a long-running process or logical transaction
In our example, the saga for one logical business transaction links several physical database transactions.
The smaller and simpler the queries and update transactions, the more complex the coordination.
Complexity appears in coordinating physical transactions to implement the logical transaction.
And in adding compensating transactions to achieve eventual consistency at the end of the logical transaction.
Inconsistency between systems makes work for developers.
Remember what Google say on eventual consistency:
“Designing applications to cope with
concurrency anomalies in their data is very error prone, time-consuming, and
ultimately not worth the performance gains.”
“developers
spend a significant fraction of their time building extremely complex and
error-prone mechanisms to cope with eventual consistency and handle data that
may be out of date.
We think this is an unacceptable
burden to place on developers and that that consistency problems should be
solved at the database level”
F1: A Distributed SQL Database That
Scales”, Proceedings of the VLB Endowment, Vol. 6, No. 11, 2013
CQRS is a pattern for very high availability, very high
throughput - huge concurrency. Surely,
most apps are not like that?
And if one data store can support both Command and Query processing, why not?
Entities are things that persist, with continuity of identity (e.g. a bank account with a current balance).
Events are things that happen, which affect persistent entities (e.g. credit and debit transactions).
However, the distinction is blurred, since events can be identified and remembered, just as entities are.
And the current state data of an entity can be seen as a side-effect of an event stream that started when the entity was born.
The following is
partly edited from Martin Fowler’s web site: http://martinfowler.com/eaaDev/EventSourcing.html
The basic idea of event sourcing is that every change to the state of an application is recorded in an event object.
Event objects are stored in the sequence they were applied - for the lifetime of the application state.
The current state of entities (things that persist) can be recovered from the event log (rather than from a conventional database).
In the
Model-View-Controller pattern, the Views may retrieve data from the Event log
using Query messages.
The key to event sourcing is that all changes to domain objects are initiated by event objects.
A number of facilities that can be built on top of the event log:
· Complete Rebuild: discard the application state completely and rebuild it by re-running the events from the event log on an empty application.
· Temporal Query: determine the application state at any point in time. Notionally we do this by starting with a blank state and rerunning the events up to a particular time or event.
· Event Replay: If a past event was incorrect or missing, reverse to before that event and then replay the new event and later events.
What about the
wider business?
Event replay
works for one application in isolation, but what if this application
sends/receives data to/from other applications?
If you replay old
events in this application, you don’t want to update other applications a
second time, or collect data that has changed since you first collected it.
So, to replay
events may involve disabling the sending of events, and storing all data
collected previously, when events happened in the past.
A conventional database schema can be extended to store the events (e.g. debits and credits) that affect the persistent entities (e.g. bank accounts).
And note that a database management system records a transaction log.
A transaction log is not the same as an event log, but it does usually support the following operations:
· Recovery of individual transactions.
· Rolling a restored database forward to a given point.
· Transaction replication, database mirroring, and log shipping.
There are relationships between the three patterns above.
But note you
don’t need Domain-Driven Design or CQRS to separate database transactions from
database queries, to publish Events, or to use Event Sourcing.
These
work together because the CQSR pattern separates Command and Query application
components.
This
suits separation of the update and reporting data stores, if required.
These work together because both separate the processing
of update Commands and Queries.
Queries can be processed in simplest and most efficient
way, say executing stored procedures on the data store through a thin API
Layer.
Commands trigger update transactions which act on the
data of a Domain Model aggregate - retrieved from the same or different data
store.
A Command hander passes a Command to a Command operation
on the root entity of an aggregate
That root entity
operation validates the Command and applies it to data within the aggregate.
Commands/transactions and aggregates do not align
themselves by accident.
Aggregates are scoped with update transactions in mind,
so that most transactions access data contained inside aggregate.
If you combine
DDD, CQRS and Event Sourcing, then:
·
the
data store for Queries may be a called the Read store
·
the data store for Commands is an Event store.
Event store is a
logical name here – it implies a particular kind of logical data model
The data storage
technology is whatever you choose, be it relational or non-relational.
Commands are
applied to Domain Model data, which must retrieved from the Event store.
After a Command (say,
Debit Account) has been applied to a Domain Model aggregate, the root entity
saves one or more Events (say, Account Debited) in the Event store.
(Although it's usually not the aggregate but a repository that is responsible for serializing and de-serializing the aggregate to and from the underlying data stores that 'saves the events'.)
Before a Command can be applied to an aggregate entity in a Domain Model, there are two things to do
First, assemble the
data of the aggregate entity on which the Command will act.
How? There are at least three options.
1. Hold current state data in the Event store
This turns Event store into a conventional database, with tables for Accounts, Debits and Credits).
2. Send a Query to the Read store
This breaks the segregation principle (and there is a risk that Event and Read stores are inconsistent).
3. Build the current system state (e.g. current account balance) by replaying Events (e.g. credits and debits).
This may sound impractical, but in code it's not really difficult to implement.
The basic idea is that a fresh instance of the aggregate is created in memory.
All events of that aggregate are retrieved from the event store, deserialized, and then re-applied to the aggregate to build it's current state in memory.
Then, a method that contains the logic of the Command is executed on the aggregate, such that business rules are validated and changes to the aggregate (i.e. by raising new events) are made.
Then, the new events are appended to the event store.
To optimize 3, you could store snapshots on the aggregate's state say once every 20 or 50 events.
So that you only have to retrieve and deserialize the latest snapshot of the aggregate and it's 20 to 50 latest events to rebuild its current state in memory.
This is hybrid between 1 and 3 basically.
Second, test any
preconditions for the Command
A Command (say Debit) usually has to check the system data is in a valid state for that Command.
Some preconditions test attribute values (e.g. does the Account hold enough money for the Debit?).
Others are known as referential integrity tests (e.g. has the Account been deleted?).
Again there are at least three validation options:
1. No validation (which results in data that is inconsistent with rules, though eventual consistency may be achievable later)
2. Pre-command validation (which leaves a small, perhaps acceptable, risk of inconsistency)
3. In-command validation (which minimises if not eliminates inconsistency)
Pre-command validation tests could be done before or when a Debit Command is posted by a UI Component.
But then, other Commands could change the Account state before this Debit is processed.
And applying the defensive design principle, the tests should be made again on the server side.
In general, at least some in-command validation is needed.
So in general, testing the state of the system is an integral part of processing a Command.
Remember
You don’t need Domain-Driven Design or CQRS to separate database transactions from database queries, publish Events, or to use Event Sourcing
Transaction scripts can equally well publish Events (for others to consume) and log Events (for subsequent Query and replay).
In general, at least some in-command validation is needed.
So in general, testing the state of the system is an integral part of processing a Command.
To prevent broken or illegal commands to be processed, you typically validate a Command before you process it (that is, at the server-side, where the Commands are received).
Validation means that you perform simple checks on the content of a Command, like checking whether or not all required data is present and whether or not this data conforms to simple constraints (similar to basic integrity checks on a database column).
These checks should not require any sort of context or other kind of external data to run - they should be executable everywhere.
This allows clients (UI) to validate a message before it is sent to the server and provide feedback to a user in an early stage that, for example, a form with required fields has not been filled in completely just yet. It's also a performance improvement since it prevents invalid data to be sent to the server.
This does, however, mean that valid commands will be validated for correctness both in the client and on the server, but since validation-checks are usually very quick and simple, this rarely is an issue.
The second level of 'validation' concerns the system's state, which is typically maintained and managed by the aggregates we talked about earlier.
This kind of validation can only be executed by simply processing the command and see where you end up, asserting data and business rules as you go.
For example, when a new order is placed by the following PlaceOrderCommand:
class
PlaceOrderCommand {
public readonly Guid CustomerId;
public readonly Guid OrderId;
public readonly List<Guid> ProductIds;
}
then the first thing that is acquired is the Customer-aggregate using the specified ID (constructor omitted for simplicitly):
public class PlaceOrderCommandHandler {
private readonly ICustomerRepository _customers;
public void Handle(PlaceOrderCommand
command) {
var customer = _customers.GetCustomerById(command.CustomerId);
}
}
If, for any reason, no customer with the specified ID exists, the repository will throw an Exception (implicitly stopping the Command from being executed, and causing the current transaction to be rolled back).
Otherwise, it will return the requested Customer-object.
This Customer-aggregate can then be used to place (create) a new Order-aggregate with the specified products selected:
public class PlaceOrderCommandHandler {
private readonly ICustomerRepository _customers;
public void Handle(PlaceOrderCommand
command) {
var customer = _customers.GetCustomerById(command.CustomerId);
var order = customer.PlaceOrder(command.OrderId,
command.ProductIds);
}
}
The PlaceOrder-method could now check, for example, if this Customer is allowed to place new orders.
There could be a Business Rule, for example, that says that Customers can only place orders when their credit has been checked and asserted.
In the case this Customer is allowed to place new orders, the Customer-object will create a new Order-aggregate using the specified ID and Products.
Finally, this order could be saved into its own repository:
public class PlaceOrderCommandHandler {
private readonly ICustomerRepository _customers;
private readonly IOrderRepository _orders;
public void Handle(PlaceOrderCommand
command) {
var customer = _customers.GetCustomerById(command.CustomerId);
var order = customer.PlaceOrder(command.OrderId,
command.ProductIds);
_orders.Add(order);
}
}
After the code completes, the OrderRepository-implementation will flush its changes to the database/event store (containing the new order) and the transaction will commit.
Again, it is not strictly necessary to use an event store for this.
In fact, event stores are rarely a feasible option related to all non-functional requirements, since they harness great advantages but also introduce extra complexity into the system (code, maintenance, less availble query capabilities, thus the need for a read model or other queryable data store).
Every message is either a Query, a Command, or an Event - to every sender and receiver.
A component can distinguish between message-types by name: Place Order Command, OrderPlacedEvent, GetOrderDataRequest and then handle them accordingly.
On a technical level, messages are nothing more than simple data-structures that are typically serialized and deserialized to get them across a network.
The type of the message (Command, Event, request or reply) is deduced from its name.
In OO-languages such as C# and Java, the name of a message is identical to its runtime type (i.e., message 'PlaceOrderCommand' maps to a PlaceOrderCommand class).
The code inside the server-component will then automatically 'know' how to process a message of type PlaceOrderCommand.
In fact, in .NET and Java many frameworks (WCF, MVC, NServiceBus) use reflection on message types to route the message to the appropriate handlers (a method that accepts an instance of that specific message).
Technically speaking, a component only distinguishes between one-way messages (no return value) and two-way messages (with a return value).
Since Commands and Events are both one-way messages, the receiving component doesn't have to be aware which is which.
However, functionally speaking, they are handled differently:
· Commands are requests that can be denied based on certain criteria (invalidation, security, business rules), whereas
· Events can only be discarded or ignored.
That makes a big difference in terms of design.
The publisher of an Event does not care if other components react to the Event (it doesn't depend on it).
So basically, no one has to explicitly report whether it successfully handled an Event or not.
Footnote: Creative Commons Attribution-No Derivative Works Licence
2.0 24/04/2015 10:24
Attribution: You may copy, distribute and display this copyrighted work
only if you clearly credit “Avancier Limited: http://avancier.website” before
the start and include this footnote at the end.
No Derivative Works: You may copy, distribute, display only complete and verbatim
copies of this page, not derivative works based upon it.
For more information about the
licence, see http://creativecommons.org