Agile 7 – Commentary on “Agile Architecture in the Digital Age”
Copyright Graham Berrisford. One of several hundred papers at http://avancier.website. Last updated 04/04/2019 17:24
This is a supplement to a series of mostly short papers.
Some comments made below presume the ideas in Agile papers 5 and 6 are understood.
This paper is a commentary on “Agile Architecture in the Digital Age” a white paper published by The Open Group at www.opengroup.org.
The quotes in blue below are from that paper.
The effectiveness of agile processes is too often jeopardized because the architecture and organizational pre-requisites of agility are neglected.
Since the 1990s, development teams have expressed frustration at things that prevent them using agile methods.
Some frustrations are reasonable, others are naive.
The fact is, some agile processes do not suit some development projects.
Projects should be assessed for suitability before they start (we have a nine-point score chart).
This White Paper proposes a new Architecture Framework (AAF), that meets the needs of the digital enterprise.
It is debatable whether the white paper is about “agile architecture”, or what that means.
It is certainly about design patterns that facilitate the “scaling up” of agile software development, particularly in the business context of a digital enterprise.
It [the white paper] develops a vision that combines in a unique manner:
• Methods for decomposing the system, and the organization that designs it, into loosely-coupled services and autonomous teams
Here “services” are application components or subsystems (not discretely requestable behaviours as in TOGAF and ArchiMate).
These subsystems are loosely-coupled so they can work relatively independently.
But if teams and subsystems were wholly autonomous, there would be no enterprise architecture at all.
They must be integrated to support the wider data integrity and reporting requirements of a business.
And to support the longer, higher level, business processes of the enterprise.
• Alignment mechanisms rooted in business strategy that promote a shared culture
that becomes the glue that keeps empowered organizations from falling apart
The white paper draws on some questionable socio-cultural thinking and analogies.
Intervening in a disorderly (aka complex) situation is one thing; building an orderly (aka complex) system is another.
• Architecture patterns that leverage the latest software innovations
in distributed computing, autonomous systems, data streaming, and artificial intelligence
Some/many of the principles and patterns are decades old.
• Validated learnings from very large enterprises that have started their agile-at-scale journey a few years ago
Most learnings come from FANGS. See discussion of FANGS above.
We observe that current architecture practices and skills come under scrutiny because they are typically anti-patterns of a lean and agile culture.
They are too often perceived to stand in the way of iterative development, Minimum Viable Products (MVPs), and collaboration.
Some of what follows seems a reaction against rigid top-down EA practices of (for example) large banks.
But some of those practices may be needed, e.g. to mitigate the risk of failing to processing money properly
And most of the organisations whose architects attend our training courses have never had such rigid EA practices.
The Architect today will pull on years of tradition, but will have to operate in a different way,
creating new artifacts, learning new skills, and working as members of cross-functional teams.
Architecture needs to create usable assets that resonate with engineering and operations teams.
This White Paper aims to address these issues and start to lay out the new architecture framework that the digital enterprise needs.
A recent McKinsey survey 1 shows that organizational agility is on the rise: “the need for companies to demonstrate agility is top of mind”.
Some agile development gurus side with sociologists in recommending new business organisation structures.
However, the organisation structure is normally the responsibility of business directors and senior managers
It is unclear how far enterprise, solution and software architects can influence this.
In any case, most of the guidance in the white paper is written from a software development organisation perspective.
Much appears to presume bespoke code rather than COTS packages.
An increasing number of large firms are deploying agile-at-scale frameworks, such as:
- the Scaled Agile Framework® (SAFe®),2
- Large-Scale Scrum (LeSS™),3 or
- the Spotify® Model. 4
Agile software development methods were initially written for use by a single team.
Agile architecture frameworks usually include a selection of regular agile software development principles.
What is meant by architecture, and agile architecture, is not always clear.
Does agile mean the architecture is flexible? Or one architecture will facilitate all agile software development?
And what does scaling up mean?
It sometimes means scaling up a system to handle extreme volumes of transactions/operations.
But it usually means widening the scope of agile methods to several teams working on related systems/subsystems.
Read this paper for some commentary on SAFe.
A paper published in IEEE Software 5 claims that process improvement alone cannot fix the root causes of poor agility:
“Agile practitioners have focused intensely on improving software development processes and not so much on technical health …
We’ve worked with several large organizations in which the application of lean principles produced underwhelming results …
This is because velocity measurement, planning poker, attacking defect backlogs, Kanban cards, pair programming,
or sprint-based planning do little to attack the root cause of problems that are inherently structural.”
Experiences from the field confirm the aforementioned: root causes are not limited to technical health,
they also originate in the organization and culture of the enterprise.
The verbatims below illustrates this:
The quotes below express frustration with EA practices and deprecate them.
Some is frustration at what seems bad EA practice.
Some suggests agile teams can or should be released from oversight and governance.
Some might be read to undermine the whole idea of EA.
• “Legacy methods tend to slow down the initial sprint … Subsequent sprints often change architecture models defined in previous sprints.”
Even agile gurus presume at least a little design up front.
Good EAs don’t prescribe software architecture down to a level likely to be volatile.
They do accept the need to maintain their higher level models in line with changes.
• “If we’re going to have to do a heavy architecture which plans for a year or two or five years into the future
on every one of those experiments, we’re in serious trouble. We cannot be agile.”
Good EAs do not preclude innovative and experimental projects.
They give waivers from enterprise-wide standards for those purposes.
• “... We want to be able to put in the smallest, simplest, minimum viable experiment, prove an assumption,
beef it up if we want to or follow it wherever it goes, pivot and follow it wherever it goes.
That means our entire architecture is going to be emergent based on where we want to go.”
“Our entire architecture” appears to mean the architecture of a single application, rather than enterprise architecture.
“... Ideally that pool of seniors in your team act as a kind of proxy architecture committee,
and we don’t have to go to someone who’s supposedly got the title sitting in an ivory tower,
and has never actually built that thing in the last ten years because they’ve been thinking high-level.”
That is certainly reasonable for low level software architecture decisions.
EA is supposed to be about core business processes and application portfolio-level management.
• “At an application level I think that those architects are a waste of time. I really don’t think they know what they’re talking about nowadays.”
There ought to be a two-way collaboration, educating the EA if need be.
Development teams should explain software architecture decisions, at least those that have an impact at the application portfolio-level.
Good EA has to be pragmatic and flexible – principles are guidelines to be used by experienced architects – not rules.
It requires collaborative behaviour to towards a target that may be continually re-factored.
• “Though the architecture discipline is needed, the architect’s role as a squad member is ill-defined …
Squad architects must attend all agile ceremonies, otherwise they are at risk of becoming marginalized.”
Using role titles as in our training courses; it is probably solution architects (rather than EAs) who ought to attend agile ceremonies.
If decisions made there have an impact on EA road maps, then somebody in the team should report that.
It is our firm belief that architecture (the thing) and architecting (the verb) cannot be treated separately.
Put differently, in an agile context technical health and the process dimension go hand-in-hand.
Generally, an architecture is a product of architecting processes.
Here, what is the architecture? The software code? Higher-level design documentation? A set of general principles and patterns?
Enterprises too often have neglected the architecture, organizational, and cultural pre-requisites of agility
(as the primary focus of agile transformations has been the process dimension.)
This is written as though enabling agile software development is the enterprise goal.
The truth is – in some projects, the scope for agile development is limited.
And giving software development teams their head can create the problems EA was invented to solve.
Generally, we see that:
• Enterprise Architects should (re)focus their attention on modularizing monolithic systems, because it is the number one pre-condition for agility
Have enterprise architects stop discussing this with solution/software architects?
• Existing architecture practices and roles need to evolve to remain relevant in an organization that adopts agile ways of working
Yes to some extent, though TOGAF (e.g.) is already so iterative and flexible it can be applied in an agile way.
• The body of knowledge of architects needs to be completed to meet the needs of the digital enterprise.
Does your enterprise have the needs of a digital enterprise? See discussion of FANGS above.
• Classical architecture governance models lose relevance when shifting from large programs toward multiple autonomous teams.
Perhaps, though the teams in a program (given they share a goal and don’t want to duplicate each other) cannot be wholly autonomous.
This White Paper formulates a vision that has the ambition of solving these problems.
It is based on the diagnostic below:
• When teams are not autonomous enough, it slows down continuous delivery which limits agility
Yes, though autonomous enough varies according to the business need for system integration and integrity.
• To avoid chaos, team autonomy must be balanced by alignment mechanisms that cannot rely on a command-and-control culture that otherwise would get in the way of autonomy.
Yes, though the top-down command and control view of EA is out of date
• New software architecture patterns deeply influence the evolution of Enterprise Architecture.
To be discussed.
• The digital enterprise needs a new architecture body of knowledge, new processes, and governance practices; architecture roles need to be redefined.
Is your enterprise a digital enterprise? See discussion of FANGS above.
Autonomy and Loose-Coupling
The 2017 State of DevOps Report 6 measured coupling between services and components by capturing whether:
• Respondents could do testing without requiring an integrated environment
• Applications and services could be deployed or released independently of other applications and services on which they depend
They discovered that high-performing teams were more likely to have loosely-coupled architectures than medium and low-performing teams.
That seems self-evident.
Obviously, the more isolated a system is, the easier the system is to develop and maintain.
The more inter-system coupling there is, the harder the systems are to develop and maintain.
One question is whether the coupling is avoidable or not – a distinction not drawn in the white paper.
Another is whether the coupling is logical or physical – a distinction not drawn in the white paper.
Our courses cover a dozen ways to be tightly or loosely coupled – which are significant here?
The 2017 DevOps report confirmed this and verified two new hypotheses:
• Teams that can decide which tools they use do better at continuous delivery –contrast [only tools] mandated by a central group
• In teams with strong IT and organizational performance, the architecture of the system is designed so delivery teams can test, deploy, and change their systems without depending on other teams for additional work, resources, or approvals, and with less back-and-forth communication.
Obviously, for productivity, software teams and subsystems should be relatively decoupled/autonomous.
But physical decoupling is not logical decoupling, and good EA should optimise the degree of decoupling.
Development team organisation
Academic research by MacCormak et al.8 demonstrates that a relationship exists between
the structure of an organization and the design of the products that the organization produces.
A natural experiment shows that loosely-coupled organizations develop more modular designs than tightly-coupled organizations.
Obviously so; but the optimal degree of coupling is determined by the business context.
The key takeaway is that to architect a loosely-coupled system, it is important to pay attention to the organization that will produce it.
Because the two are congruent, the reverse Conway law 9 suggests that the design of the architecture should influence the design of the organization.
That relationship being established, we will explore how to decompose a software system and the organization that will produce it.
Good EA recognises the software dev/ops organisation structure should reflect the software system structure - and it usually does.
Modularisation into software layers
Traditionally, architecture has focused on layering software systems based on technology concerns such as data access, business logic, application logic, or presentation logic.
Actually, a three-layer software architecture preceded the introduction of two/three/four client-server technology tiers.
(I know because I was teaching it in 1979.)
It was introduced for logical reasons, to separate concerns.
First, to separate the processing of user interface (UI) data structures from the processing of persistent data store structures
Given such a two-layer architecture, data-centric business rules are better applied in data store than UI layer.
And for various reasons, those business rules may even better be separated out into a layer of code sitting above the data store.
The main benefits are:
• Standardization which could limit technology risks and leverage economies of skills
• A form of modularity because changes in one layer do not impact layers below
Layering has created a generation of architects motivated by learning technologies to increase their market value, and less interested in learning the domain.
The proposal is that software teams should develop vertical slices of an application rather than horizontal layers.
OK, though that doesn’t rule out some team members wanting to specialise in either client-side or server-side programming.
In his seminal book,10 Eric Evans claims that the domain is the main source of software complexity.
In 2002, Martin Fowler observed that Domain-Driven Design is difficult to learn and best reserved for complex systems with “rich domain models”.
And in January 2017, Wikipedia said: “Microsoft recommends that [domain-driven design] be applied only to complex domains.”
(Where I guess rich probably implies substantial use of inheritance.)
Some domains are complex; some are not.
Creating a complex domain model creates a second of kind complexity.
That is, the structure clash between domain layers and data store layers.
He has developed a method, Domain-Driven Design (DDD), to address it.
The white paper goes on promotes DDD, though perhaps only at a superficial level.
Domain-Driven Design (DDD) decomposes the domain into sub-domains and contexts.
If done well, the resulting domain architecture defines a set of loosely-coupled services [think subsystems].
DDD focuses the attention on the vertical decomposition of the system.
Before DDD, there was business-component-based design.
Systems were partitioned (aka vertically decomposed) by dividing a persistent data structures so as to separate loosely-coupled “kernel entities”.
The system is decomposed into services that [each] encapsulate a set of homogenous [cohesive?] capabilities.
This usually means the service is a subsystem that acts on a cohesive subset of the persistent data structure.
The same idea underpinned business-component-based design before DDD, and today, it underpins microservices.
The modularity rule applies, cohesion is high within a service, inter-service coupling is low, and implementation details are hidden behinds APIs.
Good EA recognises this rule was advanced by Larry Constantine in 1968.
It also recognises that the rule applies with different force at different levels of service (subsystem) granularity.
Each service owns its persistence mechanisms and exposes its functions and features through well-defined interfaces.
Inter-service communication occurs through synchronous or asynchronous interactions.
When communication is asynchronous, messages or events link services through protocols such as the publish-and-subscribe one.
Note: inter-component communication can be synchronous.
Pub-sub and event-driven messaging are not mandatory.
No communication is allowed through database sharing, shared libraries, or other mechanisms.
The choice of tools and development stack is not constrained, which has two upsides:
• Innovation is not slowed down by technology standards that are likely to become obsolete over time
• Teams that can decide which tools they use do better at continuous delivery11
Note: no shared libraries. Does this mean no common subroutines?
A domain-oriented microservice can be seen as a subroutine.
The various clients of that subroutine do not communicate via it directly.
But they may communicate by means of data maintained in its supposedly private data store.
The filtering of this communication via a higher level API does not remove the logical coupling.
Services can be tested and deployed in isolation and are easily containerized, which helps speed continuous deployment.
Decomposing a domain requires deep domain knowledge to avoid designing services that expose APIs prone to abstraction leaks.
The level of granularity of services can vary.
When services are responsible for a single capability, they are referred to as microservices.
The trouble is that “capabilities” are just as composable and decomposable as “services” – so the rule is vague to the point of vacuity.
Choosing the right level of responsibility for each service – its scope - is one of the most difficult challenges.”12
Anecdotes suggest many (2014-2018) have decomposed applications into microservices that are too fine-grained.
This slows down process, and pushes complexity into the messaging and middleware.
DDD is now a well-adopted approach to help decompose a system into modular parts.
“Many microservice adopters have turned to Eric Evans’ “Domain-Driven Design” (DDD) approach for a well-established set of processes and practices
that facilitate effective, business-context–friendly modularization of large complex systems.”13
This discussion of DDD ought to distinguish its application at higher and lower levels.
The first of interest to the enterprise or solution architect
The second has been deprecated by Microsoft as too complex for most applications.
For all but complex applications, even Martin Fowler recommends “Transactions scripts” instead.
However, even a good method cannot replace domain expertise,
therefore Martin Fowler14 advises to start with a monolithic implementation and refactor it into microservices when the domain is better understood.
“Eventually the team merged the services back into one monolithic system, giving them time to better understand where the boundaries should exist.
A year later, the team was then able to split the monolithic system apart into microservices, whose boundaries proved to be much more stable.”15
Toward Modular and Empowered Organizations
Let us now look at how enterprises alter their operating models and organizational structures to become more agile and how it is congruent with the evolution of software systems toward modularity.
See the notes in the first section above on the business organisation and on the business change organisation.
The white paper is mostly about the latter, the organisation that is responsible for changing business systems – by software development.
The traditional way of steering change relies on programs and projects staffed from shared pools of resources.
In the new operating model, the focus is shifting toward stable teams with dedicated resources that are responsible for designing, building, and running products or services.
The process is managed by strong product owners, often from the business, who work closely with IT at all stages of the product lifecycle.
All roles are integrated within self-organizing feature teams or squads sometime regrouped into tribes.16 (Spotify practices)
The project manager role is shifting toward an agile coach role and line managers focus on capability building.
Remember the Spotify model is based on an enterprise whose business is its software product.
In other businesses, a difficulty with the new operating model is balancing supply and demand.
The demand for software change varies over time in one business unit and between business units.
Where there is little or no demand for change; how to maintain a dev/ops team?
A Practical Example
The example business is based on a single monolithic ecommerce application?
OK, but EA practices are designed for businesses that already have (say) 500 to 5,000 business applications.
We will illustrate this with an ecommerce enterprise whose business model is based on sales that last for a few days and are announced only 24 hours in advance.
Brands only appear twice a year … By frustrating demand and putting on time constraints, it aims to create desire and impulsive buying.
The enterprise and the software system are being decomposed into 50+ products managed by autonomous teams;
for example, a payments team and a logistic team whose missions are respectively to create and run the best payments or logistics products.
The current software system which is too inflexible and monolithic is being re-built in an incremental manner using the microservices architecture style.
OK. Sounds a reasonable strategy if the “products” are a based on relatively discrete data stores
See this Microservices paper for discussion of design tradeoffs.
Balancing autonomy with alignment
Much of this section is socio-cultural systems thinking about business organisations.
The white paper blurs the distinction between the business organisation and the business change organisation (EA and software development)
Partly because its references are to giant application-centric businesses (FANGS).
The white paper draws on some questionable socio-cultural analysis and analogies.
Neither a free market economy nor an army is about constructing, maintaining and extending a complex orderly system.
Managing or intervening in a disorderly situation is one thing; building an orderly system is another.
“Ensure that when the bottom spoke the top listened – was one of the challenges we would eventually have to overcome …
Order can emerge from the bottom up, as opposed to being directed, with a plan, from the top down.”17
It can indeed, but the general did not disband the army’s management structure or make his own role redundant.
The separation of decision-making from work characterizes command-and-control thinking.
It keeps managers out of touch of their operations.
A central tenet of this thinking is management by numbers which helps create a simplified and abstracted view of reality.
Good EA works in cooperation with solution architects working on projects.
Shortcomings of Command-and-Control
Setting up a straw man?
This section deprecates a strict top-down command and control style that I have not seen for a decade or more.
Command-and-control thinking is not an effective way of aligning autonomous teams because top-down flawed decisions are likely to clash with autonomous teams.
For example, the Spotify engineering culture is waste-repellent: if it works keep it, otherwise dump it.
At Spotify they skip or dump handoffs, useless meetings, and corporate nonsense.18
In contrast, agile organizations align work with a meaningful purpose … The few on the top provide clear vision, priorities, and missions.
Transparency gives a team access to the information and context it needs to make good decisions.
Well-informed teams are given empowerment and trust.
Access to privileged information is no longer a power source that middle managers leverage to impose their will upon their teams.
Remember, Spotify is one of the FANGS.
Changing the Organizational Model
Shooting at a straw man?
This section is an academic’s sales pitch for delegation of authority.
The shift from command-and-control to agility requires a culture change.
In his book “Reinventing Organization”19 Frédéric Laloux develops a taxonomy of organizational models.
The author observes that most modern global corporations are the embodiment of, what he calls, the Orange Organization type where the hierarchical structure dominates.
Virtual teams, cross-functional initiatives, and expert staff functions foster the innovative responsiveness that is needed to beat the competition.
The next stage of evolution, the Green Organization, retains the meritocratic hierarchical structure of Orange but pushes most of decisions down to frontline workers.
In Green Organizations “a strong, shared culture is the glue that keeps empowered organizations from falling apart.
Frontline employees are trusted to make the right decisions because they are guided by a number of shared values, rather than by a thick book of rules and policies”.
Too many enterprises that deploy agile-at-scale lack the empowerment, strong culture, and shared values that are pre-conditions to agile transformation.
Enterprise Architects who are used to operating in Orange Organizations are often ill-prepared to drive change toward agility.
Architecture governance models that were developed in Orange Organizations get into the way of agile transformation.
Note the fervour of the revolutionary that we must sweep away of the old to reach a brave new world.
Dual organisation model
This section copies another suggestion (rather than evidence of its success).
John Kotter has developed a model that describes how traditional organizational hierarchies can shift toward this next stage of organizational evolution.
For most companies, the hierarchy is the singular operating system at the heart of the enterprise.
But the reality is that this system simply is not built for an environment where change has become the norm.
Kotter advocates a new system – a second, more agile, network-like structure that operates in concert with the hierarchy to create what he calls a “dual operating system” – one that allows companies to capitalize on rapid-fire strategic challenges and still make their numbers.
“Accelerate”20 (XLR8) vividly illustrates the five core principles underlying a new network system, the eight accelerators that drive it, and how leaders must create urgency in others through role models.
The analogy in this section is weak, for reasons explained below.
Let us now illustrate the magnitude of change that is required.
General Stanley McChrystal confronted a nimble and agile enemy.
To cope with this, he had to change the culture and operating model of an institution that is used to command-and-control thinking:
“We restructured our force from the ground up on principles of extremely transparent information sharing (what we call “shared consciousness”) and decentralized decision-making authority (“empowered execution”) …
We dissolved the barriers – the walls of our silos and the floors of our hierarchies – that had once made us efficient …
We looked at the behaviors of our smallest units and found ways to extend them to an organization of thousands, spread across three continents.
We became what we called “a team of teams”: a large command that captured at scale the traits of agility normally limited to small teams …”
Reflecting on military history, McChrystal attributed the battle of Trafalgar’s victory to the organizational culture that Nelson had crafted.
This culture rewards individual initiative and critical thinking, as opposed to simple execution of commands.
Such a cultural change implies that leaders should not make or approve all important decisions:
“The wait for my approval was not resulting in any better decisions …
I came to realize that, in normal cases, I did not add tremendous value, so I changed the process …
The risks of acting too slowly were higher than the risks of letting competent people make judgment calls …
More important, and more surprising, we found that, even as speed increased and we pushed authority further down, the quality of decisions actually went up …”
Systems thinkers speak of top-down command, silos and collaborative teams.
These ideas apply differently in different organisation and situations.
E.g. The US army strives to destroy distributed terrorist targets; the complexity lies in the disorderliness of the situation.
McChrystal concluded top-down command was ineffective in disorder.
He encouraged autonomous teams to collaborate around a shared purpose.
By contrast, a business strives to coordinate distributed silo systems; the complexity lies in the orderliness of the wider enterprise system.
To paraphrase Jacquelin Conway.
“It may be said that silo systems help us develop and deploy each system, thus making that work more efficient.
But the consequences of dividing the enterprise into silo systems has a negative impact on the efficiency, effectiveness of the enterprise as a system.”
For silo systems, you can read “microservices”; or in an “agile architecture” context you can read “autonomous systems”.
The enterprise wants its end-to-end business processes to work efficiently and effectively.
This only happens if the supposedly “autonomous” subsystems are designed to be orchestrated or choreographed together.
If we accept the hypothesis that command-and-control is not an effective alignment approach to steer an agile organization, what is the alternative?
Leaders at the top need to provide guidance, feedback, and support to their teams.
They need to lead with purpose, which requires strategic clarity.
Business Architecture Patterns
In a seminal paper,21 Michael E. Porter writes: “The essence of strategy is in the activities – choosing to perform activities differently or to perform different activities than rivals.”
It all starts with the definition of strategic positions that can be based on customer needs, customer experience, and/or some product or service mix.
Strategy is about creating a unique position that involves a different set of activities that better meet customer needs while delivering superior experience.
The way these activities are implemented determines costs (operating model view).
The difference between the price customers are willing to pay and costs determines profitability (business model view).
You can read a distillation of Porter’s points related to business process reengineering here https://ebrary.net/18120/management/michael_porter
His value chain consists of all the activities necessary to produce and sell a product or service.
He says that “While operational effectiveness is about achieving excellence in individual activities, or functions, strategy is about combining activities.”
Architecting a business and its corresponding operating model can no longer follow a waterfall process steered in a “top-down” manner.
The Lean Startup book22 has popularized an incremental approach that relies on rapid experimentation and validated learning.
Autonomous teams that are in direct contact with clients are best equipped to define MVPs that are market-tested during rapid learning cycles.
Though autonomous teams are free to experiment, they need guidance.
It is good to “achieve excellence in individual activities”; however, Porter says “strategy is about combining activities”
He also says: “Positions built on systems of activities are far more sustainable than those built on individual activities.”
This implies attention to how supposedly “autonomous” subsystems are optimally coordinated.
The leadership team needs to define a clear vision which can be translated into a set of missions that are assigned down the organization.
The missions are operationalized by agile teams that are empowered to challenge them if needed.
The learning process, that lean refers to as catch ball,23 provides a powerful alignment mechanism if conducted well.
A recent paper from the MIT Center for Information System Research (CISR) illustrates how a combination of alignment mechanisms helped Spotify avoid chaos while protecting teams’ autonomy:
• Provide distinct goals and objectives to autonomous teams and align teams without introducing layers of hierarchy
• Set up formal sharing mechanisms that synchronize activities as the number of teams grows
• Define architectural standards that facilitate autonomy by ensuring that individual components are compatible
A new class of technology-enabled business model is transforming industries, the platform.
It connects people, organizations, and resources in interactive ecosystems that disrupt incumbents.
Airbnb™, Uber™, Alibaba, or Amazon Marketplace epitomize this disruptive power.
OK, but most businesses are not application-centric FANGS.
Traditional business models were built around products or services which were
designed on one end of a pipeline and delivered to clients at the other end.
For sure, many business processes run from end to end.
E.g. the process in a university admissions business runs from the start of an academic year to its end.
There will always be end-to-end processes in applying for role, in a goods supply chain, in a factory.
When platform-based businesses enter markets dominated by “pipelines”, they enjoy a competitive advantage.
Why? Because pipelines rely on inefficient gatekeepers to manage the flow of value when platforms promote self-service and direct interactions between participants.
For sure, the way of the world is that older businesses are replaced by newer ones.
E.g. we see e-commerce businesses are putting retail stores out of business.
The internet has enabled customers to find suppliers without the need for retailers who hold stocks.
The threat to some businesses is that the internet can shorten the supply chain, and make some middlemen redundant.
The threat to others is that internet giants operate at a scale that can make low-volume businesses uncompetitive.
What makes agile software development relevant and useful here?
It becomes relevant when businesses compete on the basis of the applications their customers use to transact business.
But isn’t it the real threat the market dominance of FANGS and other internet giants - more than the quality of their apps?
Frankly, the quality of some internet giant apps is low (e.g. Linkedin sometimes posts messages in the wrong sequence).
A platform can scale and grow more rapidly and efficiently because the traditional gatekeeper is replaced
by signals provided by market participants through a platform that acts as a mediator.
Platforms stimulate growth because they expose new supply and unlock new demand.
They also use big/fast data and analytics capabilities to create community feedback loops. 24
OK, though how many businesses aim to compete with FANGS in this self-serve community game?
And will community management software become commoditised rather than bespoke?
Platforms need governance which consists of a set of rules concerning who gets to participate in an ecosystem, how to divide the value, and how to resolve conflicts.
Good governance distributes wealth among those who add value in a manner that is perceived as fair.
Governance must pay special attention to externalities.
For example, Airbnb suffered from new rules issued by public authorities wanting to limit externalities such as its negative impact on apartment rental markets.
Because technology is a key enabler, we will now review the software architecture patterns that make digital business models possible.
First, the internet?
We will also briefly introduce another type of platform that Michael A. Cusumano designates as “internal platform”.25
They allow their owners to achieve economic gains by reusing or redeploying assets across families of products.
“Ensure that when the bottom spoke the top listened – was one of the challenges we would eventually have to overcome …
Order can emerge from the bottom up, as opposed to being directed, with a plan, from the top down.”17
Again yes, but the general did not disband the army’s management structure or make his own role redundant.
Surely, collaboration between scores or hundreds of subsystems/teams requires some oversight?
(Note that biological evolution is a poor analogy
True, complex biological systems have emerged from evolution by chance – but this is not a viable strategy to extend and integrate business systems.
Because evolution proceeds very slowly, by tiny unnoticeable changes, and 99.9% of changes in DNA turn out to be harmful.)
Software architecture patterns
“Software is eating the world”, Marc Andreessen.26
The rapid evolution of software technology has fueled the growth of digital business.
Following Internet giants’ lead, some enterprises from the old economy are framing themselves as tech companies;
for example, Banco Bilbao Vizcaya Argentaria (BBVA): “If you want to be a leading bank, you have to be a technology company.”27
Internet giants did succeed at retaining the agility of startups while they grow at a fast pace and operate at a global scale.
They paid special attention to loose-coupling and team autonomy and they learned how to master distributed computing at scale.
Let’s illustrate this with Amazon and Google®.
In 2002, Amazon was facing a complexity barrier. The size of its home page reached 800 MB and it took 8 to 12 hours to compile.
Jeff Bezos issued a mandate that profoundly changed the way software is created and the enterprise is organized.
Steve Yegge has reported this in a post.28
1. “All teams will henceforth expose their data and functionality through service interfaces.
2. Teams must communicate with each other through these interfaces.
3. There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team's data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
4. It doesn't matter what technology they use. HTTP, CORBA, Pub/Sub, custom protocols – doesn’t matter. Bezos doesn’t care.
5. All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.
6. Anyone who doesn't do this will be fired.
7. Thank you; have a nice day!”
By shifting toward modularity and APIs, Amazon became well positioned to open its distribution and logistics capabilities to third-party vendors.
The self-service nature of the platform made it easy for vendors to sell and distribute their products in a frictionless manner. This helped Amazon compete against eBay leveraging a business model which is different.
Note that Bezos allows that inter-component communication to be synchronous.
Pub-sub and event-driven messaging are not mandatory.
They can increase complexity in the messaging and the sagas needed to apply compensating transactions.
In 2004, Jeffrey Dean and Sanjay Ghemawat from Google published a paper29 that described the MapReduce programming model.
This innovation deeply influenced distributed computing.
MapReduce is based on a functional style that makes it easy to automatically parallelize and execute code on large clusters of commodity machines.
It allows developers without any experience with parallel and distributed computing to easily utilize the resources of a large distributed system.
It is also highly scalable and resilient to hardware or network failures.
Two years later, a team from Google published a paper30 that describes a distributed storage system for managing structured data designed to scale to a very large size:
petabytes of data across thousands of commodity servers.
This system exploits immutability which simplifies concurrency control: “we do not need any synchronization of accesses to the file system when reading from SSTables.
As a result, concurrency control over rows can be implemented very efficiently”.
99% of businesses do not aim to operate of the scale of FANGS like Google.
EA is about the whole application portfolio – perhaps 500 to 5,000 applications.
Most of those apps can be supported by an ordinary DBMS or document store.
Most if not all UK retailer’s web sites could be supported by a regular DBMS (with SDDs).
The vast amount of data Internet giants are gathering is best exploited using Artificial Intelligence (AI).
Analytics algorithms help predict customer behavior and recommend products.
Deep learning technology powers new types of applications that leverage computer vision and natural language processing.
The model Internet giants follow is based on developing many custom solutions to support their own products and services.
They paper many of these internal solutions in white papers that later evolve into open source projects.
The vast majority of leading-edge technology is freely available, including advanced AI libraries such as TensorFlow™.31
Some of the open source projects even include pre-trained deep learning algorithms that simplify the creation of new AI applications;
for example, OpenFace:32 “Free and open source face recognition with deep neural networks …
Please use responsibly! We do not support the use of this project in applications that violate privacy and security.”
This stream of continuous technology innovation profoundly impacts the way software that supports the enterprise is architected.
The rise of automation that ultimately results in the creation of autonomous systems has and will continue to disrupt business and operating models.
We will now focus on a few key software design patterns that should be part of the architect’s body of knowledge.
New Rules of Distributed Computing
The design, development, and operation of distributed systems has always been a difficult endeavour.
In the past, middleware technology based on transaction monitors and two-phased commit protocols was good enough.33
Today, it cannot anymore meet the scalability and availability needs of digital operating models.
OK. But how many businesses aim to operate at the scale and availability of FANGS?
Design for extreme scale is the exception rather than the norm.
And note that IoT is still considered important by only small minority.
Architects need to understand and take advantage of the paradigm shift toward new distributed computing models that:
• Decompose systems into distributable parts that run concurrently on commodity hardware
• Are horizontally scalable and elastic to varying workloads
• Take full advantage of modern multi-core processors
Surely, only where scaling up is needed?
Architecting distributed systems is about making trade-offs between operational complexity, performance, availability, and consistency.
The CAP theorem states that any networked shared-data system can have at most two of three desirable properties:
• Consistency (C)
• High availability (A)
• Tolerance to network partitions (P)
The trouble with CAP theorem is that it omits complexity.
Designing for A and P, means sacrificing C for a while - if not forever.
But if C is important then you have to design to restore consistency – which adds complexity (see Saga pattern below).
Splitting a system into distributable parts gives the ability to scale service capacity, using a larger number of shards to serve more users.
Replication (data or functionality) in more than one location is required to recover from failures, thus contributing to the high availability quality.
Sure, but most business systems operate at scale that is a small fraction of the scale of FANGS.
Because traditional concurrent programming is error prone, new “immutable” programming models are important.
Reasoning about the possible states of complex objects is difficult.
Reasoning about the state of immutable objects is trivial because they can only be in one state and can be shared safely:
“writing correct concurrent programs is primarily about managing access to a shared, mutable state ...
If an object’s state cannot be modified, these risks and complexities simply go away”.34
Still, the business world moves on (mutates) and the enterprise must know the current state of the entities it monitors and directs.
So if objects are immutable, new objects must be created.
Functional Programming (FP) treats computation as the evaluation of mathematical functions.
A pure function is a function which given the same inputs, always returns the same output, and has no side-effects.
Pure functions are completely independent of outside state and, as such, they are immune to entire classes of bugs that have to do with shared mutable state.
Their independent nature also makes them great candidates for parallel processing across many CPUs, and across entire distributed computing clusters.
Because of this, the FP paradigm is used to build large-scale distributed systems and is becoming mainstream.
Highly distributed computing models create specific challenges on the data side.
For example, microservices which can run in parallel on multiple nodes own their data.
This makes ensuring data consistency a challenge. The Saga pattern solves this problem.
Read this Microservices paper for discussion of design tradeoffs.
New Data Patterns
A Saga is a long-lived transaction 35 that can be written as a sequence of transactions that can be interleaved.
All transactions in the sequence complete successfully or compensating transactions are executed to amend a partial execution.
Both the concept of Saga and its implementation are relatively simple, but they have the potential to improve performance significantly.
Sagas are not well-called a data pattern; they are workflows that orchestrate transactions on different data stores.
Again, there are downsides to design for scalability.
Compensating transactions (and the workflows needed to implement them) are not simple – they do add complexity.
The commentary on the white paper stops here; its further sections address:
· Sharding big data
· Real-time analytics
· Infrastructure as Code
· Architecture Framework (AAF).
1 See www.mckinsey.com/business-functions/organization/our-insights/how-to-create-an-agile-organization.
2 See www.scaledagileframework.com.
3 See https://less.works/less/framework/introduction.html.
4 Scaling Agile @ Spotify with Tribes, Squads, Chapters & Guilds, Henrik Kniberg, Anders Ivarsson, October 2012; refer to: https://blog.crisp.se/wp-content/uploads/2012/11/SpotifyScaling.pdf.
5 Modular Architectures Make You Agile in the Long Run, Dan Sturtevant, IEEE Software, Vol. 35, Issue 1, January/February 2018; refer to: https://ieeexplore.ieee.org/paper/8239949/.
6 See www.ipexpoeurope.com/content/download/10069/143970/file/2017-state-of-devops-report.pdf.
7 Analyst Watch: Water-Scrum-fall is the reality of Agile, Dave West, December 2011; refer to: https://sdtimes.com/agile/analyst-watch-water-scrum-fall-is-the-reality-of-agile/.
8 Exploring the Duality between Product and Organizational Architectures: A Test of the “Mirroring” Hypothesis, Alan MacCormack, John Rusnak, Carliss Baldwin, Harvard Business School Working Paper; refer to: www.hbs.edu/faculty/Publication%20Files/08-039_1861e507-1dc1-4602-85b8-90d71559d85b.pdf.
9 See www.agilealliance.org/resources/sessions/the-reverse-conway-organizational-hacking-for-techies.
10 Domain-Driven Design: Tackling Complexity in the Heart of Software, Eric Evans, Addison Wesley, August 2003.
11 Source: 2017 State of DevOps Report
12 Microservices in Action, Morgan Bruce, Paulo A. Pereira, Manning Publications, 2018.
13 Microservice Architecture: Aligning Principles, Practices, and Culture, Irakli Nadareishvili, Ronnie Mitra, Matt McLarty, Mike Amundsen, O'Reilly Media, 2016.
14 See https://martinfowler.com/articles/microservices.html.
15 Building Microservices, Sam Newman, O’Reilly Media, 2015.
17 Team of Teams: New Rules of Engagement for a Complex World, General Stanley McChrystal, David Silverman, Tantum Collins, Chris Fussell, Portfolio Penguin, 2015.
18 See https://vimeo.com/94950270.
19 Reinventing Organizations: A Guide to Creating Organizations Inspired by the Next Stage of Human Consciousness, Frédéric Laloux, Nelson Parker, 2014.
20 Accelerate: Building Strategic Agility for a Faster-Moving World, John P. Kotter, Harvard Business Review Press, 2014.
21 What is Strategy?, Michael E. Porter, Harvard Business School Press, 1996.
22 The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses, Eric Ries, Random House Audio, 2011.
23 See www.lean.org/lexicon/strategy-deployment.
24 Platform Revolution: How Networked Markets Are Transforming the Economy – and How to Make Them Work for You, Geoffrey G. Parker, Marshall W. Van Alstyne, Sangeet Paul Choudary, W. W. Norton & Company, 2016.
25 Industry Platforms and Ecosystem Innovation, A. Gawer, M. Cusumano, Journal of Product Innovation Management, 2013.
26 See www.wsj.com/articles/SB10001424053111903480904576512250915629460.
27 See www.bbva.com/en/want-leading-bank-technology-company.
28 See http://homepages.dcc.ufmg.br/~mtov/pmcc/modularization.pdf and https://plus.google.com/+RipRowan/posts/eVeouesvaVX.
29 See https://static.googleusercontent.com/media/research.google.com/fr//archive/mapreduce-osdi04.pdf.
30 See http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf.
31 See www.tensorflow.org.
32 See https://cmusatyalab.github.io/openface/.
33 Essential Guide to Object Monitors, Karen Boucher, Fima Katz, Wiley, 1999.
35 SAGAS, Hector Garcaa-Molrna, Kenneth Salem, Department of Computer Science, Princeton University, 1987.
36 Immutable Infrastructure: Considerations for the Cloud and Distributed Systems, Josha Stella, O’Reilly Media, Inc., 2016.
37 Beyond the Twelve-Factor App, Kevin Hoffman, O'Reilly Media, Inc., 2016.