SOAP versus REST
Copyright 2019. Graham Berrisford. One of about 300 papers at
http://avancier.website. Last updated 20/01/2019 11:46
This paper was composed from notes given to me by students on our architecture classes.
It serves as footnotes to two other popular papers:
· Microservices (downloaded > 10,000 times a year).
RPC evolved, first using Object Request Brokers, and then using SOAP and REST principles.
This paper compares SOAP and REST.
Contents
Defining
an interface using the Web Service Definition Language
Service
registries and catalogues
REST
(Representational State Transfer)
Use
only the operations in a standard web protocol
Flexibility
of representation (data/media type)
Architectural
properties of REST
How
to convert from SOAP to REST?
If the SOAP / REST choice doesn’t matter, then what
does?
"The beginning of wisdom for a computer programmer is to recognise the difference between getting a program to work and getting it right" M.A. Jackson (1975).
What makes a software architecture good or right?
Traditionally, it means meets requirements more elegantly and economically than alternative designs.
It also means enabling change, which is sometimes contrary to the above.
We can easily agree a software component should be encapsulated
It should be defined primarily by its input/output interface, by the discrete events it can process and services it can offer.
But that doesn’t get us very far, because we have to make a series of modular design decisions.
· What is the right size and scope of a component?
· How to avoid or minimise duplication between components?
· How to separate or distribute components?
· How integrate components?
Since the 1970s, the IT industry has continually revisited modular design and integration concepts and principles.
Many architectural styles or patterns have been promoted; e.g. distributed objects (DO), service-oriented architecture (SOA) and REST.
Each architectural style is defined by some core ideas, presumptions and constraints.
At the turn of the last century, SOA was a reaction against the constraints of distributed objects using object request brokers like DCOM.
SOA evangelists advocated a more loosely-coupled kind of modular design and integration style, with opposing features.
Early distributed objects presumptions |
Later
SOA design presumptions |
Object identifiers |
Internet domain
names or URIs |
One name space |
Several name
spaces |
Stateful server objects |
Stateless server
components |
Reuse by inheritance |
Reuse by
delegation |
Intelligent domain objects |
Intelligent
process controllers |
Request-reply invocations |
Message/event
queues |
Blocking servers |
Non-blocking
servers |
REST adopts the features listed under SOA in the table above, and adds some more.
In SOA, a service is a component that can be called remotely, across a network, at an endpoint.
A component may act as a client/consumer and/or server/provider of data.
Components typically exchange request and response data in the form of self-describing documents.
Components make few if any assumptions about the technological features of other components.
SOAP is a technology standard for client components making remote procedure calls to server components.
It presumes that clients
· send data in XML documents
· invoke operations in WSDL-defined interfaces
· use the protocol called SOAP over standard internet protocols.
SOAP is used to invoke operations on large application components (using many verbs and a few nouns).
Those server-side components may call other components, and so on.
REST is better seen as a theory that happens to be associated with web technology standards.
It feels less like asking a servant to do something for you and more like asking a servant to act on a remote resource.
A resource is anything that can be given a domain name, can be identified by one or more Uniform Resource Identifiers (URIs).
REST is primarily a set of principles for using web standards to invoke operations acting on remote resources.
It also encourages you to rethink how you structure server-side components/resources.
By the end of the 1990s, loose-coupling had become the mantra of software architects.
Microsoft deprecated the constraints of connecting distributed objects using object request brokers like DCOM.
Instead, they advocated service-oriented architecture (SOA) as a more loosely-coupled kind of modular design and integration style.
Their core ideas might be distilled as:
· Clients send data in XML documents
· Client invoke operations they find in WSDL-defined interfaces
· Clients use the protocol called SOAP over standard internet protocols, usually HTTP.
People
observed that SOAP was not simple and not object-oriented.
The SOAP
standard is now maintained by the XML Protocol Working Group.
And SOAP now
means nothing, it is merely a name.
At the same time, Microsoft introduced an interface definition language called WSDL.
A WSDL-defined interface has two parts.
Logical or abstract part:
· The data types used in request and reply messages, defined in an XML schema
· The signatures of operations (procedures or methods), each composed of name, request and reply messages
Physical or concrete part
· The end point addresses (URIs) where the operations can be found
· The protocols used to access the operations at those addresses.
SOAP defines how to code a remote procedure call using an XML message.
It assumes
client components access remote server components by sending XML messages using
internet protocols.
This table shows the format of a remote procedure call in SOAP.
Element |
Description |
Required |
Envelope |
Identifies
the XML document carried in the message. |
Yes |
Header |
Contains
header information |
No |
Body |
Contains
call and response information |
Yes |
Fault |
Provides
information about errors that occurred while processing the message |
No |
Typically, the internet protocol is HTTP, but you can instead use SMTP or JMS.
A benefit of using HTTP is that it allows SOAP to tunnel through firewalls and proxies encapsulated in the HTTP traffic.
SOAP is
sometimes referred to as WS-SOAP, where WS = Web Services.
Related standards define extensions for security, service location and reliable messaging.
WS-* is SOAP
with an extension, such as WS-Security or WS-ReliableMessaging.
Microsoft originally proposed listing services in a registry called UDDI along with links to WSDL’s and other metadata.
The idea was that clients can find services in the registry and get more information about them from a service catalogue.
In practice UDDI use is not widely used - people just use the web - or an internal Wiki to find what they want.
Roy T. Fielding, in his Ph.D. thesis, formalised the ideas behind web protocols and invented the term REST.
Representational State Transfer (REST) means a server component represents its state in messages using internet-friendly text data formats, which can include hyperlinks.
REST supports the general principles of SOA with more specific guidance.
It is a set of principles for using web standards to invoke operations acting on remote resources.
It suggests ways modularise and integrate application components using web standards.
It encourages you to rethink how you structure server-side components/resources.
It takes advantage of hyperlinks and the web protocols used over ubiquitous TCP/IP networks.
REST feels less like asking a servant to do something for you and more like asking a servant to act on a remote resource.
A resource is anything that can be given a domain name, can be identified by one or more Uniform Resource Identifiers (URIs).
· a document or image,
· a temporal service (e.g. "today's weather in Los Angeles")
· a collection of other resources
· a chunk of related information, such as a user profile
· a collection of updates (activities)
· a global user ID (GUID)
· a non-virtual object (e.g. a person), and so on.
Designers commonly implement REST using the HTTP and URIs.
But REST is an abstract style that can be implemented using other technologies and in various ways.
A so-called RESTful
architecture contains RESTful client components.
Every resource of a software application (Web Service, web site, HTML page, XML document, printer, other physical device, etc.) is named as a distinct web resource.
A client component can only call a server component/resource using the operations available in a standard protocol - usually HTTP.
This decouples distributed components; it means a client component needs minimal information about the server resource.
A so-called REST-compliant architecture contains
REST-compliant server components/resources.
A REST-compliant server component/resource can offer only the services named in an internet protocol.
Given there are fewer operations (verbs) per component, there must be more components (nouns).
One may have to divide a large data resource into many smallish elements (many nouns).
Clients must orchestrate or integrate those smallish resources.
This section continues with a brief discussion of REST principles, and a link to a further discussion.
Identification of resources
· Requests identify resources, usually using URIs
· A server represents its state using languages like HTML, XML or JSON (none of which are the server's internal representation)
Manipulation of resources through these representations
· When a client holds a representation of a resource, including any metadata attached, it has enough information to modify or delete the resource.
Self-descriptive messages
· Each message includes enough information to describe how to process the message.
· Responses also explicitly indicate whether they are cacheable or not.
Hypermedia as the engine of application state (HATEOAS).
· Client applications can retrieve a resource representation containing hyperlinks, and use those links to access related resources.
· Clients make state transitions only through actions that are dynamically identified within hypermedia by the server (e.g. by hyperlinks within hypertext).
· A client does not assume that any particular action is available for any particular resource beyond those described in representations previously received from the server.
We humans navigate around web pages by following hyperlinks, without having to remember the URIs.
So, RESTful software navigates around web resources by following hyperlinks.
A client component doesn’t have to know anything about the overall structure of the system it is a component in.
It invokes operations on remote resources without knowing what machines they are hosted on
You could say the name space for an application designed to REST principles is the whole internet.
Restful clients call services using only internet protocol operations.
A client can access any and every resource using that same general interface.
It uses a
resource URI, and one of the operation names defined in a standard internet
protocol.
REST-compliant server
components
REST-ful clients can “parameterise” to invoke several operations
on one server component, using the same name.
But REST-compliant servers perform a limited range of operations, each corresponding to an internet protocol operation.
Interface definition?
REST simplifies the programming by using only the operation names in an API.
But what do those operation names mean?
The trouble is that Roy Fielding did not prescribe the
meanings of internet protocol operations.
This creates the problem that different programmers use them differently.
Some say a WADL (the equivalent of SOAP's WSDL) should be used to describe a REST web service
Some say a WADL is helpful for development and for testing, such as loading all the service resources into SoapUI (which supports REST testing).
Some say WADL is vapourware.
Some say WADL is not needed if the system is fully RESTful and REST-complaint.
They equate PUT, GET, POST and DELETE operations in HTTP to CRUD operations on data records, but that is an over simplification.
Surely, a development team, if not an enterprise, needs a standard?
Two different standards are tabulated below.
HTTP interface operation |
Uniform meaning |
GET |
Read/retrieve
a representation of the identified resource |
PUT |
Create
a newly identified resource OR Update
an existing identified resource (replace the previous representation). |
POST |
Add
a new resource subordinate to an identified other/parent resource |
DELETE |
Delete
the identified resource |
HEAD |
Get
meta data about the identified resource |
What if a server component does depend on whether the data resource is a collection or an item?
And notice that the PUT and POST operations have different meanings in the second table below.
HTTP
Operation |
The resource
is collection |
The resource
is an item or element |
e.g. http://example.com/resourcelist |
e.g. http://example.com/resources/item17 |
|
GET (read) |
List
the URIs and perhaps other details of the collection's members. |
Retrieve
a representation of the addressed member of the collection, expressed in an appropriate Internet media type. |
PUT (update/create) |
Replace
the entire collection with another collection. |
Replace
the addressed member of the collection, or if it doesn't exist, create it. |
POST (create) |
Create
a new entry in the collection. The
new entry's URI is assigned automatically and is usually returned by the
operation. |
Treat
the addressed member as a collection in its own right and create a new entry
in it. Not
generally used – because it is not idempotent. |
DELETE (delete) |
Delete
the entire collection. |
Delete
the addressed member of the collection. |
Idempotency
An important aspect of REST is the concept that some operations (verbs) are idempotent.
Adding 0 to a number is idempotent - the result is always the same regardless of the number of times you do it.
The PUT method in REST is idempotent, which means the request can be executed an arbitrary numbers of times, but the result will always be the same as if it had only been done once.
Unlike SOAP,
there are no formally defined data exchange standards for REST.
XML is used, but JSON is more popular as it is easily human readable and it parses very quickly.
Content negotiation
This means that client and service agree the data format for message content, JSON, XML etc.
For example, in Ruby 3.0 you’re able to use MIME types to request a representation of a resource in different formats.
OData
REST was extended by the Open Data Protocol (OData) protocol (by Microsoft 2007, then OASIS 2014).
This standard that helps clients access any remote data server wrapped up behind an OData interface.
The remote data server exposes its data structure in the form of an XML schema.
OData defines best practices for clients to obtain that schema and use it to make RESTful invocations.
REST mandates that:.
■ A server holds no client context between requests
■ Session state is held in the client, or in a database.
■ The client sends a request when it is ready to transition to a new state.
■ While requests are outstanding, the client is considered to be in transition.
■ The representation returned contains links the client may use initiate a new state-transition.
A server component should not have to hold state for any of the clients it communicates with - beyond a single request.
The reasons are:
· scalability: it is difficult to scale out server components that maintain client state.
· loose-coupling: the client is not dependent on talking to the same server component in two consecutive requests
Read Further discussion of REST principles paper for more about the five principles above.
Client–server separation of concerns
Servers and clients can be developed and replaced independently, as long as the interface between them is not altered.
Stateless
servers
See above.
Cacheable
responses
Responses must define themselves as cacheable, or not, to prevent clients from reusing stale or inappropriate data.
Layered
system
A client cannot tell whether it is connected directly to the end server, or to an intermediary along the way.
Intermediary servers may improve system scalability load balancing, providing shared caches and enforce security policies.
Code on
demand (optional)
Servers can temporarily extend or customize the functionality of a client by the transfer of executable code (e.g. Java applets and client-side scripts such as JavaScript.)
Is the playstation better than the Xbox?
SOAP and REST co-exist.
Mandating either one or the other can create difficulties.
Brief side by side
comparison
REST
exposes data resources (and operations on them) |
SOAP
exposes operations - procedures |
REST
works with many data formats |
SOAP
encodes everything in XML. |
REST
uses several HTTP Verbs (GET, PUT, POST etc) |
SOAP
just uses only POST |
REST
is assumes peer-to-peer interactions (though often client-server in practice) |
SOAP
is assumes RPC-style client-server interactions |
REST
recommends stateless operations |
SOAP
supports both stateless and stateful operations. |
REST
has many endpoints (nouns) with a few standard operations (verbs) |
SOAP
has few endpoints (nouns) with many operations (verbs). |
How to choose between
REST or SOAP?
It depends on the use case, which clients prefer and what skills you have.
A rule of thumb rule - unless you have a clear reason to use SOAP, use REST.
Best to keep architectural options available, unless there's a compelling reason not to.
(In 2006 Google depreciated SOAP in favour of REST.)
|
REST |
SOAP |
|
REST is more web friendly. You can do most of what SOAP can, by hand, without established
standards. |
The WS-* standards stack is getting complex and has a steep learning
curve. But there is strong tooling/IDE support. It supports strong typing. It may be preferred for security, ACID transactions, and reliable
messaging |
|
REST
is not prescriptive; so developers can fling stuff together. |
SOAP
is very prescriptive; if you don’t do it right or it
won't work. |
Speed |
Typically similar to SOAP but it depends on the use case. A poorly coded REST service on world class infrastructure could be
slower than a highly efficient SOAP service on a laptop. |
Depends on code design and implementation, as well as servers and
infrastructure |
Scalability |
Good for scalability. Statelessness can enable simpler scaling of
platforms. |
Depends on code design and implementation, as well as servers and
infrastructure |
Caching |
Reads in REST can be cached, |
In SOAP they cannot be cached |
Overheard |
Minimal overhead on top of HTTP |
SOAP messages are encapsulated in SOAP headers and other WS-* things
which must be parsed |
Network traffic |
More chatty, more requests back and forward. |
|
Security |
You can use HTTPS. But you can rarely guarantee an SSL tunnel from the client to the app
server. SSL secures the message on the network, but it is usually decrypted to
plain text at the web server. So, can you always trust the server-side environment / infrastructure? |
WS-Security ensures
security of the message Through the outbound firewall to the process on the server handling
the inbound SOAP requests. Arguably more secure, or at least more "enterprise level”
security focused. Message-level security is in effect until the moment a message has to
be in cleartext. That means a SOAP message can be routed around a network securely
until it reaches its final destination; that's generally not possible with
HTTPS." |
ACID Transactions |
No call back, which can make transactions tricky. REST solves this using a “Transaction resource” concept. The server sends the “Transaction resource” back to the client after
certain requests. The server component will continue processing. The client can poll for an update on their request using the
“Transaction resource” ID. |
WS-AtomicTransactions enables a two phase commit. This often doesn’t make sense over the internet, but may in some
enterprise scenarios. Generally, compensating
transactions work well enough. E.g. Payments from eBay to Paypal use
compensating transactions instead two phase commit. |
Reliable
messaging |
Use idempotency. Either use GET until the subsequent processing succeeds as defined by
the client, or use PUT followed by a verification
GET until both work. |
WS-ReliableMessaging is not end-to-end reliable. It covers the transport element but problems might occur during
application processing. Read <http://www.infoq.com/articles/no-reliable-messaging> for
more information. |
Source <http://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation>.
"Idempotence plays an important role in REST.
If you GET a representation of a REST resource (eg, GET a jpeg image from Flickr), and the operation fails, you can just repeat the GET again and again until the operation succeeds.
To the web service, it doesn't matter how many times the image is 'gotten'.
Likewise, if you use a RESTful web service to update your Twitter account information, you can PUT the new information as many times as it takes in order to get confirmation from the web service.
PUT-ing it a thousand times is the same as PUT-ing it once.
Similarly DELETE-ing a REST resource a thousand times is the same as deleting it once.
Idempotence thus makes it a lot easier to construct a web service that's resilient to communication errors.”
"The steps in a compensating transaction must undo the effects of the steps in the original operation.
A compensating transaction might not be able to simply replace the current state with the state the system was in at the start of the operation because this approach could overwrite changes made by other concurrent instances of an application.
Rather, it must be an intelligent process that takes into account any work done by concurrent instances.
This process will usually be application-specific, driven by the nature of the work performed by the original operation.
A common approach to implementing an eventually consistent operation that requires compensation is to use a workflow.
As the original operation proceeds, the system records information about each step and how the work performed by that step can be undone.
If the operation fails at any point, the workflow rewinds back through the steps it has completed and performs the work that reverses each step.
Note that a compensating transaction might not have to undo the work in the exact mirror-opposite order of the original operation, and it may be possible to perform some of the undo steps in parallel.
First, why do you want to do it?
OK, experienced developers don’t like XML, too much syntax, too much overhead.
But that is a weak reason to replace SOAP and XML messages by REST and JSON messages.
It is unwise to do convert from SOAP to REST without rethinking the architecture.
Does the client depend on SOAP extensions for security, transaction management or reliable messaging?
Do you need to restructure the system to achieve same using RESTful clients?
Should you restructure the server-side into REST-compliant servers?
It is easy for developers cobble a system together without following good practices and standards.
If you want a good REST system architecture, seek advice from a guru/teacher/expert.
Think carefully about the server component.
1. How will the client access the component?
2. What is the component’s interface - the contracted behaviours?
3. How will the component failures recognise failures, handle them and communicate back to the client?
4. Will the service quality – notably availability and speed – be good enough?
5. Can the component be developed quickly? Is there a sandbox/dev environment for it? Is the contract intuitive, so we don’t need to read a heap of documentation.
6. Can the component complete a business process without requiring the user to do more?
E.g. Don’t just submit an order leaving the user to phone up, or get email notifications of failures, or chase up to check the status of the order.
That is debatable, but purely restful API is saying to any client..
“To work with the data please send the proper HTTP verb to the URIs in our online docs and specify whether you want JSON or XML”.
A purely restful API will:
· Honor HTTP Verb Semantics - HTTP GET, PUT, POST, and DELETE
· Support HATEOAS - To prevent tight coupling between the client and the service, truly RESTful APIs provide a discovery based API. Most of todays API’s do not honour this.
· Utilize HTTP Status Codes - There are over 70 HTTP status codes, how many does your service handle?
· Have Self Descriptive Messages - where are custom or specific formats declared (hint: in the HTTP header?)
· Hypermedia Aware Media Type - HTML, XHTML, Atom, SVG (usually ignored and most used JSON / XML)
· Have no version number - In pre REST version numbers should’t be needed. Most API’s use them though i.e: /v1/payment.
· Not use static URIs - Most current API’s document precise URI’s and return types.
Drawing on http://docs.oracle.com/javaee/6/tutorial/doc/giqsx.html
Resource oriented
architecture on Wikipedia - http://en.wikipedia.org/wiki/Resource-oriented_architecture
Common REST Design Pattern - http://architects.dzone.com/news/common-rest-design-pattern
Paypals HATEOAS compliant REST API - https://developer.paypal.com/docs/integration/direct/paypal-rest-payment-hateoas-links/
SOAP message structure - http://kb.roguewave.com/kb/?View=entry&EntryID=1410&Msg=
WSDL - http://www.practicingsafetechs.com/TechsV1/WSDL/
HATEOAS - http://timelessrepo.com/haters-gonna-hateoas
SOAP vs
REST - http://seanmehan.globat.com/blog/2011/06/17/soap-vs-rest/
A DECLARATIVE, DATA-RETRIEVAL
AND AGGREGATION GATEWAY FOR QUICKLY CONSUMING HTTP APIS - http://ql.io <http://ql.io/>
RESTful Web Services discusses many software frameworks which
provide some or many features of the ROA.
These include: /db <http://www.slashdb.com/> -
constructs resource oriented architecture from relational databases
·
Django <http://en.wikipedia.org/wiki/Django_(web_framework)>
·
TurboGears <http://en.wikipedia.org/wiki/TurboGears>
·
Flask <http://flask.pocoo.org/>
·
EverRest <http://code.google.com/p/everrest>
·
JBoss RESTEasy <http://www.jboss.org/resteasy>
·
JBoss Seam <http://en.wikipedia.org/wiki/JBoss_Seam>
·
Apache Wink <http://incubator.apache.org/wink>
·
Jersey <http://en.wikipedia.org/wiki/Project_Jersey>
·
NetKernel <http://en.wikipedia.org/wiki/NetKernel>
·
Recess <http://www.recessframework.org/>
·
Restlet <http://en.wikipedia.org/wiki/Restlet>
·
Ruby on Rails
<http://en.wikipedia.org/wiki/Ruby_on_Rails>
·
Symfony <http://en.wikipedia.org/wiki/Symfony>
·
Yii2 <http://www.yiiframework.com/>