Avancier
Methods
Information/Data
Architecture (with
TOGAF & ArchiMate artefacts)
One of more than 200 papers at http://avancier.website. Copyright Graham Berrisford. Last updated 21/05/2017 22:44
Click here for illustrations of the diagrams mentioned below.
Contents
EA emerged in the 1980s to address the need for enterprise-wide analysis and re-architecting of data and processes.
Today, mainstream EA remains about business activities that involve the creation and use of information.
It is about the design and planning of changes to those business activities, and/or to the capture and provision of that information.
Moreover, enterprise architecture (rather than solution architecture) is about doing this at a strategic and cross-organisational level.
To digitise business operations that create and use information, architects must attend to modern standards and technologies.
What follows is the information/data architecture approach in Avancier Methods.
It maps information/data architecture to business architecture and applications architecture.
It groups TOGAF artefacts into views, and includes suggested mappings to ArchiMate.
Bear in mind that ArchiMate is not designed for professional business process modellers or data modellers.
Do "information" and "data” mean the same thing? If not, what is the distinction?
Various distinctions are explained and analysed in this paper “Data and Information”.
Maintaining any of the terminology distinctions (in discussing or writing) is very difficult.
This paper uses “data” as a catch all; you can replace some or all by “information” as you see fit.
Modern data architects speak of being concerned with:
· Data at rest - data stores – their locations, contents and synchronisation
· Data in motion - data flows – their sources, destinations and contents
· Data qualities – data types, standards, confidentiality, integrity and availability.
Businesses have to store information for future use, in persistent data stores.
TOGAF uses the term data component rather than data store.
There are two interpretations of “data component”:
· A passive data structure, contained in some kind of data store.
· An active data server (an application component) that provides read/write access to a data structure.
Either way, the data component contains a data structure that can be described in terms of inter-related entities.
Data store view
Enterprise architecture is concerned with actors and activities that create and use stored data/information.
Information System View |
TOGAF artefacts |
ArchiMate viewpoints |
Data store view |
Conceptual Data diagram (Business Data Model) |
Information Structure –
conceptual level |
Data Entity/Business Function matrix |
|
|
Logical Data diagram |
|
|
Application/Data matrix |
|
|
Data Entity/Data Component catalog |
|
|
Data Dissemination diagram |
|
|
Data Security diagram |
|
|
Data Lifecycle diagram |
|
|
Data Security diagram |
|
|
Data Lifecycle diagram |
|
|
Migration view |
Data Migration
diagram |
|
Define the core entities and events that the business must remember in order to complete its business processes and provide business services..
The word “core” usually implies data that is central to the conduct of business processes: e.g. Customer, Order, and Product Type..
This data is often duplicated in different data stores.
A Conceptual Data diagram (Business Data Model) may be draw to show relationships between the core entities.
Some find it more practical to simply list the core entities in a business data entity catalogue
Few if any attributes are specified in this kind of model, which is primarily used to identify data duplication.
Map data entities (in the
diagram above) to business functions that create and use them.
You can cluster activities in a Data Entity/Business Function matrix, for example by data
created.
The North West corner method sorts the rows and columns of a matrix by clustering them on a shared cell entry, such as “create”.
Define the entities and events that an application must remember in order to provide services to other applications and/or business users.
A Logical Data diagram details information to be
stored, usually in one database and/or to enable one application.
A logical data model includes not only entities and relationships, but also each entity type’s primary key and other attributes.
Foreign keys may identify the relationship between different entities.
This model is usually normalised so as to minimise duplication of information.
It defines terms and concepts used in a particular business domain.
Map data entities to application that create and use them.
An Application/Data matrix can reveal overlaps
between data maintained by different applications
Map data entities (in the
diagram above) to data components that hold them.
Data Entity/Data Component
catalog
Map data entities to
applications that maintain them, or data components that hold them.
A Data
Dissemination diagram shows there is
duplication, look to define a data mastering policy (master and copy) for the
baseline or target application portfolio.
Data Security diagram
Data Lifecycle diagram
Data Migration diagram
A physical data model
specifies the schema to be used in a particular database. .
It may be denormalized to speed up storage or retrieval.
It may refer to features available in the chosen database management system.
Physical data store
forms
Data architects are concerned with the forms of matter and energy in which information is stored.
In theory, data architects can design non-digital stores; in practice most focus on digital ones.
Digital data store forms are changing; magnetic disks are currently being replaced by flash storage.
But then, flash is optimised for the asymmetric use cases of mobile devices, where data is written few times and read many times.
So, if you want to find out what may replace flash memory try this:
http://www.computerweekly.com/feature/Whats-wrong-with-flash-storage-And-what-will-come-after
Architects have to research physical data storage forms as the need arises.
Data store schema
standards
Data architects define the locations and contents of data stores.
TOGAF’s “physical data component” is a vendor/technology specific realisation of a logical data component.
It could be database, data warehouse, document store, web information server or transaction log.
It has a technology-specific data schema, designed to suit its purpose, for example:
· Transactional database
· Data warehouse
· Document store
· Big data store
This table maps a purely logical data model to some data store schema varieties.
Logical data component |
Physical data
component |
|||
Logical data model |
CODASYL database schema |
Relational database schema |
XML schema (footnote 2) |
OData-compliant web information server. |
Entities |
Records |
Tables |
Complex
types |
Entities |
Attributes |
Fields |
Columns |
Contained
elements |
Properties |
Relationships |
Address
pointers |
Foreign
keys |
Contained
elements |
Navigation
properties |
Logical data model
TOGAF’s “logical data component” is a logical definition of the data in a data store.
It can be documented as a logical data model; that is, an entity-attribute-relationship model (which can include gen-spec relationships).
Modelling an information/data model in this way can be done independent of computing altogether.
A logical information/data model is a purely logical declaration of business terms and concepts without consideration of any database schema.
You might use a UML tool to draw an information/data model, but to call it a class diagram/model is misleading.
UML class diagrams are for modelling objects that have behaviour.
OData - the data access protocol for a web-based information server.
A modern way to realise logical data components as physical data components is in XML schema, accessed using the OData protocol.
This provides a generic way to organize and describe the data structure of any remote data store as a logical data model.
· An Entity Type (Customer, Employee, etc.) is a data structure type consisting of named and typed Properties and with a key.
· An Entity is an instance of an Entity Type.
· An Entity Key (CustomerId, OrderId etc.) is formed from a subset of Properties of the Entity Type.
· An Association defines a relationship between instances of Entity Types (for example, Employee WorksFor Department).
· An Association can be 1-to-1 or 1-to-many, uni-directional or bi-directional.
· A Navigation Property is property of an Entity Type bound to a specific association, which can be used to refer to associations of an entity.
Microsoft and SAP now expose their data using the OData protocol.
Any client (even a human) can retrieve a logical entity-attribute-relationship model from a web data store using HTTP, then proceed invoke operations on it using HTTP.
The physical data structure of a remote data server is its own business.
All that matters to a client is that data server returns a logical data model in reply to a request saying "get meta data".
The client can then proceed to invoke create, read, update and delete operations on entities in that data model.
Centralised and
distributed data storage
Data architects are much concerned with the distribution or duplication of data in different data stores.
The choice between hierarchy and anarchy is central to much discussion of sociology and politics.
It is closely related to the choice between centralisation and distribution, which appears also in business, software and data architecture.
“It is not hard to speculate about, if not realize, very large, very complex systems implementations, extending in scope and complexity to encompass an entire enterprise.” John Zachman, 1987
This might be interpreted to imply consolidation of an enterprise’s business data into one large database.
(And it appears SAP pursued this strategy for many years.)
A current fashion is to "distribute data management" as Martin Fowler puts it.
So-called “micro services” (better-called “micro apps”) are based on small data stores.
The idea is to integrate small information systems rather than consolidate them around one data store.
This has advantages and disadvantages, but whether data storage is centralised or distributed the vision of EA remains the same.
That vision is to integrate business activities through sharing of the data they create and uses.
Businesses have to move information from one place to another, between business actors and data stores.
Data architects are concerned with the capture and transport of information in data structures.
Data flow view
Enterprise architecture is concerned with actors and activities that send and receive data/information.
The view relates applications to data flows (which can include messages, files and reports) and to data components.
Information System View |
TOGAF artefacts |
ArchiMate viewpoints |
Data flow view |
Application Interaction matrix |
Application Cooperation |
Application Communication diagram |
Application Cooperation |
|
Interface catalog |
|
Application Interaction matrix
Application Communication diagram
See “Applications Architecture”.
Interface catalog
Catalog the data flows that pass between applications, and between human roles and applications.
Human actors do convey much critical business information informally - in ad hoc speech, gestures and drawings.
But architects cannot model ad hoc information; they can only name messages that appear in regular communications.
Data architects can name messages created and used in regular business processes (e.g. enquiry, response, order, invoice, payment).
A Data Flow Catalogue (Interface
Catalogue in TOGAF) |
||||
Functional attributes |
Flow name |
Enquiry |
Response |
Order |
Trigger |
Enquiry |
|||
Source |
Customer |
Sales |
Customer |
|
Destination |
Sales |
Customer |
Sales |
|
Information |
Unstructured |
Unstructured |
Order details
(tbd) |
|
Non-functional attributes |
Frequency |
1,000/day |
1,000/day |
30/day |
Volume |
500K |
|||
Confidentiality |
High |
High |
High |
|
Integrity |
Medium |
Medium |
High |
|
Availability |
24/7 |
09.00-18.00 |
24/7 |
|
Transport mechanisms |
Technology |
Web |
Telephone |
Web |
Protocol |
HTTP |
HTTPS |
Data flow definers may name data groups and items in those data structures (e.g. From and To addresses in an email header).
They can name so-called “unstructured” data items to hold ad hoc information (e.g. the message in the body of an email).
Like many such illustrations, this table shows what could be documented rather than what most actually document.
But understanding what is possible in theory is a precursor to deciding what to do in practice.
Physical data flow
forms
Data architects are concerned with the forms of matter and energy in which business actors convey information.
At the bottom-most level, physical forms include wires, microwaves and sound waves (human speech).
In theory, data architects can design non-digital data flows.
In practice, data architects mostly focus on business systems in which business information is to be digitised.
Data flow format
standards
Data architects are concerned to ensure senders and receivers can create and read data structures.
There are many standard data flow formats, covering:
· Digital audio data, image data, and video data:
· Documentation and scripts
· Geospatial data; vector and raster data
· Qualitative data, textual
· Quantitative tabular data, with or without metadata
Data architects have to research standard data formats as the need arises. See footnote 1 for more detail.
Semantic
interoperability
The information found in the structure of data flow or data store is a matter of perspective.
So “semantic interoperability” is a major concern of enterprise data architecture.
Data architects work to ensure the creators and users of a data structure share the same understanding of its contents.
Business data can be structured according to many domain-specific languages – bespoke or standard.
So, data architects have to research standard “canonical data models” as the need arises.
Where are input and output data/information flows in TOGAF?
Business input and output flows are identified at the start of the Business Architecture phase B.
These flows can convey materials and/or information.
The B-to-C and B-to-B information content is conveyed through Human-Computer Interfaces and APIs, and in non-digital forms like paper.
The flows are documented in Business Service contracts in the Architecture Requirements
Specification.
And may appear also in a Business Service/Function Catalogue and/or Process/Event/Control/Product Catalogue.
Application input and output flows are identified at the start of IS Architecture phase C
These A-to-B and A-to-A flows can convey information only.
The information content is conveyed through Human-Computer Interfaces and APIs.
The flows are documented in IS Service contracts in the Architecture Requirements Specification
And may appear also in an Interface Catalogue and/or Application Use Case Descriptions.
Where are input and output data/information flows in ArchiMate?
Architects are taught to define systems from out to in, starting with the input/output boundary.
They define the external view of a system - hiding details of internal behaviours and structures.
Then, they divide the system into layers and/or subsystems and define each in the same way.
Architects define each system and subsystem (building block or component) by defining its interface(s).
An interface is a collection of services that a system or
subsystem makes available to clients.
An interface encapsulates the internal actors/components and processes that implement or realise services.
The ArchiMate modelling language classifies these ideas as shown in the table below.
ArchiMate |
Behaviour
elements |
Active
structure elements |
External view |
Services |
Interfaces |
Internal view |
Processes |
Actors/Components |
Services are discrete behaviours that clients can request of a system.
Service contracts encapsulate (hide) the necessary internal process flows and actors/components
Services consume and produce input/output flows that contain data and/or materials.
So, input and output data flows can be named (and detailed if need be) in service contracts.
The table below shows a selection of formats from the list
at https://library.uoregon.edu/datamanagement/fileformats.html.
It is drawn from UK Data Archive documentation; some of the data formats may be receding into history.
Popular modern formats include JSON for data flows, and OData for the description of web-accessible data stores.
Digital
image data TIFF version 6 uncompressed (.tif) JPEG (.jpeg, .jpg) PDF (.pdf) Digital
video data: MPEG-4 High Profile (.mp4) JPEG 2000 (.mj2) Digital
audio data Free Lossless Audio Codec (FLAC) (.flac) Waveform Audio Format (WAV) (.wav) MPEG-1 Audio Layer 3 (.mp3) |
eXtensible
Mark-up Language (XML) text according to a Document Type Definition (DTD) or
schema (.xml) Rich Text Format (.rtf) plain text data, ASCII (.txt) Hypertext Mark-up Language (HTML) (.html) widely-used proprietary formats, e.g. MS Word
(.doc/.docx) |
Documentation
and scripts Open Document Text (.odt) Rich Text Format (.rtf) HTML (.htm, .html) plain text (.txt) widely-used proprietary formats, e.g. MS Word
(.doc/.docx) or MS Excel (.xls/
.xlsx) XML marked-up text (.xml) to a DTD or schema,
e.g. XHMTL 1.0 PDF (.pdf) |
Quantitative
tabular data with extensive metadata SPSS portable format (.por) delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information structured text or mark-up file containing
metadata information, e.g. DDI XML file MS Access (.mdb/.accdb) |
Geospatial
data; vector and raster data ESRI Shapefile
(essential -- .shp,.shx, .dbf; optional -- .prj, .sbx, .sbn) geo-referenced TIFF (.tif,
.tfw) CAD data (.dwg) tabular GIS attribute data |
Quantitative
tabular data with minimal metadata: comma-separated values (CSV) file (.csv) tab-delimited file (.tab) including delimited
text of given character set with SQL data definition statements where
appropriate delimited text of given character set -- only
characters not present in the data should be used as delimiters (.txt) widely-used formats, e.g. MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb),
dBase (.dbf) and OpenDocument Spreadsheet (.ods) |
The table below is edited from a table on the IBM web site.
Logical data model |
Physical XML schema |
Schema |
|
SchemaLocation
(XSD file name) |
|
TargetNamespace
(unless set in the Properties page) |
|
Atomic
Domain |
Simple
Type |
Atomic Domain - Name |
Name |
Domain Constraint |
Facet (FractionDigits, TotalDigits, MaxLength, MinLength, Length MaxExclusive,
MinExclusive, MaxInclusive,
MinInclusive, Enumeration, Pattern) |
Entity |
Complex
Type and Element |
Entity - Name |
Name |
Entity - Documentation |
Documentation |
Entity - Supertype of Generalization |
BaseType
of Complex Type |
Entity - Primary Key |
Key of Element |
Generalization |
See Entity |
Generalization Set |
See Entity (with all applicable properties of the
generalization set). |
Attribute |
Contained
Element with Simple Type |
Attribute - Name |
Name |
Attribute - Documentation |
Documentation |
Attribute - Data Type, Length/Precision, Scale |
Type |
Attribute - Primary Key |
Key field of containing Element |
Attribute - Entity |
Owning Complex Type |
Relationship |
Contained
Element with Complex Type |
RelationshipEnd |
Contained Element with Complex Type |
RelationshipEnd
- VerbPhrase |
Name |
RelationshipEnd
- Cardinality |
MinOccurs
/ MaxOccurs |