Wisdom, Knowledge, Information and Data (WKID_
Copyright 2017 Graham Berrisford. One of about
300 papers at http://avancier.website. Last updated 25/03/2018
16:34
A role of enterprise architects is to observe and envisage information systems
So, you might assume it is universally agreed what "information" is; but this is far from the case.
Is your
telephone number data, information, or knowledge? Or all
three?
Contents
Preface:
What is information? (repeated from system ideas)
Appendix:
Eight other ways people propose to differentiate data from information
“connected with system theory is… communication.
The general notion in communication theory is that of information.” Bertalanffy
Our main interest is in social and business systems in which animate and/or computer actors exchange information.
The Oxford English Dictionary lists more than half a million words.
Consider data, information, knowledge and wisdom; also signal, symbol, description, representation, meaning and model.
Given those ten words, how many clearly distinct concepts are there?
You may have come
across something called a WKID
triangle, pyramid or hierarchy.
This version is compatible with the information and communication
theories that follow.
Wisdom |
The ability to
respond effectively to knowledge |
Knowledge |
Information that
is accurate or true enough to be useful |
Information |
Meaning created or found in a structure by an actor |
Data |
A structure of
matter/energy in which information has been created or found |
All communication utilises a structure
The medium for information storage or communication is a matter/energy structure of some kind.
To communicate, animals use sound waves (calls), smells, gestures, etc.
Humans use sound waves, written text, flags, etc.
Computers use electronic signals, radio waves, etc.
Every structure has information potential
There are infinite structures in the matter/energy of the universe.
Some equate structure with information.
Here, we say a structure has information potential to actors.
There is actual information when actors use some information potential to create or obtain a meaning.
There is information potential
in the variable |
There is actual information
when |
angle of the sun’s rays |
a human reads the time from the shadow on a sundial. a sunflower perceives the position of the sun and turns to face it |
nerve impulses (electrical charges) |
an actor responds by removing its hand from a hot plate |
bending of a bi-metal strip |
a thermostat responds by switching a heater on or off. |
movements of a honey bee |
honey bees dance to communicate a location of pollen. |
open or closed state of an office door |
actors share a vocabulary in which an open door means “you have permission to enter”. |
lengths of dots and dashes (in sound, light, braille…) |
actors use Morse code to communicate. |
quantity in a number |
an actor says 20 in reply to a request for a fact (say, the speed of a bicycle in miles per hour). |
Information is
meaningful to its sender and/or receiver
Senders encode meanings in data structures, and receivers decode meanings from them.
The meanings include descriptions, directions, decisions and requests for them.
Descriptions are usually divided into facts (tasty, tall, scary) about things (say, food, friends and enemies) that actors perceive as discretely identifiable.
Information has at least one sender and/or
receiver
A sender (a voice crying in the wilderness) may create information in a data structure that no receiver inspects.
A receiver may find some information in a data structure that was not intentionally sent.
E.g. The sun radiates a flow of light towards a rotating earth.
A sunflower finds a direction to turn its face to optimise its energy consumption.
Different actors can
find different information in the same data structure
E.g. The sun radiates a flow of light towards a rotating earth.
A sunflower finds a direction to turn its face to optimise its energy consumption.
One man reads the shadow on a sundial as describing the hour of the day.
Another concludes that the sun rotates around the earth; another that the earth spins on its axis.
E.g. the data structure in a DNA molecule may be decoded by a biological cell as instructions for making proteins.
And decoded by a human reader of the genome as carrying a gene for some life-shortening condition.
Neither actor can read and act on the data structure as the other does.
To communicate requires sharing a data
structure and a language
First, the data structure of a message must be preserved (a concern of Shannon’s theory).
Second, creators and users must share a language for encoding and decoding that data structure.
Two things can go wrong.
First, the data structure is distorted between sender and receiver.
E.g. Speaker says: “Send reinforcements we are going to advance.”
Listener hears: “Send three and four pence we are going to a dance.”
The intended signal is distorted at some point between sender and receiver.
Shannon’s information theory is about preserving the integrity of a data structure.
Second, creators and users use a different a language to encode and decode a data structure.
Or the ambiguity of natural language disables communication.
E.g. Speaker says: “He fed her cat food.”
Listener 1 hears: He fed her cat – food (He fed a woman’s cat some food).
Listener 2 hears: He fed her - cat food (He fed a woman some food that was intended for cats).
Listener 3 hears: He fed - her cat foods (He somehow fed the cat food that a woman owned).
In business systems, the presumption is that things do not go wrong.
Data structures are preserved perfectly.
Senders and receivers apply the same language to writing and reading them, or perfect translations are made.
Information is a subjective view of a data
structure
The information in a data structure depends on senders and/or receivers and the languages they use.
E.g. I leave my office door open.
Case 1: I do it deliberately, to signal that I am open to visitors; you read the door as saying I am open to visitors, and enter my office.
Case 2: I do it by accident, but am open to visitors anyway; you misread the door as saying I am open to visitors, and enter my office.
Case 3: I do it by accident, but am not open to visitors; you misread the door as saying I am open to visitors, and enter my office.
Any meaning created or
found in a message or memory structure is information to that actor
An actor can change their mind about the information found in a message.
E.g. I say the swimming pool is warm; you hear and act on that information by diving in.
I turns out the swimming pool is cold, and you now recall the information as a lie.
What a sender considers true, a receiver may consider false, and vice versa.
Knowledge is information that is
true enough to be useful.
The accuracy or truth of information is a matter of degree.
Knowledge
is information that is true enough to be useful (e.g. Newton’s laws of motion).
Sometimes what we say can be tested by measurement of meaning against reality.
But all measurement has a degree of accuracy, and even Newton’s laws of motion are approximations.
Read Information for a longer discussion.
Read Knowledge and truth for exploration of that topic.
The universe may be continuous, but animals decode apparently continuous signals into discrete facts.
How our brains and computers work (at the neuron or electronic level) is not important here.
We make sense of the world by chunking perceptions of the continuous universe into discrete entities (you, me, your lunch, or a bicycle ride).
We describe the world in terms of discrete facts (you are tall, I am old, lunch is tasty, and this bicycle is speeding at 20 mph).
These facts are models or coded representations of things we perceive as discrete entities.
Social system
members have two mechanisms for sharing the language, rules and state of a
social system.
·
Communication
- by messages sent from senders to receivers
·
Recording
- by storage in a shared memory that all actors can access.
Social systems range from the informal to the formal.
Business systems evolved from social systems by
formalisation social
communication.
Gradually, the transactions of government and commerce were standardised.
The language
and behavioural rules of a business system, once agreed, are stable for a
system generation.
The business
system is directed according to those rules, until they are changed in a new
system generation.
Data |
A structure of
matter/energy in which information has been created or found. Any feature or part of a signal that is mappable to a language or data model. |
A structure becomes information when it is mapped to a language.
Honey bees use the language of dance to signal the locations of pollen sources; situation-specific data is found in particular dance movements.
The bees’ language has to be generalised, independent of particular bees, particular dances and particular pollen sources.
Analysing an example signal
The sentence below is one particular physical signal, formed of characters and spaces that you can read on your screen.
“Jack Jones has booked
seat 35F in coach E on the fast train to Newcastle Central from London Kings
Cross leaving at 18.00 hours tomorrow evening.”
You can surely read all the meanings that I encoded in this signal, because you and I share the same general-purpose language.
The sentence contains at least seven separable facts, which each associate a thing with a property or another thing.
· There is a train to Newcastle Central from London Kings Cross.
· The train leaves at 18.00 tomorrow evening.
· The train is a fast train.
· The train has a coach E.
· Coach E has a seat 35.
· Seat 35 is forward facing.
· Jack has booked a train seat.
The sentence implies some further facts:
· London has a station called Kings Cross.
· Newcastle has station called Central.
Given some sentences similar to the one above, you can abstract at least ten domain-specific types.
And if you like to generalise further, you can abstract super types from those types.
Note that you can’t classify all things under one super type only, because there is multiple inheritance. E.g. a Station is both a Place and a Passive object.
Particular thing or
property |
Domain-specific
type |
More generic type |
Even more generic
type |
Jack Jones |
Customer |
Person (or Party) |
Entity |
E |
Coach |
Passive object |
Entity |
seat 35 |
Seat |
Passive object |
Entity |
Newcastle Central |
Station |
Place (or Passive object) |
Entity |
London Kings Cross |
Station |
Place (or Passive object) |
Entity |
F (= forward facing) |
Seat direction |
Property type |
Attribute |
Fast train |
Train journey duration |
Property type |
Attribute |
To/from (= station role in journey) |
Arrival/Departure Place |
Property type |
Attribute |
has booked |
Booking |
Process |
Event |
leaving at 18.00 tomorrow evening |
Departure time and date |
Point in time |
Event |
Data models
The domain-specific language used in a digital
information system is commonly defined in what is known as a data model.
A data model is composed of generalised data types (e.g. Customer, City)
The data types which are instantiated as data items (e.g. Jack Jones, London) in particular messages and records created and used by actors.
A data model is a
language for communication of information between members of a specific
business/social system.
This language stands apart from, is independent of, particular actors,
particular signals, and particular information.
Knowledge |
Information that
is accurate or true enough to be useful |
Consider these reasonable uses of the verb “to know”.
1. You know your son’s name
2. You know your son likes ice cream.
3. You know to remove your hand from a hot plate (that knowledge is in your spinal column).
4. You know that removing your hand is an autonomic action (completed before any signal reaches your brain).
5. You know you know all the discrete facts above; that is to say, you are self-aware.
Knowledge is
an ambiguous term
French and German languages have different words for the knowledge of recognition and the knowledge of understanding.
Wisdom |
The ability to
respond effectively to knowledge |
Wisdom ought to help actors and/or the society they live in to flourish.
Is wisdom transient or persistent? Can an elephant be wise?
There is evidence that chimpanzees and elephants, have considerable self-awareness.
Surely wisdom is the ability to generate directions in new situations by introspection of remembered information?
So far, we
haven’t needed to discuss actors having conscious aims, having free will or
making choices.
However,
wisdom implies an ability to be consciously introspective, to recall and
analyse remembered information.
This WKID table is compatible with
the information and communication theories above.
Having |
Means |
Wisdom |
The ability to
respond effectively to knowledge |
Knowledge |
Information that
is accurate or true enough to be useful |
Information |
Meaning created or found in a structure by an
actor. Meaning encoded in a signal by a signal creator or decoded
from a signal by a signal user. |
Data |
A structure of
matter/energy in which information has been created or found. Any feature or part of a signal that is mappable to a language or data model. |
The language used to encode/decode signals may shared by signal senders and receivers, but is independent of them
So, what about the
data/information ambiguity in enterprise architecture?
Enterprise architecture about business roles and processes that create
and use information.
Or should that be data?
Enterprise data architects typically define their domain along these lines.
"Data architecture defines business data in terms of relationships between the following data elements:
· Data stores and data flows created and used by business activities.
· Data structures contained in data stores (usually defined in terms of data entities).
· Data structures contained in data flows (such as messages).
· Data qualities (meta data) including data types, confidentiality, integrity and availability.
Architects may relate these data elements to business activities and to business applications."
However, data architects do also speak of information and information systems.
They use the
term "data" variously, sometimes as synonym for “signal” or for
“information”.
And when they speak of a data in storage, they are usually thinking of the information, not the raw signals.
Suppose we agree data = physical signals, and information = meanings given to signals by actors?
OK, then which is the correct term, data store or information store? database or information base?
Both are correct, since a database does store signals in a physical form.
And what it stores does represent facts meaningful to actors who read/write those signals.
And so, the data/information terminology ambiguity will persist in EA frameworks.
Asking enterprise architects to use distinguish data from
information in a disciplined way simply doesn’t work
It is easier
to let people use the terms “data” and “information” interchangeably, choosing
whichever suits their audience.
The proposal above: data is any feature or part of a signal that is mappable to a language or data model.
However, many other distinctions have been dawn between data and information.
This section
outlines many and diverse ways in which people think to distinguish data from
information.
Some
distinctions are human-centric, some are about chunking, and some are about
idealisation.
Our domain is the use of information in business systems.
But there was information before business, and before humans.
So the first three distinctions seem arrogant, since they elevate humans over machines, humans over animals, and managers over operators.
1. Data is
computerised: information is human?
Data architects distinguish data at rest (in stores) and data in motion (in flows).
Some say data flows created or used by computers are data, whereas data flows created or used by humans are information.
And thus, some enterprise architecture sources distinguish between data and information architecture.
But is there a sound basis for treating information as the preserve of humans?
Computers mimic social systems in sharing vocabularies, grammars and rules for reading and acting on signals.
The first business computers played roles formerly played by humans; and computers still act as proxies for humans.
2. Data is binary
digits: information is verbal?
This distinction might be seen as a rephrasing of the computer / human distinction above.
But it is misleading, since information is conveyed in other ways than by words.
And humans invented binary digits long before computers were invented.
Ones and zeros are meaningful information to any humans who manipulate them in binary arithmetic.
3. Data is facts used
by operators: information is facts aggregated or analysed from operational
facts?
This distinction assumes a kind of information stack.
It treats operators as bottom level actors, and managers as the topmost actors.
The distinction is subjective, as can be seen in the fact that one person’s atomic information is another’s summary data.
The total of available stock items can be seen as an atomic fact, or an abstraction from thousands of stock items.
Your current account balance can be seen as an atomic fact, or a total abstracted from thousands of credit and debit transactions.
Every cardinal number (amount of stock available) and value judgement (heavy, good, nearly finished) is both an atomic fact and summary data.
Information is created and used by actors at different levels of management for different reasons.
“Management reports typically aggregate individual cost transactions into a total for (say) a chemical plant in a factory.
This is of no interest to the junior accountant responsible for successful business transactions, but very useful to the factory accountant.
However, the CEO sees even plant totals as bits of data – needs data aggregated to the whole company and trended over time.
Good BI reports let you drill down from very top to very bottom to provide assurance the high level number is accurate.
And they enable analysis of exceptional behaviour or actors, such as, which plant costs the most to run.”
(See “Big Data and Business Intelligence” on the avancier.website for more from Rick Anderson.)
These three distinctions are about how signals are divided into facts, or atomic facts are aggregated into complex facts.
4. Data is singular:
information is plural?
At your local railway station, you read an ordered list of train departure times on a notice board.
At the same time, you hear one train departure time announced over the Tannoy.
Surely, the single item of news you hear gives you more accurate information than list on the notice board?
5. Data is the atoms
of (molecular) information?
In reply to the question "How good is Jack's diet?" comes the one word answer "Poor."
That one word sentence conveys one atom of information to the questioner.
The atom of information is not the word, but rather the fact that Jack’s diet is poor.
"Jack is short and has a poor diet".
This sentence contains two discrete facts: Jack is short; Jack has a poor diet.
Analysis of many sentences of the same kind may reveal a third fact: Poor diet tends to stunt height.
Words give us a vocabulary for communicating facts about discrete entities and events, in signals.
Surely each discrete fact is information in its own right? To call it data is entirely arbitrary?
6. Data is discrete:
information is continuous? (Or the reverse)
To send or perceive information, actors need to make or find some distinguishable variety in the flow or structure of a signal.
Some signals are continuously varying (the angle of the light from the sun).
Some signals are chunked into discrete elements (the light flashes in Morse code).
Information can be extracted from a continuous signal, such as the time of day on a sundial.
The angle of a bi-metal strip is another continuously-varying variable.
So the information read from these signals can be continuous also.
However, actors have evolved to make sense of the world by chunking signals into discrete elements.
Actors that can match new entities and events to a remembered pattern or type have an advantage in life.
Nobody knows how the brain works, but we do know it can store and retrieve discrete facts, e.g. telephone numbers.
And business systems clearly divide information into discrete facts, e.g. names, addresses, and account balances.
They assume people can write, read, remember, consider and
act on discrete facts about business entities and events.
7. Data is objective:
information is subjective? (Or the reverse)
Many modern philosophers and scientists take the view that all is subjective; perception is reality.
The matter and energy of the universe is not directly knowable; it is knowable only in a subjective measurement or description of it.
“If you accept quantum physics at face value then at least one of two dearly held principles from the classical world must give....
One is realism, the idea that every object [particular] has properties [instances of typical characteristics] that exist without you measuring them."
Anil Ananthaswamy “New Scientist” 13 December 2014
“A supposed “reality” that is “outside” of every logical possibility of empirical or logical interaction with “it” can play no direct role in the sciences.
Science can deal only with phenomena, that is to say, only with what can “appear” somehow in experience.
All scientific concepts must somehow be traceable back to phenomenological roots [in the study subjective experience].”
http://plato.stanford.edu/entries/peirce/index.html
If all is subjective, then how can actors use the matter and energy of the universe to communicate?
Sending and receiving actors must share a common vocabulary and grammar for writing/reading the information in signals.
You might reasonably say the meaning of that information is objective within the bounds of the social system the actors are members of.
8. Data is physical:
information is logical?
This equates data with signals, and information with meaning created or used in signals.
It says data (in its energy or matter form) is physical or concrete, whereas information (meanings) is logical or abstract.
Whatever data/information distinction you prefer, you can test it on the exercises below.
Do you distinguish the terms “data” and “information” in one
of the many ways listed above? Or another way?
Then whatever your chosen data/information distinction, you can test it on
these exercises.
Exercise 1: How would you reword the definition of enterprise data architecture below?
"Data architecture: defines business data in terms of relationships between the following data elements:
· Data stores and data flows created and used by business activities.
· Data structures contained in data stores (usually defined in terms of data entities).
· Data structures contained in data flows (such as messages).
· Data qualities (meta data) including data types, confidentiality, integrity and availability.
Architects may relate these data elements to business activities and to business applications."
Exercise 3: In any sentence below, at any point, does substituting “data” for “information” change the sentence’s meaning?
A human receives information from another human and stores the received
information.
A human receives information from a computer and stores the received
information.
A computer receives information from a human and stores the received
information.
A computer receives information from another computer and stores the received
information.
Exercise 3: When and where in the story below do you see your friend’s address as data, information, or knowledge?
You ask your new friend for his address.
He brings his address to mind, and tells it to you.
You copy it into your address book.
Before you visit him, you open the address book and use it to find his house on
a map.
Then you use the map to find his house.
Next time you visit, you don't need the address book or map, because you
remember the way.
But you get into an argument about what data means, and he doesn't invite you
back for ten years.
Now, you have forgotten his address and have to use the address book and map
again.
Footnote:
Creative Commons Attribution-No Derivative Works Licence
2.0 18/08/2013 11:18
Attribution: You may copy, distribute and display this copyrighted work only if
you clearly credit “Avancier Limited: http://avancier.website”
before the start and include this footnote at the end.
No Derivative Works: You may copy, distribute, display only complete and
verbatim copies of this page, not derivative works based upon it.
For more information about the licence, see http://creativecommons.org