Big Data + BI
This page is published under the terms of the licence summarized in the footnote.
These notes were written in the 29th of May 2015 with the help of Rick Anderson.
The proposal was and is that big data is better distinguished from data warehouse data.
Also, business intelligence is a separate thing; and big data + business intelligence is a special case.
And all these distinctions can be drawn with little reference to technology.
Three classes of data store may be distinguished.
A business database is a data store of structured and strongly typed business data.
The data is collected for the purposes of monitoring and/or directing individual business entities and events.
Business databases can be made faster by using solid state drives and in-memory storage.
These enable tens of thousands of transactions per second, and greatly speed up enquiries and reports.
A data warehouse holds a large volume of structured and strongly typed business data.
The data is usually collected from business databases.
The data is organised to the enable analysis and summary reports for management information – be it enterprise-wide or departmental.
Data warehouses are often optimised for data retrieval by the use non-relational data structures (column stores, key value stores etc.).
Data warehousing practices includes cleansing, sorting, transforming, aggregating data
Data warehouses are often associated with specific BI tools.
Big data is the kind of data that is not better captured in a traditional business database or data warehouse.
A big data store holds a very large volume of rapidly accumulated and possibly unstructured and/or weakly typed data.
The data records natural, human or machine behaviour that has not traditionally been monitored or directed by business systems.
Big data stores are made faster by employing a variety of data structures and query languages.
Big data is often described in terms of the 3Vs.
Volume: many terabytes collected over a short period of time.
Variety: ranging from social media data to the 1.5 billion pieces of data generated per car per grand prix.
Velocity: storing a high volume and variety of data slows down data retrieval, which implies the need for faster technologies.
BI is about analysing large volumes of data to extract useful information from it.
One would think that the more data there is the better, but you can drown in data while thirsting for insight.
BI helps by aggregating, summarising, reporting etc.
It should help conclusions to be drawn and decisions to be taken using facts rather than drowning in data.
BI can potentially be extracted from all three kinds of data store above.
The trick is to enable large volumes of data to be read in a short time.
And all the above data stores can be optimised to enable queries and reports.
But the larger the volume of data, the more likely you need a data warehouse or big data store.
BI can help you validate raw data.
“Many times a business person has said to me “your report must be wrong”.
But when we drill down to the line items we find the business data entry was wrong.
For instance someone once put in a repair for 10x million Korean Won instead of x thousand, indicating a £1,000, 000 repair on a £1,000 container.
It wasn’t picked up until the management report was seen.
So good reporting can reveal data quality issues and help them to be fixed.” Rick Anderson
BI can help you visualise.
Tools can capture or read large volumes of data and display them in dashboards, storyboards etc.
They can pick out (or help people to pick out) trends, clusters, variances and “outliers” in the data sources.
Big data + BI is used to analyse data captured from human and machine behaviour beyond traditional structured business data.
Social media data (use of Twitter and Google) is analysed to target advertisements.
Big data + BI can enable “real time” business intelligence instead of following overnight upload to a data warehouse.
The 1.5 billion pieces of data generated for cars in a grand prix is used to create a real-time model of a cars’ progress.
Note that this lost Lewis Hamilton the Monaco grand prix when it indicated he should pit, allowing him to be overtaken!
On Variety and Visualisation
On Volume and Velocity
Footnote: Creative Commons Attribution-No Derivative Works Licence 2.0 29/05/2015 15:50
Attribution: You may copy, distribute and display this copyrighted work only if you clearly credit “Avancier Limited: http://avancier.website before the start and include this footnote at the end.
No Derivative Works: You may copy, distribute, display only complete and verbatim copies of this page, not derivative works based upon it.
For more information about the licence, see http://creativecommons.org