“Big Data” has become a key driver in business project delivery, however do vast reservoirs of data translate into actionable engagement?
If you were to look at the business pages of almost any newspaper or attend a business conference, it’s more than likely you’ll come across the terms Big Data, Data Lake and now Data Pond.
Whether it’s from the page, the lectern or the foyer, the subtext is that the conversation is all about data – and lots of it.
The primary business context for these terms relates to an organisation’s ability to achieve the “360 view” of the customer. Once gathered, the objective is to leverage this vast amount of information, or data, to provide the ultimate tailored customer experience.
It offers the potential to engage millions of people simultaneously, as well as targeting each individual in a bespoke manner. The end game of gathering and organising these vast reservoirs of data is to use it to increased customer awareness and engagement on a level not possible before.
In order to achieve this level of performance and deliver that 360-customer view, new technologies and the corresponding buzzwords have emerged.
This is all great in theory but let’s take a look at what Big Data, Data Lakes and Data Ponds really are. How do they differ from one another and why they are needed?
Here is the Quay Quick Reference Guide:
Definition: Big Data
Gartner defines Big Data as “high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”.
To the uninitiated this could be seen as the same purpose of the Data Warehouse (logical representation of clean, structured and vetted data). The major differences include the sheer volume of data; the structured and unstructured nature of the data and the speed/velocity it is collected and utilised.
To give you an idea of the volume of data being collected:
- Facebook alone ingests 500 terabytes of new data every day
- Walmart handles 1 million transactions every hour and
- A Boeing 737 will generate 240 terabytes of flight data during a single flight.
The variety of data collected includes a mix of structured data with unstructured data such as photos, audio, videos, geospatial data and semi-structured data such as unstructured text.
Definition: Data Lakes
Gartner refers to Data Lakes in broad terms as “enterprise-wide data management platforms for analysing disparate sources of data in its native format”.
The creation of Data lakes has been the result of addressing the need for increased agility and accessibility for data analysis since the traditional data warehouses have been found to be too rigidly structured to manage the volume, velocity and variety of data being collected.
Simply put instead of placing data in a purpose-built data store/data warehouse, you move it into a data lake in its original format where it is available to everyone. It also avoids the cost and time of transformation required with data warehouses but comes with some limitations due to the raw nature of how the data is now stored.
Thus the benefit of a Data Lakes in the short term is the upfront avoidance of costs and effort to understand, translate and categorise the data for structured relational storage (ie: the Data Warehouse model). The downside being the challenges around accessing the data due to it being kept in a rawer form.
Definition: Data Ponds
The responsibility of getting value out of the data reverts back to the business user and technologies such a Hadoop (a platform focussed on the management of Big Data) are being used to assist in this. However without strong information governance, the risk is that the Data Lake will end up becoming a mismatch of unconnected collective reservoirs of data.
Thus, we arrive at the third of our latest data terms the Data Pond also known as Data Puddles (in Australia we could also refer to them as Data Billabongs).
Deriving the Value of Big Data
Turning Big Data into a usable and accurate source of information to get a 360-customer view has been a significant challenge. However, new tools and trends are emerging, such as Data Lakes and Hadoop which increasingly address organisations’ data requirements.
Accurate reliable and reusable information with which to engage the end customer needs to become the norm.
Achieving this outcome still requires information governance; a clear vision of the outcomes and the right guidance and implementation to ensure success but this cannot be achieved without also embracing the latest data trends and buzzwords.
Next month we will look at why once captured Big Data and information management has been – and still is – such a challenge for organisations.
We believe that quality thought leadership is worth sharing and encourage you to share with your colleagues. If you’re interested in republishing our content, here’s what’s okay and what’s not okay.
To speak to our team about how we can help your business deliver better projects, please contact us.