Data is everything in business today, providing us with the intelligence to constantly improve products and services, and provide better experiences for customers in an age of massive digital competition.
With such a growing proliferation of data now at our fingertips there has, over a period of decades, emerged an increasingly sophisticated arsenal of data management solutions that enables us to get a better handle on the data we accumulate and use it to make better decisions.
Data warehouse solutions first came into use in the late 1980s, and while they have evolved rapidly since then, their function remains unchanged: to consolidate data from multiple sources and formats in one place creating a single and consistent source of information across a business. This is achieved via a process known as Extract, Transform, Load (ETL), which we explored in another post.
By consolidating and integrating multiple data sources in this structured way, organisations can use data analytics tools and methods to identify trends and make strategic decisions from historical data.
Given the availability and effectiveness of data warehousing alongside data management and analytics tools, businesses have become more reliant on data to make decisions. At the same time, this popularity has led to a need for solutions that provide intelligence in real-time, which has tested the limitations of traditional data warehousing solutions, and led to the development of real-time data warehousing solutions.
Data warehousing vs. Real-time data warehousing
Conventional data warehouses comprise an integrated and historicised collection of data used to make strategic decisions across the business. They consolidate multiple independent data sources to create a single view of the organisation and therefore will provide a picture of the organisation at a certain time in the past, such as the day, week or month in which the data was loaded.
Taking this further, real-time data warehousing meets the rising demand for up-to-second information by refreshing the data it stores several times a day. Information stored in a real-time data warehouse therefore represents a much more accurate picture of the actual situation of the organisation at the time the data is requested and analysed.
Major differences between traditional data warehouses and real-time data warehouses:
Traditional data warehousing |
Real-time data warehousing |
For strategic decisions only | For strategic and tactical decisions |
Historical data updated periodically | Real-time data |
Restrictive reporting used to check existing processes and patterns | Flexible ad hoc reporting and machine modelling to discover new insights |
Results hard to measure | Results measured with effect on operations |
Daily, monthly or weekly data concurrency is acceptable | Only data available in minutes is acceptable |
Source: arXiv
The purpose of real-time data warehousing is therefore to enable organisations to rapidly access information and react almost immediately to new information, hour by hour.
The criteria required for these kinds of continuous updates – without first shutting down the data warehouse and putting it out of use – are generally not possible with traditional ETL tools. However, that’s not to say traditional data warehousing systems can’t be updated with new ETL solutions specialising in real-time and data loading, or by modifying existing ETL tools to load data warehouses in ‘near-real-time’, such as every week.
Applications and benefits of real-time data warehousing
Given the current compute resource consumption of real-time data warehousing solutions, their use tends to be reserved for specific use cases which demand real-time analytics and continuous data reporting, such as querying IoT sensors, analysing sudden spikes or dips in revenue from financial transactions, or exploring buyer behaviour from Customer Relation Management (CRM) data.
However, there are a number of benefits of real-time data warehousing that apply to any organisation which finds itself relying on data to make decisions regularly, including:
Faster decision-making: Businesses can make decisions quicker based on more current, accurate and consistent data, reducing wait time.
Controlling data load: Smaller, more regular loads comprising only data that has changed (as opposed to the entire data source) can reduce the impact window of larger, less frequent updates – especially important for organisations operating 24/7.
Faster recovery: If data load issues take place and data is made unavailable, there is less time to wait for the next load sequence, so recovery and intervention can take place more quickly.
Better availability: Real-time data warehousing can eliminate batch load windows which require the data warehouse to be dormant and make data sources unavailable for a period of time.
While real-time data warehousing is crucial for businesses whose success relies on up-to-minute intelligence, given its cost, real-time data warehousing is not necessarily a viable investment for the needs of every organisation.
Near real-time data warehousing offers a more affordable alternative for businesses which are finding their existing data warehouses limiting and are seeking to move towards those benefits we mentioned above. For example, a load that is usually carried out weekly could be executed once a day, enabling users of the data warehouse to have access to more recent data without making major modifications to the loading process or data model.
While not quite the real-time reality, near real-time data warehousing can offer a low-cost step to levelling up data warehouse solutions and increasing the value of the insights drawn from them.
illumo digital is a specialist in building bespoke data warehousing solutions. Find out more about our services here.