Data Platform Optimisation
Habit Health is New Zealand's largest integrated health provider, offering, physiotherapy rehabilitation, occupational health, EAP services and fitness clubs. An extensive range of specialised health, fitness, and personal welfare services along with a comprehensive national footprint, allows Habit Health to optimise the health and well-being of Kiwis at home, in the workplace or at one of more than 100 locations.
The Challenge
As data volumes grew and business reporting requirements increased and became more complex, the existing data platform solution was not fit for purpose. With the addition of multiple data sources and tables incrementally over time, load times grew to over 12 hours, effecting downstream processes and making it more difficult to recover load errors, when they happened, in time for the start of the business day.
This complex process was difficult to maintain, and with the duplication of business rules, meant differing results in differing reporting areas.
The Solution
Analysis of the problem lead to a plan to focus on the most cost-effective solution to tackle the 4 main logical areas of the ETL process.
1. Data Extraction
Reduce run times on data extraction and transformation
Standardise code and documentation
Change extraction from V1 to V2 of a key API
2. DW Code reorganisation
By creating a staging environment of common repeating logic, we were able to reduce duplication and create one source of truth. This also had the advantage of reducing processing time as these queries are run less often. It also means that future code enhancements can be achieve a lot faster with easier testing.
With our custom code generation we were able to quickly locate and combine duplication of business logic.
3. DW Code parallel processing
Each stream of data processing was segmented and connecting to the data source, duplicating the activity and business rules. The entire process was an end-to-end sequential process.
The Implementation of new parallel streams meant better use of infrastructure and processing to speed up and reduce re-run work should an error occur.
Our ADF solution was easily able to execute and monitor each stream.
4. Automated Power BI data refresh
An automated process was built in ADF to reload each Power BI report. This process meant each report would be automatically reloaded once each of its dependant tables in the data warehouse was loaded. This takes the guess work out of scheduling report reloads at a time that’s might suit the end of the data refresh. This has led to fewer data reload fails and earlier availability of reports for the business.
Outcomes
Reduced Running time from 12 hours to 2 hours per night, reduced cost and faster reprocessing when required.
Ability to utilise Azure database scaling during processing times to process the data faster and scale back down sooner.
Reduced code complexity, easier and faster enhancement.
On time report availability improved dramatically