This month we’ve been blogging a lot about data migration to the cloud; in particular migrating from a legacy Netezza data warehouse. We were recently chatting with a client well on their way to making such a migration, and while the process is moving along smoothly, they still experienced a natural moment of hesitation, “Will all this work be worth it?”
Decommissioning legacy systems and applications is a complex initiative for any organization, that must be based on tangible benefits to the business in order to justify the work effort.
Based on past experience helping companies with legacy EDW migrations, including Netezza, Greenplum, Teradata, and others, we can categorize benefits into three core buckets:
- Cost Savings
- Operational Flexibility
- Consumption Improvements
Eliminate buying and hosting expensive proprietary hardware/ software
First of all, the annual licensing fees for the proprietary hardware/software/support of the legacy platform is eliminated. Additional nodes or resources for the new target environment may be required to support the required applications, but there will be a material cost-saving by removing the heavy cost of commodity hardware, as well as the ‘vendor lock-in’.
Reduce or eliminate ‘lock-in’ to tightly coupled technologies
With Netezza, tools like DataStage or SSIS are tightly coupled to the platform, which limits your options and ability to ingest, prepare and integrate data. With ETL tools specifically, the required server(s) to support data integration capabilities within Netezza come at a high-cost per user. This is largely alleviated in favour of open-source technology, which give the organization freedom to adopt from a wider array of technologies, based on open-architecture/API frameworks.
Reduce infrastructure management complexity
Managing Netezza database infrastructure is a highly manual process. Whether the new target is Hadoop or a cloud provider, this management becomes largely automated. Plus, auto-scaling and more efficient resource management is one of the core capabilities of these modern platforms and further reduce the resources required to manage the platform.
Reduce risk associated with platform tuning
As mentioned previously, Hadoop/cloud are horizontally scaling data management systems, and most Hadoop-based tools and resources are managed by an auto-scaling technology (such as YARN) to scale-up as necessary based on consumption patterns. These tasks can be automated as well in turn reducing the resources required to manage.
Reduce risk with planning upgrades and downtime
Upgrades to legacy EDWs, like Netezza, are typically cumbersome processes requiring planned downtime, usually over weekends or after-hours. Which is especially burdensome if it’s a result of fixing some sort of issues/bug resulting in unplanned downtime. Business units need to consume data at an increasingly aggressive pace, making downtime for upgrades unacceptable. Hadoop/cloud-based platforms largely remove the need to manage these activities, resulting in higher availability.
Reduce times for consumption and resultant performance-related issues
The auto-scaling capability of Hadoop/cloud removes the need to limit concurrent queries. Hadoop-based tools like Hive and Spark, with proper deployment and resource management, can alleviate many of the performance related issues users face with Netezza and other legacy EDWs. In the case of cloud-based databases like BigQuery or Snowflake, query performance is exceptionally high when compared to on-prem RDBMS. Thus, allowing business units to operate in real-time.
Eliminate Data Consumption “Pre-Processing” through automation
With Netezza, or any legacy EDW system, when business users look to consume data for analytics or reporting, the ability to consume data is dependant on a number of pre-processing steps. This could include complex mappings or complex transformations just to create a dataset in a format their reporting application requires. This pre-processing usually amounts to around 80% of a business analyst’s time. With modern Hadoop/Cloud platforms, pre-processing is usually eliminated via automated data pipelines, built on open-source or platform-native tools specifically to make consumption faster. Automated ingestion/ETL, data quality validation, among other capabilities are typically handled by the pipeline, making it easier for business consumers to plug in their preferred tools.
Increased capability to derive actionable insights, and combine disparate datasets
With Netezza, users are limited on how they consume data. They either need to manually move data to another server for complex analytics or put the onus back on Netezza to process the queries. Hadoop/cloud platforms are purpose built to bring processing to the data. Meaning, less wait times, faster query performance, and easier ability to combine disparate data sets (i. e structured and unstructured data, streaming and batch, and wider support for other data types). Open architectures and APIs also allow organizations to adopt a ‘bring your own consumption tool’ methodology; in essence creating a self-service framework for the business.
We believe in the benefits of adopting modern technologies and have seen our client’s successes as proof positive. The work involved can be intense, but if done right, the results are immediate and the pay-off is profound.