Big Data Playbook: Data Warehouse Migration to the Cloud

Playbook: Data Warehouse Migration to the CloudAny conversation regarding digital transformation inevitably comes around to discussing moving enterprise data warehouses to the Cloud or a hybrid model. With the technological advances being made by the cloud providers, the benefits associated with housing data warehouses in the cloud are many. Cloud providers offer incredibly fast processing, fast deployment, built-in disaster recovery and, depending on your cloud provider, strong security and governance.

A five-step process

As with any enterprise IT project, moving an existing data warehouse to the cloud isn’t necessarily fast or easy. While lots of companies have developed tools and processes to smooth the process, it is definitely not as simple as lifting and shifting from on-prem to the cloud. Data is only one of five critical components in a data warehouse needing special attention and care.

Data warehouse migration is a complex project involving interactive processes to migrate all of the components successfully. Here are the five major component and what you should consider when migrating to the cloud.

1) Schema
Before you’re able to move a data warehouse, you’ll need to migrate the schema supporting the data; the tables and specifications. Structural changes, like indexing or portioning will also likely be required to support the new environment. However, you should consider if these need to be rethought.

Need to migrate from a legacy system? Check out SHIFT from Next Pathway to translate the legacy schema to speed up the migration process and reduce errors.

2) Data
Migrating large volumes of data is rather time consuming and intensive both for the network and processing. Prior to moving, map out how long it will take and consider options for accelerating the process. You may also need to restructure data as part of your previous schema changes, and transform the data as part of the migration process. Alternatively, can or should you transform in-stream or pre-process and then migrate?

3) Data Pipelines
With any data migration or transformation process, we highly recommend building data pipelines to replace existing ETL (see ETL is the Root of all Data Problems). By building data pipelines, you’ll better be able to understand dependencies in order to create optimum workflow and gain advantages like performance, agility, reusability and maintainability.

We created a tool called Cornerstone to help build your data pipeline all without using ETL. Check it out as an option to managing your data supply chain.

4) Metadata
Source-to-target metadata is a crucial part of managing a data warehouse; knowing data lineage, for proper governance and troubleshooting is critical. You’ll need to determine if you’ll be able to export and import by either reverse engineering the metadata or rebuilding it from scratch.

5) Users and Applications
The final step in the process is migrating users and applications to the new cloud data warehouse, without interrupting business operations. Security and access authorizations will need to be updated, and BI and analytics tools should be connected.

Don’t try to do everything at once
A typical enterprise data warehouse contains a large amount of data covering many business subject areas. Migrating with a “big bang” approach would almost guarantee failure; so taking incremental steps will be key to a successful migration; especially when undertaking significant design changes.

With an incremental approach, your on-premise data warehouse can remain operating as the cloud data warehouse comes online. During this transition phase, you’ll need to synchronize the data between the old on-premise data warehouse and the new one in the cloud.

Cloud migration services to the rescue

The good news is, lots tools and services are available and can be invaluable when migrating your legacy data warehouse to the cloud. Next Pathway specializes in bridging companies between legacy and the cloud; with experience and proprietary accelerators.