What You Need to Know Before You Move Code and Data to the Cloud

Over the past few years, we’ve witnessed the top 3 Cloud providers sprint away from the pack and add new capabilities at astonishing rates. Each has their own unique features that lend themselves to specific use cases. However, they all offer the irresistible value proposition of cheaper storage and compute power. No on-prem solution can compare to the savings that Cloud can offer. Hence, the strong movement of data and code to the Cloud is quickly transitioning from a trickle to a flood.

However, this migration is turning out to be harder than it sounds. Moving data and code must be well planned before executing.

First and foremost, you need to appreciate that migrating data and code from an Enterprise Data Warehouse (EDW) to a Big Data (Enterprise Data Lake) is not a simple “lift and shift” exercise; it truly is an architectural change.

EDW migration to Big Data is a migration from
Legacy Architecture to a Modern Big Data Architecture

The overall migration plan involves 3 elements:

  1. Architectural Assessment & Design
  2. Conversion – Schema, Code & Data
  3. Performance Tuning

1 – Architecture Assessment & Design

An EDW migration (code and data) to a Big Data Environment (Cloud or otherwise) is an exercise in migrating from a legacy architecture to a modern architecture.
We recommend that you design your target architecture with the following considerations:

  • Historical data loads
  • Preserve existing data integration patterns and data structures
  • Current SLA requirements are maintained
  • Flexibility for future requirements
  • Automation in the migration of code and data
  • Cloud-based migration

2 – Conversion: Schema, Code & Data

The conversion is of particular significance because it is complex in nature. This is especially true for code migrations where there will be multiple views, built-in functions, and thousands of tables.

You need to consider the structure of the statements; some will be simple statements, others moderate and some will be complex. The complexity of the statement drives the work effort to rewrite the language statement in another language (such as Hive, Spark). For example, a developer can convert 1 simple statement in 1 hour (minimum effort for a task) and a moderate statement about 2-4 hours and a complex statement can take even a couple of days.

Critical success factors of a migration effort include:

  1. Automation – this is key. The process of migration is very time-consuming. Look for tools to migrate both the ingestion of the data and the migration of the code.
  2. Metadata Capture – if you are unable to capture metadata, you will be unable to perform analytics and unable to produce the necessary lineage required for regulatory reporting.

3 – Performance Tuning

Performance tuning is the work effort to optimize both the code and the target environment.

Performance is often overlooked, probably because optimizing code requires highly technical staff and predicting the performance of code in the target environment is difficult.

Performance tuning involves several techniques, mainly gathered from years of experience and trial/error. There is no easy “one size fits all” method, you need to select the right approach and technology partner who can provide the appropriate experience. Otherwise, you risk migrating a lot of code and data, but it will run slow.

Next Pathway’s Migration Accelerator Toolkit contains
all the necessary components for a successful migration

Our Migration Accelerator Toolkit contains all the essential technologies and frameworks that are required to ensure a successful migration. It includes the following:

 

Cornerstone is a fully automated, metadata-driven data platform automating the ingestion, technical standardization, security, metadata capture and lineage of data into Big Data environments or the Cloud.

Cornerstone removes the need to manually code or write “ETLs” to move data. Enabling users to ingest all types of structured and unstructured data in batch, streaming and direct-to-database methods; and land the data in various target formats – entirely driven by metadata.

Through its self-service model, Cornerstone greatly accelerates the time to market for data consumption, without a single line of code being written.


Fuse provides the method, business data domain models, and a library of core components that automates source to business data domains mapping transformations.

Fuse persists business data domains into a physical data model that is designed to provide historical storage of data coming in from multiple operational systems.

Fuse provides a solution that deals with requirements such as conforming data from various source systems and addresses auditing, tracing of data, loading speed and resilience to change issues.


Shift is an automated code translation engine to automate the conversion of legacy code into various target languages, to run natively inside of Big Data environments, including Hadoop, Spark, R and in the Cloud.

Shift can accelerate the translation of up to 80% of the most complex code, and the output is always consistent, ensuring compliance with industry coding standards.

Shift allows organizations to minimize the cost and manual effort involved in many migration-oriented projects, including decommissioning and migrating an enterprise data warehouse to a Data Lake, source system rationalization, or modernizing legacy ETLs.

Performance Playbook

Our Performance Playbook is a collection of techniques to optimize both the migrated code and target environment. It is based on best practices from multiple migration initiatives and years of experience in Big Data tools and technologies.


To learn more about our Migration Accelerator Toolkit, and how we can assist with your migration efforts, visit us at www.nextpathway.com