Building Data Lakes that Last

Share on linkedin
Share on twitter
Share on facebook

Establishing an enterprise data lake (EDL) to deliver good value to all stakeholders is not an easy feat.  Although stakeholders have different priorities, most are looking for some combination of cost, speed, quality, security, simplicity, support, and ultimately – tangible benefits.

A typical implementation cycle goes something like this:

  • Excitement: C-level executives mandate becoming a data-driven organization. Some really cool visualizations are created for one pilot area to wow the Executives; checkbooks are opened, and business lines are jumping up to join in.
  • Growing Pains: You quickly hit a wall when the simple pilot doesn’t scale., and building a data practice, across the enterprise, creates new challenges. Business departments and their direct IT support teams are not comfortable having to move to an enterprise solution and retire the department warehouses they’ve built and come to rely upon.
  • Compliance Concerns: Security, risk, and compliance groups descend, naturally concerned about how you are going to protect all of the enterprise data. Further compounding growing business unit resistance, making a bad situation worse for those tasked with building the new enterprise data lake.
  • Stall-out Phase: Frustration grows and stakeholder support fades. Due to all the early enthusiasm, spend rates are high but the results are limited. It is harder and harder to ignore the elephant in the room: the enterprise is not getting the value promised from data.

If this sounds familiar, you’re not alone?

Maybe in your company, things played out somewhat better or worse, but almost every company I have talked to has had some variation of this scenario. In the aftermath of the stall-out, the immediate step is to fall back and punt (disband the effort and let each department go back to running their own data).

But just as punting might be the right play in a football game, it’s not a good outcome.

Enterprise data is still needed in business. 

Here are the common problems I have seen, and steps to address each:

  1. Assuming an Enterprise Data Lake is just a bigger Data Warehouse: Probably the worst mistake. Legacy warehouses, often one per department, are pretty simple whereas EDLs have a complex web of stakeholders across the business, technology, and governance units. Upfront planning is a must before entering this uncharted territory.
  2. Not including department warehouse owners and their vendors in the planning of the EDL: Typically, a new central team is given responsibility for the new lake, but each department runs their own warehouse, often with different vendor products.  Resistance is natural and needs to be dealt with head-on. Make sure to include those teams in planning, agreeing on operating models, and talking honestly about what will be retired.
  3. Not treating this as a business transformation: IT teams can be their own worst enemies. They are often ahead of the business units in understanding the potential of big data and are eager to start. But unless departments agree to the costs and benefits, prioritize the initiative, and invest in training staff to leverage the data, you will not succeed.
  4. Not addressing Risk, Compliance, and Security requirements from the start: EDLs are essentially “putting all your eggs in one basket.” Recent headlines involving the likes of Expedia, Target, Sony are just three examples that understandably have your compliance leaders worried. Recognize up front you are not going anywhere without these people fully engaged and working with you.
  5. Over-centralizing and over-architecting:  The communist “central planned economy” didn’t work too well for a reason, one big central team doing everything won’t work.  This is a huge topic all by itself, but like shared highways supporting many different vehicles; some things make sense to share and others are best left to each team.
  6. Understanding needs vary (persona-based approaches): An important but often missed item. At the macro level, everyone will say “I want to be data-driven and leverage analytics to improve my performance.” However, drill in and ask what exactly they want to do, and you will find multiple and distinct needs. Organize and prioritize your efforts based on this deeper understanding as a way to ensure you are focused on the needs of most value instead of broadly delivering capabilities no one needs.
  7. Being use-case driven instead of solving world hunger: Companies are generating unimaginable amounts of data already, with more and more coming. Businesses can dream up infinite uses of the data, but don’t fall into the trap of trying to capture everything so you can do anything. Instead, be use-case driven – what specifically is a business unit trying to do that they will invest to solve? Focus on that. Start small on your roadmap (with less risky data in less mission-critical areas) and expand it over time, with specific use cases at each step to focus teams on real needs.

Companies must learn from past mistakes, recognize and plan for issues before they become an impediment to adoption and a barrier to realizing the true value of a data lake.

Share on linkedin
Share on twitter
Share on facebook