Why Most First-Generation Big Data Projects Will Fail

Data is omnipresent; almost everything we touch or work with is producing enormous amounts of data. With all the buzz and hype around big data, analytics, machine learning, and Artificial Intelligence (AI), organizations have started believing that they may fall behind the curve if they do not implement a big data project or an analytics solution. In fact, many large enterprises jumped on to the bandwagon early, and have already deployed at least a partial segment of a broader intelligence mining solution. However, like any premature implementation, these first-generation projects have not turned out to be perfect. In a research report released in December 2014, Gartner had accurately predicted that “In 2017, 60% of big data projects will fail to go beyond pilot, and will be abandoned.” The report also suggests that “Through 2019, more than 50% of data migration projects will exceed budget and timelines, and/or harm the business due to flawed strategy and execution.”

The First Time is Always a Tougher Challenge

Lack of clarity, overinflated expectations, or lack of skilled data scientists to draw meaningful insights may have been significant contributors to the 60% projects that crumbled at the pilot stage. But even if all these factors are in place now, one major hurdle that remains in the first wave big data projects is the quality of the data lakes. During the early stages of implementation, organizations prioritized their time, effort and resources in procuring cutting-edge technology solutions. Eventually, that meant that the investment in standardizing, securing and preparing the metadata was completely neglected, or not sufficient. Even today, this critical step is often overlooked in favour of procuring new AI/ML-based solutions. This is especially true in Canada, where large amounts of initial investments are being made in TOR-MTL-EDM.

Let’s take a quick look at why big data may be so critical, and what we can learn from the first-generation projects.

Why Big Data or AI Matters?

The potential of big data solutions is unlimited. A successful project starts with pointing your system to a specific need or business solution to get meaningful, cost-effective, data-driven insights. These can transform various areas of your business operations, including pricing models, market expansion, and operational efficiencies. They can also enrich your engagement with multiple stakeholders through product or service innovation, risk monitoring, compliance standards, and more.

Progressive organizations, especially those with deep pockets, are also closely monitoring the possibilities that Artificial Intelligence presents. They are all ears for how AI tech deployment can give their businesses a competitive advantage.

Whether it is AI, big data, business analytics, or any other intelligence mining solution, the idea is simple. Organizations can combine large pools of data from different sources (for example: CRM, social media and website data) and derive holistic insights that touch multiple functions. These deployments will benefit not just their profit centers, but also their support functions such as HR, Risk Management, Marketing and others.  Every business unit will be able to adopt more accurate and targeted strategies to achieve their business goals.

What Went Wrong with the First Wave of Big Data Projects

All big data and AI solutions are supported by massive repositories (also known as data lakes), that house the data from different sources in a single place. Data lakes knock down the boundaries between departments and enable sharing of critical information. This helps organizations draw meaningful, actionable insights for creative business strategies. So, what was lacking in the first wave of big data projects?

  • Insufficiencies in Data Quality: Unfortunately, the first generation of data lakes prioritized quantity over quality. A massive amount of information was added without proper governance measures. Not only did the data lack basic standardization, but organizations did not visualize its future use and failed to account for its completeness or accuracy. These insufficiencies are now showing up when business users try to generate reports that dip into these data lakes for their information. Simply put, it has become a case of garbage in, garbage out.
  • Time Consuming Data Warehousing Process: Several organizations continue to work with a traditional data warehousing model. While this model supports thousands of concurrent users and performs basic to advanced analytics, it involves a lengthy and complicated process to transform the heterogeneous metadata into desired outputs. When business users want to extract insights from the organization’s big data environment, they have to define their requirements in a spreadsheet and then develop an ETL (Extract, Transform and Load) code that preps the data for processing. In addition to being time consuming, this process also increases the users’ dependencies on their IT teams.

Learnings from Version 1.0 of Big Data Projects

Data integrity is a key step in the deployment and successful execution of any AI or business intelligence solutions.  Here are some of the crucial takeaways that organizations can implement:

  • Technical Standardization: Clean and standardize data to speed up the results and optimize costs. It will also help to keep a better control on the consumption and interpretation of information.
  • Secure the Data: Catalogue the data, create the governance structures and security protocols and introduce formal policies for controlling the where, who, what, and when of any data being consumed. After all, no organization wants its information to be untraceable or uncontrollable in these modern big data environments. 
  • Prepare the Data: Properly prepare the metadata and ensure that it is complete and accurate before it is ingested into any big data environment. A good estimation of how the organization’s information will be used will help to prevent inconsistencies in reports and avoid biased outcomes.
  • Automated Ingestion: If it is financially and technically feasible, organizations should evaluate and implement automated tools that make structured or unstructured data ingestion fast and hassle-free.

The starting point of any big data project is a thorough understanding of the business problems that an organization wants to target, or the value-add that it wishes to derive. Consequently, it is vital to invest sufficient time and resources to clean, secure and prepare the organization data before deploying any big data, analytics or AI projects.