The First Hurdle for “Democratized AI”; Self-Service Access to Clean/Governed Data

While attending Google Cloud Platform’s NEXT’18 conference this past week, it wasn’t a surprise that many of the sessions focused on the advanced AI and Machine Learning capabilities that GCP provides to its users. Whether it is integrated into the G-Suite platform for “smart email responses”, or launching BigQuery ML to make it easier to deploy algorithms on any data, AI is literally embedded into everything they do. This isn’t just impressive from a pure innovation perspective, but it’s also trend-setting from a change management point of view, as Google’s intent is to provide all users with access to AI technology.

During one of the keynotes, Fei Fei Li, Google’s Chief AI Scientist, brought up the concept of “Democratized AI” and what Google’s plans are to make AI available for everyone. Democratized AI has come up a lot lately during today’s AI revolution, but what does it really mean?
Per Fei Fei’s most recent blog post:

“AI is empowerment, and we want to democratize that power for everyone and every business…
AI is no longer a niche in the tech world — it’s the differentiator for businesses in every industry. And we’re committed to delivering the tools that will revolutionize them.”

To summarize, Democratized AI is the goal of enabling all people, not just data scientists, to leverage the power of AI to create more unique and value-adding experiences in their field, while removing the barriers to entry.
This utopic-AI vision is nice in theory, and Google will most certainly get there at the rate in which they are innovating. However, democratizing AI comes with an array of challenges and barriers that have to be addressed before this becomes the standard at an enterprise-level. These include:

  • Access to the right talent to leverage current-state AI tools;
  • Business-Technology alignment to agree on an AI strategy to help solve priority business problems;
  • And most importantly, access to clean and governed data.

The third point is really the first problem organizations should be solving for. I’m sure everyone is familiar (and sick of) the cliché’ “garbage in, garbage out”, but when it comes to democratizing AI, this should be taken more seriously.

Today’s self-service AI/ML tools make it very easy for anyone to deploy algorithms on their data. But deploying those solutions for enterprise-scale use cases, and not one-off pilots, need to ensure there is not bias or uncertainty in the outcomes that the AI tools provide.
For instance, when deploying Machine Learning on a set of customer data to ascertain the “next best action” of the customer, the underlying data must be consistent and standardized, without outliers or data quality errors, before you can trust it. Data integrity is as important as the algorithms you deploy on the data.

For conducting a large-scale Anti-Money Laundering (AML) initiative, being able to detect or predict suspicious activity using AI/ML is predicated on the underlying data being standardized and modeled in order to reduce the number of false positive detections.
We’ve all heard that 80% of the work effort for most AI projects is the data preparation, so it’s not surprising that many data scientists are frustrated, and thus are jumping right into deploying their tools without spending the appropriate time on data preparation.

So, what can organizations do to automate this data preparation process,
to not only clean/standardize their data, but to also accelerate getting that data into the hands of data scientists,
and “citizen” data scientists?

Automation, standardization, and governance are capabilities that are baked into our industry leading metadata-driven ingestion engine, Cornerstone®, to help alleviate “the 80%”.

Cornerstone removes the need for developers to manually write code to move data from source to target. In other words, NO ETL! This gets data into centralized locations, like an Enterprise Data Lake, to enable a self-service method to access data, by anyone. In fact, our latest release of Cornerstone 4.0 can ingest data directly into native cloud storage formats, such as GCP’s Cloud Storage.

Cornerstone also automates the standardization of data to formats that allow users to quickly deploy algorithms, without extensive data prep work. Not only that, but Cornerstone secures the data via encryption or tokenization, and enforces user-defined security classifications to ensure users can only see the data they are authorized to.

Governance is also central to Cornerstone’s processes, as it captures all the associated metadata and lineage to ensure the data has an audit trail leading back to the point of origination. This is especially important when AI/ML tools are used to solve complicated enterprise-scale projects, such as Fraud or AML, to stay compliant with regulatory policies.

Companies like Google have solved half of the equation to establish Democratized AI, but companies must solve for the other half – the access to clean and governed data – before this comes to fruition at an enterprise scale.

Next Pathway Data Pipeline