<img alt="" src="https://secure.rate8deny.com/219096.png" style="display:none;">
Skip to content

Understanding Data Through Metadata

Organizations embarking on Big Data initiatives aim to extract meaningful insights from ever-larger volumes of data from more heterogeneous sources. New technologies have made it easier and cheaper to host large volumes of data and process them at higher rates.
metadata

Embarking on Big Data Initiatives

However the challenge still remains to accurately align the understanding of the data across sources, Big Data environments, and data consumers. This is where the importance of metadata comes to prominence.

A clear understanding of the data is critical in preventing your Data Lake from turning into a “data swamp.” Properly understanding data requires understanding its structure, meaning, and operational constraints. Building and maintaining this understanding is the main objective of the Metadata Management discipline.

Types of Metadata

Technical Metadata

Supporting the understanding of data from a structural perspective, technical metadata is used to capture details of physical structure and representation (e.g. in databases, files, or messages) in terms of:

  • type,
  • size,
  • precision,
  • relationships,
  • referential constraints,
  • transformations, and
  • access permissions

In general, technical metadata takes the form of database catalogs, XML schemas, ETL job definitions, etc. Data models and dictionaries are design-time representations of the technical metadata that also incorporate business meaningful descriptions of individual data elements.

Business Metadata
The source of data element definitions is business metadata, which consists of glossaries of terms. Terms are associated with data elements to convey their meaning. To be effective, glossaries must be more than just simple lists of terms with their definitions. Glossaries need to employ a classification methodology that places terms in taxonomies. This helps ensure that concepts, including their descriptors and relationships, are consistently identified, independent of the lexical constructs of the business definitions.

Operational Metadata
To measure and improve the effectiveness of the data related processes in the data lake, operational metadata is needed to quantitatively and qualitatively describe the

  • volumes,
  • availability,
  • validity,
  • accuracy,
  • timeliness, and
  • access of the data.

Metadata Management Practice

A proper Metadata Management practice needs to include all three perspectives and requires the concerted participation of the business, technology, and operations organizations in order to be successful.

Metadata is the heart of any successful data project today, and the lack of importance placed on metadata is the reason why many first generation big data initiatives have failed.