12 min.

S01E01 - The Data Catalog is dead, long live the Data Catalog‪!‬ DatActionable

    • Management

Since 90’s, IT teams aims to know where are data in the information system. Since 30 years, they are struggling in demonstrating value of such catalog to businesses and ensure recurring budget for it. Like Data Governance itself, this topic should be a business topic, not an IT topic. Creating value with data comes with responsibilities, starting by having a knowledge of this data.
What’s wrong with data catalog ?
The first comparable element between data catalog solution is the type or number of connectors available. It’s representative of the source of the problem, current data catalogs are looking to cartography at attribute level the physical tables of the information system.
To answer the business question “where is this data”, the data catalogs map all data with precision. It is as if in order to know in which districts the inhabitants of a city live, one had to keep precise addresses with street number, street, postal code and city. Don't put words in my mouth. I'm not saying that this level of granularity isn't necessary, but that it isn't always necessary. And when it is necessary, it is because there is a use case that justifies it.
To continue with this people comparison, the business question in that case is not always - almost never in fact - where are “the” people. It’s is more where is “a population” of people. So does data requests : when we want to create value with data we don’t necessarily need all data of one type but only a subpart. If we want to have businesses asking data with a business wording and not database table name, we must have a data catalog ready for this.
Let’s stop this people comparison because data is not people. One of its superpower is ubiquity (I’ve heard about a guy with this superpower but it as another story). To be precise, even data doesn’t have this superpower : when it is duplicated, its quality - especially related to timeliness - changes. By the way, it shakes the concept of looking for the source of truth : business is in fact looking for the good source for the purpose intended. The definition of good data would be more data at quality … and at cost, rather that a quest for the truth.
I’ll finish with a last point, even it is not really last one : four is enough to challenge current principle of data catalogs. How do you know that your data catalogue is complete ? Since it was initially designed for databases, making the shortcut dataset equal a table, we miss other nature of datasets : unstructured documents, videos, images and we don’t talk about hidden data in free text areas. And even for structured data, it’s easy to miss some population of data because they are in a different information system. It typically occurs during merger and acquisition processes letting IT legacy live.
To make it short :
* The granularity of identification of location of data is not adapted for business.
* The granularity of the business data expression is not adapted for business.
* The link between business view and IT view is not adapted for business.
* The catalogue completeness is unknown so, it is not adapted for business.
How to enable a business ready data catalog ?
Start by a new way to describe data for businesses by creating “Business Object”
It all starts by business! We need a new way to express data for businesses without going into information system implementation detail. Ambition is to fulfil the following ambition :
* It should be understandable first by business : by both business accountable of the data and by the other businesses.
* It should enable the description of population of data, from a business perspective.
* It should be manageable over time, businesses expect a return on investment on the time spent to describe the data.
* It should be efficient, even if we don’t have a precise mapping, we need to know where to look.
Now forget what vendors put behind “Business Object” which is, like you may have understood, not

Since 90’s, IT teams aims to know where are data in the information system. Since 30 years, they are struggling in demonstrating value of such catalog to businesses and ensure recurring budget for it. Like Data Governance itself, this topic should be a business topic, not an IT topic. Creating value with data comes with responsibilities, starting by having a knowledge of this data.
What’s wrong with data catalog ?
The first comparable element between data catalog solution is the type or number of connectors available. It’s representative of the source of the problem, current data catalogs are looking to cartography at attribute level the physical tables of the information system.
To answer the business question “where is this data”, the data catalogs map all data with precision. It is as if in order to know in which districts the inhabitants of a city live, one had to keep precise addresses with street number, street, postal code and city. Don't put words in my mouth. I'm not saying that this level of granularity isn't necessary, but that it isn't always necessary. And when it is necessary, it is because there is a use case that justifies it.
To continue with this people comparison, the business question in that case is not always - almost never in fact - where are “the” people. It’s is more where is “a population” of people. So does data requests : when we want to create value with data we don’t necessarily need all data of one type but only a subpart. If we want to have businesses asking data with a business wording and not database table name, we must have a data catalog ready for this.
Let’s stop this people comparison because data is not people. One of its superpower is ubiquity (I’ve heard about a guy with this superpower but it as another story). To be precise, even data doesn’t have this superpower : when it is duplicated, its quality - especially related to timeliness - changes. By the way, it shakes the concept of looking for the source of truth : business is in fact looking for the good source for the purpose intended. The definition of good data would be more data at quality … and at cost, rather that a quest for the truth.
I’ll finish with a last point, even it is not really last one : four is enough to challenge current principle of data catalogs. How do you know that your data catalogue is complete ? Since it was initially designed for databases, making the shortcut dataset equal a table, we miss other nature of datasets : unstructured documents, videos, images and we don’t talk about hidden data in free text areas. And even for structured data, it’s easy to miss some population of data because they are in a different information system. It typically occurs during merger and acquisition processes letting IT legacy live.
To make it short :
* The granularity of identification of location of data is not adapted for business.
* The granularity of the business data expression is not adapted for business.
* The link between business view and IT view is not adapted for business.
* The catalogue completeness is unknown so, it is not adapted for business.
How to enable a business ready data catalog ?
Start by a new way to describe data for businesses by creating “Business Object”
It all starts by business! We need a new way to express data for businesses without going into information system implementation detail. Ambition is to fulfil the following ambition :
* It should be understandable first by business : by both business accountable of the data and by the other businesses.
* It should enable the description of population of data, from a business perspective.
* It should be manageable over time, businesses expect a return on investment on the time spent to describe the data.
* It should be efficient, even if we don’t have a precise mapping, we need to know where to look.
Now forget what vendors put behind “Business Object” which is, like you may have understood, not

12 min.