DatActionable

Frederic BERNARD-PAYEN

0,0 (0)
QUẢN LÝ

I have designed & deployed enterprise level data governance framework across large corporation. On datActionable, I share best practices. datactionable.substack.com

16/04/2021

S01E04 - The fall of the Kingdom of Process and the birth of the Data Federation.

Business functions have now only one word in mind, Data (usually followed by Artificial Intelligence). However, unless you are in a company selling a data product or a data based service, the business is not data itself. It can be a tangible item or digital delivery - like software. Your company is not (or not mainly, or not yet at all) monetising its data, but selling a product or a service. Let’s step back looking at your company. Considering it as a system, a black box. Your customer has requirements and your company provides him the product fulfilling this requirements. Of course, some data can transit from the customer to your company or in the reverse way, but it is still a negligible part of the data you have in your company. What is happening inside is you know-how and you customer doesn’t really care of it, at least until your cost, quality and delivery time are not killing the value he was expecting. Since decades, pushed by standardisation initiatives in the 90’s like ISO 9000 norms, company have documented their activity thanks to process and people and support of some IT tools. The place of the data was limited to some inter process exchanges of deliverables, more close to paper dossier than data like we consider it today. These exchanges were mainly documented along the production chain. What we aim, is to be a data driven company, for data driven decisions, data driven reporting, data driven behaviours. Yes, your organisation will go data centric. Nevertheless, becoming a pure data company is another story. This legacy is there, pretending it doesn't exist is a mistake. We need to enroot data in this ecosystem. Yes, the process kingdom is falling. It doesn’t mean we need to kill the process. Connecting the dots between data catalogue concepts and company processes. So, connecting with quality team of simply by consulting your internal portal, you may find this referential of processes. You should find something like this. Processes, Activities, Tasks : from a pure ISO 9000 perspective, you have set of interrelated or interacting activities that transform inputs into outputs. Sometimes the outputs take the name of deliverables. I won’t deep dive here, but I will insist on the fact these are conceptual. Like the Business Objects and Business Objects Views. Nevertheless, in term of granularity, I have at least some convictions that will frame the model I propose : * For Tasks, they should be associated to limited roles and be executed in a continuous time frame. Interruption can occur, but it is not the standard behaviour. * On one other hand, Activities group these tasks. Activity may have a deliverable for which Tasks were contributors. * Finally, Processes are grouping and sequencing the activities. You should have a process owner. Most of the time this process owner is attached to the organisation executing this process. It will help deployment if your level of granularity of process permits to do such allocation, identifying your data stewards and link them with their data officer. The level of granularity of process is so high that, the only data catalogue concept you can link, is Business Objects. As defined in the Episode 1. It won’t give you a lot of information but, at least, it will help to allocate data sensitivity requirements we have seen in Episode 3. To have a finer level of granularity, we need to go to activity level and link them with the Business Object Views. Activities would create, enrich or simply use the Business Object Views. Tasks should interact too with Business Objects Views, which are naturally the one aggregated at Activity level. What to do with Deliverables of you Business Processes ? The last element I mentioned, in the process repository, was the Deliverables. My opinion is that they are similar in term of granularity with the Business Object Views. Nevertheless, they have been designed a long time before the start of the company data journey. So, it is difficult to have a one to one correspondence. I faced several issues when trying to reconcile them : * They are often document oriented, mixing several concepts. * They have a specific name, like if they were a concept … but are in fact a name given to a state of a concept. * We miss some Business Objects Views, when deliverable are described at process level. It typically hides the inter-activities deliverables. * Done in a top-down approach, the vocabulary is sometimes different from the user one. Depending on your ambition, and on the budget you have, I would suggest two paths : * Either do, for now, a simple mapping matrix between deliverables and one, or many, Business Objects Views. * Or, dusting off the process referential by replacing all Deliverables with Business Objects Views. But it is a long way… Completing the view by adding activities relationship with IT. A last thing, you need to think about, is complementary mapping of Activities or Tasks to IT levels. As the Business Object View is pointing to a Dataset, thanks to Dataset Index, Activity should point to (Dataset) Processing. I’m not adding complexity for the pleasure… Considering the architectural possibilities of today, analysing an application as an indivisible whole won’t give you the flexibility of deployment versus data compliance. Let’s imagine a simple case with an application manipulating both sensitive and non sensitive data. If you want to use Function As A Service platform (like Lambda) to process the non sensitive data, and you don’t have this level of granularity of the description, you will enforce sensitive requirements on the Function As A Service. That will lead to over costs or even show stopper. With evolution of privacy laws, I let you imagine the over constraints you can create. I guess you would like to have now a cheat sheet with the concepts and their relations. Let me think about that for subscribers ! Leveraging this connection to rethink the data access and data usage policies. Since decades, yes decades, we are managing group of users of applications. And we are dying of that. People change of job within the company, nobody takes care of removing your access to application … which is not a big deal we say to ourselves. We know that the person stays in the company, his access will be removed if he leaves. But… some data requirements have changed. Your new job purpose doesn’t justify at all the access to personal data you have, thru the application you used ten years ago… In parallel, these access were given from a pure need to know perspective, when you needed to justify why for everything. This is at the opposite of what we try to do : open the data by consent. We try to have a graduated answer to the risks and some data can be available to all employee without any risks. So, today, some people don’t have access to rather open data they could use to create value, while they have access to data they don’t need, and which may create risks. We can change that thanks to a shift to a data centric approach and a mature business process management. If you have all this season articles in mind, we have all the elements. People are working for a company purpose. People job is executing some activities and tasks. These activities and tasks consume and produce Datasets corresponding to Business Objects Views. It’s a no brainer for those people to be able to access and, or, process those data. We need to ensure data requirements, especially compliance one, are known and are fulfilled for the activity. It includes fulfilment on IT systems security, on activity itself and on the person executing the activity. More over, if we don’t have requirements restricting the diffusion of the data within the company, we don’t need to restrict the access to this activity. The idea is to find how far the dataset can be shared, thanks to the requirements coming with the associated Business Object View. * If the Business Object View has no sharing and processing restriction, we have our company open data. * If the business object view has some sharing restrictions, we try to find the higher level of granularity for which it can be opened : Task ? Activity ? Process ? Group of Processes ? and will call it “Purpose”. At the end, anybody working for a Purpose, has access to the data necessary for the purpose, plus to the company open data. New use cases are new company purposes If somebody has a new use case, and need some datasets limited to a purpose, he can request corresponding Business Object Views to Data Officers, who will do an ad-hoc risk analysis. The access may be granted for this new purpose or not. For POC it will be temporary. If it goes as industrial solution, the access granting would have to be documented by linking the new or modified Activity or Task as consumer of the Business Object View. I let you re-read the episode 2 to define complementary responsibilities of Data Stewards to manage these high level right policies, keeping them at Business Level. These business rules, combined with the attachment to people to purposes, will have to be transformed in technical rules for actual accesses. Here we have some exploration area for automation, and the business case is there for sure when you seen how much you spend to manage individual people access rights. What is next ? That was the season 1 finale. We have an approach to lever the data knowledge and use it to open the data within the company. It encompasses the whole business knowledge of the data and connect the dots with compliance, processes and IT. It doesn’t answer what we will do with these data. We can do a lot of things. What if you have discovered Uranium, would you have use it for power plant, weaponry … or a glow in the dark table for your living-room ? Subscribe now to receive season 2 premiere directly in your inbox … if the show is renewed ;) This is a public episode. If you would like to discuss this wi

11 phút
09/04/2021

S01E03 - The Data Compliance Quest

I was looking for an introduction for this episode and found this quote : The rules have changed. There's a fine line between right and wrong. And, somewhere in the shadows, they send us in to find it. It made me smile and, I’m sure it will make you smile if you have the reference. Don’t ask me how my search led me to this text… and let’s come back to our topic. Yes, the data rules have changed and they will continue to change in the near future. The data related regulations like GDPR have impacted the usage of data in business, up to our private usage. If you are conducting a data ecosystem watch, you may have heard about such coming changes. I will mention the proposal of regulations which will impact data, like Data Governance Act or Digital Services Act. Of course, depending on your business sector, you may have other laws, regulations and other constraints leading to requirements on data. On top of these external requests, your own company requirements are also putting your data under constraints. I talk typically about your internal data security classification. Understanding the data compliance complexity. The data compliance requirements usually finish in the security thematic and give you constraints in terms of protection of confidentiality, integrity, availability or traceability. You have now the data openness dilemma, data in jail or people in jail ? If you open all datasets, you will have people in jail, or at least, you will have to pay fines. If you close too much the datasets, putting data in jail, people won’t be able to create value with data. It sounds familiar to you ? Of course, we need to consider as datasets not only tables well structured in a database. It encompasses also the free texts in such tables but also your office documents or even videos. To achieve this quest, you need to be able to know the risks coming with your data. You need an effective system when you see the impact. You also need an efficient system because nobody wants to pay for this enabler. Especially, when having in mind that rules will change, we are talking about recurring costs. First observation : a dataset is sensible for a reason. Let’s illustrate this with a fictive dataset : a database table containing the products of the company, especially the following informations : * Name of the product. * Recommended retail price. * List of ingredients on the package. * Colour. * Dimensions. * Production recipe. * Production costs. * Minimal acceptable price. * Responsible of the production. * Some photos of the product. Just reading it, you can feel this dataset is sensible. You also feel that only some piece of information are sensible, not all : * Some informations are public by nature, like the name of the product, recommended retail price, dimensions or colour. * For the production costs, it’s clear. You don’t want them to go to your competition. So does for the minimal acceptable price. * For the production recipe, it’s not so obvious. On your latest product, you want to protect it. For an older product for which the patent is outdated, not really. * About the person responsible of the production, you face another sensibility axis : privacy. You have to protect it even if, from a pure business perspective, before GDPR, it wasn’t really coming with a big risk. * For photo of the product, you open potentially a pandora box. Do the photos are illustratives or can give some manufacturing secrets ? I won’t deep dive on this last one today … If you consider only the dataset as an indivisible whole, it will be marked as “confidential plus privacy”. It will block the data analyst working on ingredients. Annoying, isn’t it ? It can be even worse if we mark the dataset as “confidential”, hiding the privacy topic under the same axis. In that case, if privacy law changes, we don’t even know how we are impacted, and we need to reopen all datasets marked as “confidential” to check. Second observation : Data compliance requirements are business expressed, not IT. Based on the first observation, the temptation would be to tag directly in the dataset each column instead of the whole dataset. Good try, but bad idea… The first problem, especially occurring in companies with a long history of their IT legacy, is the heterogeneity of your information system. Let’s come back on the minimal acceptable price of the example. In one application, this information could be represented by one column : price without tax. When, in another application, it could be two columns, with and without taxes. The business expresses that the minimal acceptable price is confidential, whatever is its representation in the information system. The second problem you will face is that even for a column, not all records may have the same sensibility. I remind you the example of the production recipe : depending on the patent status, the answer will be different. So, you need to tag column for a set of lines. Ouch. The third problem that should completely chill you is the size of the obstacle to overcome. Just do a simple calculation : the number of applications multiplied by the average number of tables multiplied by 30 minutes, let's be ambitious about our productivity. You are a big company and you just find half a million hours ? 58 years and … 14 days ? If you have 200 people ready to spend half of their time this year on it, feel free to try. Some will tell me that thanks to Artificial Intelligence we can reduce that time, I’m sure of it. I will answer, why not … if you are able to provide training datasets for all your cases. I’m sure there is a more efficient usage, I’ll describe it in the last part. Third observation : Wherever the data is located, the data compliance requirements are the same. To make matters worse, dataset is, for good or bad reasons, replicated in your information system. The product table I used for the previous example could be in the Customer relationship management system, in the data hub of the sales domain, in the company data lake but also (partially) replicated in the list of product used for you online store and, of course, in several backups. Can you imagine the rework if you need to update sensibility tags ? Typically when a patent is deprecated or if the privacy law changes or, if, don’t ask me why, a regulation is impacting the confidentiality of ingredient lists ? Reducing the data compliance knowledge complexity. If you have read the first episode of this season, I guess you have already connected the dots with Business Object View concept. If not, I invite you to read or re read it to make you familiar with the concept. By design, this notion of Business Object view is there to fulfil this requirement on data compliance knowledge. By playing with characteristics, business contexts and states we can express the population of data which are sensible and why. The idea is only to document the applicable data compliance requirements and, thus, prepare the feeding of a sensibility tagging engine. It may look like something like this : * The production cost of the product is confidential. * The minimal acceptable price of the product is confidential. * The production recipe of product which are under patent is confidential. * The person responsible of the production of product is personal data. Any population of data combining these minimalistic business object view inherits of classification. All other populations of data are not sensible. Enriching the data compliance knowledge. As you may have understood reading me, I push Data Governance as the balance between Data Business Value and Data Responsibility. I believe it’s also the moto that should drive the enrichment of this business centric way of consolidating data compliance knowledge. From a value creation perspective, before a dataset usage, the Data Officer will have to ensure the sensibility is assessed. By opening the dataset, its population will be documented as a Business Object View and a Dataset Index. Existing classification rules will be completed if necessary. From a responsibility point of view, thanks to the list of classification rules, Data Officers can interpolate most sensible Business Objects Views. Thanks to Lead Data Architects, we can focus our effort on the Datasets representing these Business Objects Views, finding them, protecting them. Accelerating the tagging thanks to artificial intelligence. Before going further, I would like to precise some vocabulary : data sensitivity classification versus data sensitivity tagging versus data sensitivity labelling. At least the vocabulary I push, fell free to react in comments. Data sensitivity classification : this concept is at Business Object View level, it is the rule of cascading sensitivity requirement on some populations of data. When I wrote in the example “The production cost of the product is confidential”, it was the classification of this Business Object View. These information changes rather slowly, at the speed of law and regulations or, sometimes, at the one of the political-economic context. Data sensitivity tagging : this concept is at dataset level, it represents the application of the classification rule. It take the form of meta data attached to the dataset. This information changes at a higher speed than the classification, following the lifecycle of the dataset. An example I like is the annual financial report, which is confidential until disclosure and public when published. The classification is almost set in stone, when the tagging of the current annual financial report will change in one second, the day of its publication. Data sensitivity labelling : this last concept is also at dataset level. It is a visual representation of some of the tagging when it is required. You remember the last James Bond and the big stamp “Top Secret” ? Do you remember any film with a stamp “subject to GDPR” ? You get the

12 phút
02/04/2021

S01E02 - Data Catalog the orphan, looking for a family

Previously on datActionable. We have seen the necessity to have a data catalog with a new business perspective to govern the data. I insist on this point, the need is driven by the governance of data not by management of data. The businesses need to take decisions on the data to be able to create value while ensuring the company protection. Typically, business needs to know : * where are the datasets to serve them for analytics team. * where are datasets impacted by privacy law, for current compliance but also to analyse impact of a change in the law. And privacy is just an axis of criticality among others. * what are the datasets the most critical for his business, to protect them from a loss of confidentiality but also to prioritise them in the disaster recovery plan. * what are the data to put data quality effort on, at a company level not at project local level. I’ll propose in this article a pattern of organisation with elements for both business function and IT side. For those reading a lot about data mesh : yes, it is data mesh ready. The challenge to onboard business. The potential value will be created by the business with data, as the potential loss will be supported by the business. We need them onboard. This balance of data valorisation with data responsibility is the main enabler of the transformation to a data centric company. The question is now how to identify in the business functions the holders of this topic, at the scale of the company. Redefining the data steward role and complete it. The dream would be to find people able to handle both data governance and data management perspectives. And of course, people with time to spend on these topics. Find one four-leaf clover is great, finding the dozens you need is another story. If we want to rely on people in place and their knowledge, my conviction is that we should not ask these experts to change drastically their perimeter. So, we must be realistic in the role of the Data Stewards and complete them with another role, more focused on data management - I like to call this latest role Data Architects. I won’t give the full view of role and responsibilities in this article and focus only on the one related to the data catalog. The accountabilities of the respective roles would be the following : Data Steward, who have high business knowledge and good IT acumen, are in charge of : * documenting the Business Objects. * documenting the restrictions coming with the data, for compliance to external or internal requirements. * documenting the business requirements on the data, especially the data quality requirements. Data Architect, who have high data management knowledge and good business acumen, are in charge of : * documenting the data related informations of applications. * documenting the Dataset Index. You may have noticed that I don’t make one of them accountable of the actual usage rights of the data nor the decision on data quality improvement. I reserve this decision to another role : Data Officer. We will see that in episode 3. Federating Data Steward and Data Architect. Like mentioned just before, the number of people endorsing Data Steward and Data Architect can raise quickly in the company. Especially if you want to data enable people already in place in the company and you have, by design, not full time people. Orchestrating this network from an unique central point can’t be efficient : even if we want to break silos, there are some clusters of people dealing regularly together. The idea is to group them by domain, with a domain lead. This domain lead will functionally report to a central data governance body. Here we have the Data Officer role. As for Data Steward, four-leaf clover are not everywhere. I propose to have a similar approach by introducing a Lead Data Architect role. As Data Architect and Data Steward, the Data Officer and Lead Data Architect will act as a complementary couple to ensure the construction of the Data Catalog. The Data Officer will push for the Business perspective and the Lead Data Architect for the IT perspective. The Data Governance Manager, facilitator of the Data Offices. By grouping Data Steward by domains thru Data Officers, we manage the size of the community but it has its drawback : we may have created another type of silo. To avoid this risk, here come the Data Governance Manager role. This role could be endorsed by the Chief Data Officer depending on your organisation … or not. But the person in charge must at least report to him. Did I talk about four-leaf clover ? This time it doesn’t work! We are looking for only one person, it is manageable. Nevertheless, to ensure consistency, I propose to introduce the role of Enterprise Data Architect, especially to manage the Lead Data Architect network. Of course, we are only talking of role, one person can handle both if it is possible and more convenient in your organisation. Or, at least, to start the deployment. This new couple will have the accountability to federate the network. It will have also a facilitation role for the Business Objects which are multi domain. Finally, it will operationally support the Data Officers and Lead Data Architect, acting as expert on Data Governance topics and, thus, preparing the future standardization of the activity. We have now the federated network principle. We need now to deploy it in the company and deploy it for long term. How to enroot those roles in the company ? The first temptation is to rely on the organisations. The weakness of organisations is that they change over time. So we need to find a more stable network to rely on. My thoughts are, since our companies have quality system, the processes are the good entry point. Connecting the roles and the people. You should be able to close Business Objects, more specifically Business Objects Views, with the deliverable of the processes. I say only close with. Since, at the time of the design of your processes, the culture of your company was more based on the following triptych: People, Process and Technology rather than Data based. In all case, it is what we already have and it’s a good start to work on the first version of your business oriented Data Catalog. With these principles, here is how to find your network version one : * Data Steward, they are the voice of the process owners on the data topic. They are nominated by the process owners. * Data Architect, we can start by the architect of the applications/products supporting these processes. * Data Officer, they are closing the gap between organisation and process on data. Nominated by the Head Of Business Functions. Their scope is defined by the process under the responsibility of the function. Interesting thing, you may already find bridges between organisation there : organisation have changed, not the processes. * Lead Data Architect, as peer of Data Officer, they are located on the IT side supporting the Business Function. Since IT organisation may differs a lot between companies, I can’t say more than that. * Data Governance Manager, nominated by the Chief Data Officer… or the Chief Data Officer himself like mentioned before. * Enterprise Data Architect, in the Enterprise Architecture team. He can be nominated by the Enterprise Architect, again it depends on your IT organisation. How to manage worldwide deployment ? Again, by referring to your quality system, you should find the path. If your company is fully integrated, you may tackle the topic by having delegates of the network in other regions, starting by a Data Governance Manager delegate. By putting a shadow organisation in region while having a delegation and not a duplication of role, you reach two objectives. You gain consistency in your global deployment and therefore ensure the consistency of the catalog. You also have a feedback loop from other regions. They will contribute to the maturity of the catalog by introducing the local constraints. If you are not so much integrated, typically if your merger and acquisition process is not aiming to fully integrate everything, you need a new level in the role federation. You won’t have the possibility of transversal delegation and may have to deploy in your division and affiliates a complete network, starting by a Data Governance Manager. However, consistency remains important for the global picture and synergies will reduce the global bill. In that situation, you will have to orchestrate your Data Governance Managers and a new role is necessary. I let you find a name for your company … Global Data Governance Manager ? Data Governance Officer ? Start small, grow fast. You have now in mind the target landscape. How can you start its creation ? Start by a simple dictionary. Since the beginning of this article I’m talking about roles. They are obviously not full time equivalent, and it is not even the target. Like I said previously, we don’t aim to have full time Data Stewards or Data Architects role to ensure connection with the company legacy and usual activity. The objective is to avoid having a parallel data organisation, siloed from the business as usual. What I suggest to start, is even not to have Data Steward until you have the basics: a list of potential Business Objects name organised in domains. Don’t try to do be one hundred percent right, don’t try to build it with systematic interview with businesses : simply gather what is already existing and publish it. The option I took some years ago, was to have a three months time boxed deliverable with one objective, having a list of 350 Business Objects representing the whole company activity. A Business Object Dictionary. As input, I given access to heterogeneous models coming from IT, access to the processes, access to the intranet for search, access to enterprise architects and access to local organisations already dealing with data. We obtained the first company wide Business Object Dictionary, a simp

12 phút
26/03/2021

S01E01 - The Data Catalog is dead, long live the Data Catalog!

Since 90’s, IT teams aims to know where are data in the information system. Since 30 years, they are struggling in demonstrating value of such catalog to businesses and ensure recurring budget for it. Like Data Governance itself, this topic should be a business topic, not an IT topic. Creating value with data comes with responsibilities, starting by having a knowledge of this data. What’s wrong with data catalog ? The first comparable element between data catalog solution is the type or number of connectors available. It’s representative of the source of the problem, current data catalogs are looking to cartography at attribute level the physical tables of the information system. To answer the business question “where is this data”, the data catalogs map all data with precision. It is as if in order to know in which districts the inhabitants of a city live, one had to keep precise addresses with street number, street, postal code and city. Don't put words in my mouth. I'm not saying that this level of granularity isn't necessary, but that it isn't always necessary. And when it is necessary, it is because there is a use case that justifies it. To continue with this people comparison, the business question in that case is not always - almost never in fact - where are “the” people. It’s is more where is “a population” of people. So does data requests : when we want to create value with data we don’t necessarily need all data of one type but only a subpart. If we want to have businesses asking data with a business wording and not database table name, we must have a data catalog ready for this. Let’s stop this people comparison because data is not people. One of its superpower is ubiquity (I’ve heard about a guy with this superpower but it as another story). To be precise, even data doesn’t have this superpower : when it is duplicated, its quality - especially related to timeliness - changes. By the way, it shakes the concept of looking for the source of truth : business is in fact looking for the good source for the purpose intended. The definition of good data would be more data at quality … and at cost, rather that a quest for the truth. I’ll finish with a last point, even it is not really last one : four is enough to challenge current principle of data catalogs. How do you know that your data catalogue is complete ? Since it was initially designed for databases, making the shortcut dataset equal a table, we miss other nature of datasets : unstructured documents, videos, images and we don’t talk about hidden data in free text areas. And even for structured data, it’s easy to miss some population of data because they are in a different information system. It typically occurs during merger and acquisition processes letting IT legacy live. To make it short : * The granularity of identification of location of data is not adapted for business. * The granularity of the business data expression is not adapted for business. * The link between business view and IT view is not adapted for business. * The catalogue completeness is unknown so, it is not adapted for business. How to enable a business ready data catalog ? Start by a new way to describe data for businesses by creating “Business Object” It all starts by business! We need a new way to express data for businesses without going into information system implementation detail. Ambition is to fulfil the following ambition : * It should be understandable first by business : by both business accountable of the data and by the other businesses. * It should enable the description of population of data, from a business perspective. * It should be manageable over time, businesses expect a return on investment on the time spent to describe the data. * It should be efficient, even if we don’t have a precise mapping, we need to know where to look. Now forget what vendors put behind “Business Object” which is, like you may have understood, not an IT representation of something. Let’s introduce a notion for business, like the name implies it. A Business Object starts by a Name and a Description. Nothing revolutionary. Business Object Characteristics. First new notion to add is Characteristics. The word is important by itself, characteristic, not attribute. It’s mandatory to close the natural tendency to go directly to the fields of a database. As an example, the "Price” of a product can be a characteristic while its representation in one information system will be 2 attributes (eg. price and currency) … and 4 attribute in another one (eg. price excl. tax, tax rate, currency) or dozen in a last one managing discount by quantity for example. If we don’t do this abstraction step, we drown in the details and between experts discussions. Unfortunately, it can’t be so simple, especially to manage data handled by several domains. I propose to introduce also a notion of Set of Characteristics, simply grouping and naming a group of Characteristics. I won’t deep dive on this topic today but you can imagine a business object managed by two successive processes, the first one in charge of scheduling it and the other of execution. Business Object applicable business contexts. Second key notion would be to manage the population of data. And what is making sense for the businesses are the ApplicableBusiness Contexts. Designing these business contexts requires to step back while looking at your company : what are its products, how to classify them, is there some classifications in the market, how the company is deployed worldwide, etc. This axis is really specific to your company, you may have commonalities between business objects but also specificities. As example, product range can be an axis, region another, or factory sites a last one … and factory sites may have a sense only for manufacturing domain, not for engineering one. Finally, keep in mind that we are looking for applicable business contexts for your company. They give the borders of data population you have to consider within your company. If I come back on factory in my previous example, we need the list of applicable factories for the company, neither factories of competition nor potential factory one day. We look for the current applicable contexts. Business object business states. As a complementary concept to be used to identify data population, the Business States is really close to the nature of the business object. Of course, we may have group it with applicable business contexts but the business state is really specific to each business objects and it really talk to business. Having it separately ensure that the business ask itself the question. It’s key to understand we are talking about business states, not technical states. It’s neither a “CRUD” (Create Read Update Delete) concept nor the lifecycle of datasets. As example, the business states of a “Task” could be : identified, designed, planed, executed, verified, closed. What do we have with business objects definition ? Nothing but the beginning of knowledge! We have laid the foundations for business to describe data in a top (business) down (IT) approach. But we are still far from the data itself. To close the gap between business and IT view, we need now an intermediary concept we will call Business Object View and describe just below. Continue by describing population of data with Business Object Views. We have defined a taxonomy to describe both the Business Object itself but also possible population of data. The wording I have chosen is Business Object View. Again it has to be considered as a new vocabulary for businesses. It’s important not to mix this concept with a view on a table that your database admin has in mind. Defining Business Object views. The idea is to be able to express a population of data. Thanks to the concepts we have seen before we can express them, starting with the whole possible population in the company which can be expressed by “A dataset representing all the Characteristics in all Business Contexts and Business states of a considered business object”. It’s quite easy then to define sub population of data by playing with the 3 components of the business objects : the characteristics, the business contexts and the states. For example, I’m able now to consider not all the products of the company but a population. The name and the composition (characteristics) of the product (business object) for the European market (business context) which are currently sold (state). The interest is multiple. The business object view has multiple interests and can be enhanced but that require a dedicated article. We will go further later. For now, let’s continue with the data catalog usage. Redefine the way to link business and IT view. I’m sure you closed the gap between this concept and the IT cartography : a specific information system contains a population of data. The business object view gives a necessary refinement to connect the dots. Necessary, yes, sufficient ... no. We need to refine the link itself. Defining the dataset index. As I mentioned before, data has - almost - the superpower of ubiquity. I say almost because of the evoked change of quality. Yes, the population of data in the authoring tool is the same you loaded in your data lake - I hope - but they don’t have the same freshness. So, if you want to link the business object view and the dataset you need to specify the quality on the link - link I propose to call datasetindex as reference to the index of a book … index to find where is the information in the book. Unfortunately this complementary specification of the index is not sufficient because of architecture of the information system. Sometimes, a specific table contains a mix of business objects which are close from a conceptual point of view but really far from a business point of view. Simple example, a table containing

13 phút

4 tập

I have designed & deployed enterprise level data governance framework across large corporation. On datActionable, I share best practices. datactionable.substack.com

Nhà sáng tạo

Frederic BERNARD-PAYEN
Năm hoạt động

2 N
Tập

4
Xếp hạng

Sạch
Trang web chương trình

DatActionable

DatActionable

Tập

S01E04 - The fall of the Kingdom of Process and the birth of the Data Federation.

S01E03 - The Data Compliance Quest

S01E02 - Data Catalog the orphan, looking for a family

S01E01 - The Data Catalog is dead, long live the Data Catalog!

Giới Thiệu

Thông Tin