Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding's free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn.
Transcript for this episode (link) provided by Starburst. You can download their Data Products for Dummies e-book (info-gated) here and their Data Mesh for Dummies e-book (info gated) here.
Olga's LinkedIn: https://www.linkedin.com/in/olga-maydanchik-23b3508/
Walter Shewhart - Father of Statistical Quality Control: https://en.wikipedia.org/wiki/Walter_A._Shewhart
William Edwards Deming - Father of Quality Improvement/Control: https://en.wikipedia.org/wiki/W._Edwards_Deming
Larry English - Information Quality Pioneer: https://www.cdomagazine.tech/opinion-analysis/article_da6de4b6-7127-11eb-970e-6bb1aee7a52f.html
Tom Redman - 'The Data Doc': https://www.linkedin.com/in/tomredman/
In this episode, Scott interviewed Olga Maydanchik, an Information Management Practitioner, Educator, and Evangelist.
Some key takeaways/thoughts from Olga's point of view:
- Learn your data quality history. There are people who have been fighting this good fight for 25+ years. Even for over a century if you look at statistical quality control. Don't needlessly reinvent some of it :)
- Data literacy is a very important aspect of data quality. If people don't understand the costs of bad quality, they are far less likely to care about quality.
- Data quality can be a tricky topic - if you let consumers know that the data quality isn't perfect, they can lose trust. But A) in general, that conversation is getting better/easier to have and B) we _have_ to be able to identify quality as a problem in order to fix it.
- Data quality is NOT a project - it's a continuous process.
- Even now, people are finding it hard to use the well-established data quality dimensions. It's a framework for considering/measuring/understanding data quality so it’s not very helpful to data stewards / data engineers in creating data quality rules.
- The majority of quality errors are not random, they come from faulty data mapping / bugs in pipelines. Having good quality rules will catch a large percentage of errors that can be fixed in bulk.
- When thinking about getting started around data quality, it doesn't have to be complex and with lots of tools. It can be people looking at the data for potential issues and talking to producers. Then you can build a business case for fixing the data to get funding. You have to roll up
Information
- Show
- FrequencyUpdated daily
- Published15 April 2024 at 03:15 UTC
- Length1h 2m
- Episode301
- RatingClean