A decade ago, Data Governance was still an obscure term on the agenda of only a few organisations. Today, the situation has changed; with over 40 Data Governance and Data Catalog solutions available, it’s not straightforward anymore for organisations to select the correct one.
In spreading this article, we hope to shed light on the issue of selecting the proper Data Governance and Data Catalog solution.
If you’re considering a Data Catalog implementation but don’t know where to start, you should join our live event on September 29th. In this event, we lay out a plug-and-play approach to Data Governance that we have shown to work. Not just in theory, but in practice.
Reminder: A Data Catalog is often presented as a webshop for data. It contains metadata (data about data) describing data sets and APIs, how they are defined, and where to find them. It also plays an integral part in the governance process. It provides features that help ensure Data Quality and compliance and that only trusted data is used for analysis. Without an in-depth knowledge of data and associated metadata, organisations cannot truly safeguard and govern their data.
What caused the acceleration in the number of tools?
Searching for and finding data and data assets like reports, APIs, data marts, etc., was and still is a massive problem in many organisations. Data is often redundant and scattered over an ever-increasing number of solutions. Even today, in the average organisation, data volumes are doubling every 18 months.
Furthermore, obtaining the correct data is often even more challenging than finding data. Data Catalogs were introduced to provide insights into where the data and data assets are located. When configured properly, they also allow the user to consume the data, whether it’s a dataset, an API, a report,….
Just like with a Data Governance solution, metadata is the underlying enabler.
The rise of the Data Catalog as a hands-on data solution for end-users was an accelerator for many organisations to consider Data Governance.
5 mistakes when selecting Data Governance and Data Catalog solution.
As longstanding experts in Data Governance and Data Catalog implementations and evaluations, our firm understands what works well and what pitfalls to avoid.
Based on our experience, we’ve listed organisations’ five main mistakes when selecting Data Governance and Data Catalog solutions.
1) Assuming a Data Catalog and a Data Dictionary are the same
-
- Ironically – while the essence of Data Governance is to understand and manage (meta)data in all aspects, there seem to be multiple definitions of the word ‘Data Catalog’.
- A Data Dictionary – mainly storing and managing technical metadata – is a different solution from a Data Catalog, but some vendors describe it as the same.
- Make sure you ask about the Data Dictionary component if you find that the whole of the solution is labelled “Data Catalog”.
2) Starting with a Data Catalog
-
- The Data Catalog is the cherry on the cake after implementing Data Governance (and data quality mgmt.).
- Many organisations set up a Data Catalog project without implementing Data Governance or Data Quality management and find out later that these are two crucial prerequisites.
3) Not every tool will suit your needs
-
- With data stewards’ and data owners’ demands increasing, the traditional Data Governance view is no longer sufficient.
- These days, the user doesn’t need to see the definition, owner, lineage, etc., of a term or concept. He also wants to see its quality level. In the Financial sector, these are often even regulatory requirements.
- Furthermore, some tools are targeted at organisations that want to work with multiple glossaries and some only with one. Depending on your industry and approach to data – one of both is more suitable
- Make sure you understand the specific technical and business needs and involve “future” requirements – as you will need AI-driven accelerators to keep track of the increasing volumes of (meta)data.
4) No integration between business and technical metadata
-
- Compare it to Google: if Google didn’t translate your search term and map it to a vast number of existing websites and trillions of keywords – it wouldn’t be successful.
- Many vendors will provide you with the homepage of Google with the ability to search for (pre-defined) terms – but without the ability to connect to the websites or keywords.
5) Does not integrate with your data platform or solution landscape
-
- Integration with your data platform is crucial – as more and more organisations use that as a starting point for metadata management.
- Some platform solutions can be hard to integrate with other data platforms.
- Furthermore – flawless integration with your data platform and sources is mandatory when you’re looking to implement data-driven integrations, events or pipelines.
While much more can be said at this point, we have learned that paying attention to these points helps mitigate the most common challenges that occur today.
Sign-up for our upcoming event
If you found our article helpful and relevant to your situation, we would like to invite you to our upcoming live event, ‘Accelerating Data Governance in 100 Days’, on September 29th, ’22.
We will be inviting CDOs and other Data Leaders to discuss an iterative approach to Data Governance (and Data Catalogs) that we have personally created after seeing so many implementations fail.
We are keen to introduce you to this framework that has proven effective in creating multiple wins in a short timeframe.