What makes a data product...?
Physically, data can be provided in many different ways. The idea of a data product is thus often abstract and difficult to grasp. Think about the following examples - which of them can be considered a data product and which can't?
Flat files in a data lake
The manual download of an Excel-sheet or csv-file
Automatic data queries via API
A view in a database
This question becomes particularly relevant when decentralized data management should be introduced in an organization. Rather sooner than later one will have to face the question: which of the countless distribution channels of data in our organization qualifies as (or should be elevated to become) a proper data product?
The issue here is that the term data product needs to be viewed from a governance perspective. The technological means used for data delivery is irrelevant for this question. The term data product rather tries to conceptualize a consumer-focused approach to data delivery: data should be provided and maintained so that consumers can find, understand and consume it easily — and so that they can trust its guaranteed quality.
Keeping this in mind, I use the following criteria as guideline to select data sets that qualify as data products:
Read-only access: a data product is by definition read-only. Consumers must not be allowed to change it in order to ensure trust and stability. Allowing consumers to write into the product creates the risk of corrupting the data or creating side effects.
General consumability: the data product should be consumable by any interested party. This does not mean that anyone should have access by default — but the interface and documentation must support multiple independent consumers beyond a single team or use case. Anyone interested should be able to request access. If this request is accepted, access should be granted and data consumption can start. In particular this means that APIs, database views etc. tailored to the communication between two applications are most likely not candidates for data products, unless they are nevertheless suitable for consumption by other parties.
Declared ownership: any data product must have a dedicated owner that is responsible for maintaining and improving it over time. A data product without clear ownership will tend to degrade quickly. Furthermore, as for any piece of software, changes of the data product will quite likely become necessary to keep it relevant as the world around it changes.
Finally, in order to actually deserve the label data product, a data set must be findable and documented suitably well so that potential consumers can find, understand and trust it:
Complete documentation: any data product must be listed and documented on the data platform resp. the data catalog. Meta data such as schema, refresh interval and data owner must be listed for easy consumption of the data. The documentation should help consumers not only access the data but also make correct and confident use of it.
Framing data sets as products means treating them with the same care and consumer-focus as software or APIs. This perspective ensures that data remains useful, trustworthy, and scalable as organizations evolve toward decentralized, domain-driven data architectures. Using these above selection criteria should leave you with good candidates for your first data products.
Photo credits: the title picture was created with mistral.ai.