1. Data Quality & Observability
Data Quality issues have been around as long as data has been around. DQ has undergone phases of improvement with a reduction in IT oversight and more business ownership. However, DQ remains a no.1 problem in all strategic initiatives.
DQ cannot be left on the side or forgotten about; poor data will haunt you. Previously DQ was divided into two core categories Technical & Business. Former solved mainly DQ issues relating to poor source data, poorly designed pipelines, and unmanaged reconciliations. In contrast, the latter would ensure alignment with the business DQ rules such as accuracy of the information, uniqueness in a given timeframe etc.
Data Observability is a new way of looking at data from a Software Engineering lens. It is trying to consolidate the Technical DQ checks and introduce more unique ways of managing data to reduce the burden of Business DQ checks. This is a welcome change, as depending on business users to resolve issues takes precious time and effort, better invested in decision making.
How should this impact your strategy?
If you are in the middle of a strategic data transformation programme, look at whether your incumbent supplier/vendor/internal teams are incorporating Data Observability checks (potentially under a different name). If they are not, consider proposing basic checks on data completeness, uniqueness, and accuracy. Also propose some automated resolutions like triaging, self-healing pipelines etc.
2. Data Product & Data as A Product
We’ve talked about Data as an Asset for a long time, and the implementation of this has been littered with challenges. How do you define an asset? Who manages it? How they manage it etc. Data Product is now building upon this thinking.
Data Products are different from the Data Asset approach as the latter was mainly a logical separation of the data. You couldn’t quantify a Data Asset based on code, tables, ETLs etc. It made it hard for people to understand when they asked the questions such as “so, which bit of the data do you mean exactly?”
Data Products are an actual tangible thing providing an end outcome. It could be board metrics/KPIs, a combination of your sales data in a datawarehouse, and product catalogues.
Data as a Product, on the other hand, enables processes and technology around Data Products. Confusing? Let me explain. Having a board metric is a Data Product; how you store, code, distribute, buy, sell, etc., treats that Data Product as a Product. Implementation of a Data Product Marketplace is one way of achieving that.
How should this impact your strategy?
To treat Data as a Product, you require particular basic data management and infrastructure requirements to be met. You can’t productize something if you don’t trust it enough. Implementing a solid data quality framework and data ingestion and retention framework, amongst other things, would help ensure you can truly market Data as a Product in your organization.
3. Active Metadata
I’ve lost count on a number of vendor presentations I’ve sat through that have started talking about Active Metadata. Historically, we have used Metadata for downstream purposes; however, it didn’t have a catchy enough name. Now — Metadata used to make certain decisions is a crucial trend to achieving a mature data platform.
Examples include Metadata such as Personally Identifiable Information (PII), which is automatically used to mask customer data when used for analytical purposes. Another way is to restrict access using PII tags. One more way is by helping alert the end user of a potential DQ issue.
How should this impact your strategy?
You may be doing something like this already under a different name. If you are not, this is a true game-changer, automated tagging of DQ issues, triaging of issue resolution, and acting on Privacy and Data Classification tags can create a smooth workflow in your Data Management space. Implement this at a small scale to a focused area like Privacy, which will pay dividends in automation savings and regulatory compliance.
4. Data Composability
Composability is a trend we are borrowing from our system design friends. Although quite a hype in the Web3 world, Composability can be applied to our way of developing data solutions. Every organisation I know of uses multiple data tools to achieve the end outcome, a data storage layer, data transfer tool, data visualisation tool, data distribution and protection tool etc.
Imagine you had core modular packages of data solutions, including your code, logic and actual data. An end-user could then re-use this package to derive some decisions without having to rebuild everything from scratch. Permissionless innovation is a key to Web3, but organisations can capitalise on this trend too, allowing innovation to their end users, helping Trend 5.
I see Composability above re-usability, as re-using components means your interpretation of specific information is the same as the author of that component. Composability lets you “Steal Like An Artist” and allows flexibility to use someone else’s pre-work to improve your decision-making.
How should this impact your strategy?
Composability goes hand in hand with the Data Product approach; you want to avoid having every team create different versions of the same data to answer a similar question. You also want to ensure that introductions of the new tools in the data stack are easily package-able so that they can be composed. Something that can’t be packaged or composed will require additional application management overhead and hence may not be shortlisted.
5. Data Democratisation
So, I know I didn’t talk about Data Mesh. But core concepts of the mesh have been covered with individual trends like Data Products / Observability etc. Data Democratisation is an outcome rather than an architectural trend. Providing the data to the end users to help them with their decision-making is a win-win.
But democratising data without proper controls and governance is also a recipe for disaster. Trends 1–4 help ensure the outcome of trend 5 is successful. Having the right quality data ensures adequate data composability, and implementing Active Metadata practices ensures restrictions on data distribution and management. And publishing and subscribing to Data Products will lead to a robust operating model for data democratisation.
How should this impact your strategy?
Don’t just talk about democratising data; write your constitution (operating model) to ensure Data is being accessed in a controlled fashion. Implement processes, roll out communications and training plans on how you will democratise data. If your strategy currently focuses on technical teams only, think about how to indulge business end users?
Conclusion
There are a lot of hype/trends currently in the data space; separating weed from the chaff has become difficult. The one thing I use to determine the usefulness of the trend is whether it helps move the needle. Does the trend help you go to market quicker? Helps you become more efficient? Enables you to mitigate core risks? If so, then the trend is worth implementing.
Courtesy: www.towardsdatascience.com
Comments