A different way to "bundle" Data Platforms

Mar 9, 2022

Why should we stop focusing on bundling tools in our data stacks?

4 Comments

Mar 17, 2022

Most of the current approaches are automating the existing tools built on data management methods of the past. The science of data management is not fully implemented in these tools and approaches. There are better ways to make life easy for implementers by reimagining data modeling and job building in the modern DW world. Data management systems ignoring "when and what changed in data" as additional information, tend to limit the value of data consolidation. Most modern platforms do not consider "data conformation" as an important element. There are several data warehousing principles that are ignored in these architectures resulting in not being able to obtain answers to questions such as "what is the % of sales increase in Q1 from last year to this year Q1 (before and after re-org)". One of the important aspects of a platform is that data coming from different data sets should always match. If not, how is it different from a cluster of independent/disjointed systems? There is a lot that needs to be done in higher layers beyond Orchestration. Effective Orchestration keeps us sane when doing the actual work.

Expand full comment

Emmanuel Cassimatis

Mar 10, 2022

That is right and integration companies have been making money making connectors. And it used to be very complicated but had simplified in the recent years so now software can emerge to unify. Key may however be how to convince developers who are often in love with a few software/tools they use. Community led/product led growth or private software type of reach, or both?

Expand full comment

Sarah Krasnik Bedell

Mar 9, 2022

Something that I think is important to call out from the practitioner's perspective is budgeting. The data space is so fragmented, and many practitioners have a hard time deciding where their money is best spent if they can't afford all the different types of tools. Additionally, integrating all the tools together is not easy. Understanding where failures occur among all the integration points is something the "data observability" category attempts, but it's still not easy to manage.

Expand full comment

Reply (1)

Petr Janda

Mar 9, 2022

That's right; budgeting is non-trivial since the whole industry is so much "under construction." I hope that the maturity of both OSS and SaaS offerings will ultimately create flexibility for any team to get most of the stack at an acceptable cost. An alternative way to look at it is that once you start to have a larger data team, say 10-15+, the headcount budget is so significant that it's easier to justify the budget for SaaS tools as long as you can demonstrate reasonable ROI on productivity.

On integrations—I fully subscribe to that pain. There is a lot of potential for Data Observability, but I think we have to go beyond what we do today. A lot of the stack is still somewhat hard to understand and debug as current solutions are pretty narrowly focused on a subset of the tools in the stack.

Expand full comment

petr@substack

A different way to "bundle" Data Platforms