9 Comments

I can totally relate to what you describe here ... I have never found the lineage representations in the metadata tools to be very helpful. It feel like gimmickry to convince management but not the solution you need in your engineer's toolbelt to effectively navigate the complexity of your data reality. A search based tool with data lineage operators/verbs is indeed the solution ("search over navigation", right?).

But would it be so simple to dump the dataset in a graphdb like Neo4j and be over with it?

Expand full comment

> "A search based tool with data lineage operators/verbs is indeed the solution"

I imagine various experiences. One could be a search bar that allows flexible queries. Another could be a set of predefined graph searches that match workflows embedded in a workflow-optimized interface. Great lineage graph search algorithms would open a lot of options, I believe.

> "But would it be so simple to dump the dataset in a graphdb like Neo4j and be over with it?"

We see a lot more lineage charts because search is a much harder engineering challenge. As you say, you can dump data to graph DB, but even running graph DB at scale, making sure queries are performant, finding how to fit search to the interface, etc. That's a lot of work. But it would be worth the effort.

Expand full comment

It is more than a search problem, right? It's not like search all tables that ... in your examples you mention e.g. give me all upstream sources. For that you need to the traverse the graph.

Whereas the map example 'give me groceries close by with at least 4 stars' is only a search problem.

But I agree: fitting the problem to the interface is harder then just saying let's use a graph tech to solve the directional part of the queries.

Expand full comment

Yeah, it is indeed a graph traversal problem!

> Whereas the map example 'give me groceries close by with at least 4 stars' is only a search problem.

I assumed path-finding to the shops near me to be a graph traversal too. E.g. traverse the streets to understand what is close + apply other search criteria.

Expand full comment

'Close by' is solved simply in direct distance, not road traversal ... I assumed :-)

Expand full comment

Thanks for pointing this out Wim. I assumed that `Close by` is calculated as a travel distance, rather than direct distance, with a graph 🤔 (https://www.quora.com/How-does-the-algorithm-of-Google-Maps-work).

Nevertheless, we're on the same page—data lineage + graph traversal = 🔥

Expand full comment

The algorithms referred to on that quora question are to calculate the itinerary from point A to B, but are not calculated when simply querying for "all grocery stores close by" ... that would simply take too much time. For such queries, it is just bird's-eye view which is fast and approximate enough for the result set.

Expand full comment