Many “Open Source” databases belong to companies: a single point of failure in our supply chain. What happens when the company behind your search engine, vector or NoSQL database folds? Are we all up a creek without a paddle?
Setting aside non open source licenses, even Apache licensing code can mean little with regards to a project’s health. If the organizing energy disipates with the destruction of a company. When committers become scattered to the nine winds it could all evarporate overnight.
Foundations like the Apache Software Foundation actively work to broaden the company’s supporting a project. They encourage active participation from consumers and vendors. They manage and measure project health. They essentially protect your supply chain.
However with the NoSQL databases and vector search space, I feel we have a brewing problem. I’m worried.
Instead of being backed by a foundation, single companies build these projects. Many of these companies have shaky foundations - they’re VC backed, or unprofitable and publicly traded during a time of economic stress. Many companies may not be a going concern 2+ years from now. They might be acquired. Other projects could dissappear at the whim of a giant megacorp that loses interest.
For example, OpenSearch and Vespa, in the search space, seem like great projects. I am thrilled by the innovation and energy they bring. However they are backed / managed by single companies. What happens if, on OpenSearch, Amazon’s Return-to-office mandate causes that team to experience an exodus? Or Yahoo decides Vespa is a rounding error in its overall company strategy?
Year to year a single company’s outlook and strategy can vary dramatically. Lately Elastic has put renewed energy into search and retrieval, invigorating underlying Apache projects (alongside peers at other companies). Elastic has innovated quickly trying to keep pace with the vector database world. But this could change in a heartbeat with whatever next Black Swan investor-backed tech thingie pops up.
If I were on those teams, I’d WANT my code to be managed by a foundation if I cared about longevity.
I had thought that the classic NoSQL backed database company was a ZIRP phenomenon. VC money chasing bets when the NoSQL space took off 10+ years ago. But then every database vendor got a shot in the arm from AI and vector search. Everyone and their mother are adding vector retrieval to their thing, trying to become a storage layer for AI.
But would you want to build on that?
Somehow we mitigate this in the cloud layer. We learned to be multi-cloud. To use tools like Terraform and Kubernetes to abstract away those resources. Foundations like Cloud-Native Computing Foundation and now Open Terraform exist. We don’t want a single cloud vendor to dominate our infrastructure.
What’s going to happen at the data storage layer?
If I worked on or consumed these projects, I’d want project sustainability over the long haul. I’d push for getting this code into a foundation like ASF: a proven brand that actively manages project health. I’ve been around long enough to see how the economic sea-changes at one company rip assunder our best laid plans.
In the meantime, as a consumer, I’ll probably prefer projects like Solr or Postgres. Those and other projects you might not hear about much. Projects without dev evangelists.
I don’t care if technology wise, they’re not as up to date. The arc of this stuff veers towards commoditization anyway, and I’ll be able to contribute to that future. However unsexy and boring.
In short, I’ll just keep saying, like Indiana Jones: