The Complex Relationship Between Cloud Providers and Open Source
Cloud providers like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure have a frenemy relationship with open source.
Editors note: This article first appeared in The NewStack.io where they had a two-week exclusive to this post.
Cloud providers like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure have long had a frenemy relationship with the open source community.
On one hand, cloud providers’ Platform-as-a-Service (PaaS) layer has successfully redirected most of the value from software maintainers to themselves. However, big tech companies have also significantly contributed to the open source ecosystem.
One notable example is the Kubernetes project, which originated at Google. Other examples include LinkedIn’s Apache Kafka, Facebook’s Presto, and Airbnb’s Apache Airflow and Superset.
The tension between cloud providers and open source maintainers is on the verge of changing dramatically, but it is worth keeping up with what is happening right now.
The Platform as a Service Model
The most obvious source of tension is the establishment of PaaS as a major revenue model for cloud providers. In the early days of AWS, its core business was compute and storage.
Still, there’s a major technical gap between a raw virtual machine and a fully distributed architecture needed to host a significant web service. Over time, AWS realized it needed another layer to ensure its users were successful, largely around the provisioning and lifecycle management of complex software like databases, queues, and job orchestrators.
The clouds relied upon open source for battle-tested implementations of this software and exploited their permissive licensing to create extremely profitable lines of business on top of them. This seems nakedly exploitative of the open source developers’ effort, but it’s worth understanding why this worked.
There are two main pain points cloud PaaS solves:
- The operational economy of scale
- Distribution of the service
As an engineer, I can immediately see the need for an operational economy of scale. I have seen firsthand how difficult operating these systems can be. It requires a deep experience with engineering fundamentals as well as the idiosyncrasies of the software itself to provision and maintain them reliably.
Most organizations will not have the in-house expertise, and will instead have to pay someone who has that expertise.
In theory, the likes of Amazon or Google are not the only organizations that can offer this sort of service, and it turns out many open source communities have commercialized their products by creating PaaS offerings of their own, like Elastic for ElasticSearch, Confluent for Kafka, MongoDB, and others. This business model has provided vital sustaining capital to those open source projects and is a huge reason for the success they’ve had.
Distribution is the second huge competitive advantage the clouds have over really any other source of software. The harsh reality of most enterprises is that almost all action is severely bureaucratically constrained, and any vendor or open source package has to survive tedious scrutiny to begin being used.
Cloud providers have the advantage of an established commercial relationship with virtually every large business on the planet, so the friction of trialing a new service through one is orders of magnitude less, creating a massively powerful sales machine. That competitive risk has proven so great that some open source projects have considered it existential and entirely changed their licensing to counteract the threat, most notably Elastic and MongoDB.
Another interesting side effect of the emphasis on this business model has been on the codebases themselves. The spat between AWS and ElasticSearch has caused the project to become forked, with AWS maintaining OpenSearch and Elastic maintaining control over the legacy Elasticsearch codebase.
This happens on a more subterranean level as well, with services like AWS RDS effectively rewriting major relational databases like Postgres for its Aurora service ( a major advancement in database design) but not contributing that innovation back to the open source community. This works because the software is never expected to leave the walled garden of the managed service, but that ultimately neglects the wider ecosystem of open source as a result.
Cloud Providers Take a Step Forward
When you break down the problem, there are two main issues at play between the cloud providers and open source relationships.
First, there is the problem of finding a more equitable distribution of value between the infrastructure providers and the open source maintainers that doesn’t overcompensate the ability to overcome corporate bureaucratic inefficiency and allow the software ecosystem to be more self-sustaining.
There has been progress towards acknowledging this issue, with AWS leading the way again by establishing a partnership with Grafana to distribute software with a clear revenue-sharing agreement. That’s a win-win-win; Grafana gets appropriate compensation for its product, corporates get to cut through their red tape, and AWS gets another service in its catalog.
But, there’s also a technical challenge that stands at the root of all of this: open source software is frequently so complex that a third-party infrastructure provider is needed to provide a decent user experience. If that were no longer the case, this issue would change materially.
Organizations would not need to carefully parse who they outsource their infrastructure to, in order to utilize standard open source components, making distribution advantages, not game-breaking.
Developers could monetize the value of the software itself and not its “management.” And, users would not have to make compromises around data tenancy in order to rely on the best of the open source ecosystem.
Plural Is Here to Help
At Plural, we believe this is much more achievable than most people realize.
We are not in the world of the early 2010s where the cloud was new and unproven, and managing software on it was a wild west of duct-taped half-solutions.
There’s now an incredibly powerful ecosystem of tools like Kubernetes and Terraform that can virtually automate the entire problem of distributed system management, but people have simply not exploited their full potential.
We’ve already packaged over 50 major open source solutions like Airbyte, Airflow, Prefect, ElasticSearch, Kafka, PostHog, Grafana, and Argo CD to enable developers to deploy them using DevOps best practices on top of Kubernetes. Our platform provides engineers with all the operational tools they would get in a managed offering plus a verified stream of upgrades, all deployed in your own cloud for maximum control and security.
Be the first to know when we drop something new.