推荐理由：顾名思义，这个项目是一个各种各样爬虫的集合。这些爬虫大部分都是 Python 的，如果有正在学习爬虫方面知识的同学可以用这些项目来作为参考。因为真的是各种各样的爬虫，所以你们可能会在里面看到一些奇怪的东西……不过利用爬虫可以得到相当多的数据，从而制作出一些有意思的东西这一点倒是毋容置疑的，比如说对 B 站用户写个报告这样的事情也能做到。
今日推荐英文原文：《5 ways open source software companies make money》作者：Ajay Kulkarni
5 ways open source software companies make money
A guide on how to evaluate the long-term sustainability of the business behind any open-source software you are using (or considering working on yourself). Article co-authored by Mike Freedman.
(Interested in helping us build the next great open-source company? Timescale is hiring!)
As open source software becomes more and more popular, one of the most common questions we hear is: How do these projects make money?
Or, to put it another way: How do I know that these companies, who are building all this open source software I’m using, won’t go out of business?
And since we’re the co-founders of TimescaleDB, an open source time-series SQL database company, this question can also mean, How do I know that you won’t go out of business?
All are variations of the same straightforward question. But the answer is complex, and quickly evolving.
There is already a cohort of open-source software companies, some of which are public, that are surpassing $100M (or even $1B) in annual revenue: RedHat, Cloudera / Hortonworks (Hadoop), MuleSoft, Automattic (WordPress), Elastic, MongoDB, Acquia (Drupal), Hashicorp, Confluent (Kafka), Databricks (Spark), and more. (Here are over 30 of them.) If we analyze their success, we see a common pattern to how they’ve built sustainable businesses.
In this article we describe that common pattern by sharing 5 business models these open-source software companies use to make money.
If you are adopting open-source software in your own company, or working on an open-source business of your own, or even just considering working for an open-source startup, understanding these business models will help you evaluate not just the software, but also the long-term sustainability of the business behind that software.
But first, there are two requirements every open-source company needs before it can even consider making money.
What every open-source company needs before it can consider making money
Prerequisite #1: Broad adoption
The first prerequisite is broad adoption: the open-source project needs to have a large user base and community.
Broad adoption is necessary because an open-source company can capture only a small amount of the value it creates. To be clear: an open-source company gives most, if not all, of its developed software away for free, and most of its users will never pay for that software.
In fact, most open-source monetization rates (the conversion rate from users to paying customers) are fairly small: often in the low single-digit percentages (if not lower). But given a large enough community, that conversion rate can be enough. This dynamic is one of the drivers behind the economies of scale in the open-source model. In other words, the need to have broad adoption is one of the reasons why there are often category “winners” in open-source.
This need for a large up-front investment in adoption is also why most successful open-source companies today start off as projects in a large company (e.g., Hadoop/HDFS at Yahoo!, Kafka at LinkedIn, Kubernetes at Google), as research areas in academia (e.g., Spark at Berkeley), or as VC-backed startups.
Prerequisite #2: Primary credibility
The second prerequisite is primary credibility within the community. This is important because it enables the open-source company to build an efficient sales and marketing process, which is especially important given the low monetization rates.
Having “primary credibility” means that anyone who needs help with the software reaches out to the open-source company and not someone else for assistance. Not having this credibility means having to slog it out with others in the market for that attention, leading to a far less efficient business model and lower margins.
The value of primary credibility, which today is often achieved by being the main contributors to the project, can be seen by comparing the market caps, annualized revenue, and multiples of Elastic ($4.59 billion market cap, $160 million revenue, 29x) and MongoDB ($3.97 billion market cap, $155 million revenue, 26x) vs. (pre-merger) Hortonworks ($1.23 billion market cap, $262 million revenue, 5x) and Cloudera ($1.73 billion market cap, $367 million revenue, 5x). (Market caps as of November 27, 2018; revenue numbers are from latest reported fiscal year.)
Because Elastic and MongoDB had primary credibility in their respective communities, they were able to build a much more efficient business model, and capture far more value with less revenue than either Hortonworks or Cloudera, who had to raise more money and fight fiercely over the Hadoop market. (One could even speculate that the need to possess primary credibility was one of the reasons behind the recent Hortonworks/Cloudera merger.)
Once an open-source company has broad adoption and primary credibility, it can build a pipeline of companies who need assistance, and start layering in a variety of business models to build a sustainable business.
Now, let’s talk about those business models.
The 5 open-source business models
From analyzing successful open-source companies today, five common business models emerge:
- Restrictive licensing
- Hybrid licensing
The support model, also known as the “RedHat” model, goes like this: sell deployment and integration services, production-oriented “insurance policies”, certified binaries, trainings, bug fixes, etc., to businesses deploying the project in production.
This model becomes limiting over the long-term for a few reasons: (1) support often requires a lot of manual work, and so reduces business margins; (2) scaling is hard because support work is often not easily repeatable; (3) it creates perverse incentives on the part of the open-source company, where making the product easier to use cannibalizes support revenue. In fact, this model works very well when the project requires complex deployments with sprawling ecosystems, which often goes against building the best user experience.
This model is also notoriously inefficient, typically converting less than 1% of all users into paying customers. This inefficiency should come as no surprise. Open-source software itself is free. In order to feel the need to pay for support, a company needs to rely on the project for mission critical systems. Yet over time, companies that do rely heavily on the project will naturally invest their own engineering efforts to understand the project, reducing the need for external support. So there’s only a small usage window where this model works.
The support model is still where every open-source company starts today. Yet, with all these challenges, and the fact that RedHat is still the only company to build a multi-billion dollar revenue business in open-source in the past 25 years, it’s become clear that open-source companies need better business models than just support.
Hosting means offering a fully-managed version of your project, so that when users want to try out the project, or even deploy it in production, they can spin up a remote server with the software in just a few clicks, and not have to worry about operating it in steady state (i.e., not worry about backups, downtime, upgrades, etc.).
Given the popularity of the cloud and managed services in general, it should come as no surprise that this has also become a popular model for open-source. In particular, this has become a common way for the public cloud providers (and in particular, AWS) to monetize open-source projects without giving back to the community, which has led to some complaints and tensions (and the emergence of other models, which we’ll soon discuss).
The hosting-only model can work well. Some companies (e.g. Databricks, Acquia) have been quite successful with it. Yet typically hosting is layered in with a few of the following other models.
3. Restrictive licensing
The restrictive licensing model creates a legal reason for users of open-source software to pay. It does this by providing an open-source license with slightly onerous terms, such that anyone using the software in production is highly incentivized to strike a commercial deal with the vendor. The GPL and AGPL licenses, as well as the newly created Commons Clause (adopted by certain Redis modules), are examples of this model. In particular, AGPL and Commons Clause (as well as the new SSPL launched by MongoDB) are licenses also designed to defend against the public cloud providers.
But this approach has limitations: the GPL-based license restrictions do not restrict unmodified usage, and only apply if one makes modifications and does not want to open-source them; the Common Clause has some ambiguity in its language, and it remains to be seen how this will play out in the courts. Still, the largest drawback of this approach is that these licenses hurt adoption, often turning off potential users. In particular, there are quite a few large companies who have explicit policies against using restrictive licenses. Because of the inherent friction of this approach, many rule it out, relying on other business models.
Open-core has quickly emerged as the most popular way for open-source companies to make money. The idea behind open-core is that the majority of the code base is open-source, while a smaller percentage (targeted at production or enterprise users) is proprietary. The proprietary portion may be packaged into separate modules or services that interface with the open-source base, or could be distributed in a forked version of the open-source base.
Typically the proprietary features are ones needed for production deployments and/or at scale. (As an example, for an open-source database, features like monitoring, administration, backup/restore, and clustering are often proprietary.) One benefit here is that it allows the open-source company to license the core with a very permissive license (e.g., Apache 2), while retaining the ability to charge for proprietary features. It also allows open-source companies to defend against free-loading participants (e.g., such as the public cloud providers) by keeping certain features in the proprietary code base.
The challenge with this model is in balancing the open-source value versus the proprietary: if an open-source company gives away too much, then it gives up the opportunity to make money; but if it gives away too little, then the open-source project effectively becomes “lame-ware” (and the project will likely fail to get broad adoption).
Another challenge is that cleanly separating the open-source from proprietary features in code is sometimes difficult. Even if separating them is easy, maintaining two different code bases can also be challenging from an engineering process perspective: e.g., managing independently versioned releases that might need to interoperate and/or porting code back-and-forth to prevent code divergence over time. And often engineers would rather work in the open-source repo than the “business” repo. But despite all these reasons, this model is quite powerful.
5. Hybrid licensing
The hybrid licensing model is the newest one on this list. Initially popularized by CockroachDB (Jan 2017), and later adopted by Elastic (Feb 2018), hybrid licensing takes the open-core approach but improves on it in a few key ways.
What hybrid licensing does is intermingle open-source and proprietary software in the same repository, and then make the code for the entire repo available. That is, the entire repository is “open code” (or “source available”), just not all licensed under an OSI-approved open-source license. Users can choose to use a binary with just the open-source bits (available under an open-source license), or use a binary with both the open-source and proprietary bits (available under the proprietary license). The proprietary licensed binary often will have paid functionality that is off by default, but can be unlocked by purchasing a license key.
The advantages of this approach for an open-source company include all of the ones listed under open-core, plus a few more: (1) having everything in the same code base makes it easier to manage engineering process and development; (2) it enables the entire team to work on the core project; (3) it allows users to upgrade from free to paid in-place, often without downtime (and without needing to interact with a salesperson); (4) it allows external community members to comment on, file issues on, and (if they so choose) contribute to proprietary features using the same workflow they’d normally use for open-source features (e.g., via GitHub).
The largest challenge is also the same as open-core: balancing the quantity and value of open-source vs. proprietary features.
A business model layer cake
We just discussed five common open-source business models, yet there is no one-size-fits-all. Some will find success with a purely managed offering model, e.g., Databricks. Others will have such broad adoption, like SQLite (reportedly billions of installs), that they will be able to support a small core development team with just support and warranties.
That said, most open-source companies will make money using a combination of the five models we discussed, forming a revenue layer cake. For example, by combining support and licensing, or support and hosting and open-core.
If you adopt open-source software in your company, these are also the various ways you can support an open-source business, and ensure that the software continues to improve and be maintained. So if you do decide to use open-source software, and if there is a company behind the project, please support it.
This post, like another that preceded it, was built on the shoulders of giants. A big thank you to all of the following people who have shared their open-source wisdom and time with myself and my co-founder Mike over the past few years: Harry Weller (RIP), Forest Baskett, Greg Papadopoulos, and the rest of the team at NEA; Peter Fenton, Chetan Puttagunta, and Eric Vishria and the rest of the team at Benchmark; Rob Bearden, Shaun Connolly, Herb Cunitz, Mitch Ferguson, Jeff Miller, and the rest of the Hortonworks diaspora; Gaurav Gupta from Elastic; Jay Kreps from Confluent; Spencer Kimball from CockroachDB; and so many, many more. We are honored to have such great peers in our industry.