Apache exec talks prudent open source software project usage
The open source community faces a number of technical and existential threats. A veteran talks open source's role, funding, security and compliance, as well as his favorite projects.
Open source software is a mainstay for enterprise applications. Aside from special cases, such as those for highly regulated businesses, most apps contain open source components. It's more efficient and economical to rely on code from the open source community for many processes within an application than to build all your own.
Any two open source software projects, however, can have vastly different origins and levels of maintenance. Sometimes, an open source project can get so large that it becomes a virtual house of cards; where a piece of software with many consumers only has a few contributors who support and maintain it. Other times, developers will communicate the potential vulnerabilities that exist within open source software projects, but leave it up to organizations that rely on it to make the necessary changes to combat them.
And those are not the only challenges that face the community. Technology companies gobble up open source software projects, and some of these vendors consume more than they contribute. So, how will the open source philosophy continue to thrive and the community continue to innovate as it deals with looming technical and existential threats?
David Nalley, executive vice president at The Apache Software Foundation (ASF), believes organizations must put levels of self-governance in place -- or pay a price. As a contributor and advocate of the open source community going back two decades, Nalley believes that enterprise reliance on open source software is ripe for something akin to "the tragedy of the commons," in which carelessness of many consumers leads to negative, or even disastrous, outcomes.
"You can't assume that just because you're consuming something that it's okay," he said. "Consuming [open source code] without a maintenance plan in there, or a plan for how you're going to be made aware of security issues and how you're going to handle upgrades -- it's setting yourself up for failure."
In this episode of the Test & Release podcast, Nalley discussed how The Apache Software Foundation assesses the viability of its 300-plus ongoing open source software projects, why license tracking is a booming market, and what level of open source contribution he would like to see from tech giants.
Transcript - Apache exec talks prudent open source software project usage
Editor's note: Nalley spoke with site editor David Carty and assistant site editor Ryan Black. The transcript has been lightly edited for clarity and brevity.
Many applications contain open source code, as many as 96%, according to some figures out there. Can you explain how that number got to be so high? And do you think that it will continue to increase?
David Nalley: I think it will continue to increase. I think when, back in the 90s and early 2000s, there was a lot of talk about reusing code and repurposing code that had already been written. For many folks, they considered that a pipe dream. They thought that, "write it one time, run it anywhere and run it on multiple platforms" was the thing of the future. And the reality is that we're doing a little more loose coupling, than perhaps folks originally thought about, but that's where we're at. There's no need to write a web server, there's no need to write a Java application server, because we have multiple of those to choose from that are freely available and much higher quality than we could probably write in the place. So my expectation is that as organizations strive to deliver value -- and that's the only reason organizations write software, they want to deliver some value -- then the focus is going to be on what actually delivers value and … that's not writing another web server. So to the degree that those foundational layers help us and basically insulate us from having to do work that doesn't necessarily add value to the stack, then I think we're going to continue to reuse code and do that via using open source libraries and products that are already there. So I think that trend's certainly increasing.
I'm not surprised by the 96% number at all. Especially the newer the application, the more likely it is to have a very small amount of things that add value over the rest of that foundation.
Is there such a thing as being too reliant on open source code, perhaps if you're in a highly regulated industry or something like that?
Nalley: I think that there is possibility of being carelessly reliant on open source. If you look at this from an economics perspective, we have a situation that is ripe for the tragedy of the commons, where we have this common framework that is available to us, and people are using it, but not necessarily that many people are caring for it. You look at [the] Heartbleed [vulnerability], the world was shocked that there were two folks who basically worked on that part time, and nobody was really paying them for it. And the entire world was dependent upon two guys to continue caring about OpenSSL. ... So, I think you can make some bad choices, or you can put yourself in a place where you're reliant on things that no one's caring for anymore. That's certainly a dangerous position to put yourself in. I do think you need to take some care about what kind of open source you're looking at [and] what kind of open source you're consuming, especially if it becomes vital to your business. I don't think we're in the place where you can just say, 'All right, I need a library, I will use this library, consume it blindly and assume that everything will be fine,' especially for five or 10 years down the road.
To bring up another prominent example, in the case of Equifax, the company supposedly had thousands of vulnerabilities prior to its data breach. So how would you recommend, for those organizations trying to keep up with those open source vulnerabilities, how should they tackle eliminating that backlog, and keeping up with that backlog?
I will tell you that it's a very difficult problem to manage. I see folks who have a dependency list sometimes that exceeds 1,000 open source components in a single product. And so that can be a difficult problem to solve. I think that you have to be paying attention to what's coming out from the projects. So the ASF has an announce list, where we announce new releases of software all the time, including security releases. There's also tons of resources: Mitre's [Common Vulnerabilities and Exposures] CVE project has announcements; Bugtraq and other mailing lists that you can consume security issues from; or you can pay a service to inform you about that and provide you with that in a way that's a little more easily consumable. But the bottom line is that you can't assume that, just because you're consuming something, that it's okay. There's a constant maintenance issue that you're going to have to deal with any software.
When you're consuming open source, there needs to be a maintenance plan that goes along with it. And I've seen some organizations, they will do a full procurement process, where they evaluate where the open source software is coming from, how it's updated, how often they should be expecting to update it, but consuming it without a maintenance plan in there or a plan for how you're going to be made aware of security issues and how you're going to handle upgrades, it's setting yourself up for failure.
A lot of the problems that we're coming to see today are also around the philosophy of how fast upgrades are happening, and how fast updates are being pushed out into these open source packages. When you're deploying software that your release cycle on the software may be six months or a year, if the project is pushing out releases every month or every two weeks, you've got a much more difficult problem on your hands. I think that you've got to come away with, what's the plan that matches my code's velocity, or my product's velocity? And how do you mirror that up with what the project that I'm consuming is using?
You're talking a bit about security, and that reminded me of something else that we were talking about. I know within the past year, we've seen cases where third-party contributions to open source software were actually pretty malicious contributions. An example that comes to mind is there was a compromised NPM package. And so I was curious, what does an open source foundation like Apache, and what do their communities, do to maintain the openness that's so fundamental to the open source project, but still safeguard users from compromised code?
Nalley: So there's several things that happen at ASF.
The is, we understand that there has to be a viable community around a project, if the project is going to have any kind of long-term life. That means it can't be one person. So we have a test immediately that is looking for at least three participants who are actively watching the commit mailing list, they're watching all of the commits flowing into the code base, they're actually evaluating releases. So we actually have the minimum number [set at three for] project management committee members necessary to create a release, or for a project to continue operating. For many projects, that's not enough, but that's our bare minimum. When I was a member of the board of directors, one of the things we were doing is that, every quarter, we were looking at every project at the foundation to make sure that there was still enough vibrancy and life -- that enough people were paying attention to what was going on in the project, that it wasn't going to basically be unattended. The famous thing about open source is that all bugs are pretty easy to deal with, given enough eyeballs. But you're actually dependent upon there being enough eyeballs. So, that's the stage.
The second [stage] is, we have contributors come in, and we have a multi-tiered system. So when you start contributing to a project, you're going to submit a pull request, or you're going to submit a patch file to the project; someone will review that, and largely that's around technical accuracy of the patch, and [whether] you [are] headed in the same direction as the rest of the project. So, it's actually a pretty big step when you move from submitting patches or submitting pull requests to actually having direct commit access. The projects are actually managed by folks on a project management committee. Those folks are essentially another layer of trust and a feeling that this person is interested in long-term help with the project.
That's worked very well for the ASF, because it allows us to, long before there's problems, we can say, 'Hey, there's just nobody who cares about the software anymore. We need to signal to the rest of the world that that's a problem.' They should be aware of that.
GitHub recently launched its Sponsors program, which allows developers to receive funding when they contribute to widely used open source software projects. I'm curious how familiar you are with that program, what you think about the program, and maybe what else can be done, more generally, to incentivize developers to contribute and maintain open source projects?
Nalley: I'm aware of the initiative that GitHub has. I don't know that I necessarily have an opinion one way or the other because I've not been the beneficiary, nor have I sent money into it, so I can't speak to it.
One of the interesting things about the ASF is, we don't pay for any code development. All the folks who contribute code at the ASF, we call them volunteers, because they're volunteers for the foundation, they may be employed by someone who actually pays for them to work on these open source projects. But the foundation itself does not actually pay anyone to develop code. We care deeply about being neutral, and not picking winners or losers in the marketplace. Our philosophy around that is that if enough people care, then a project will flourish. If people stop caring, we shouldn't be propping it up artificially. That [philosophy] has worked rather well for us.
It's clear when there are times that a project's clearly going to be relevant for scores of years, and people will continue to care about it, they'll continue to work on it. Then there's times where projects are clearly faltering, because the marketplace has decided that, 'We don't want to pay anyone.' We consider it part of the public's trust that we signal, 'Hey, nobody's paying attention to this anymore. You shouldn't be using this, or you should at least be aware that nobody's paying attention to it anymore, if you're going to use it.'
I think it's an interesting model in terms of what GitHub is trying to do. I think we'll have to wait and see what the actual outcome is in terms of, does this program make open source more sustainable or not? And I think we won't know that for a while.
On a similar note, I'd also be curious to get your take, in a general sense, on the model in which developers create open source software on large companies' dimes.
Nalley: Certainly there are a lot of large companies who create open source software and pay their employees to contribute to it. I think there's also a large number of companies who pay their employees to contribute to open source software that they didn't originate. I think there's been a history of that both being wildly successful, in many cases, and also being failures, where the company decided that that strategy wasn't working and left and projects ... I'm not necessarily saying the method is wrong, but there are clearly successes and failures that you can point to for each model.
What do you think should be the obligation of some of these big tech companies like Amazon or Google to contribute to some of these projects? I mean, clearly, they see the benefits of this open source code. So should there be a standard of what's expected for them in terms of open source contribution?
Nalley: In an ideal world, I suspect that we would have enlightened self-interest that would have us working on the things that are responsible for us essentially being able to make money, right? So any company, regardless of size, would say, 'Hey, these five or 10 projects are responsible for a lot of my revenue or reducing the amount of money that it costs me to develop a product, and we should see some contribution there.' I don't know that we've achieved that state. I think there's certainly some very self-enlightened companies out there that realize that open source allows them to do amazing things, and has allowed them to get to market very quickly. And they have folks working on [open source projects]. Is it perfect? Absolutely not. We only have to look at a handful of different examples, and we can see plenty of open source projects that are incredibly widely used, that very few people are contributing to.
I don't know that we're going to be in a place where we can ever dictate to a company: 'You consume X amount of open source, so you should also contribute X amount of folks to work on open source projects.' The other problem is that, if you're consuming 1,000 components in a single product, you can't then turn around and necessarily contribute 1,000 employees to work on those 1,000 projects. I think there are a number of efforts out there that look at sharing that burden, and those are interesting in that they recognize that there are some core fundamental, open source packages that need attention. Even if they're not necessarily the most cutting edge or the most interesting [software projects] today, because they're so fundamental and foundational to everything that we build on, they're working on them. I see efforts like that coming out of the European [Commission] seems to be getting folks to contribute to a number of different open source projects.
There's a number of commercial working groups that sit down and compare open source usage, and where they have problems. They start to coalesce around what's important for us to focus on as an industry, and I think those are valuable. I think that's trying to recognize the problem and trying different things to solve it. But I don't know that we have a perfect solution today for that problem.
Software development teams often will use various management tools to keep track of progress, software requirements, issues, et cetera. I was curious what methods or tools they might use to keep track of the conditions that open source licenses stipulate. And, how would they track and manage those requirements, essentially the stuff they need to observe that's laid out in those licenses?
Nalley: There's a cottage industry around license compliance and open source. In some cases, that becomes very complex. We have scores of licenses out there, and they all have slightly varying degrees of requirements that you have to comply with. I'll tell you how, generally, at the Apache Software Foundation, we handle that problem. It's all kept in text files in our source code repositories.
[At the start], we have essentially a list of approved licenses, and those are broken down really into two different groups: the kind we can incorporate and modify, and the kind we can just consume; we call this category A and Category B. And I'm grossly over simplifying the process, but if it's category A or category B, you can have that as a dependency in your product at the Apache Software Foundation, and can consume that. Then, we have two different files that we keep as text files, called the 'license' and the 'notice' file. We track the license obligations and provide them the notices as required in the notice file.
One of the things that we do to make that simple, not just for us but for our downstream consumers, is our standard is that we don't want anything more restrictive than the Apache Software License in the software that we include. That has greatly simplified the process for us, because a lot of the licenses with additional terms are more complicated. They're simply not on our available license list, and that saves us quite a bit of that headache, because those are automatically not part of the options that we have in the place. But there are a number of tools out there from a number of different companies that will tell you all about your license obligations, create an entire manifest for you, if you need that. That's an industry that seems to be flourishing at the moment, because open source compliance can get very convoluted, particularly when you're dealing with somewhere between scores and hundreds of dependencies with different licenses.
Let me ask you quickly, David, we've talked about a lot of things here today. What do you consider the biggest challenge that the open source community will face over the next, we'll say, five years?
Nalley: I see a number of large challenges. Some of them are things like maintenance. I do think there's this ever-growing question about what open source is. And there are folks who have seen the tremendous benefit of the open source development methodology, and they don't necessarily see just the benefits of the development methodology, they may see it as there being some marketing benefits [or] customer acquisition benefits. At the same time, they may not necessarily want all of the things that come along with being an open source project. And I think defining open source's place in the tech landscape and what the open source is going to be in the coming decades is part of the challenge that really faces open source, because even in the two decades that I've been involved in open source, that's changed pretty considerably. What used to be a bunch of folks who cared deeply about open source ideology or free software ideology, we're seeing that the rest of the world now cares about open source. We see congress having investigations that involve open source software and, a decade ago, they probably had no idea that open source existed. I think that having to figure out what our place in this world is, and what we're going to do as an industry around open source is going to be one of the big challenges. There's constantly going to be, I think, this tragedy of the commons that is going to remain a lingering and ever-growing risk to open source. And I think we're going to have to figure out how to deal with that, how to manage that risk, both from the tech industry and the open source software perspective.
Apache has a number of projects under its umbrella, over 300, something along those lines. I'm curious if you have one or two you'd like to mention that's a particularly exciting community, or a particularly vibrant community.
Nalley: My out-front favorite is CloudStack, which is an infrastructure-as-a-service platform. It's one of my favorites because that's where I've done a lot of work. I think there are some amazing projects out there, and I'll call out just a few that I've been impressed by recently. One of the problems that I had when I was the director was, every quarter, I was reviewing every project at the ASF. And every quarter, it seemed like I was finding a project that I hadn't previously seen.
A couple of really interesting ones [include] Fineract, which is ... open source banking software; Kylin, which is in the big data space and largely was developed by the Chinese open source community; and Beam, which has become a bit of a translation layer to allow a number of different big data streaming and other tools to interoperate. Those are the ones that jump out immediately. One of the fascinating things is getting to read reports that the projects are generating every quarter and seeing how they're dealing with growth, how they're dealing with lack of activity sometimes is fascinating. And, you're right, the ASF has 300 or so products that call the foundation home, and they are widely varied from very low-level libraries all the way up to end-user tools and developer tools. There's lots of fascinating projects that are hiding out at apache.org.