Python vulnerability highlights open source security woes

A 15-year-old unpatched vulnerability in a tarfile module for the Python programming language prompted researchers from cybersecurity vendor Trellix to take action.

Remediation efforts for a 15-year-old unpatched Python vulnerability have raised questions around open source security after one company took on the immense task itself.

Cybersecurity vendor Trellix spent the last month releasing fixes for CVE-2007-4559, a Python vulnerability in the programming language's tarfile module that affected more than 300,000 open source repositories. Trellix researcher Kasimir Schulz stumbled upon the bug earlier this year and initially believed it was a new vulnerability. However, Schulz later discovered it was an existing Python vulnerability that had never been patched.

While the flaw was assigned a CVE when it was originally discovered in 2007 and given a medium-severity CVSS score of 6.8, Trellix researchers discovered that it was easier to exploit than initially thought and could lead to code execution, increasing its priority as a threat.

The rediscovery of CVE-2007-4559 -- and the struggles to patch it -- also highlighted larger open source security issues for projects such as the Python Software Foundation (PSF) that rely on volunteers to develop, maintain and patch the software. What happens when a project's volunteers can't reach a consensus on how to handle a reported vulnerability? And what happens when those volunteers depart the project?

How CVE-2007-4559 fell through the cracks

Lars Gustäbel, a former PSF volunteer, was lead on the Python vulnerability 15 years ago and even proposed a fix in 2014. However, he left PSF in 2019 amid an ongoing Python tarfile patch discussion that appears to have fallen by the wayside with his departure.

In a public GitHub thread from 2007 that discussed the Python tarfile vulnerability, Gustäbel said that "after careful consideration" he and a fellow PSF maintainer decided the flaw did not warrant a security issue. Instead of patching, PSF provided warning documentation that stated it could be "dangerous to extract archives from untrusted sources."

"In principle, I still stand by that statement," Gustäbel said in an email to TechTarget Editorial. "However, this is no trivial matter, and there are many facets to it."

He provided additional context in a blog post last week, noting that he dismissed the first bug report in 2007 and proposed a patch for discussion in the Python bug tracker in 2014.

"At that time, it seemed to me that this was not the way most of the people wanted the problem to be fixed. The discussion instantly died down, so there was no clear vote and the patch was never implemented," Gustäbel wrote in the blog post. "In 2018, the discussion about the patch was resumed, but due to time constraints I was no longer able to participate. I had increasing difficulty fulfilling my role as tarfile maintainer. Therefore, in 2019 I gave up my position as maintainer."

The GitHub thread shows that Gustäbel's proposed patch received no responses in 2014. While the discussion about the patch resumed in 2018, and several volunteers expressed support for the effort and even made revisions to the patch, participation in the discussion dwindled, and the patch was never released.

Amid inquiries about the status of the patch in 2019, one Python developer replied, "There was progress made as described on this issue, but there is yet work to be done, and no-one seems to be taking this upon themselves at the moment."

While Gustäbel emphasized that he no longer works with PSF and does not speak for the organization and its developers, he wrote that "the claims that there is a security vulnerability in the tarfile module that has been ignored for 15 years are somewhat exaggerated and out of context."

In addition, Gustäbel addressed many concerns from Schulz's report last month that documented higher risks for the Python vulnerability. In his opinion, the flaw "does not show a security vulnerability in the tarfile module but instead in the Spyder IDE," an open source development environment for Python programming. Trellix researchers demonstrated how an attacker could exploit the Python flaw for remote code execution using that environment.

Screenshot of Trellix Python remote code exploit demonstration
Trellix researchers demonstrate how to exploit the Python vulnerability remotely to compromise an instance of Spyder IDE, an open source development environment for Python programming.

"Both the tarfile and the pickle modules are used in ways they are not supposed to be used and that are strongly discouraged in the documentation," Gustäbel wrote in the blog post.

Despite Trellix's new research on CVE-2007-4559, Gustäbel's proposed patch has yet to be released. The GitHub thread shows no new discussions about the proposed patch since Trellix published its report last month.

It's unclear what action PSF will ultimately take, if any, for the Python vulnerability. Victor Stinner, a Python developer with PSF, told TechTarget Editorial last month that there was a proposal, first introduced in 2017, to "add an option to opt in for more secure behavior," but that it had not been implemented. According to the GitHub thread on the proposed change, the opt-in feature was discussed at length over the last several weeks, but still has not been deployed.

Trellix tackles the issue

When Trellix researchers discovered that 61% of GitHub repositories using the tarfile package were vulnerable to an attack, the cybersecurity company acted by developing its own patches. As of last week, Douglas McKee, principal engineer and director of vulnerability research at Trellix, said it had submitted just under 11,000 pull requests to repositories, with about 140 confirmed applied patches.

McKee told TechTarget Editorial that Trellix is on target to meet its initial number of about 60,000 patches before the immense task is completed. No additional partners have gotten involved, and McKee emphasized how slow the patching process is, but he did highlight one positive.

"One example of how this is improving the cybersecurity across industries is the acceptance of our pull request by Monai, an open source network for medical artificial intelligence. This framework has been downloaded over 600,000 times and was susceptible to this vulnerability," McKee said.

"Since it is a framework, this directly impacts the supply chain, as applications are built around this code base," he noted. "Our efforts have helped secure this piece of open source technology for medical professionals. We are excited to see the positive impact our efforts will continue to have across multiple industries."

Despite conflicting opinions between Trellix and PSF on how to move forward, the Python vulnerability demonstrated open source security concerns when it comes to communication and support. Infosec experts agree this is a complex situation and it's hard to say exactly where responsibility lies.

Josh Bressers, vice president of security at supply chain security vendor Anchore, said that while open source software comes with no guarantees, he thinks there are certain expectations of core technology. In this case, Python is a widely used programming language.

"The Python Software Foundation has a mission statement that includes to 'protect' the Python language, which I suspect everyone would agree this is applicable," Bressers told TechTarget Editorial.

Tim Mackey, principal security strategist at the Synopsys Cybersecurity Research Center, said it's important to recognize that because Python is open source, anyone with the skills to write in C can contribute updates. The fact that no one had contributed speaks to a broader issue with the consumption of open source, he added. That consumption is high: Synopsys published an open source security report in April that determined while nearly 100% of enterprises use open source software, managing it proves difficult.

"If you're basing the future of your business on free code and aren't contributing back or actively engaging with the development teams creating that free software, then you are implicitly accepting any risks associated with the decisions those authors make," Mackey said in an email to TechTarget Editorial.

Similarly, he said it's important to recognize that open source software foundations such as PSF cannot be compared with commercial software vendors that are obligated to patch any and all flaws. It's easy to blame that foundation when open source development affords everyone the opportunity to contribute both fixes and new functionality, he said, including the researchers who are drawing attention to the flaw.

"In this case, there is renewed discussion within the GitHub issue, and that discussion shows how hard it is to fix a security issue without creating breaking changes for the users," Mackey said.

Another discrepancy, which Tenable CSO Robert Huber noted, is that some open source projects are more well staffed and very quick to respond to issues compared with others. Contributing factors he highlighted were resources and the level of effort required for a fix, which he pointed out is no different for commercial endeavors.

More importantly, Huber said, when the flaw was initially discovered in 2007, supply chain security wasn't considered to the extent it is today.

Significant software supply chain attacks over the past few years, such as SolarWinds, have heightened concerns for both enterprises and open source projects. Even product launches this month by tech giants such as Google focus around securing the software supply chain.

However, 15 years ago, there were no secure software development lifecycles with software composition analysis, Huber noted, or third-party dependency checks. It's not safe to simply trust open source software, and Huber emphasized how important it is for enterprises to verify such code for themselves.

"The [Trellix] research noted they were looking for the import tarfile -- while this is an indicator a project might be vulnerable, it does not necessarily mean it is. You'd have to evaluate each project in detail to verify," Huber said. "Regardless, it's a great data point to ensure you are practicing good secure software development practices."

Another potential problem is that the directory traversal flaw might affect more than just the tarfile module. Directory traversal vulnerabilities, which Mackey described as "scenarios where user-supplied or user-modifiable data is used without validation or without boundaries," are nothing new. More importantly, he said, it's easy for developers to write code that contains that type of flaw.

On the other hand, Mackey said standard application security tools can help identify this class of threat.

Bressers agreed that the flaw will affect more than just the tarfile, which is why he believes looking for other affected modules is the right way to fully address the open source security flaw.

"Rather than rush to do something, the Python developers are starting a discussion, which I 100% agree is the right way forward," Bressers said.

If enterprises do implement the Trellix patches, Mackey said, it's important that Python users extensively test their applications to ensure no breaking changes occurred and that no updates to configuration settings were required.

"Given the age of this vulnerability, users of Python in long life span situations should pay particular attention to how user-supplied data might be used by this interface," Mackey said.

Dig Deeper on Application and platform security