By William Woodruff
This is a joint post with the PyPI maintainers; read their announcement here!
This audit was sponsored by the Open Tech Fund as part of their larger mission to secure critical pieces of internet infrastructure. You can read the full report in our Publications repository.
Late this summer, we performed an audit of Warehouse and cabotage, the codebases that power and deploy PyPI, respectively. Our review uncovered a number of findings that, while not critical, could compromise the integrity and availability of both. These findings reflect a broader trend in large systems: security issues largely correspond to places where services interact, particularly where those services have insufficiently specified or weak contracts.
PyPI is the Python Package Index: the official and primary packaging index and repository for the Python ecosystem. It hosts half a million unique Python packages uploaded by 750,000 unique users and serves over 26 billion downloads every single month. (That’s over three downloads for every human on Earth, each month, every month!)
Consequently, PyPI’s hosted distributions are essentially the ground truth for just about every program written in Python. Moreover, PyPI is extensively mirrored across the globe, including in countries with limited or surveilled internet access.
Before 2018, PyPI was a large and freestanding legacy application with significant technical debt that accumulated over nearly two decades of feature growth. An extensive rewrite was conducted from 2016 to 2018, culminating in the general availability of Warehouse, the current codebase powering PyPI.
Various significant feature enhancements have been performed since then, including the addition of scoped API tokens, TOTP- and WebAuthn-based MFA, organization accounts, secret scanning, and Trusted Publishing.
Under the hood, PyPI is built out of multiple components, including third-party dependencies that are themselves hosted on PyPI. Our audit focused on two of its most central components:
We performed a holistic audit of Warehouse’s codebase, including the relatively small amount of JavaScript served to browser clients. Some particular areas of focus included:
During our review, we uncovered a number of findings that, while not critical, could potentially compromise Warehouse’s availability, integrity, or the integrity of its hosted distributions. We also uncovered a finding that would allow an attacker to disclose ordinarily private account information. Following a post-audit fix review, we believe that each of these findings has been mitigated sufficiently or does not pose an immediate risk to PyPI’s operations.
Findings of interest include:
Our overall evaluation of Warehouse is reflected in our report: Warehouse’s design and development practices are consistent with industry-standard best practices, including the enforcement of ordinarily aspirational practices such as 100% branch coverage, automated quality and security linting, and dependency updates.
Like with Warehouse, our audit of cabotage was holistic. Some particular areas of focus included:
During our review, we uncovered a number of findings that, while not critical, could potentially compromise cabotage’s availability and integrity, as well as the availability and integrity of the containers that it builds and deploys. We also uncovered two findings that could allow an attacker to circumvent ordinary access controls or log filtering mechanisms. Following a post-audit fix review, we believe that these findings have been mitigated sufficiently or do not pose an immediate risk to PyPI’s operations (or other applications deployed through cabotage).
Findings of interest include:
From the report, our overall evaluation is that cabotage’s codebase is not as mature as Warehouse’s. In particular, our evaluation reflects operational deficiencies that are not shared with Warehouse: cabotage has a single active maintainer, has limited available public documentation, does not have a complete unit test suite, and does not use CI/CD system to automatically run tests or evaluate code quality metrics.
Unit testing, automated linting, and code scanning are all necessary components in a secure software development lifecycle. At the same time, as our full report demonstrates, they cannot guarantee the security of a system or design: manual code review remains invaluable for catching interprocedural and systems-level flaws.
We worked closely with the PyPI maintainers and administrators throughout the audit and would like to thank them for sharing their extensive knowledge and expertise, as well as for actively triaging reports submitted to them. In particular, we would like to thank Mike Fiedler, the current PyPI Safety & Security Engineer, for his documentation and triage efforts before, during, and after the engagement period.
*** This is a Security Bloggers Network syndicated blog from Trail of Bits Blog authored by Trail of Bits. Read the original post at: https://blog.trailofbits.com/2023/11/14/our-audit-of-pypi/