skip to navigation
skip to content

Python Wiki

Python Insider Blog

Python 2 or 3?

Help Fund Python

[Python resources in languages other than English]

Non-English Resources

Add an event to this calendar.

Times are shown in UTC/GMT.

Add an event to this calendar.

PEP:480
Title:Surviving a Compromise of PyPI: The Maximum Security Model
Version:e9cf21f0c0a9
Last-Modified:2014-12-08 19:01:14 -0800 (Mon, 08 Dec 2014)
Author:Trishank Karthik Kuppusamy <trishank at nyu.edu>, Vladimir Diaz <vladimir.diaz at nyu.edu>, Donald Stufft <donald at stufft.io>, Justin Cappos <jcappos at nyu.edu>
BDFL-Delegate:Richard Jones <r1chardj0n3s@gmail.com>
Discussions-To:DistUtils mailing list <distutils-sig at python.org>
Status:Draft
Type:Standards Track
Content-Type:text/x-rst
Requires:458
Created:8-Oct-2014

Abstract

Proposed is an extension to PEP 458 that adds support for end-to-end signing and the maximum security model. End-to-end signing allows both PyPI and developers to sign for the distributions that are downloaded by clients. The minimum security model proposed by PEP 458 supports continuous delivery of distributions (because they are signed by online keys), but that model does not protect distributions in the event that PyPI is compromised. In the minimum security model, attackers may sign for malicious distributions by compromising the signing keys stored on PyPI infrastructure. The maximum security model, described in this PEP, retains the benefits of PEP 458 (e.g., immediate availability of distributions that are uploaded to PyPI), but additionally ensures that end-users are not at risk of installing forged software if PyPI is compromised.

This PEP discusses the changes made to PEP 458 but excludes its informational elements to primarily focus on the maximum security model. For example, an overview of The Update Framework or the basic mechanisms in PEP 458 are not covered here. The changes to PEP 458 include modifications to the snapshot process, key compromise analysis, auditing snapshots, and the steps that should be taken in the event of a PyPI compromise. The signing and key management process that PyPI MAY RECOMMEND is discussed but not strictly defined. How the release process should be implemented to manage keys and metadata is left to the implementors of the signing tools. That is, this PEP delineates the expected cryptographic key type and signature format included in metadata that MUST be uploaded by developers in order to support end-to-end verification of distributions.

Rationale

PEP 458 [1] proposes how PyPI should be integrated with The Update Framework (TUF) [2]. It explains how modern package managers like pip can be made more secure, and the types of attacks that can be prevented if PyPI is modified on the server side to include TUF metadata. Package managers can reference the TUF metadata available on PyPI to download distributions more securely.

PEP 458 also describes the metadata layout of the PyPI repository and employs the minimum security model, which supports continuous delivery of projects and uses online cryptographic keys to sign the distributions uploaded by developers. Although the minimum security model guards against most attacks on software updaters [5] [7], such as mix-and-match and extraneous dependencies attacks, it can be improved to support end-to-end signing and to prohibit forged distributions in the event that PyPI is compromised.

The main strength of PEP 458 and the minimum security model is the automated and simplified release process: developers may upload distributions and then have PyPI sign for their distributions. Much of the release process is handled in an automated fashion by online roles and this approach requires storing cryptographic signing keys on the PyPI infrastructure. Unfortunately, cryptographic keys that are stored online are vulnerable to theft. The maximum security model, proposed in this PEP, permits developers to sign for the distributions that they make available to PyPI users, and does not put end-users at risk of downloading malicious distributions if the online keys stored on PyPI infrastructure are compromised.

Threat Model

The threat model assumes the following:

  • Offline keys are safe and securely stored.
  • Attackers can compromise at least one of PyPI's trusted keys that are stored online, and may do so at once or over a period of time.
  • Attackers can respond to client requests.
  • Attackers may control any number of developer keys for projects a client does not want to install.

Attackers are considered successful if they can cause a client to install (or leave installed) something other than the most up-to-date version of the software the client is updating. When an attacker is preventing the installation of updates, the attacker's goal is that clients not realize that anything is wrong.

Definitions

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [13].

This PEP focuses on integrating TUF with PyPI; however, the reader is encouraged to read about TUF's design principles [2]. It is also RECOMMENDED that the reader be familiar with the TUF specification [3], and PEP 458 [1] (which this PEP is extending).

Terms used in this PEP are defined as follows:

  • Projects: Projects are software components that are made available for integration. Projects include Python libraries, frameworks, scripts, plugins, applications, collections of data or other resources, and various combinations thereof. Public Python projects are typically registered on the Python Package Index [4].
  • Releases: Releases are uniquely identified snapshots of a project [4].
  • Distributions: Distributions are the packaged files that are used to publish and distribute a release.
  • Simple index: The HTML page that contains internal links to the distributions of a project [4].
  • Roles: There is one root role in PyPI. There are multiple roles whose responsibilities are delegated to them directly or indirectly by the root role. The term "top-level role" refers to the root role and any role delegated by the root role. Each role has a single metadata file that it is trusted to provide.
  • Metadata: Metadata are files that describe roles, other metadata, and target files.
  • Repository: A repository is a resource comprised of named metadata and target files. Clients request metadata and target files stored on a repository.
  • Consistent snapshot: A set of TUF metadata and PyPI targets that capture the complete state of all projects on PyPI as they existed at some fixed point in time.
  • The snapshot (release) role: In order to prevent confusion due to the different meanings of the term "release" used in PEP 426 [1] and the TUF specification [3], the release role is renamed to the snapshot role.
  • Developer: Either the owner or maintainer of a project who is allowed to update TUF metadata, as well as distribution metadata and files for a given project.
  • Online key: A private cryptographic key that MUST be stored on the PyPI server infrastructure. This usually allows automated signing with the key. An attacker who compromises the PyPI infrastructure will be able to immediately read these keys.
  • Offline key: A private cryptographic key that MUST be stored independent of the PyPI server infrastructure. This prevents automated signing with the key. An attacker who compromises the PyPI infrastructure will not be able to immediately read these keys.
  • Threshold signature scheme: A role can increase its resilience to key compromises by specifying that at least t out of n keys are REQUIRED to sign its metadata. A compromise of t-1 keys is insufficient to compromise the role itself. Saying that a role requires (t, n) keys denotes the threshold signature property.

Maximum Security Model

The maximum security model permits developers to sign their projects and to upload signed metadata to PyPI. If the PyPI infrastructure were compromised, attackers would be unable to serve malicious versions of a claimed project without having access to that project's developer key. Figure 1 depicts the changes made to the metadata layout of the minimum security model, namely that developer roles are now supported and that three new delegated roles exist: claimed, recently-claimed, and unclaimed. The bins role from the minimum security model has been renamed unclaimed and can contain any projects that have not been added to claimed. The unclaimed role functions just as before (i.e., as explained in PEP 458, projects added to this role are signed by PyPI with an online key). Offline keys provided by developers ensure the strength of the maximum security model over the minimum model. Although the minimum security model supports continuous delivery of projects, all projects are signed by an online key. That is, an attacker is able to corrupt packages in the minimum security model, but not in the maximum model, without also compromising a developer's key.

pep-0480-1.png

Figure 1: An overview of the metadata layout in the maximum security model. The maximum security model supports continuous delivery and survivable key compromise.

Projects that are signed by developers and uploaded to PyPI for the first time are added to the recently-claimed role. The recently-claimed role uses an online key, so projects uploaded for the first time are immediately available to clients. After some time has passed, PyPI administrators MAY periodically move (e.g., every month) projects listed in recently-claimed to the claimed role for maximum security. The claimed role uses an offline key, thus projects added to this role cannot be easily forged if PyPI is compromised.

The recently-claimed role is separate from the unclaimed role for usability and efficiency, not security. If new project delegations were prepended to unclaimed metadata, unclaimed would need to be re-downloaded every time a project obtained a key. By separating out new projects, the amount of data retrieved is reduced. From a usability standpoint, it also makes it easier for administrators to see which projects are now claimed. This information is needed when moving keys from recently-claimed to claimed, which is discussed in more detail in the "Producing Consistent Snapshots" section.

End-to-End Signing

End-to-end signing allows both PyPI and developers to sign for the metadata downloaded by clients. PyPI is trusted to make uploaded projects available to clients (PyPI signs the metadata for this part of the process), and developers sign the distributions that they upload to PyPI.

In order to delegate trust to a project, developers are required to submit a public key to PyPI. PyPI takes the project's public key and adds it to parent metadata that PyPI then signs. After the initial trust is established, developers are required to sign distributions that they upload to PyPI using the public key's corresponding private key. The signed TUF metadata that developers upload to PyPI includes information like the distribution's file size and hash, which package managers use to verify distributions that are downloaded.

The practical implications of end-to-end signing is the extra administrative work needed to delegate trust to a project, and the signed metadata that developers MUST upload to PyPI along with the distribution. Specifically, PyPI is expected to periodically sign metadata with an offline key by adding projects to the claimed metadata file and signing it. In contrast, projects are only ever signed with an online key in the minimum security model. End-to-end signing does require manual intervention to delegate trust (i.e., to sign metadata with an offline key), but this is a one-time cost and projects have stronger protections against PyPI compromises thereafter.

Metadata Signatures, Key Management, and Signing Distributions

This section discusses the tools, signature scheme, and signing methods that PyPI MAY recommend to implementors of the signing tools. Developers are expected to use these tools to sign and upload distributions to PyPI. To summarize the RECOMMENDED tools and schemes discussed in the subsections below, developers MAY generate cryptographic keys and sign metadata (with the Ed25519 signature scheme) in some automated fashion, where the metadata includes the information required to verify the authenticity of the distribution. Developers then upload metadata to PyPI, where it will be available for download by package managers such as pip (i.e., package managers that support TUF metadata). The entire process is transparent to the end-users (using a package manager that supports TUF) that download distributions from PyPI.

The first three subsections (Cryptographic Signature Scheme, Cryptographic Key Files, and Key Management) cover the cryptographic components of the developer release process. That is, which key type PyPI supports, how keys may be stored, and how keys may be generated. The two subsections that follow the first three discuss the PyPI modules that SHOULD be modified to support TUF metadata. For example, Twine and Distutils are two projects that SHOULD be modified. Finally, the last subsection goes over the automated key management and signing solution that is RECOMMENDED for the signing tools.

TUF's design is flexible with respect to cryptographic key types, signatures, and signing methods. The tools, modification, and methods discussed in the following sections are RECOMMENDATIONS for the implementors of the signing tools.

Cryptographic Signature Scheme: Ed25519

The package manager (pip) shipped with CPython MUST work on non-CPython interpreters and cannot have dependencies that have to be compiled (i.e., the PyPI+TUF integration MUST NOT require compilation of C extensions in order to verify cryptographic signatures). Verification of signatures MUST be done in Python, and verifying RSA [11] signatures in pure-Python may be impractical due to speed. Therefore, PyPI MAY use the Ed25519 [14] signature scheme.

Ed25519 [12] is a public-key signature system that uses small cryptographic signatures and keys. A pure-Python implementation [15] of the Ed25519 signature scheme is available. Verification of Ed25519 signatures is fast even when performed in Python.

Cryptographic Key Files

The implementation MAY encrypt key files with AES-256-CTR-Mode and strengthen passwords with PBKDF2-HMAC-SHA256 (100K iterations by default, but this may be overridden by the developer). The current Python implementation of TUF can use any cryptographic library (support for PyCA cryptography will be added in the future), may override the default number of PBKDF2 iterations, and the KDF tweaked to taste.

Key Management: miniLock

An easy-to-use key management solution is needed. One solution is to derive a private key from a password so that developers do not have to manage cryptographic key files across multiple computers. miniLock [16] is an example of how this can be done. Developers may view the cryptographic key as a secondary password. miniLock also works well with a signature scheme like Ed25519, which only needs a very small key.

Third-party Upload Tools: Twine

Third-party tools like Twine [17] MAY be modified (if they wish to support distributions that include TUF metadata) to sign and upload developer projects to PyPI. Twine is a utility for interacting with PyPI that uses TLS to upload distributions, and prevents MITM attacks on usernames and passwords.

Distutils

Distutils [18] MAY be modified to sign metadata and to upload signed distributions to PyPI. Distutils comes packaged with CPython and is the most widely-used tool for uploading distributions to PyPI.

Automated Signing Solution

An easy-to-use key management solution is RECOMMENDED for developers. One approach is to generate a cryptographic private key from a user password, akin to miniLock. Although developer signatures can remain optional, this approach may be inadequate due to the great number of potentially unsigned dependencies each distribution may have. If any one of these dependencies is unsigned, it negates any benefit the project gains from signing its own distribution (i.e., attackers would only need to compromise one of the unsigned dependencies to attack end-users). Requiring developers to manually sign distributions and manage keys is expected to render key signing an unused feature.

A default, PyPI-mediated key management and package signing solution that is transparent [19] to developers and does not require a key escrow (sharing of encrypted private keys with PyPI) is RECOMMENDED for the signing tools. Additionally, the signing tools SHOULD circumvent the sharing of private keys across multiple machines of each developer.

The following outlines an automated signing solution that a new developer MAY follow to upload a distribution to PyPI:

  1. Register a PyPI project.
  2. Enter a secondary password (independent of the PyPI user account password).
  3. Optional: Add a new identity to the developer's PyPI user account from a second machine (after a password prompt).
  4. Upload project.

Step 1 is the normal procedure followed by developers to register a PyPI project [20].

Step 2 generates an encrypted key file (private), uploads an Ed25519 public key to PyPI, and signs the TUF metadata that is generated for the distribution.

Optionally adding a new identity from a second machine, by simply entering a password, in step 3 also generates an encrypted private key file and uploads an Ed25519 public key to PyPI. Separate identities MAY be created to allow a developer, or other project maintainers, to sign releases on multiple machines. An existing verified identity (its public key is contained in project metadata or has been uploaded to PyPI) signs for new identities. By default, project metadata has a signature threshold of "1" and other verified identities may create new releases to satisfy the threshold.

Step 4 uploads the distribution file and TUF metadata to PyPI. The "Snapshot Process" section discusses in detail the procedure followed by developers to upload a distribution to PyPI.

Generation of cryptographic files and signatures is transparent to the developers in the default case: developers need not be aware that packages are automatically signed. However, the signing tools should be flexible; a single project key may also be shared between multiple machines if manual key management is preferred (e.g., ssh-copy-id).

The repository [21] and developer [22] TUF tools currently support all of the recommendations previously mentioned, except for the automated signing solution, which SHOULD be added to Distutils, Twine, and other third-party signing tools. The automated signing solution calls available repository tool functions to sign metadata and to generate the cryptographic key files.

Snapshot Process

The snapshot process is fairly simple and SHOULD be automated. The snapshot process MUST keep in memory the latest working set of root, targets, and delegated roles. Every minute or so the snapshot process will sign for this latest working set. (Recall that project transaction processes continuously inform the snapshot process about the latest delegated metadata in a concurrency-safe manner. The snapshot process will actually sign for a copy of the latest working set while the latest working set in memory will be updated with information that is continuously communicated by the project transaction processes.) The snapshot process MUST generate and sign new timestamp metadata that will vouch for the metadata (root, targets, and delegated roles) generated in the previous step. Finally, the snapshot process MUST make available to clients the new timestamp and snapshot metadata representing the latest snapshot.

A claimed or recently-claimed project will need to upload in its transaction to PyPI not just targets (a simple index as well as distributions) but also TUF metadata. The project MAY do so by uploading a ZIP file containing two directories, /metadata/ (containing delegated targets metadata files) and /targets/ (containing targets such as the project simple index and distributions that are signed by the delegated targets metadata).

Whenever the project uploads metadata or targets to PyPI, PyPI SHOULD check the project TUF metadata for at least the following properties:

  • A threshold number of the developers keys registered with PyPI by that project MUST have signed for the delegated targets metadata file that represents the "root" of targets for that project (e.g. metadata/targets/ project.txt).
  • The signatures of delegated targets metadata files MUST be valid.
  • The delegated targets metadata files MUST NOT have expired.
  • The delegated targets metadata MUST be consistent with the targets.
  • A delegator MUST NOT delegate targets that were not delegated to itself by another delegator.
  • A delegatee MUST NOT sign for targets that were not delegated to itself by a delegator.

If PyPI chooses to check the project TUF metadata, then PyPI MAY choose to reject publishing any set of metadata or targets that do not meet these requirements.

PyPI MUST enforce access control by ensuring that each project can only write to the TUF metadata for which it is responsible. It MUST do so by ensuring that project transaction processes write to the correct metadata as well as correct locations within those metadata. For example, a project transaction process for an unclaimed project MUST write to the correct target paths in the correct delegated unclaimed metadata for the targets of the project.

On rare occasions, PyPI MAY wish to extend the TUF metadata format for projects in a backward-incompatible manner. Note that PyPI will NOT be able to automatically rewrite existing TUF metadata on behalf of projects in order to upgrade the metadata to the new backward-incompatible format because this would invalidate the signatures of the metadata as signed by developer keys. Instead, package managers SHOULD be written to recognize and handle multiple incompatible versions of TUF metadata so that claimed and recently-claimed projects could be offered a reasonable time to migrate their metadata to newer but backward-incompatible formats.

If PyPI eventually runs out of disk space to produce a new consistent snapshot, then PyPI MAY then use something like a "mark-and-sweep" algorithm to delete sufficiently outdated consistent snapshots. That is, only outdated metadata like timestamp and snapshot that are no longer used are deleted. Specifically, in order to preserve the latest consistent snapshot, PyPI would walk objects -- beginning from the root (timestamp) -- of the latest consistent snapshot, mark all visited objects, and delete all unmarked objects. The last few consistent snapshots may be preserved in a similar fashion. Deleting a consistent snapshot will cause clients to see nothing except HTTP 404 responses to any request for a target of the deleted consistent snapshot. Clients SHOULD then retry (as before) their requests with the latest consistent snapshot.

All package managers that support TUF metadata MUST be modified to download every metadata and target file (except for timestamp metadata) by including, in the request for the file, the cryptographic hash of the file in the filename. Following the filename convention RECOMMENDED in the next subsection, a request for the file at filename.ext will be transformed to the equivalent request for the file at digest.filename.

Finally, PyPI SHOULD use a transaction log [23] to record project transaction processes and queues so that it will be easier to recover from errors after a server failure.

Producing Consistent Snapshots

PyPI is responsible for updating, depending on the project, either the claimed, recently-claimed, or unclaimed metadata and associated delegated metadata. Every project MUST upload its set of metadata and targets in a single transaction. The uploaded set of files is called the "project transaction." How PyPI MAY validate files in a project transaction is discussed in a later section. The focus of this section is on how PyPI will respond to a project transaction.

Every metadata and target file MUST include in its filename the hex digest [24] of its SHA-256 [25] hash, which PyPI may prepend to filenames after the files have been uploaded. For this PEP, it is RECOMMENDED that PyPI adopt a simple convention of the form: digest.filename, where filename is the original filename without a copy of the hash, and digest is the hex digest of the hash.

When an unclaimed project uploads a new transaction, a project transaction process MUST add all new targets and relevant delegated unclaimed metadata. The project transaction process MUST inform the snapshot process about new delegated unclaimed metadata.

When a recently-claimed project uploads a new transaction, a project transaction process MUST add all new targets and delegated targets metadata for the project. If the project is new, then the project transaction process MUST also add new recently-claimed metadata with the public keys (which MUST be part of the transaction) for the project. recently-claimed projects have a threshold value of "1" set by the transaction process. Finally, the project transaction process MUST inform the snapshot process about new recently-claimed metadata, as well as the current set of delegated targets metadata for the project.

The transaction process for a claimed project is slightly different in that PyPI administrators periodically move (a manual process that MAY occur every two weeks to a month) projects from the recently-claimed role to the claimed role. (Moving a project from recently-claimed to claimed is a manual process because PyPI administrators have to use an offline key to sign the claimed project's distribution.) A project transaction process MUST then add new recently-claimed and claimed metadata to reflect this migration. As is the case for a recently-claimed project, the project transaction process MUST always add all new targets and delegated targets metadata for the claimed project. Finally, the project transaction process MUST inform the consistent snapshot process about new recently-claimed or claimed metadata, as well as the current set of delegated targets metadata for the project.

Project transaction processes SHOULD be automated, except when PyPI administrators move a project from the recently-claimed role to the claimed role. Project transaction processes MUST also be applied atomically: either all metadata and targets -- or none of them -- are added. The project transaction processes and snapshot process SHOULD work concurrently. Finally, project transaction processes SHOULD keep in memory the latest claimed, recently-claimed, and unclaimed metadata so that they will be correctly updated in new consistent snapshots.

The queue MAY be processed concurrently in order of appearance, provided that the following rules are observed:

  1. No pair of project transaction processes may concurrently work on the same project.
  2. No pair of project transaction processes may concurrently work on unclaimed projects that belong to the same delegated unclaimed role.
  3. No pair of project transaction processes may concurrently work on new recently-claimed projects.
  4. No pair of project transaction processes may concurrently work on new claimed projects.
  5. No project transaction process may work on a new claimed project while another project transaction process is working on a new recently-claimed project and vice versa.

These rules MUST be observed to ensure that metadata is not read from or written to inconsistently.

Auditing Snapshots

If a malicious party compromises PyPI, they can sign arbitrary files with any of the online keys. The roles with offline keys (i.e., root and targets) are still protected. To safely recover from a repository compromise, snapshots should be audited to ensure that files are only restored to trusted versions.

When a repository compromise has been detected, the integrity of three types of information must be validated:

  1. If the online keys of the repository have been compromised, they can be revoked by having the targets role sign new metadata, delegated to a new key.
  2. If the role metadata on the repository has been changed, this will impact the metadata that is signed by online keys. Any role information created since the compromise should be discarded. As a result, developers of new projects will need to re-register their projects.
  3. If the packages themselves may have been tampered with, they can be validated using the stored hash information for packages that existed in trusted metadata before the compromise. Also, new distributions that are signed by developers in the claimed role may be safely retained. However, any distributions signed by developers in the recently-claimed or unclaimed roles should be discarded.

In order to safely restore snapshots in the event of a compromise, PyPI SHOULD maintain a small number of its own mirrors to copy PyPI snapshots according to some schedule. The mirroring protocol can be used immediately for this purpose. The mirrors must be secured and isolated such that they are responsible only for mirroring PyPI. The mirrors can be checked against one another to detect accidental or malicious failures.

Another approach is to periodically generate the cryptographic hash of snapshot and tweet it. For example, upon receiving the tweet, a user comes forward with the actual metadata and the repository maintainers are then able to verify metadata's cryptographic hash. Alternatively, PyPI may periodically archive its own versions of snapshot rather than rely on externally provided metadata. In this case, PyPI SHOULD take the cryptographic hash of every package on the repository and store this data on an offline device. If any package hash has changed, this indicates an attack has occurred.

Attacks that serve different versions of metadata or that freeze a version of a package at a specific version can be handled by TUF with techniques such as implicit key revocation and metadata mismatch detection [2]. n

Key Compromise Analysis

This PEP has covered the maximum security model, the TUF roles that should be added to support continuous delivery of distributions, how to generate and sign the metadata of each role, and how to support distributions that have been signed by developers. The remaining sections discuss how PyPI SHOULD audit repository metadata, and the methods PyPI can use to detect and recover from a PyPI compromise.

Table 1 summarizes a few of the attacks possible when a threshold number of private cryptographic keys (belonging to any of the PyPI roles) are compromised. The leftmost column lists the roles (or a combination of roles) that have been compromised, and the columns to the right show whether the compromised roles leaves clients susceptible to malicious updates, freeze attacks, or metadata inconsistency attacks.

Role Compromise Malicious Updates Freeze Attack Metadata Inconsistency Attacks
timetamp NO snapshot and targets or any of the delegated roles need to cooperate YES limited by earliest root, targets, or bin metadata expiry time NO snapshot needs to cooperate
snapshot NO timestamp and targets or any of the delegated roles need to cooperate NO timestamp needs to coorperate NO timestamp needs to cooperate
timestamp AND snapshot NO targets or any of the delegated roles need to cooperate YES limited by earliest root, targets, or bin metadata expiry time YES limited by earliest root, targets, or bin metadata expiry time
targets OR claimed OR recently-claimed OR unclaimed OR project NO timestamp and snapshot need to cooperate NOT APPLICABLE need timestamp and snapshot NOT APPLICABLE need timestamp and snapshot
(timestamp AND snapshot) AND project YES YES limited by earliest root, targets, or bin metadata expiry time YES limited by earliest root, targets, or bin metadata expiry time
(timestamp AND snapshot) AND (recently-claimed OR unclaimed) YES but only of projects not delegated by claimed YES limited by earliest root, targets, claimed, recently-claimed, project, or unclaimed metadata expiry time YES limited by earliest root, targets, claimed, recently-claimed, project, or unclaimed metadata expiry time
(timestamp AND snapshot) AND (targets OR claimed) YES YES limited by earliest root, targets, claimed, recently-claimed, project, or unclaimed metadata expiry time YES limited by earliest root, targets, claimed, recently-claimed, project, or unclaimed metadata expiry time
root YES YES YES

Table 1: Attacks that are possible by compromising certain combinations of role keys. In September 2013 [26], it was shown how the latest version (at the time) of pip was susceptible to these attacks and how TUF could protect users against them [8]. Roles signed by offline keys are in bold.

Note that compromising targets or any delegated role (except for project targets metadata) does not immediately allow an attacker to serve malicious updates. The attacker must also compromise the timestamp and snapshot roles (which are both online and therefore more likely to be compromised). This means that in order to launch any attack, one must not only be able to act as a man-in-the-middle, but also compromise the timestamp key (or compromise the root keys and sign a new timestamp key). To launch any attack other than a freeze attack, one must also compromise the snapshot key. Finally, a compromise of the PyPI infrastructure MAY introduce malicious updates to recently-claimed projects because the keys for these roles are online.

In the Event of a Key Compromise

A key compromise means that a threshold of keys belonging to developers or the roles on PyPI, as well as the PyPI infrastructure, have been compromised and used to sign new metadata on PyPI.

If a threshold number of developer keys of a project have been compromised, the project MUST take the following steps:

  1. The project metadata and targets MUST be restored to the last known good consistent snapshot where the project was not known to be compromised. This can be done by developers repackaging and resigning all targets with the new keys.
  2. The project's metadata MUST have its version numbers incremented, expiry times suitably extended, and signatures renewed.

Whereas PyPI MUST take the following steps:

  1. Revoke the compromised developer keys from the recently-claimed or claimed role. This is done by replacing the compromised developer keys with newly issued developer keys.
  2. A new timestamped consistent snapshot MUST be issued.

If a threshold number of timestamp, snapshot, recently-claimed, or unclaimed keys have been compromised, then PyPI MUST take the following steps:

  1. Revoke the timestamp, snapshot, and targets role keys from the root role. This is done by replacing the compromised timestamp, snapshot, and targets keys with newly issued keys.
  2. Revoke the recently-claimed and unclaimed keys from the targets role by replacing their keys with newly issued keys. Sign the new targets role metadata and discard the new keys (because, as we explained earlier, this increases the security of targets metadata).
  3. Clear all targets or delegations in the recently-claimed role and delete all associated delegated targets metadata. Recently registered projects SHOULD register their developer keys again with PyPI.
  4. All targets of the recently-claimed and unclaimed roles SHOULD be compared with the last known good consistent snapshot where none of the timestamp, snapshot, recently-claimed, or unclaimed keys were known to have been compromised. Added, updated, or deleted targets in the compromised consistent snapshot that do not match the last known good consistent snapshot SHOULD be restored to their previous versions. After ensuring the integrity of all unclaimed targets, the unclaimed metadata MUST be regenerated.
  5. The recently-claimed and unclaimed metadata MUST have their version numbers incremented, expiry times suitably extended, and signatures renewed.
  6. A new timestamped consistent snapshot MUST be issued.

This would preemptively protect all of these roles even though only one of them may have been compromised.

If a threshold number of the targets or claimed keys have been compromised, then there is little that an attacker would be able do without the timestamp and snapshot keys. In this case, PyPI MUST simply revoke the compromised targets or claimed keys by replacing them with new keys in the root and targets roles, respectively.

If a threshold number of the timestamp, snapshot, and claimed keys have been compromised, then PyPI MUST take the following steps in addition to the steps taken when either the timestamp or snapshot keys are compromised:

  1. Revoke the claimed role keys from the targets role and replace them with newly issued keys.
  2. All project targets of the claimed roles SHOULD be compared with the last known good consistent snapshot where none of the timestamp, snapshot, or claimed keys were known to have been compromised. Added, updated, or deleted targets in the compromised consistent snapshot that do not match the last known good consistent snapshot MAY be restored to their previous versions. After ensuring the integrity of all claimed project targets, the claimed metadata MUST be regenerated.
  3. The claimed metadata MUST have their version numbers incremented, expiry times suitably extended, and signatures renewed.

Following these steps would preemptively protect all of these roles even though only one of them may have been compromised.

If a threshold number of root keys have been compromised, then PyPI MUST take the steps taken when the targets role has been compromised. All of the root keys must also be replaced.

It is also RECOMMENDED that PyPI sufficiently document compromises with security bulletins. These security bulletins will be most informative when users of pip-with-TUF are unable to install or update a project because the keys for the timestamp, snapshot, or root roles are no longer valid. Users could then visit the PyPI web site to consult security bulletins that would help to explain why users are no longer able to install or update, and then take action accordingly. When a threshold number of root keys have not been revoked due to a compromise, then new root metadata may be safely updated because a threshold number of existing root keys will be used to sign for the integrity of the new root metadata. TUF clients will be able to verify the integrity of the new root metadata with a threshold number of previously known root keys. This will be the common case. In the worst case, where a threshold number of root keys have been revoked due to a compromise, an end-user may choose to update new root metadata with out-of-band [27] mechanisms.

Appendix A: PyPI Build Farm and End-to-End Signing

PyPI administrators intend to support a central build farm. The PyPI build farm will auto-generate a Wheel [28], for each distribution that is uploaded by developers, on PyPI infrastructure and on supported platforms. Package managers will likely install projects by downloading these PyPI Wheels (which can be installed much faster than source distributions) rather than the source distributions signed by developers. The implications of having a central build farm with end-to-end signing SHOULD be investigated before the maximum security model is implemented.

An issue with a central build farm and end-to-end signing is that developers are unlikely to sign Wheel distributions once they have been generated on PyPI infrastructure. However, generating wheels from source distributions that are signed by developers can still be beneficial, provided that building Wheels is a deterministic process. If deterministic builds are infeasible, developers may delegate trust of these wheels to a PyPI role that signs for wheels with an online key.

References

[1](1, 2, 3) https://www.python.org/dev/peps/pep-0458/
[2](1, 2, 3) https://isis.poly.edu/~jcappos/papers/samuel_tuf_ccs_2010.pdf
[3](1, 2) https://github.com/theupdateframework/tuf/blob/develop/docs/tuf-spec.txt
[4](1, 2, 3) http://www.python.org/dev/peps/pep-0426/
[5]https://github.com/theupdateframework/pip/wiki/Attacks-on-software-repositories
[6]https://mail.python.org/pipermail/distutils-sig/2013-September/022773.html
[7]https://isis.poly.edu/~jcappos/papers/cappos_mirror_ccs_08.pdf
[8]https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html
[9]https://pypi.python.org/security
[10]https://mail.python.org/pipermail/distutils-sig/2013-August/022154.html
[11]https://en.wikipedia.org/wiki/RSA_%28algorithm%29
[12]http://ed25519.cr.yp.to/
[13]http://www.ietf.org/rfc/rfc2119.txt
[14]http://ed25519.cr.yp.to/
[15]https://github.com/pyca/ed25519
[16]https://github.com/kaepora/miniLock#-minilock
[17]https://github.com/pypa/twine
[18]https://docs.python.org/2/distutils/index.html#distutils-index
[19]https://en.wikipedia.org/wiki/Transparency_%28human%E2%80%93computer_interaction%29
[20]https://pypi.python.org/pypi?:action=register_form
[21]https://github.com/theupdateframework/tuf/blob/develop/tuf/README.md
[22]https://github.com/theupdateframework/tuf/blob/develop/tuf/README-developer-tools.md
[23]https://en.wikipedia.org/wiki/Transaction_log
[24]http://docs.python.org/2/library/hashlib.html#hashlib.hash.hexdigest
[25]https://en.wikipedia.org/wiki/SHA-2
[26]https://mail.python.org/pipermail/distutils-sig/2013-September/022755.html
[27]https://en.wikipedia.org/wiki/Out-of-band#Authentication
[28]http://wheel.readthedocs.org/en/latest/

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grants No. CNS-1345049 and CNS-0959138. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

We thank Nick Coghlan, Daniel Holth and the distutils-sig community in general for helping us to think about how to usably and efficiently integrate TUF with PyPI.

Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin Samuel helped us to design TUF from its predecessor Thandy of the Tor project.

We appreciate the efforts of Konstantin Andrianov, Geremy Condra, Zane Fisher, Justin Samuel, Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng to develop TUF.