Certificate revocation

In public key cryptography, a certificate may be revoked before it expires, which signals that it is no longer valid. Without revocation, an attacker could exploit such a compromised or misissued certificate until expiry. Hence, revocation is an important part of a public key infrastructure. Revocation is performed by the issuing certificate authority, which produces a cryptographically authenticated statement of revocation.

For distributing revocation information to clients, the timeliness of the discovery of revocation (and hence the window for an attacker to exploit a compromised certificate) trades off against resource usage in querying revocation statuses and privacy concerns. If revocation information is unavailable (either due to an accident or an attack), clients must decide whether to fail-hard and treat a certificate as if it is revoked (and so degrade availability) or to fail-soft and treat it as unrevoked (and allow attackers to sidestep revocation).

Due to the cost of revocation checks and the availability impact from potentially-unreliable remote services, Web browsers limit the revocation checks they will perform, and will fail soft where they do. Certificate revocation lists are too bandwidth-costly for routine use, and the Online Certificate Status Protocol presents connection latency and privacy issues. Other schemes have been proposed but have not yet been successfully deployed to enable fail-hard checking.

Glossary of acronyms

ACME
Automatic Certificate Management Environment
CA
certificate authority
CA/B
CA/Browser Forum
CRL
certificate revocation list
CRV
certificate revocation vector
OCSP
Online Certificate Status Protocol
PKI
public key infrastructure
TLS
Transport Layer Security

History

The Heartbleed vulnerability, which was disclosed in 2014, triggered a mass revocation of certificates, as their private keys may have been leaked. GlobalSign revoked over 50% of their issued certificates. StartCom was criticised for issuing free certificates but then charging for revocation.[1]

A 2015 study found an overall revocation rate of 8% for certificates used on the Web,[2] though this may have been elevated due to Heartbleed.[3]

Despite Web security being a priority for most browsers, due to the latency and bandwidth requirements associated with OCSP and CRLs, browsers place limits on checking certificate status.[4] In 2015, Google Chrome only actively checked Extended Validation certificates, no mobile browser performed any validity checks, and no browser fully checked all certificates.[5] Chrome and Firefox perform push-based checks for a small set of domains deemed critical.[6] Browsers show little agreement in corner cases around certificate validity, potentially confusing even experienced users.[7]

The number of certificates in the Web PKI increased massively during the last portion of the 2010s, from 30 million in January 2017 to 434 million in January 2020. A significant factor in this growth is Let's Encrypt providing free domain validated certificates. The size of the potentially-revocable set of certificates places requirements on the scalability of the revocation mechanism.[4]

Chuat et al. (2020) call revocation "notoriously challenging".[8] In 2022, RFC 9325 characterised certificate revocation as an important problem with "no complete and efficient solution". OCSP and OCSP stapling are recommended as the "foundation for a possible solution".[9]

Necessity

Certificate revocation is "an important tool" for dealing with attacks and accidental compromises. RFC 9325 places a normative requirement on TLS implementations to have some means of distrusting certificates.[9] Without revocation, an attacker can use a compromised certificate to impersonate its owner until expiry.[4]

Revocation may not be necessary for certificates of sufficiently short lifetime, roughly on the order of hours to days, which is comparable to the lifetime of an OCSP response. The associated frequent certificate issuance typically requires automation (e.g., the ACME protocol) and may stress other infrastructural elements (e.g., transparency logs).[10] Short-lived certificates also present complications with TLS connection resumption, though not necessarily insuperably.[11]

Procedure

Revocation may be initiated by a certificate holder (who, for example, may know that a private key has been compromised), who informs the CA. The CA then produces and distributes cryptographically authenticated attestations that the certificate has been revoked.[12] The CA/B requirements also allow a CA to autonomously revoke certificates if the CA is aware of a possibility of compromise.[13] Anyone may submit such evidence.[14]

Revocation statuses are not typically preserved and archived for long beyond the certificate's expiry, making research into and auditing of revocation behaviours difficult.[15] One proposal to solve this involves submitting 'postcertificates' to Certificate Transparency logs for each revocation, which would also allow revocation to be performed without action by a CA;[16] an alternative proposal, also based around Certificate Transparency, involves CRVs being sent to the CT logs by CAs.[17]

Informing clients

Considerations

Failure model

If revocation status is unavailable (which may be benign or due to an attack), a client is faced with a dilemma when evaluating a certificate: it may fail-soft and assume that the certificate is still valid; or it may fail-hard and assume that the certificate has been revoked. This is a trade-off between security and availability: failing-soft allows downgrade attacks, while failing-hard allows denial of service (from attacks) or causes unavailability.[18]

An attacker with the ability to present a compromised certificate likely also has the ability to prevent the client performing an online revocation status check; in this case, failing-soft effectively provides no protection at all. Browsers have chosen this arm of the dilemma and preferred availability over security.[19]

Failing-hard can introduce new denial-of-service attack vectors. For example, if clients expect OCSP stapling and fail-hard otherwise, a denial of service against OCSP responders is amplified to a denial of service against all services wishing to use those OCSP responses.[10]

Resource usage

There are two scenarios for the evaluation of resource usage: normal conditions, and mass revocation events. Revocation schemes should be efficient under normal conditions, and functional during mass revocation events.[4]

Retrieving revocation information incurs bandwidth and latency costs for clients.[20]

During the 2014 Heartbleed mass revocation event, where revocation rates rose from 1% to 11%, Cloudflare estimated that the bandwidth that GlobalSign used to distribute CRLs could have cost 400,000 USD (equivalent to 510,000 USD in 2023[21]).[22]

Timeliness

If revocation status is not freshly retrieved for every check (e.g., due to caching or periodic retrievals), there is a delay between a certificate being revoked and all clients being guaranteed to be aware of the revocation. This presents a trade-off between latency, efficiency, and security: longer cache times or less-frequent updates use fewer resources and reduce latency but mean that a compromised certificate can be abused for longer.[23]

Privacy

Clients that perform pull-based checks can leak the user's browsing information to third parties, namely the distributor of revocation information.[24]

Auditability

It is desirable to avoid creating a trusted third party in a PKI. If an actor within a scheme is auditable, then clients (or agents acting on behalf of clients) can provably verify that an actor is behaving correctly.[25]

Deployability

A revocation status distribution that places heavy burdens on CAs may not succeed, especially if the CA is unable to derive countervailing benefits from implementation. Reducing the number of parties that must make changes to adopt it also eases deployability: potentially involved are CAs, clients, server administrators, and server software producers. Forwards compatibility is a double-edged sword, in that old clients and servers will not be disrupted by a new scheme, but their users may not realise that they are missing the benefits of revocation.[26]

Architectures

There are three broad architectures for how clients access revocation status: pull-based, where clients retrieve revocation status at validation time; push-based, where clients retrieve revocation status ahead of validation and cache it; and network-assisted, where revocation checking is tightly integrated with the TLS protocol and separate checks may not be needed.[27]

Pull-based checking typically has latency and availability issues. Clients performing pull-based checking will typically cache responses for a short period. A purely pull-based check combined with failing-soft does not add security.[28]

Push-based checking is less bandwidth efficient than pull-based checking, but gains availability and privacy. Different methods may be used on a certificate-by-certificate basis, allowing for fine-tuning the trade-off: both Google Chrome and Mozilla Firefox perform push-based checks on a small set of critical certificates.[28]

Certificate revocation lists

A certificate revocation list (CRL) enumerates revoked certificates. They are cryptographically authenticated by the issuing CA.[29]

CRLs have scalability issues, and rely on the client having enough network access to download them prior to checking a certificate's status.[9]

A CRL contains information about all of the certificates revoked by a CA, which means distributors and clients must incur transfer costs for information that is likely irrelevant.[30] A 2015 study found that the median certificate had a CRL with size 51 kB, and the largest CRL was 76 MB.[2]

OCSP

The Online Certificate Status Protocol (OCSP) allows clients to interactively ask a server (an OCSP responder) about a certificate's status, receiving a response that is cryptographically authenticated by the issuing CA.[29] It was designed to address issues with CRLs.[30] A typical OCSP response is less than 1 kB.[31]

OCSP suffers from scalability issues. It relies on the client having network access at the time of checking the certificate's revocation status; further, the OCSP responder must be accessible and produce usable responses, or else the check will fail and the client must choose between failing-soft and failing-hard. Many certificate authorities do not publish useful OCSP responses for intermediate certificates.[9]

As requests to the responder are made in response to users' browsing, OCSP responders can learn about the users' browsing, which is a privacy issue. It also introduces latency to connections, as the responder must be queried before a new connection can be used.[18]

A 2018 study found that 1.7% of requests to responders were unavailable at the network level, and a further c. 2% produced unusable OCSP responses, with significant hetereogeneity across CAs and client vantage points.[32]

OCSP stapling

OCSP stapling is a TLS extension providing for OCSP responses being provided to the client, together with the certificate, at connection initiation.[30]

OCSP stapling can solve the operational challenges of OCSP, namely additional network requests causing latency and privacy degradation.[33] However, it can be susceptible to downgrade attacks by an on-path attacker.[9] RFC 7633 defines an extension that embeds a requirement into a certificate to be stapled to a valid OCSP response.[34] With this extension, stapling can be effective for the case where a certificate has been compromised after proper issuance; however, if a certificate can be misissued without the extension, stapling may not provide any security.[35]

Beyond clients and CAs enabling stapling and the must-staple extension, server administrators must also take action to support stapling by regularly retrieving responses and then providing them to clients during the handshake. In 2018, only Firefox supported must-staple, and neither of the two most-used Web servers (Apache httpd and Nginx) supported OCSP stapling at all.[36]

CRLite

CRLite provides revocation statuses by a cascade of Bloom filters. A single filter constructed from a list of revoked certificates produces false positives. With an open domain, this is an insuperable problem for revocation checking. However, by using Certificate Transparency to enumerate all unexpired certificates, an exhaustive list of false positives can be produced. This list is then used to construct a second filter, which is consulted if a certificate matches the first (and hence has a strictly smaller domain); if the second filter does not match, then it is a true positive and the certificate has been revoked; however, a match in the second filter may be a false negative, necessitating a third filter, and so on. As the universe is finite and the domain of each filter strictly decreases at each step, this procedure produces a finite filter cascade.[37]

CRLite allows clients to fail-hard.[23]

The revocation status of all certificates in the Web PKI in January was estimated to be 10 MB in size when using the Bloom filter cascade, with updates of 580 kB per day. In March 2018, this had grown to 18 MB.[28] In a simulation with 100 million certificates, a 1% daily expiration rate, and a 2% revocation rate, CRLite required an initial provision of 3.1 MB, and then 408 kB per day for updates.[38]

As all clients retrieve the same information, CRLite has no privacy concerns.[23]

CRLite has not yet been widely deployed.[9] It is, however, deployable, only requiring an aggregator to retrieve CRLs from CAs and then provide the filter cascade and updates to it, and for clients to use it; no action from CAs is needed, and nor is any needed from certificate holders.[23] The aggregator does not need to be a trusted third party: the filter cascade can be audited to prove that it accurately reflects the input CRLs.[39] Private CAs also require special handling within CRLite.[40]

Let's Revoke

Let's Revoke uses bit vectors of revocation statuses (called certificate revocation vectors, or CRVs) to allow large amounts of revocation statuses to be efficiently retrieved by clients.[4] CAs generate CRVs for their own certificates, with one CRV per expiration date. CRV maintenance for CAs is linear in the number of certificates issued. CAs must add a new field, a revocation number, to each issued certificate, allowing certificates from a single CA to be identified by a tuple of certificate expiration date and revocation number; this tuple allows a client to efficiently locate a bit giving the identified certificate's status within the CRV. CRVs may be compressed; they are expected to compress very well, as most bits will be unset most of the time. As each CRV is associated with a fixed expiration date, old CRVs can be efficiently discarded. Updates to CRVs are batched, with the update timestamped and signed for distribution to clients.[41] The updates may be in one of three forms, with the optimal choice depending on the revocation rate, allowing for both efficient normal operation and mass revocation events.[42]

CRVs are expected to be small enough to enable push-based checking, but more constrained clients may still perform pull-based checks, only accessing select CRVs, or deferring retrieval of CRVs until certificate validation.[43] A client using Let's Revoke with push-based checking is able to fail-hard for any certificate with a revocation number.[23] The privacy impact and availability of Let's Revoke depends on the architecture: if all checks are push-based, then there is no privacy leakage and a reduced vulnerability to denial-of-service or downtime; if pull-based checks are used, however, some information about the user's activities is leaked (in the form of which CRVs are accessed), and the CRVs may be inaccessible at validation time.[23]

In a simulation with 100 million certificates, 1% daily expiration rate, and a 2% revocation rate, Let's Revoke required an initial provision of 2.2 MB, and then 114 kB per day for updates.[44]

Let's Revoke has not yet been widely deployed.[9] Besides client implementations, it requires CAs to make operational changes,[45] and does not provide as much information as CRLs or OCSP (only a bit per certificate for validity); CRLs or OCSP may still be used to supplement Let's Revoke and provide that additional information.[46] Deployment can be performed CA by CA, with clients benefitting from fail-hard behaviour incrementally. Due to the efficiency of CRVs over CRLs and OCSP responses, CAs may be incentivised to deploy Let's Revoke.[45]

Other proposals

Private information retrieval techniques can allay the privacy concern with pull-based checks.[47] Rather than clients performing revocation checks, a middlebox could instead, centralising the cost of revocation checking and amortising it across many connections; the clients need to dedicate no storage to revocation information.[48] Another proposal involved broadcasting revocation information on FM radio.[37]

References

  1. ^ Durumeric et al. 2014, p. 482.
  2. ^ a b Liu et al. 2015, p. 184.
  3. ^ Liu et al. 2015, p. 187.
  4. ^ a b c d e Smith, Dickinson & Seamons 2020, p. 1.
  5. ^ Liu et al. 2015, p. 190.
  6. ^ Bruhner et al. 2022, p. 2.
  7. ^ Wazan et al. 2017, IV. Conclusion.
  8. ^ Chuat et al. 2020, p. 3.
  9. ^ a b c d e f g Sheffer, Saint-Andre & Fossati 2022, 7.5. Certificate Revocation.
  10. ^ a b Smith, Dickinson & Seamons 2020, p. 4.
  11. ^ Chuat et al. 2020, p. 9-10.
  12. ^ Chung et al. 2018, p. 3.
  13. ^ CA/B 2022, p. 54-55.
  14. ^ CA/B 2022, p. 56.
  15. ^ Korzhitskii & Carlsson 2021, p. 1.
  16. ^ Korzhitskii, Nemec & Carlsson 2022, p. 1.
  17. ^ Leibowitz et al. 2021, p. 7-8.
  18. ^ a b Larisch et al. 2017, p. 542.
  19. ^ Smith, Dickinson & Seamons 2020, p. 2.
  20. ^ Liu et al. 2015, p. 183.
  21. ^ 1634–1699: McCusker, J. J. (1997). How Much Is That in Real Money? A Historical Price Index for Use as a Deflator of Money Values in the Economy of the United States: Addenda et Corrigenda (PDF). American Antiquarian Society. 1700–1799: McCusker, J. J. (1992). How Much Is That in Real Money? A Historical Price Index for Use as a Deflator of Money Values in the Economy of the United States (PDF). American Antiquarian Society. 1800–present: Federal Reserve Bank of Minneapolis. "Consumer Price Index (estimate) 1800–". Retrieved 29 February 2024.
  22. ^ Prince 2014.
  23. ^ a b c d e f Smith, Dickinson & Seamons 2020, p. 10.
  24. ^ Chuat et al. 2020, p. 11.
  25. ^ Larisch et al. 2017, p. 540.
  26. ^ Chuat et al. 2020, p. 11-12.
  27. ^ Smith, Dickinson & Seamons 2020, p. 2-3.
  28. ^ a b c Smith, Dickinson & Seamons 2020, p. 3.
  29. ^ a b Larisch et al. 2017, p. 541.
  30. ^ a b c Liu et al. 2015, p. 185.
  31. ^ Liu et al. 2015, p. 189.
  32. ^ Chung et al. 2018, p. 6-7.
  33. ^ Chung et al. 2018, p. 4.
  34. ^ Hallam-Baker 2015, p. 1.
  35. ^ Hallam-Baker 2015, p. 7.
  36. ^ Chung et al. 2018, p. 2.
  37. ^ a b Larisch et al. 2017, p. 543.
  38. ^ Smith, Dickinson & Seamons 2020, p. 8-10.
  39. ^ Larisch et al. 2017, p. 548-9.
  40. ^ Larisch et al. 2017, p. 548.
  41. ^ Smith, Dickinson & Seamons 2020, p. 4-5.
  42. ^ Smith, Dickinson & Seamons 2020, p. 6.
  43. ^ Smith, Dickinson & Seamons 2020, p. 7-8.
  44. ^ Smith, Dickinson & Seamons 2020, p. 8-9.
  45. ^ a b Smith, Dickinson & Seamons 2020, p. 10-11.
  46. ^ Smith, Dickinson & Seamons 2020, p. 8.
  47. ^ Kogan & Corrigan-Gibbs 2021, p. 875-876.
  48. ^ Szalachowski et al. 2016.

Works cited