Re: CRL for the node Identities

Mike Hearn
 

It's harder than revocation being disruptive, unfortunately. Blockchain and revocation/expiry are a new combination and there are fundamental design questions about what they mean. Perhaps you can invite your corporate security team to take part in this thread? They may have useful ideas to contribute, and I know other users are also starting to look at this topic too. It's possible we can find a way forward that works.

The current status is that revocation is supported for TLS certs only, however, the tradeoffs we face for TLS certs are the same as the browser makers face. Browsers universally ignored the PKIX revocation infrastructure because they feel it doesn't work well enough - in other words, on the web revocation is a placebo. It looks like it makes you healthier but in fact does nothing. Google TLS engineer Adam Langley does a great job of explaining why browsers don't support revocation in this essay, again in this one, and in this post he recommends everyone  disable revocation checking everywhere. For these reasons we support a "soft fail" mode and the next revocation related feature we plan to add is the ability to switch off checking entirely, for those node operators who review the arguments and conclude the browser makers are correct. After that we have some more work to do on making CRL checking more robust and less risky, but again, it's all for TLS certs only.

Pure revocation of non-TLS certificates (keys) raises fundamental product design questions that appear to have no obvious answers, so we have no plans to add support at this time. Expiry is largely similar so I won't deal with it separately. 

On the other hand what might work is a Corda specific key change mechanism designed very very carefully (i.e. no guarantees can be made about when or even if it'd be released). This would not be PKIX/X.509 revocation based, so I don't know what your security team would think about it - that's why it'd help to get them involved.

What do other platforms do? Public blockchains have no support for either p2p encryption (TLS) nor revocation or expiry of on-ledger keys. In fact they don't use certificates anywhere. Hyperledger Fabric uses PKIX like we do, but in the docs we find this paragraph:

It is important to note that MSP identities never expire; they can only be revoked by adding them to the appropriate CRLs. Additionally, there is currently no support for enforcing revocation of TLS certificates.
Fabric defines revocation of an identity cert as "not being a member of the organisation anymore" but their docs provide no details about any of the issues we're about to discuss; it's probable they haven't been considered and thus I'd question whether this feature will ever be useful for real.

There are 4 questions raised by revocation:

  1. In what situations is a key revoked?
  2. How do nodes learn that a key has been revoked?
  3. What do they do in response?
  4. What happens after that?
In the following discussion "public key" and "certificate" can be treated as synonymous.

For (1) it may seem obvious that you revoke a public key when the accompanying private key has been compromised and is therefore no longer private, but on the web most revocations are administrative. "Revoked" is a binary status but it gets used for very different use cases. Let's ignore that here and pretend all revocations will be to key compromise, to keep the rest of the analysis tractable.

(2) turns out to be a much harder problem than it looks, at least in large networks. None of the PKIX revocation protocols were analysed deeply enough when they were designed so none of them really work. Adam Langley's essays linked above go into detail so I won't repeat them here. 

The blockchain-specific issues start with (3) and (4). Let's look at (3). Revocation is just a warning from the certificate authority that a key shouldn't be trusted anymore,  but it's up to individual programs to decide how to handle that. We could:

  1. Refuse to accept new transactions signed by that key.
  2. Retroactively refuse to consume states that were created in a tx signed by that key.
  3. Hide any states that involve that key.
  4. Initiate some more complex anti-crime procedures,  like a decentralised traitor trace envisioned in section 5.5 of the tech white paper.

(a) is unworkable because that would permanently freeze every part of the ledger involving the revoked key. This doesn't correspond to any normal or useful business operation. Now you may be thinking, ah ha, but what about having a way to replace keys? Well, for now we're just looking at revocation, key replacement will be tackled separately below.

(b) is unworkable because it'd lead to software showing users that they own assets or are involved in deals that can be changed in some way, and then they'd get an error at the last possible moment when the node does the revocation check. If the user isn't in direct control of which states are being picked e.g. during automatic coin/token selection to craft a spend, then the node will deadlock because it'll keep trying to build transactions involving revoked keys. Fixing this would create an explosion of complexity in both the node and people's apps as we plumb "visible but unusable" as a concept everywhere. That in turn would imply huge delays to other features we're planning to add.

So it'd be highly disruptive to the project and still not do anything useful in a business sense: this leads to Roman's scenario where you end up with dead states you can never get rid of, because someone way back in the chain of custody announced they'd screwed up. Even when the government seizes funds, ledgers are updated to reflect that the government is the new owner but this wouldn't support that.

Also consider what happens if a regulator or oracle key is revoked. All trading on a business network would stop, with no way back except "unrevocation". But you can't "uncompromise" a key, so that'd imply revocation was being used in ways we're not designing for.

(c) solves the node deadlock and app complexity issues but would mean that your Corda ledger gets out of sync with what you think you own in case of a revocation:  huge deals could vanish without you doing anything. That's not acceptable for a system that aims to be a system of record, and again, doesn't correspond with any actual business process.  Counterparties or counterparties-of-counterparties making mistakes doesn't cause stuff to automatically and permanently vanish from your own ledgers, at least not in any existing IT system I'm aware of. Maybe I'm wrong here!

(d) gets us closer to some kind of real world business process: if your key was compromised and it turns up being used, there should probably be a criminal investigation of some sort. The business may discover their key was stolen only long after it's been abused, so you need some notion of retroactive revocation alerting node operators to help in tracing the stolen funds.  But it's a huge amount of work and won't happen for years,  if ever.

Now let's look at (4) - "what happens after that?".

You may object that I'm making all this too complicated, that all we need is a way to not only mark a key as revoked but also signal a replacement key of some kind so trading can continue.  PKIX has no support for this so we'd need a Corda specific protocol, but OK, let's study it.

There are two approaches I can see to add such a feature:

  1. The old key signs a data structure containing the new key and distributes it.
  2. The zone operator (identity root CA) starts publishing via the network map a notification that two certs A and B that share the same X.500 name should be treated as B replacing A.
Approach (i) is nicely decentralised, but as an attacker the first thing I'll do when compromising your key is announce a change to my own new key. You can then revoke your old key at your leisure but it'll accomplish nothing because everyone has moved onto "your" (my) new key. But you can't revoke the new key because you don't own it. You could sign your own key change message but it'd come after the attack not before, so we have no way to figure out which one was real.

Whether approach (ii) works depends heavily on the exact details of the implementation. If the operator can simply announce a new certificate for an identity, it'd grant the zone operator arbitrary write access (i.e. root access) to the ledger. If it's not clear why I can go into more details on this. If there's an organisation that can read and write to the ledger at will then you may as well just pay them to host a big centralised SQL database and simplify everyone's lives. Blockchains don't support any form of identity based revocation or key change for this reason: it'd make them centralised and therefore pointless.

But maybe we can fix approach (ii) by demanding that any notification published to the network is signed by both the zone operator and key A together? This would prevent the operator taking unilateral control of the ledger, and avoid the problematic design of PKIX revocation.

This idea is way under-analysed: because that replacement needs to be atomic and immediate, it would require new features to be added to the notaries and that in turn opens the question of what happens if you try to change a notary key (which is compound). It may prove useless if any plausible attack that could compromise a private key would also make it possible to beat the zone operator's ID verification. If security teams insist on support for PKIX as well then it gets us nowhere.  There are likely many other problems and edge cases currently unconsidered.

For these reasons revocation and expiry will most likely never work on any blockchain system,  and key changes may or may not arrive any time soon. Protecting your keys well using HSMs is the best recommendation we can give for now.

thanks,
-mike

Join corda-dev@groups.io to automatically receive all group messages.