Notes on LVI attacks

Mike Hearn

Yesterday an embargo was lifted on another Intel CPU bug. I spent many happy hours understanding the papers; here's a hopefully useful summary.


LVI won't affect Corda users or developers, now or in future. The people who have to do extra work are those who create infrastructure for enclaves, like Intel and the Conclave team here at R3. If you're not a compiler engineer then you can ignore LVI and let other people work on it for you.
Even without any work being done, LVI attacks are at this time a theoretical problem. Some writeups have made this clearer than others.
A new SGX attack can be quickly evaluated by a single question: can you extract keys from the quoting enclave? The quoting enclave is an enclave written by Intel which is involved in the rest of the SGX infrastructure. If you can steal an encryption key out of this particular enclave, you can break the security of the system completely until Intel issues upgrades to re-seal it.
Prior side channel attacks on SGX have managed to do this. Intel had to do what they call a "TCB recovery" which is a kind of software rollout across the SGX ecosystem that fixes security bugs. But LVI attacks are so difficult to pull off that neither Intel nor the academic research team that found the issue were actually able to make them work, despite the issue being under embargo for nearly a year. They were able to steal keys out of a deliberately weakened quoting enclave, but the real thing stumped them completely.
Speculative execution attacks were first announced at the start of 2018. Since then there haven't been any reported uses of the technique in the wild, at least, not as far as we know. Probably that's because these sorts of attacks are hard to pull off even in lab conditions, and LVI is at the extreme end of that. So we should keep these sorts of announcements in perspective. It seems like these sorts of attacks are getting harder and more complicated as Intel hardens the ecosystem, which the trajectory we would hope to see.


Obviously, LVI attacks may not stay theoretical. Therefore out of (in their words) "an abundance of caution" Intel has developed mitigations and is initiating a TCB recovery. TCB recovery means everyone who runs enclaves does an operating system upgrade, recompiles their enclaves/hosts and sometimes reconfigures their server BIOS. In this case that last step isn't necessary. 
Some time after it begins Intel's assessment servers begin responding that a non-upgraded machine isn't trusted anymore, so old versions are phased out. Clients of enclaves can always detect if the machine was upgraded, so they can choose to stop provisioning such enclaves with secret data.
The LVI documents are exceptionally complex even by the standards of microarchitectural side channel papers. I don't recommend reading them unless you REALLY enjoy x86 assembly. The key thing to know about the fixes is that they essentially give up on speculative execution inside enclaves. This is a powerful move that has big downsides but also a silver lining.
The downside is the obvious one: CPUs speculate because electricity moves slowly. Turn off speculative execution and your enclave has to run at the speed of the slowest electrons. How much slower you go depends a lot on what the enclave does: reported slowdowns are anywhere between 50% and 900% slower. I'd expect Corda enclaves to be at the upper end of that range. So, applying the mitigations effectively roll us back to the power of late 1990s chips.
This is bad but we don't think it forces a change of roadmap. We lose some performance in Corda by using Java instead of C++ and there are various other places where we've preferred features or productivity over raw performance. Even with a worst case 900% speed penalty, enclaves are still much faster than the alternative of ZKPs.

Mitigating the mitigations

The slowdown from the mitigations is itself mitigated by a few different factors:
1. The mitigations can be optimised, most obviously by just switching them off. For users who need more performance this will be an option, and so users who consider the risk of a theoretical attack not worth the practical costs don't need to pay them. Less obviously they can be optimised by eliminating inserted LFENCE opcodes when it can be proven safe to do so. So some of that performance can be won back via smart compilers.
2. Contract logic isn't a performance bottleneck in any Corda deployment we know of. Making something slower only matters if it's on your critical path and you can't parallelise it, but contract execution parallelises well and Corda performance tends to be dominated by disk IOPs, not verify methods.
3. Most of the attacks on SGX we've seen in the past few years are based on abusing speculative execution. Intel's latest advice is to harden enclaves by basically ending speculation inside them. The hit is big but it has a silver lining: it's very likely that future speculation attacks won't work on enclaves that have been hardened in this way. So this move may wind down this line of attacks, which would be nice.

Eventually new silicon will let the mitigations be switched off without risk, and that'll restore some of the lost performance.

Closing thoughts on SGX attacks

The lifecycle of SGX is settling down to be like that of games consoles - a long term sequence of ever more complex attacks with subsequent fixes, supported by the "TCB recovery" designed into the platform (I'm using Intel's term here, this is normally called renewability).
Most attacks on SGX we've seen so far abuse SpecEx. This new hardening step won't affect rarer kinds of attacks like voltage glitching. Now speculation is being put to bed we should expect researchers to focus their efforts elsewhere. Last year saw Plundervault which showed you can glitch an x86 core using undocumented registers - convenient, but not the only way to do glitching. This year I'm expecting to see someone attach some wires to the voltage regulator in order to perform "Son of Plundervault", along with another microcode fix and another TCB recovery.
Intel is taking on a very challenging thing by making open PC hardware tamperproof; that's never been done before. I was asked by another R3er whether AMD is more secure. Sadly the answer is no.
I'll briefly analyse AMD SEV here as part of explaining why, despite its problems, SGX is still in my view the most viable way to increase DLT privacy.
AMD's closest equivalent feature (SEV) isn't actually usable by Corda for design reasons: SEV protects a VM in such a way that it always has an owner who can see everything inside it. There's no way to do a computation that's protected from everyone at once, which means it can't be used for MPC or hiding DLT transactions. It's really only meant for letting you run a VM in the cloud with encrypted memory.
But there's another problem, which is it has a track record of being completely broken. Here's one example paper:
It says: "We show that the severity of our proposed attacks is very high as no purely software-based mitigations are possible. This effectively renders the SEV technology on current AMD Epyc CPUs useless"
SEV's equivalent of TCB recovery doesn't actually work as it shipped in a way that's incomplete (there's no way to block firmware downgrades). It feels like the team there simply ran out of time? Perhaps it's no surprise that when critical security bugs in SEV were reported to AMD, it appears they didn't bother to invoke the recovery procedure at all. So, SEV isn't renewable and has been hacked in such a way that it required new CPUs to be bought, multiple times. This doesn't look strong enough at the moment. Hopefully with AMD's ascendent fortunes they will be able to staff up the SEV team and future versions will work better.
I think this is a big reason why it's usually SGX at the center of these research papers. Partly it's due to how speculation works in the Core microarchitecture, but I think it's mainly that in academia you're rewarded for finding interesting things. Intel's tech is secure enough that breaking it requires the constant development of clever new attacks, and so far they've always been able to fix reported problems. The game is still on and there's reputations to be made. In contrast SEV was hacked by exploiting bog-standard mistakes of the type that have been common for decades, and then couldn't be fixed. There's little novelty to be found there.