It seems that AMD’s issued patch for its Zen 1 “Division by zero” bug wasn’t the end-all, be-all the company wanted it to be. While the company was fast in issuing a patch, there’s now the suspicion that they might’ve been just a bit too fast: according to Michael Larabel with Phoronix, AMD Linux Engineer Borislav Petkov published a new patch that fixed an issue with the original solution (also published by him). It’s just another datapoint on the difficulties of hardening against possible attack vectors.
The original bug related to how Zen 1 processed an integer calculation divided by 0 in certain circumstances: according to the findings, there was the possibility that AMD’s CPU kept “stale quotient data” within its registers even after the operation was fully finished, which could give attackers a window to retrieve sensitive information. The original workaround was to perform a final “dummy division 0/1 before returning from the #DE exception handler”. The idea is simple: whatever old data was still stored would be wiped upon the completion of the 0/1 division (whose result is always, well, zero).
The issue with that solution, as Petkov explained, was that by the time that security provision kicked in, the speculative execution attack would have already advanced too far: there would already be some amount of old data on AMD’s divider, which the attackers could get at before the dummy division kicked in. As Petkov explained it, his new solution now forces that same division in a number of scenarios:
“Initially, it was thought that doing an innocuous division in the #DE handler would take care to prevent any leaking of old data from the divider but by the time the fault is raised, the speculation has already advanced too far and such data could already have been used by younger operations.
Therefore, do the innocuous division on every exit to userspace so that userspace doesn’t see any potentially old data from integer divisions in kernel space.
Do the same before VMRUN too, to protect host data from leaking into the guest too.”
It’s already been a busy month for vulnerabilities in the CPU realm, with both AMD and Intel both having been hit with disclosures. From Intel’s more extreme Downfall vulnerability (affecting Skylake through Tiger Lake/Rocket Lake) through AMD’s SQUIP and Inception vulnerabilities (and the now re-fixed “divide by zero” vulnerability, researchers have been hard at work. It still doesn’t compare to the storied history of Meltdown and Spectre days (although these new bugs are also related to speculative execution vulnerabilities. Speculative execution refers to the way modern CPUs try to pre-empt calculation steps before they’ve even become necessary, so that the required data is already available in case it’s called to the execution. Yet while the fixes to some of those vulnerabilities have carried (sometimes severe) performance penalties, it’s at least a good sign that AMD’s 0/1 dummy division doesn’t come with additional overhead.
At the same time, it’s heartening to see that at least in this case, the security patch wasn’t issued in a sort of “set it and forget it” manner – with the sort of merry-go-round work that blue team experts have to carry, there were other ways this could have gone (the deficient patch could’ve been believed to fully work, leaving the door open to further hacking explorations down the road (with whatever impact those might carry).