First, a recap of the vulnerability
For the first time, performance-enhancing features of most modern processors (known as out-of-order superscalar execution, speculative prediction, and HW caching) were discovered to contain (as a unit) a flaw that allows unauthorized disclosure of information. The fact that this flaw is at the processor level differentiates it from other potential vulnerabilities in terms of the number and variety of systems impacted.
These vulnerabilities allow an authenticated attacker with access to a company’s system to execute code that may compromise data currently being processed on the system within other processes (Spectre) or the kernel (Meltdown). This means the attacker must have physical or logical access of the system to exploit, or has exploited a separate vulnerability to be able to take advantage of these processor level vulnerabilities remotely. Memory (data) controlled by one process is not typically able to be accessed by another process. These vulnerabilities circumvent current protections and currently have publicly available exploit code.
This exposure means that passwords, documents, emails and other data residing on affected systems may be at risk. In a shared services environment, such as many cloud environments, there is the risk of one customer using the attack to access data of another customer being processed on the same hardware.
But, the patch fixed it right? Well, sort of
The Meltdown and Spectre vulnerabilities are hardware issues caused by design flaws in the processor chips themselves. Over the past several decades, chipmakers have been innovating and testing new designs to make their chips faster and more efficient; it was these efforts that played a role in bringing Moore’s Law to reality.
As scalar, superscalar, and out-of-order processing architectures became unable to meet performance demands, along came branch prediction and speculative execution, which we now know are susceptible to these side channel cyber-attacks. This innovative architecture (combined with the HW cache subsystem) was adopted by every major chipmaker in the world leaving virtually every device at risk.
While the patches that were released and are being implemented do indeed remediate the vulnerability, addressing hardware issues with software always impacts performance. The magnitude of this performance impact depends entirely on the environment and workload of the processor and cannot be accurately quantified without a case-specific empirical study. Plus, some of the firmware updates provided by vendors impacted the stability of the server systems! So, the question remains, how do we get our chips back to the performance level that we planned for, paid for, and relied upon?
Back to the drawing board
The short answer is that we need a new chip architecture. But, this is an extremely complex problem to be solved for that could take some serious time. Effectively, these vulnerabilities have forced chip architects and computer scientists around the world to come up with a brand-new chip design that matches or beats today’s performance without the performance-enhancing processes being used today. And, as it turns out, there is no clear answer. What’s more, as early as February, NVIDIA and Princeton released a white paper outlining alternative approaches to exploiting these vulnerabilities that will have to be accounted for in the new design.
A hardware refresh might just be the beginning
If a new chip architecture is developed and made widely available, likely not to be before 2019, it may cause a massive impact to the software community as well. As this new chip design will likely be a revolutionary change, rather than an evolutionary one, today’s software ecosystem including operating systems, compilers, and applications will have to update accordingly. This could cause major rework for software vendors, researchers, and companies that develop their own business applications.
The bottom line
The fact is, business-critical hardware in production today is vulnerable to attack and patching this hardware will have a material impact on performance, potentially causing major capacity issues. Even if alternative hardware does become available, it will likely require time-intensive, expensive refactoring of software on top of the explicit cost of the hardware refresh. This impact will not be limited to on-premises environments and will likely have an even larger impact on cloud infrastructure providers and tenants.
Fun fact: The small and affordable computer Raspberry Pi remains some of the only hardware not vulnerability to Meltdown & Spectre due to their lack of use of speculative execution.