Beyond Over-Protection: A Targeted Approach to Spectre Mitigation and Performance Optimization

Tiziano Marinaro CISPA Helmholtz Center for Information Security, Saarland University tiziano.marinaro@cispa.de , Pablo Buiras KTH Royal Institute of Technology buiras@kth.se , Andreas Lindner KTH Royal Institute of Technology andili@kth.se , Roberto Guanciale KTH Royal Institute of Technology robertog@kth.se and Hamed Nemati KTH Royal Institute of Technology hnnemati@kth.se

(2023)

Abstract.

Since the advent of Spectre attacks, researchers and practitioners have developed a range of hardware and software measures to counter transient execution attacks. A prime example of such mitigation is speculative load hardening in LLVM, which protects against leaks by tracking the speculation state and masking values during misspeculation. LLVM relies on static analysis to harden programs using slh that often results in over-protection, which incurs performance overhead. We extended an existing side-channel model validation framework, Scam-V, to check the vulnerability of programs to Spectre-PHT attacks and optimize the protection of programs using the slh approach. We illustrate the efficacy of Scam-V by first demonstrating that it can automatically identify Spectre vulnerabilities in real programs, e.g., fragments of crypto-libraries. We then develop an optimization mechanism that validates the necessity of slh hardening w.r.t. the target platform. Our experiments showed that hardening introduced by LLVM in most cases could be significantly improved when the underlying microarchitecture properties are considered.

hardware security, side-channel attacks, countermeasures, Spectre

^†^†copyright: acmcopyright^†^†journalyear: 2023^†^†doi: XXXXXXX.XXXXXXX^†^†conference: Proceedings of the 2024 ACM ASIA Conference on Computer and Communications Security; July 1–5, 2024; Singapore.^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Security and privacy Side-channel analysis and countermeasures

1. Introduction

Refer to caption — Figure 1. Scam-V’s workflow. Modules in bold boxes are those that have been added or modified in this paper.

The past decade has witnessed a surge in the number of microarchitectural attacks that exploit hardware side channels to exfiltrate secret information. The prime example of such attacks is the Spectre attack family (Kocher et al., 2019) which leverages transient (speculative) execution to leak data through caches. To counter transient execution vulnerabilities, researchers and practitioners have developed several analysis techniques as well as hardware and software measures (Hu et al., 2023; Cauligi et al., 2022). Examples include fence instructions, like x86 lfence, which are serializing instructions to mitigate the side effects of speculative execution. Another example of software-based mitigation is Speculative Load Hardening (slh) (Carruth, 2020), which protects against transient execution leakages by tracking the speculation state and masking values during misspeculation.

Unfortunately, to date, almost all existing mitigations against transient execution leakages are either incomplete and miss attacks or overly conservative and slow (Cauligi et al., 2022). For example, the only mitigation that guarantees complete protection against the Spectre-V1 (-PHT) attack on commodity processors are memory fences, such as lfence, csdb (used in the implementation of the slh pass for AArch64), and dsb+isb. Nevertheless, using too many fences (over-fencing) hinders performance (e.g., inserting a fence at every load or control flow point can incur around 440% overhead (Oleksenko et al., 2018)), while using too few fences (under-fencing) may allow unexpected leakage to occur. There are also tools such as LLVM (Lattner and Adve, 2003) that harden programs, e.g., using slh, against transient leakages. However, almost all existing software-based solutions rely on static analysis and follow a conservative approach to harden programs and thus suffer from over-protection. Moreover, different processors implementing the same architecture can have substantially different speculative leakage. This suggests we may need to rethink the way we analyze and protect programs against transient attacks.

Our Approach. In this paper, we present a systematic approach to identifying conditions (code alignment and compiler optimization level) under which programs become vulnerable to Spectre-PHT attacks and finding an optimal hardening scheme to protect programs against potential leakages. Our approach relies on relational testing w.r.t the underlying processor’s real implementation to determine the necessity of the hardening applied by state-of-the-art approaches like the LLVM compiler infrastructure.

We build on an existing platform, Scam-V (Nemati et al., 2020a; Buiras et al., 2021), which automates the validation of abstract side-channel models via relational testing. Fig. 1 shows the Scam-V’s workflow. The choice of Scam-V is supported by the fact that compared to existing tools like Revizor (Oleksenko et al., 2022), Scam-V requires fewer test cases to disclose leakages due to its testing approach and internal optimizations (see 2.3.1). Scam-V, first, generates test cases consisting of a program and two input states that are in an equivalence relation that is automatically synthesized w.r.t model under test. Then, Scam-V executes the test cases on real hardware and measures the side channel to find cases that invalidate the input model.

The current implementation of Scam-V is insufficient for our purposes, and several changes were required to enable Scam-V to test the information-flow security of real-world programs, e.g., fragments of OpenSSL (Sec. 5). Sections 3 and 4 describe our methodology and changes to the Scam-V’s pipeline. Particularly, instead of testing random binaries, we feed Scam-V with slices taken from the binary of programs under test. Moreover, Scam-V uses symbolic execution to synthesize the equivalence relation for the input states. Nevertheless, the existing simple symbolic execution engine in Scam-V could not handle slices taken from real programs. Therefore, we further connect Scam-V to the angr framework (Shoshitaishvili et al., 2016); Sec 4 elaborates on the details of this integration. Also, we have generalized the refinement technique that Scam-V uses to generate inputs that are more likely to trigger leakages (Buiras et al., 2021). Finally, we have connected Scam-V to LLVM and developed a compiler pass to optimally protect programs against leakages.

Results. Our experiments highlight several interesting results, showing that (i) unexpected microarchitectural details like the alignment¹¹1Alignment refers to the process of arranging the binary instructions in memory. of the code executing on the processor can affect the leakage behavior of programs; (ii) most hardening applied by the current approaches are not required in real executions; and (iii) there are transient leakages like the one presented in (Buiras et al., 2021) that are not mitigated by the implementation of existing mitigations such as LLVM AArch64 slh²²2This case has been theoretically proved (Patrignani and Guarnieri, 2021) before..

LLVM conservatively hardens programs to stop transient execution leakages. For example, in LLVM AArch64 slh pass, almost every load instruction is hardened. However, our experiments show that hardening programs to stop potential leakages strictly depends on the hardware platform that programs execute on. That is, a program that leaks on a specific architecture or processor does not necessarily show the same behavior when it executes on a different architecture or core. This behavior is witnessed by a few cases from the Kocher Spectre benchmarks, e.g., Case #10 (Kocher, 2018). While all existing analysis tools and techniques, e.g., (Guarnieri et al., 2021; Cauligi et al., 2020; Mosier et al., 2022a; Daniel et al., 2021; Wang et al., 2021), classify this case as a leaky program and conservatively protect it using fences, we verified that this test case does not induce any leakages on ARM processors we have tested (Cortex-A53 and -A72), and therefore, there is no need for any protection on these cores. Based on this insight, we have built a toolchain to examine the vulnerability of programs w.r.t the target hardware platform, and optimally harden them to stop the found leakages.

2. Background

2.1. Side channels

Side channels are unintended information flow paths that are potentially exploitable by a malicious process to exfiltrate secrets from the memory of other processes. A prime example of side-channel vulnerabilities is Spectre attacks. These attacks are characterized by a speculation primitive (Miller, 2018), which enables an instruction that accesses a secret to supply it to a transient transmitter (Yu et al., 2019) that leaks it. Speculation primitive is some hardware mechanism that initiates speculative execution. Examples include control-flow instructions (e.g., conditional or indirect branches (Kocher et al., 2019), return instructions (Koruyeh et al., 2018)), which predict the intended value of the program counter and loads which predict the effective addresses of prior unresolved stores (Horn, 2018).

Spectre attacks usually leak data via data caches, which can be measured by an attacker using techniques such Flush+Reload (Yarom and Falkner, 2014). In Flush+Reload, the attacker flushes cache entries, allowing the victim to (normally or speculatively) execute a secret-dependent load, after which the attacker measures the reload time. This reloads time enables the attacker to determine if the victim has accessed any cache entries during its execution and subsequently enables the attacker to extract the secret.

Running example. We introduce the example of Fig. 2 to show the main concepts of our approach. The example is a simplified version of Case #10 from the benchmarks used in (Kocher, 2018; Guarnieri et al., 2020; Cauligi et al., 2020). The program consists of two nested conditional statements, where the first one checks the bound of an accessed index within the public array $A$ . The load of the first element from the array $B$ is then decided by the inner if statement by checking whether or not the accessed value in $A$ based on the provided index is equal to $k$ .

When the processor encounters the first if statement, based on its internal state and the pipeline load, it may decide to speculatively execute the read from the array $A$ even when $idx$ is greater or equal to $A\_size$ . This potentially enables an attacker who controls the value of $idx$ and $k$ to get access to sensitive information inferred from the comparison in the inner condition. Specifically, the attacker can infer the value of data loaded in speculation by executing $A[idx]$ when the $B[0]$ shows up in the cache, indicating the condition in the inner if statement holds.

When analyzing the security of a program, it is essential to consider the security level of variables to get a precise understanding of the program leakage potential and decide the specific mitigation which must be applied. For instance, depending on whether $idx$ is deemed private or public, our running example leaks differently. While in the latter case, the leakage happens when the execution reaches $B[0]$ , i.e., the inner conditional expression holds, the former case shows different leakage at the point $A[idx]$ is executed.

2.2. Mitigations of transient execution attacks

Spectre attacks can be mitigated through a combination of hardware and software measures. Using fence instructions or speculative load hardening are examples of such measures.

Fence instructions.

To mitigate Spectre-PHT, ARM architecture includes different fences like dmb (data memory barrier), dsb (data synchronization barrier), and isb (instruction synchronization barrier) instructions. Another AArch64 barrier, which is used by LLVM’s slh pass, is csdb or Consumption of Speculative Data Barrier. csdb restricts speculative execution and data value prediction. Once csdb is executed, no instruction other than branch instructions in program order can be speculatively executed using the results of data value predictions or PSTATE predictions of any instructions appearing before csdb that have not been resolved. This allows for control flow speculation before and after csdb and permits speculative execution of conditional data processing instructions after csdb, as long as they don’t use the results of predictions made before csdb.

Speculative load hardening

An alternative approach is speculative load hardening (slh) (Carruth, 2020), which masks addresses or values of loads inside in a conditional branch with the branch predicate. The idea of slh is to maintain a predicate indicating if the execution is currently in a mispredicted branch or not. This predicate is then used to “poison” either the outputs (i.e., values) or inputs (i.e., addresses) of load instructions. Since slh limits the amount of speculative execution that can be performed, it also impacts the programs’ performance. However, compared to other mitigation, slh is more efficient, with a speed improvement of approximately 1.77 times. The overhead of load hardening is expected to range from 10% to 50%, with most large applications experiencing an overhead of $\sim$ 30% (Carruth, 2018).

LLVM AArch64 slh

The LLVM slh pass for AArch64 uses taint tracking (blue lines) to find program points that must be protected.

The snippet shows the hardened version of our example using the LLVM slh pass. AArch64 slh reserves the register x16 as the taint register, which contains all-ones when no misspeculation happens, and all-zeros when misspeculation is detected. Mask operations (red lines) are inserted using the and instruction, and misspeculation is tracked using a data-flow conditional select instruction (csel) based on the evaluation of CPU flags, which can be limited with csdb to avoid its speculation. When a conditional branch direction gets misspeculated, the semantics of the inserted csel instruction is such that the taint register will contain all zero bits.

AArch64 slh limitation

Current implementation of the AArch64 slh can only protect against data leak through memory, and it does not prevent leakages through registers. An example of the latter case is when $idx$ in our running example is labelled as private and the memory load at line 12 in the optimized (-O2) assembly code of Fig 2 leaks the value of $idx$ .

Besides, slh conservatively hardens programs, and some of the introduced hardenings can safely be removed without affecting the security of the hardened code. For example, in case $idx$ is a public value, the hardening at line 24 in the unoptimized (-O0) assembly code of Fig 2 is unnecessary and can be safely removed. Optimizing the slh hardening can, in some cases, be done using static analysis, e.g., it can detect that the hardening at 24 is not necessary when $idx$ is public. Nevertheless, static analysis can not always be helpful and full optimization requires taking into account the properties of the underlying microarchitecture and must be guided by analyzing the execution traces of programs on real hardware.

2.3. Side channel analysis — Scam-V

In the following, we use $s\in S$ to range over ISA states and we say that two states $s_{1}$ and $s_{2}$ are low-equivalent (i.e., $s_{1}=_{L}s_{2}$ ) iff they agree on every public register and memory location. We also say that two states $s_{1}$ and $s_{2}$ are indistinguishable iff an attacker cannot distinguish executions on real hardware that start from the same microarchitectural state and any states corresponding to $s_{1}$ and $s_{2}$ . Intuitively, a system is free of side channels if low-equivalent states are also indistinguishable. That is, the attacker cannot learn anything from the channel that is not already public.

Since verification requires models, verifying the absence of leakage due to side channels requires a model capturing the channels. However, due to the complexity of modern processors, it is infeasible to explicitly model all the complex and intertwined microarchitectural features like cache hierarchies, cache replacement policies, as well as memory and buses. Abstract observational models, hereafter denoted by $\mathit{M}_{i}$ , tackle this problem by overapproximating attacker capabilities. To this end, the abstract model of the system is extended with a set of possible observations $\mathit{o}\in\mathit{O}{}$ , e.g., cache tag or set index, and a transition relation $\rightarrow\subseteq S\times\mathit{O}\times S$ . For each such model, the observations represent the part of the processor state that may affect the channel at each transition.

For instance, in order to overapproximate the information leakage that may occur in Fig. 2 due to the presence of caches, the processor model may be extended with the observations that are shown in the right column: e.g., the execution of the first instruction of the non-optimised binary may leak the value of stack pointer plus 8. Intuitively, this model, say $\mathit{M}$ , captures that the program execution time depends on the addresses of loads and instructions that are executed based on conditional branches. We say that two states $s,s^{\prime}\in S$ are observationally equivalent (i.e., $s\sim_{M}s^{\prime}$ ), iff for every possible trace $s\xrightarrow{\mathit{o}_{1}}s_{1}\dots\xrightarrow{\mathit{o}_{n}}s_{n}$ of $M$ there is a trace $s^{\prime}\xrightarrow{\mathit{o}_{1}^{\prime}}s^{\prime}_{1}\dots\xrightarrow% {\mathit{o}_{n}^{\prime}}s^{\prime}_{n}$ such that $[\mathit{o}_{1},\dots,\mathit{o}_{n}]=[\mathit{o}_{1}^{\prime},\dots,\mathit{o% }_{n}^{\prime}]$ .

Clearly, we would like models to be sound, which means that observationally equivalent states should lead to executions that cannot be distinguished by an attacker on real hardware.

Definition 2.1 (Soundness).

An observational model $M$ is sound if $s\sim_{M}s^{\prime}$ entails indistinguishability of $s$ and $s^{\prime}$ .

Sound observational models can be regarded as reliable foundations for side-channel analysis, since they allow to demonstrate the absence of side channels by statically proving that low-equivalent states are observationally equivalent. The problem with speculative execution is that observational models that do not take into account transient observations are unsound, and there is no general model of transient observations that covers all possible microarchitectures.

Scam-V (Nemati et al., 2020b, a; Buiras et al., 2021) combines techniques from program verification and fuzzing to perform relational testing and examine the soundness of observation models. At a high level (see Fig. 1), (1) it generates well-formed binary programs and (2) synthesizes a relation that, for a generated program, identifies states that are observationally equivalent according to the model under the validation. Then, (3) an instance of this relation in terms of two input states is generated. Finally, (4) Scam-V runs the generated binary with different input pairs on real hardware and compares the measurements on the side channel of the real processor. Since the generated inputs satisfy the synthesized relation, the soundness of the model would imply that the side-channel data on hardware cannot be distinguished either. Thus, a test case where we can distinguish the two runs on the hardware amounts to a counterexample (a potential side channel) that invalidates the observational model.

Scam-V uses the symbolic execution of programs that are annotated with observations to synthesize the equivalence relation. Symbolic execution is a technique to explore all program execution paths by using symbolic values instead of concrete ones for the inputs. Starting from an initial symbolic state, the execution explores all possible paths and collects the execution effects in a final symbolic state $\sigma\in\Sigma$ for each path. Each symbolic state $\sigma$ consists of a map from variables to symbolic expressions (i.e., where symbols represent initial state variables) and a path condition $p_{\sigma}$ (i.e., a symbolic expression identifying the condition that leads to the execution of that path). Scam-V also maintains a list of symbolic expressions $l_{\sigma}$ , which collects the effects of observable statements that have been encountered. E.g., starting from an initial state which maps each register $x_{i}$ to a symbol $\alpha_{i}$ and the memory to $\alpha_{M}$ , an empty list of observations, and a $true$ path condition, the symbolic execution of the -O2 program of Fig. 2 produces three final states:

•

one for $idx\geq A\_size$ , where path condition $\alpha_{0}\geq\alpha_{M}[\alpha_{8}]$ , state mapping $x_{8}\mapsto\alpha_{M}[\alpha_{8}]$ , and observation list $[\alpha_{8};false]$ ,
•

one for $idx<A\_size$ and $A[idx]\neq k$ , where path condition $\alpha_{0}<\alpha_{M}[\alpha_{8}]\land\alpha_{M}[\alpha_{0}+\#A]\neq\alpha_{1}$ , state mapping $x_{8}\mapsto\alpha_{M}[\alpha_{0}]$ , and observation list $[\alpha_{8};true;\alpha_{0}+\#A;false$ ],
•

one for $idx<A\_size$ and $A[idx]=k$ , where path condition $\alpha_{0}<\alpha_{M}[\alpha_{8}]\land\alpha_{M}[\alpha_{0}+\#A]\neq\alpha_{1}$ , state mapping $x_{8}\mapsto\alpha_{M}[\alpha_{0}+\#A],x_{10}\mapsto\alpha_{M}[\#B]$ , and observation list $[\alpha_{8};true;\alpha_{0}+\#A;true;\#B]$ .

Scam-V uses self composition (Barthe et al., 2004) to compute the observational equivalence relation by imposing equivalence of the symbolic observation list of the final states of the symbolic execution. For the example above, the relation would be as follows:

\begin{array}[]{ll}&\alpha_{8}=\alpha_{8}^{\prime}\land(\alpha_{0}\geq\alpha_{% M}[\alpha_{8}]\Leftrightarrow\alpha_{0}^{\prime}\geq\alpha_{M}^{\prime}[\alpha% _{8}^{\prime}])\land\\ (\alpha_{0}<\alpha_{M}[\alpha_{8}]\Rightarrow&\alpha_{0}=\alpha_{0}^{\prime}% \land\\ &\alpha_{M}[\alpha_{0}+\#A]\neq\alpha_{1}\Leftrightarrow\alpha_{M}^{\prime}[% \alpha_{0}^{\prime}+\#A]\neq\alpha_{1}^{\prime})\end{array}

2.3.1. Observation Models Refinement

Using relational testing similar to the Scam-V’s approach to validate observational models can lead to state space explosion. This is because an unguided search may explore states that are either too similar to each other, thus unlikely to invalidate the given model or fail to trigger attacker-visible microarchitectural behavior. Observation refinement (Buiras et al., 2021) tackles this problem by adding more fine-grained observations of the system state to capture behaviors we need to exclude.

For a given program, the observation model $\mathit{M}$ partitions the input states into observation equivalence classes. Relevant pairs in such equivalence classes must be tested to validate the soundness of $\mathit{M}$ . To make validation more efficient, the observation refinement suggests further repartitioning the induced equivalence classes using a complementary model $\mathit{M}^{\prime}$ that captures the observations that might arise from the side channel under scrutiny.

For instance, in ARM Cortex-M0, a 32-bit multiplication instruction can take between 1 and 32 clock cycles to complete, depending on the operands being multiplied (Arm Limited, 2013). The number of clock cycles required to execute a multiplication operation can potentially leak information about the operands being multiplied, revealing sensitive information about the cryptographic algorithm being executed. Arithmetic operations like multiplication normally are not assigned an observation in standard observational models. However, to check for the existence of the side channel above, one can define a refined observation model which reveals the most significant bits of the multiplication operator:

Having defined a refined observation model, test cases (pairs of states $s$ and $s^{\prime}$ ) are chosen s.t. they are observationally equivalent w.r.t $\mathit{M}$ yet are distinguishable w.r.t $\mathit{M}^{\prime}$ , i.e., $s{}\sim_{\mathit{M}\wedge\neg\mathit{M}^{\prime}}s{}^{\prime}$ . Observation refinement is especially important to facilitate finding transient execution leakages by steering the test case generation toward generating input states that ensure distinguishable cache updates due to misspeculated memory load instructions.

3. Methodology

To efficiently protect against microarchitectural leakages, it’s essential to design a mitigation strategy to ensure that the resulting system meets both the necessary security and performance requirements. In doing that, an obstacle is the lack of documentation or model regarding the speculative leakage of different microarchitectures. We address this problem by using relational testing to identify when a specific program is affected by transient leakages on a specific microarchitecture. This allows us to remove unnecessary mitigation and protect only the parts of the program that are indeed vulnerable on a given processor. The general strategy consists of three tasks: (a) use Scam-V to generate pairs of input states that should trigger potential information leakages in case of speculative behavior; (b) test if the leakage indeed occurs; (c) protect the vulnerable program fragments and remove unnecessary hardening.

3.1. Observation refinement for speculation

In order to drive the generation of test cases, we leverage observation refinement (see Sec. 2.3.1). However, the previous implementation of the refinement technique in Scam-V could only handle a single conditional branch and was insufficient for our tests. Thus, we generalized the observation refinement in Scam-V by introducing a new program transformation that simulates the speculative execution of program instructions. This refinement enables us to observe memory operations that happen in speculation.

Our experiments focus on branch prediction, but the technique is general enough to cover other types of speculation. We achieve refinement by choosing a set of branches that can be misspeculated and composing them with the program. This is in line with the Spectector approach, which uses the always misprediction policy (Guarnieri et al., 2020).

Our program transformation inlines a shadow copy of the program fragment starting from the execution point before the first misspeculation. We statically fix the size of these fragments to cover the largest expected speculative window of the processor. This shadow code (marked with ${}^{\star}$ in Fig. 3) is a copy of the original program fragment, where all selected misspeculating branches have negated conditions—to simulate what can happen in misspeculation. The shadow code also operates over a shadow machine state in order to not affect the non-speculative behavior. All memory operations executed in the shadow code raise a refined observation, indicating that they are possible causes of leakage.

Fig. 3 shows how this transformation works for our running example. In this example, we inline a shadow copy (code snippet between Start and End) of the program fragment starting from the ‘if’ statement with the negated condition. During the execution, we save the current program state at Start, switch to a shadow copy of the state, execute the shadow fragment and collect observations, and restore the normal execution of the program when we reach the execution point marked with End.

While our new implementation of the refinement technique can deal with more complex cases, e.g., programs with nested branches, it has some limitations, as shown in the snippet below. Since we negate all conditions in the shadow copies, in the example, we end up synthesizing relations that are unsatisfiable for the path that accesses memory in the inner ‘if’ statement’s true branch.

Possible solutions to this problem are to either apply refinement to only the inner ‘if’ statement or implement a heuristics that chooses the right ‘if’ statement (i.e., the one that makes the relation satisfiable) that needs to be refined. For complex scenarios, e.g., with nested branches and functions, similar heuristics could be designed to automatically try different combinations of branches to find optimal refinement settings. We have tried the former solution in our experiments in Sec. 5; e.g., to refine observations of Case #5 when compiled with -O2 enabled.

The refinement involves two observational models: the base model $\mathit{M}$ and its refined counterpart $\mathit{M}^{\prime}$ . We follow the technique of Buiras et al. (Buiras et al., 2021) to optimize the process of computing the observations w.r.t. the two models. Say $l^{\mathit{M}}$ and $l^{\mathit{M}^{\prime}}$ are, respectively, the observation lists obtained by symbolically executing the program under the base and the refined models. As Fig. 3 shows $l^{\mathit{M}}\subseteq l^{\mathit{M}^{\prime}}$ . Thus, we can implement a projection function $\pi$ that enables obtaining the symbolic observation list of the base model from the observation list of the refined model, i.e., $\pi(l^{\mathit{M}^{\prime}})=l^{\mathit{M}}$ , without the need of executing symbolic execution twice.

3.2. Branch misprediction

To ensure that misprediction will happen in the desired branch, we must train the branch predictor. The standard observation model to check a program’s vulnerability to transient execution observes both the program counter and the address of memory operations. Therefore, each pair of test input states, i.e., two observational equivalent states, follow the same execution path and satisfy the same path condition $\mathbf{p}$ . Based on this insight, to generate a training state, it is sufficient to find satisfying assignments for a different path $\mathbf{p^{\prime}}$ in the symbolic execution tree, where $\mathbf{p\neq p^{\prime}}$ .

3.3. Microarchitectural state configuration

The state of microarchitectural features can impact the success of side-channel attacks such as Spectre, as it may affect the speculative execution process. The presence or absence of certain data in the cache, or the prediction made by the branch predictor, can alter the speculative execution path, leading to successfully mounting that attack or missing potential information leakage. As an instance, in our running example, the speculation of the nested branch can be affected by the presence of $A[idx]$ in the cache before executing the program. Therefore, tests that only start with an empty cache may not reflect the actual leakage of the processor (see Sec. 5.1.4). Furthermore, we have primed the branch predictor state by training it according to different training inputs that we generated based on symbolic execution of the test programs.

3.4. Optimizing programs hardening

Given a program fragment to analyze, the base and the one with shadow observations represents two observational models: the first one models leakage of a non-speculative processor, and the latter models leakage of a processor that can leak information by speculatively executing all shadow instructions. Assuming that the fragment is large enough to fill the processor’s speculative window, the second model represents the worst-case scenario from a defense point of view, where all potential speculative memory operations must be protected.

On a real processor, the worst-case scenario is not usually possible: the processor may not be able to execute all the speculative instructions. For example, peculiarities of the program may prevent some cache misses, and inter-instruction dependencies may limit the ability of the processor to proceed with speculative execution.

We can generate different observational models by removing some of the shadow observations. We say that model $M_{1}$ refines model $M_{2}$ if any pair of states that is $\sim_{M_{2}}$ , is also $\sim_{M_{1}}$ . For example, it is easy to show that if model $M_{2}$ is obtained by removing a shadow observation of model $M_{2}$ , then $M_{1}$ refines $M_{2}$ . This forms a lattice of models, where the bottom corresponds to the original program and the top corresponds to the program with all shadow observations. Navigating this lattice can be used to identify which shadow observation (i.e., speculative instruction) causes the leakage and must be protected. To guide our optimizing slh hardening, we adopt this lattice structure. Intuitively, for a given program and on a specific processor, there is a hardened copy that is produced conservatively and fully protects against transient leakages. On the opposite side, we have a program that is not hardened and is vulnerable to transient leakages. Between these two, we can find many other partially hardened programs. Our technique is to walk—from the fully hardened version to not protected one—on this lattice to find a version of a program which is hardened with a maximally permissive set of fences, i.e., removing any additional hardening would lead to leakage of secret data in some form.

3.5. Classification aware tests

We would also like to avoid the introduction of protections against leakage of variables that are already public. For example, without knowing the classification of variables for the running example, we should consider the load at line 24 of the -O0 compilation potentially insecure, since it may leak the value of $idx$ . However, if $idx$ is public, we should avoid generating experiments where the index differs in $s_{1}$ and $s_{2}$ , since this may lead to useless experiments where the potential different cache footprint depends on public information. Therefore, we must allow users to possibly configure variable classification and add the constraint $s_{1}=_{L}s_{2}$ to the relation generated by Scam-V. We achieve this by allowing the user to add arbitrary initial observations to the translated program into the BIR language. Since Scam-V generates only pairs of states that are observationally equivalent, adding an initial observation for each public variable results in restricting Scam-V to generate only a pair of states that are low-equivalent.

4. Implementation

Scam-V³³3Scam-V is available at: https://github.com/FMSecure/HolBA/tree/dev_scamv_spec is developed as a part of HolBA (Lindner et al., 2019) using HOL4’s meta language SML. We have extended and modified the Scam-V’s pipeline implementation to (1) extend coverage of transient execution vulnerabilities; (2) analyze binaries produced by common compilers; (3) tailor countermeasures to specific microarchitectures.

4.1. angr integration

The internal symbolic execution engine of Scam-V does not scale even to mid-size (more than ten instructions) programs, and the infeasible paths are not pruned in its execution tree. These render Scam-V impractical to analyze real-world programs; e.g., Spectre-PHT gadget extracted from OpenSSL in Sec. 5.2 consists of a series of conditional statements that cannot be handled by Scam-V’s symbolic execution. In order to resolve these issues, we have integrated Scam-V to angr (Shoshitaishvili et al., 2016)—a state-of-the-art binary analysis framework. This required several changes ranging from developing a new interface for communication between Scam-V and angr to modifying Scam-V’s pipeline to work with the angr generated symbolic execution tree.

4.1.1. BIR to VEX transpilation

To outsource symbolic execution to angr, we have implemented a translation from Scam-V’s intermediate language BIR to VEX, which is a representation used by the angr internal analysis passes. On the other hand, the output of angr symbolic execution is the list of observations and the set of path constraints that are expressed in Claripy abstract syntax tree⁴⁴4Claripy is a Python library for constraint-based symbolic execution.. Therefore, to transfer back the results of the angr symbolic execution, we also had to implement a translation from Claripy in angr into BIR. Interfacing Scam-V and angr further required extending BIR to support missing VEX constructs like bitstring concatenation.

4.1.2. Handling observations in angr

In contrast to BIR, VEX lacks support for observations that we need in our analysis. To compensate for this shortcoming, the translation module from BIR to VEX replaces BIR observations with an angr system call, a feature of angr to modify the symbolic state, handle library calls, etc. We have implemented a specific angr system call handler to process observations. The handler takes as input the state elements we want to observe, like memory address or register number, and updates the list of observations in the symbolic state for the running path.

4.1.3. Simulating speculative execution in angr

We have also used system calls to simulate speculative execution up to a parameterized depth $d$ , reminiscent of the processor speculation depth, in the angr symbolic execution. As shown in Fig. 4, we use two system calls to mark where the speculative execution begins and ends. The first system call saves the current state using global plugin feature of the angr symbolic execution to maintain the program state across multiple execution paths. Then we jump into the code fragment that is supposed to run speculatively. Having reached the specified depth $d$ , we use the second system call to collect the transient observations and the path constraints and then context-switch to the normal execution by restoring the program state prior to the start of speculative execution.

4.1.4. Concretization of memory accesses

angr is a static and dynamic symbolic (a.k.a concolic) executor based on the Z3 solver. It trades performance for soundness by adopting Mayhem (Cha et al., 2012) partial memory model to scale to large codebases. Using this model, all symbolic pointers used in memory store operations are concretized by making a query to Z3. However, the address of load operations is conditionally and based on the size of the contiguous interval of possible values treated as symbolic or get concretized.

Concretization makes symbolic execution more efficient. Yet, to build a generic equivalence relation in the BIR language that can be used to generate multiple test cases by querying an SMT solver, we would need a strategy to generalize from concretized memory locations and perform a remapping from concrete to symbolic values to the corresponding BIR symbolic expressions. For memory addresses, angr keeps a mapping (in the path predicates) from concrete values to the corresponding symbolic expressions, which facilitates the reconstruction of symbolic expressions.

Nevertheless, the existing angr concretization strategy is insufficient for our analysis. For example, for the load operations, angr queries Z3 for the min and max value of memory addresses under the current path conditions, but our analysis expects one specific valuation of addresses. If the concretization fails, angr marks the memory as unconstrainted, which also breaks our analysis. Moreover, since angr uses a just-in-time style strategy to concretize symbolic expressions, if the concretized values invalidate an assertion that comes later on a path, angr prunes the path rather than restarting the concretization. Case #1 in Fig. 5 is an example where concretizations of two memory addresses are consistent with each other, but a following alignment check makes the path unsatisfiable.

Moreover, angr uses a naïve strategy to implement concretization, and it always produces a new value for symbolic expressions it encounters. This, however, usually breaks the consistency of values assigned to a specific expression. Finally, the angr concretization is not collision free, which may cause different symbolic expressions to be mapped to the same value. Case #2 in Fig. 5 is an example showing how x0 and sp registers are mapped to the same value by the angr concretization. Given these problems, to ensure the soundness of our approach, we had to develop an efficient concretization strategy that does not suffer from these limitations.

New concretization strategy: For a symbolic memory address $a_{i}$ and a path constraint $\phi_{i}$ associated with $a_{i}$ , the concretization of $a_{i}$ is performed by submitting a query $q_{i}$ that is constrained with $\phi_{i}$ to the SMT solver. The obtained concrete value $s_{i}$ is assigned to $a_{i}$ and added as a new constraint to the path. This will ensure that solutions for the subsequent queries will be consistent with those obtained in the previous steps. Additional constraints are also provided to ensure that solutions do not collide with each other. More formally, if a solution $s_{i}$ is obtained for a path constraint $\phi_{i}$ associated with memory access $(a_{i},\phi_{i})$ in the sequence of queries submitted to the SMT solver, then $s_{i}$ does not match any previous solution $s_{j}$ obtained for a path constraint $\phi_{j}$ associated with memory access $(a_{j},\phi_{j})$ with $j<i$ .

We repeat the concretization each time a memory access is encountered during the symbolic execution. We do not query the solver for memory addresses that have already undergone concretization. Also, a record of all performed concretizations is maintained in order to retrieve old solutions.

If a solution cannot be found for a query, then the SMT solver will be asked to find a solution for all the memory accesses reached up to that point with one single query. If such a solution is found, the concretization of the memory access for the path that fails is updated accordingly and the symbolic execution is restarted from the beginning. To avoid already explored paths, we add the path constraints of the failed path to the initial state. If no solution is found, the concretization of the program is deemed impossible and the path will be pruned from the execution.

4.2. Optimizing the LLVM’s slh hardening

Selective SLH

Our slh optimization algorithm takes as input a version of the program that is fully hardened. We enumerate the introduced hardening in the program by changing the LLVM pass. Then Scam-V tries to find hardenings that are not required based on the leakage lattice of Sec. 3.4 when the program executes on a specific platform. Essentially, our algorithm removes the slh hardening (or poisoning) of loads from top to bottom to keep only those that are essential to stop the leakage. Masks introduced by slh are independent of each other, therefore removing fences with their corresponding mask operation does not affect the subsequent fences. Yet, to ensure the soundness of this optimization our algorithm does not remove instructions that perform taint tracking, which is essential for slh. Algo. 1 presents our optimization algorithm. In Algo. 1, “ $\mathit{hasSideChannelLeakage}$ ” invokes Scam-V to test the program under test for the existence of any leakages.

Algorithm 1 Scam-V Selective SLH

Input: program P hardened using LLVM slh

Output: program P’ with an optimized number of hardenings

1: procedure SelectiveSLH(

P

)

P^{\prime}\leftarrow P

{Initialize

P^{\prime}

with the original program}

EnH\leftarrow\mathit{enumerateHardening}(P^{\prime})

{List all load hardenings}

4: for

i\leftarrow 1

\mathit{length}(EnH)

\mathit{RemoveHrd}(P^{\prime},EnH[i])

{Remove the

i

-th hardening}

6: if

\mathit{hasSideChannelLeakage}(P^{\prime})

then

\mathit{insertHrd}(P^{\prime},EnH[i])

{Reinsert the

i

-th hardening}

8: return

P^{\prime}

4.3. Experiment setup

We have run our test cases, each consisting of a program and two inputs, under seven different cache configurations, starting with an empty cache in the first iteration to mimic a cold start. However, for the subsequent iterations, we replicate a more realistic execution environment to account for the effects of cache hits and misses by constructing a cache state based on its content after the first iteration and randomly evicting cache lines. This is to ensure that speculative execution does not get trapped in other loads in the mispredicted branch and, with a higher probability, can reach interesting loads that leak secret data.

To ensure the consistency of our results for each cache configuration, we executed each experiment ten times, and we used the same cache state for runs starting from the two inputs. Unless all these ten executions give the same result, the experiment is classified as inconclusive. In order to resolve such cases, we inspect the cache state after each iteration and we keep the count of valid cache lines, which are populated by the program. Based on the collected data, a counterexample happens when a cache line was present in 70% of ten iterations in the cache state of one run, and it never appeared in the cache for the other run. In case all valid cache lines were present in the cache state of both runs at least once, we mark the experiment as conclusive when the total number of cache lines that were present in the cache state of both runs is at least 80% of the total number of all valid cache lines in both runs. We have chosen this threshold based on our statistical analysis to exclude outliers.

5. Evaluation

Case		#Exp	Cortex-A53											Cortex-A72
Case		#Exp	#C	#I	#SLH	#OpSLH	ExT			SLHExT			OSLHExT	#C	#I	#SLH	#OpSLH	ExT			SLHExT			OSLHExT
01	-O0	500	0	0	6	/	929	928	0.58	932	931.57	0.79	/	241	259	6	0	864	862.29	1.60	948	946	1.29	966	963.57	1.51
01	-O2	500	0	0	4	/	688	688	0.0	904	903.29	0.49	/	477	4	4	1	686	685.42	0.53	962	949.71	6.85	962	943.14	14.36
02	-O0	500	0	0	7	/	920	919.43	0.53	941	940.43	0.53	/	0	20	7	/	976	972.43	2.51	955	952	1.63	/
02	-O2	500	0	0	4	/	688	688	0.0	906	904.86	0.69	/	477	5	4	1	686	685.43	0.53	964	945.43	10.63	974	948	18.64
03	-O0	500	0	0	8	/	1183	1183	0.0	1459	1459	0.0	/	0	0	8	/	1305	1304.71	0.49	1565	1564.57	0.53	/
03	-O2	500	0	0	4	/	917	916.57	0.53	1036	1035.86	0.38	/	28	51	4	1	935	934.14	0.38	1063	1059	3.87	938	937.29	0.76
04	-O0	500	0	0	6	/	929	928.29	0.49	932	930.86	0.69	/	37	52	6	0	866	862	2.24	1063	1058.14	4.45	965	962.71	1.70
04	-O2	156	0	0	4	/	793	792.43	0.53	907	906.43	0.79	/	122	20	4	1	800	799.86	0.38	863	860.57	1.72	977	941.86²²2 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.	19.19
05	-O0	500*3	0	0	9	/	1272	1271.71	0.49	1437	1436.43	0.53	/	0	0	9	/	1150	1147	2.08	1548	1548	0.0	/
05	-O2	500*2	0	0	4	/	794	793.14	0.38	1043	1042.43	0.53	/	349	245	4	0	820	817	1.91	992	989.86	1.07	850	848.57	1.13
06	-O0	500	0	0	7	/	932	931	0.58	1055	1053.71	0.76	/	0	0	7	/	936	934.86	0.90	1174	1173.29	0.49	/
06	-O2	500	0	0	4	/	793	793	0.0	1048	1047.14	0.38	/	482	0	4	1	819	817.14	1.35	985	982.85	1.21	827	825.43	1.13
07	-O0	500*2	0	0	9	/	1308	1308	0.0	1187	1186.43	0.53	/	330	25	9	0	1336	1335	0.82	1172	1170.71	0.95	1339	1337.14	1.21
07	-O2	500*2	0	0	5	/	784	783.71	0.49	1163	1162.43	0.53	/	498	0	5	1	807	805.71	1.11	1037	1036.57	0.53	1173	1172.14²²2 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.	0.69
08	-O0	1000	0	0	7	/	912	911.29	0.49	963	961.71	0.76	/	0	43	7	/	871	867.29	1.98	1008	1007	0.58	/
08	-O2	/	/	/	/	/	/			/			/	/	/	/	/	/			/			/
09	-O0	/	/	/	/	/	/			/			/	/	/	/	/	/			/			/
09	-O2	/	/	/	/	/	/			/			/	/	/	/	/	/			/			/
09v2	-O0	/	/	/	/	/	/			/			/	/	/	/	/	/			/			/
09v2	-O2	/	/	/	/	/	/			/			/	/	/	/	/	/			/			/
10	-O0	500*3	0	0	7	/	1173	1173	0.0	1236	1236	0.0	/	0	1	7	/	1065	1063.57	0.79	1185	1184.29	0.49	/
10	-O2	500*3	0	0	4	/	601	601	0.0	965	964.86	0.38	/	0	155	4	/	696	696	0.0	1023	1014.86	7.03	/
11gcc	-O0	500	0	0	18	/	2046	2045.57	0.53	2733	2732.29	0.49	/	0	0	18	/	2162	2160.86	0.69	2688	2687.29	0.95	/
11gcc	-O2	500	0	0	4	/	789	788.43	0.53	1051	1050.86	0.38	/	479	1	4	1	819	816.86	1.57	1045	1044.71	0.49	865	863.14	1.34
11ker	-O0	500*2	0	0	17	/	2183	2182.14	0.38	2760	2760	0.0	/	0	0	17	/	2162	2160.71	0.95	2702	2700.71	0.76	/
11ker	-O2	500	0	0	4	/	688	688	0.0	906	905.43	0.79	/	476	5	4	1	686	685.86	0.38	964	944.57	15.11	966	948.43	12.99
11sub	-O0	500	0	0	22	/	2076	2075.14	0.38	2720	2719.71	0.49	/	0	0	22	/	2060	2059.71	0.49	2565	2563.86	1.21	/
11sub	-O2	500	0	0	4	/	688	688	0.0	906	904.71	0.95	/	476	2	4	1	686	685.43	0.53	962	945.14	12.43	966	945.57	10.10
12	-O0	500	0	0	8	/	937	936.43	0.53	1176	1176	0.0	/	0	0	8	/	907	905.86	0.90	1142	1139.86	2.27	/
12	-O2	500	0	0	4	/	703	703	0.0	834	834	0.0	/	345	80	4	1	812	812	0.0	863	861.29	1.38	938	937²²2 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.	1.15
13	-O0	500	0	0	8	/	1012	1012	0.0	1434	1434	0.0	/	0	0	8	/	1167	1165.57	1.13	1325	1324.71	0.49	/
13	-O2	500	0	0	4	/	688	688	0.0	905	904.29	0.76	/	477	5	4	1	686	685.28	0.49	958	945.85	11.58	966	939.42	14.46
14	-O0	500	0	0	6	/	929	928.57	0.53	931	930.86	0.38	/	1	5	6	0	866	861.57	2.64	1057	1053.86	2.12	964	961.86	1.95
14	-O2	500	0	0	4	/	793	792.43	0.53	906	905.29	0.49	/	500	0	4	1	800	800	0.0	862	860.57	0.98	960	947.86²²2 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.	14.15
14v2	-O0	500	0	0	8	/	931	930.71	0.49	1186	1185.86	0.38	/	0	0	8	/	1039	1027.71	6.80	1312	1310.86	0.69	/
14v2	-O2	500	0	0	4	/	792	791.43	0.53	1047	1047	0.0	/	0	0	4	/	819	816.71	1.98	1045	1043	1.41	/
SiSCloack	-O0	500	0	0	3	/	1024	1024	0.0	1145	1144.43	0.53	/	422	0	3	/	/			/			/
SiSCloack	-O2	500	488	0	1	/	579	579	0.0	798	798	0.0	/	488	0	1	/	/			/			/

Table 1. Analysis of collected microbenchmarks (Kocher, 2018; Guarnieri et al., 2020; Cauligi et al., 2020). Abbreviations in the table: Exp: number of experiments (for some cases, multiplication shows how many times the experiment is repeated with different ways of training the branch predictor), C: Counterexamples, I: Inconclusive cases, SLH: slh inserted hardenings, OpSLH: retained hardening by Scam-V, (O)SLHExT: (optimized) (SLH) Execution time in CPU cycle. The execution time is represented by a tuple of three values corresponding to the maximum value, the average and the standard deviation, separated by ”—”. Highlighted rows are discussed in Section 5.1.5.

To show the effectiveness of Scam-V in optimally protecting programs against Spectre-PHT, we have conducted several experiments on two widely used ARMv8 boards, Raspberry Pi 3 and 4. First, we have evaluated Scam-V on a suit of benchmarks that are used by Kocher (Kocher, 2018) and others (Guarnieri et al., 2020; Cauligi et al., 2020; Mosier et al., 2022b) to analyze and mitigate transient execution attacks (Cauligi et al., 2020; Mosier et al., 2022c; Guarnieri et al., 2020). Second, we used Scam-V to analyze the vulnerability of the OpenSSL library on AArch64 processors and optimally harden it to stop found leakages. For this experiment, we have only analyzed fragments of OpenSSL that are reported to be vulnerable to transient execution attacks in related literature (Mosier et al., 2022a). Our experiments run as bare-metal code, i.e., no operating system or other background processes exist. Therefore, our experiments represent the worst-case scenario where no defense is in place and the attacker can inspect the cache state after execution directly using available hardware instructions.

Evaluation boards

Raspberry Pi 3 and 4 use Cortex-A53 and Cortex-A72 processors, resp. Cortex-A53 is an 8-stage pipelined processor with a 2-way superscalar and in-order execution pipeline. Similarly, Cortex-A72 is a 15-stage pipelined ARMv8 core with a 3-way superscalar and out-of-order execution pipeline. Both processors support speculative execution based on control flow prediction. However, while ARM Ltd. declared Cortex-A72’s vulnerability to Spectre-PHT, it was recently that the vulnerability of Cortex-A53 to transient execution attacks was proved (Nemati et al., 2020a). Scam-V uses a special module in TrustZone to run experiments. The module sets up memory types (e.g., the cacheability of memory regions), configures the cache’s initial state, and probes the cache state after the execution of programs. In a real attack scenario, an attacker can use performance monitor counter or PMC for timing analysis. We also used the PMC to evaluate the performance of our slh optimization.

5.1. Analyzing Spectre microbenchmarks

We have successfully analyzed 17 variants of Spectre and detected leakage in those that are vulnerable to Spectre-PHT on our evaluation processors. The only exceptions are cases #9 and its variation #9v2, which we could not analyze due to the limitation of our approach (Sec. 5.1.6 elaborates on this). Please note that specification-based testing and the use of observation refinement in Scam-V helped us to reduce the number of required test cases in our experiments.

We used clang with compiler optimizations -O0 and -O2 to produce the binary of microbenchmarks. We repeated the same experiments on both RPi boards that we have to check their vulnerability on these platforms and to protect them optimally. We identified several cases where compilers unnecessarily hardened programs. This provides opportunities for optimization without sacrificing security. Table 1 summarizes the results of our experiments for the benchmark programs. Our evaluation is done under the assumption that the code is executed standalone and no other code or vector of attacks is allowed. Note that the microbenchmarks represent different versions of Spectre-PHT at the source code level. However, when compiling with optimization -O2 enabled, sometimes the same binaries are produced for different cases, e.g. Case #1 and Case #13.

Timing analysis

We measured the time by reading the CPU cycle register of the PMC. To better evaluate the performance, we used inputs that cause programs to take the longest path possible and to go through as much hardening as possible. To minimize the effect of hardware internal noises that cause variations in CPU cycles, we run the program under test 50000 times and compute the mean value of all iterations. We repeated each measurement seven times and computed the average, the standard deviation and the maximum value of all the executions.

5.1.1. Cortex-A53 experiments

None of the 17 cases from Kocher benchmarks resulted in leakage on Cortext-A53, i.e., no need for any protection. This may be attributed to the absence of register renaming and the short CPU pipeline, which prevents using the result of a speculated load instruction in subsequent operations. The only case that induces leakage on Cortex-A53 is the last case in Table 1, which is a variant of SiSCloak presented in (Buiras et al., 2021). Compared to Spectre-PHT, in SiSCloak only the load which leaks the secret (read from array B) is protected by the bound check and reading from the public array (A) is moved before the if statement:

slh fails to protect against SiSCloak on Cortex-A72 as SiSCloak’s implementation relies on inlined assembly that is not considered in the slh protection model. Thus, we do not proceed with further performance analysis of SiSCloak. On Cortex-A53, slh could prevent the leakage, but this was solely due to the added tracking instructions that fill up the processor’s short speculation pipeline, causing the leaky load to not execute in speculation.

5.1.2. Cortex-A72 experiments

ARM officially confirmed that Cortex-A72 processors are vulnerable to Spectre attacks. However, our experiments highlight interesting findings, which we summarize here. While for most benchmarks compiled with optimizations -O2 enabled, Scam-V identified several counterexamples, it only found a few vulnerable cases when -O0 was used. This suggests that additional operations, including operations on the stack, which are introduced when the compiler optimizations are disabled, may invalidate the transient leakages. Also, Scam-V did not detect any leak for two benchmarks, namely Case #10 and Case #14, when compiled with clang using both -O0 and -O2. Scam-V successfully generated well-formed inputs to exploit transient execution. However, no counterexamples were identified on hardware, even though both cases are considered insecure in the literature (Guarnieri et al., 2020). This observation indicates that the properties of the underlying hardware can significantly influence the leakage potential of the code. This is because different architectures may execute the same high-level code using varying machine instructions and ordering, thereby affecting the programs’ security.

The other interesting case is Case #6. The snippet below shows (left) the unoptimized and unmodified output of clang.

For this version, we could not identify any leaks. However, replacing the jump instruction at address 400030, with a nop makes this program vulnerable to Spectre-PHT and Scam-V identified a few counterexamples. The replaced jump instruction does not change the program control flow and just moves the control to the next instruction in the program order. We conjecture that such jump instructions cause the processor to flush the instruction pipeline.

5.1.3. Effect of compiler optimizations

Except Case #1, #4, #7, and #14, all other cases compiled with -O0 show no leakage. When a program is compiled with compiler optimizations disabled (i.e., using -O0), the produced assembly includes several unnecessary memory operations. For example, function arguments, such as the value of the index in the running example, are stored (resp. loaded) on (from) the stack. These additional memory operations can affect speculative execution by filling up the pipeline, causing leakage-inducing operations to not execute in speculation.

5.1.4. Cache configuration effect

As discussed in Sec. 3.3, The state of microarchitectural features, like the data cache, can impact the success of Spectre attacks. We found evidence of this in our microbenchmarks. As an example, consider Case #1 which leaks data through a memory access like $B[A[idx]]$ . When Case #1 is compiled with -O0, all 241 found counterexamples are achievable only when $A[idx]$ is in the cache. If $A[idx]$ does not hit the cache, fetching $A[idx]$ from the main memory causes a delay, which prevents the execution of $B[A[idx]]$ in speculation.

5.1.5. Security and performance

Cases without counterexamples do not need any protection; thus, no hardening optimization was required. Instead, for those cases that Scam-V identified counterexamples, we synthesized a minimal set of required hardening needed to protect against data leaks. In particular, we found an improvement in performance in Case #3, #5, #6, #11gcc when slh hardening is optimized. Furthermore, we evaluated the security of AArch64 slh and our slh optimization by re-executing counterexamples found for the unprotected program to ensure that the leakage has been mitigated.

Our experiments highlighted a few cases where our optimization pass removes all protections: Case #1, #4, #5, #7, and #14. Further analysis revealed that the slh hardening changes the code alignment in memory that affects speculative leakage. Code alignment may influence programs’ behavior in several ways, such as branch prediction accuracy, memory access patterns, and instruction decoding and dispatching. For example, if the branch instruction crosses cache line boundaries, this might affect branch prediction accuracy. Changing memory access patterns could affect, e.g., data prefetching, thus impacting the cache hits rate. Also, code alignment may increase the latency of instruction decoding and dispatching, potentially impacting speculative execution. Notice that code alignment w.r.t. cache lines is preserved by the majority of implementations of ASLR, since randomization is usually done at page-level granularity.

5.1.6. Analysis limitations

Our approach has some limitations:

Branch predictor training

To mount Spectre-PHT, we need at least one conditional branch in the program. In this way, one path is necessary to generate an input to train the branch predictor and the opposite one will be used to generate inputs to exploit the miss-prediction. In Case #9 $\_1$ , however, we could get only one path from angr due to the constant propagation that leads to pruning of the path. Similarly, Scam-V could not test Case #9 $\_2$ because of path pruning. The problem arises from utilizing a pointer in the branch condition, which is retrieved from the stack point through a memory load. As we constrain every memory access to be within a specific memory region, the condition will never become false, and therefore, the path is discarded. Case #8 compiled with -O2 is also excluded from the analysis as it lacks a conditional branch and is therefore not vulnerable to Spectre-PHT.

Misspeculation and observation refinement

The other limitation of Scam-V is associated with applying the observation refinement, as discussed in Sec. 3. By default, the misspeculation is expected to trigger at the first conditional branch and continue to all subsequently encountered conditional branches. Thus, our refinement approach negates all branch conditions to make observable memory operations that can be executed in misspeculation. However, this does not always work (e.g., Case #11 ${}^{*}$ and #13), as some attacks do not follow this pattern. To overcome this limitation and ensure that Scam-V can analyze cases in which potential memory accesses are not always in the opposite branch, we had to manually decide the condition of which if statements must be negated.

Loops in symbolic execution

Handling infinite loops is challenging in symbolic execution. In our experiments, only Case #5 contains an infinite loop, which we handle by performing a one-time unrolling and back-edge cutting of the loop at the LLVM level. For this specific case, one iteration of the loop body was enough to detect the leakage. For all other cases which contain a finite loop, we have performed loop unrolling. The same problem can also be addressed by introducing a precondition to constrain variables involved in the loop condition to a specific range.

5.1.7. More details on Case #1 and #10

Ex.		#Exp	Cortex-A53											Cortex-A72
Ex.		#Exp	#C	#I	#H	#OpH	ExT			HExT			OpHExT	#C	#I	TypeH	#H	#OpH	ExT			HExT			OpHExT
01	-O0	500	0	0	6	/	928	927.57	0.53	932	931	0.82	/	464	34		6	1	863	860.29	1.80	948	946.71	1.38	1042	1024.14	14.71
01	-O2	500	0	0	4	/	688	688	0.0	908	906.14	1.07	/	500	0	aSLH	4	1				866	861.71	2.29	944	937.43	6.45
	-O2	500	0	0	4	/			0.0			1.07	/	500	0	FI	1	/	686	685.71	0.49	999	998.29	0.49	/
10	-O0	500*3	0	0	7	/	936	935.29	0.76	1331	1331	0.0	/	602	0		7	1	915	912.86	1.77	1334	1332.29	1.98	1091	1089.43	1.13
10	-O2	500*3	0	0	4	/	699	698.57	0.53	836	835.86	0.38	/	620	216	aSLH	4	1				1031	1029.57	0.98	803	802.43	0.53
	-O2	500*3	0	0	4	/			0.53			0.38	/	620	216	FI	1	/	733	731.29	1.11	1049	1046.43	1.51	/
SSL_get_shared_sigalgs	-O0	105*5	0	0	35	/	2419	2418.43	0.53	3686	3685.43	0.53	/	0	0		35	/	2362	2361	1.41	3390	3382.14	4.91	/
SSL_get_shared_sigalgs	-O2	85*6	0	0	9	/	1190	1190	0.0	934	933.29	0.49	/	78	216	aSLH	9	1	1020	1018.57	0.98	1094	1093.29	0.49	1090	1089.43	0.53

Table 2. Analysis of specific cases for a private array index (i.e., idx) and Spectre-PHT gadgets from OpenSSL. TypeH column indicates the type of applied hardening: none (default LLVM slh), aSLH (poisoning on memory addresses), FI (dsb

+

isb).

We have investigated Case #1 (the prime Spectre-PHT example) and Case #10 (our running example in the paper) in more detail under different (i) compiler optimizations and (ii) when different security labels are assigned to the idx variable. We summarize our findings as follows.

Case #1

When idx is labelled as a public variable and the code is compiled with optimization level -O0, Scam-V detects the leakage and refines the slh hardening. However, all protections are ultimately removed due to the effect of code alignment changes as discussed in 5.1.5. When idx is labelled as private and the code is compiled with optimization -O0, Scam-V can also detect the leakage. Subsequently, Scam-V refines the slh protection to retain only the hardening of the vulnerable load. However, this refinement did not result in a performance improvement.

Similarly, with a public idx and the optimization level set to -O2, Scam-V identifies the leakage and refines the slh protection to retain only those that are essential for the vulnerable load. Notably, our refinement did not improve performance in this case either. Finally, when idx is labelled as private and the code is compiled with -O2 enabled, Scam-V detects the leakage. However, the LLVM slh implementation fails to prevent leakage. As a result, we employed alternative hardening methods for the program: slh on memory addresses (aSLH⁵⁵5We borrowed the terminology from (Zhang et al., 2023)) and fence insertion. Refining aSLH did not improve performance. In the latter case, the speculative barrier formed by dsb and isb increased the execution time.

Case #10

When idx is labelled as a public value, and the code is compiled with optimization level -O0, Scam-V does not identify any leakage. Thus, there is no need to protect the code with slh. However, when idx is labelled as private and the code is compiled with optimization -O0, Scam-V successfully detects the leakage. Subsequently, Scam-V refines the slh protection to retain only the essential hardening for the vulnerable load. Notably, our slh refinement enhances performance in this scenario.

With a public idx and an optimization level set to -O2, Scam-V fails to detect any leakages, making hardening unnecessary. Finally, when idx is considered as private and the code is compiled with -O2 enabled, Scam-V identifies the leakage. However, slh does not provide effective protection to stop the leakage. Therefore, alternative countermeasures, including refined aSLH and fence insertion, are employed. In this scenario, refining aSLH results in performance improvement.

5.2. Analyzing crypto libraries

To show Scam-V scales to real-world case studies, we used Spectre-PHT gadgets in OpenSSL v3.1.0 that had been discussed in (Mosier et al., 2022b). Among the three vulnerable gadgets, namely EVP_PKEY_asn1_get0, ts_check_status_info and SSL_get_shared_sigalgs, Scam-V identifies only SSL_get_shared_sigalgs (due to a leakage at line #44 in the snippet below) to be vulnerable on actual hardware (see Table 2). Others did not trigger speculation as the speculation primitive’s (i.e., branch condition) operands were already loaded into the cache by the code executed before the if statement. To protect against SSL_get_shared_sigalgs leakage, we initially applied the default LLVM slh countermeasure. However, employing slh solely on loaded values did not mitigate the leakage. Subsequently, by adopting slh on the memory load addresses, we could stop the leakage.

6. Related work

Several studies tried to identify and mitigate transient execution attacks. We only discuss a few studies relevant to our results. For a comprehensive list of existing work, we refer the reader to (Xiong and Szefer, 2021; Cauligi et al., 2022).

Detecting Spectre Attacks

Widely used techniques for detecting Spectre-style vulnerabilities include Symbolic execution and relation analysis (Guarnieri et al., 2020; Daniel et al., 2021; Nemati et al., 2020a), as well as fuzzing (Oleksenko et al., 2020, 2022), both at the machine code (Wang et al., 2021; Guarnieri et al., 2020; Cauligi et al., 2020; Daniel et al., 2021; Oleksenko et al., 2020; Cheang et al., 2019) and LLVM-IR levels (Wang et al., 2020; Wu and Wang, 2019; Guo et al., 2020). Yet, most existing tools either do not scale well or face qualitative limitations. For example, SpecFuzz (Oleksenko et al., 2020) simulates the execution of code fragments in misspeculated branches and uses input fuzzing to pinpoint programs’ vulnerability to Spectre attacks. However, it does not perform well for nested speculation and inherits limitations of fuzzing, e.g., input coverage. Scam-V (Nemati et al., 2020a) uses instruction fuzzing and relational testing to synthesize test cases and check the vulnerability of modern processors to Spectre-PHT. A similar approach was used by Revizor (Oleksenko et al., 2022, 2023) to identify Spectre-PHT/STL (Store-to-Load forwarding). However, in contrast to Revizor, Scam-V utilizes observation refinement and symbolic execution to guide input generation and reduce the search space, thus requiring fewer test cases to uncover the potential leakages (Buiras et al., 2021). The other symbolic execution-based approaches include Spectector (Guarnieri et al., 2020; Fabian et al., 2022) (detects Spectre-PHT/STL/RSB (Return Stack Buffer)), KLEESpec (Wang et al., 2020) (detects Spectre-PHT), Pitchfork (Cauligi et al., 2020) (detects Spectre-PHT/STL), and BH (Daniel et al., 2021) (detects Spectre-PHT/STL); all with scalability limitations.

Software Mitigations

There is also a growing body of work that (formally) analyze programs’ vulnerabilities and mitigate leakages using software measures (Barthe et al., 2021; Cauligi et al., 2020; Patrignani and Guarnieri, 2021; Guarnieri et al., 2020; Vassena et al., 2021; Guanciale et al., 2020; Guarnieri et al., 2021; Shivakumar et al., 2022). For example, oo7 (Wang et al., 2021) uses taint tracking to find Spectre-PHT attacks and inserts lfence to stop the leakage. Cauligi et al. (Cauligi et al., 2020) proposed Pitchfork based on the concept of speculative constant-time for speculative execution. However, while their theoretical developments suggest inserting fences to mitigate leakages, Pitchfork does not provide this in practice. InSpectre (Guanciale et al., 2020) offers an operational model to aid in reasoning about countermeasures and transient execution attacks. Patrignani and Guarnieri (Patrignani and Guarnieri, 2021) analyzed the effects of compiler transformations and countermeasures on speculative execution security. They showed that the existing slh mitigation in LLVM is inadequate for stopping Spectre-PHT leakage in programs and proposed a more powerful version of slh to prevent data leaks. Shivakumar et al. (Shivakumar et al., 2022) demonstrated the ineffectiveness of the LLVM primitives to mitigate Spectre-PHT, proposing a new slh variant to address the limitations of the existing LLVM slh mitigation. Blade (Vassena et al., 2021) employs a static type system to detect transient leakage and uses lfences or slh to mitigate the found leaks in constant-time WebAssembly. None of these mitigations are easily deployable in an existing toolchain, such as LLVM’s lfence and slh mitigations. Mosier et al. (Mosier et al., 2022a) introduced leakage containment models (LCMs), which are axiomatic security contracts designed to formally model and automatically detect leakages in programs.

While some of these works use static analysis techniques to optimize the number of fences, e.g., (Wang et al., 2021; Mosier et al., 2022a), we are not aware of any work optimizing fence placement by consulting the hardware.

Hardware Mitigations

The research community also proposed several hardware defenses against Spectre attacks. Hardware-level mitigations can be grouped into two main classes. The first class are techniques that hide the effect of speculative access instructions (Yan et al., 2019; Khasawneh et al., 2019; Sakalis et al., 2019; Saileshwar and Qureshi, 2019) by, e.g., introducing a speculative buffer (Yan et al., 2019) or shadow hardware structures to squash microarchitectural state changes if the processor mispredicts (Khasawneh et al., 2019). The second class includes techniques that leverage information flow tracking to block leakages by preventing data forwarding between speculatively executed access and transmitter instructions (Yu et al., 2019; Weisse et al., 2019; Loughlin et al., 2021).

Hardware-Software Co-design

In order to deliver the promised security guarantees without sacrificing performance, hardware-based mitigations require significant modifications in hardware. Instead, there exist also works that propose a software-hardware co-design approach. Examples of such techniques include (Taram et al., 2019; Koruyeh et al., 2020; Li et al., 2019). For example, Taram et al. (Taram et al., 2019) proposed the concept of context-sensitive fencing that uses taint tracking to find the optimal location for inserting fences at the decoder level. They also make various speculative barriers available to software.

7. Concluding Remarks

We explored the necessity of hardenings introduced by the LLVM slh pass against Spectre-PHT by taking into account the properties of the underlying microarchitecture. Our experiments highlighted several interesting results. We showed that the vulnerability of programs to Spectre attacks and the required level of protection to stop potential leaks strictly depend on the properties of the underlying processor and the compiler optimization level. Additionally, we showed that there are unexpected factors (e.g., code alignment) that can impact the vulnerability of programs to side-channel attacks.

Scam-V’s current implementation only supports the ARM and RISC-V architectures, but porting it to other architectures like x86 just requires extending the binary-to-BIR translation module. Moreover, we only focused on Spectre-PHT. Covering other variants like Spectre-STL mainly requires developing new observation refinement techniques to synthesize an equivalence relation that can be used to generate suitable test cases and training data w.r.t the variant under test.

Acknowledgments

This work was supported in part by a gift from Intel. We thank the anonymous reviewers for their valuable feedback during the review process.

References

(1)
Arm Limited (2013) Arm Limited. 2013. Cortex-M0+ Technical Reference Manual r0p0 (r0p0 ed.). Arm Limited, Cambridge, UK. https://developer.arm.com/documentation/ddi0432/latest/
Barthe et al. (2021) Gilles Barthe, Sunjay Cauligi, Benjamin Grégoire, Adrien Koutsos, Kevin Liao, Tiago Oliveira, Swarn Priya, Tamara Rezk, and Peter Schwabe. 2021. High-Assurance Cryptography in the Spectre Era. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021.
Barthe et al. (2004) Gilles Barthe, Pedro R. D’Argenio, and Tamara Rezk. 2004. Secure Information Flow by Self-Composition. In 17th IEEE Computer Security Foundations Workshop, (CSFW-17 2004), 28-30 June 2004, Pacific Grove, CA, USA. 100–114. https://doi.org/10.1109/CSFW.2004.17
Buiras et al. (2021) Pablo Buiras, Hamed Nemati, Andreas Lindner, and Roberto Guanciale. 2021. Validation of Side-Channel Models via Observation Refinement. In MICRO ’21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Greece, October 18-22. https://doi.org/10.1145/3466752.3480130
Carruth (2018) Chandler Carruth. 2018. RFC: Speculative load hardening (a Spectre variant #1 mitigation). https://llvm.org/docs/SpeculativeLoadHardening.html. Accessed October 2022.
Carruth (2020) Chandler Carruth. 2020. Cryptographic software in a post-Spectre world. Talk at the Real World Crypto Symposium. https://chandlerc.blog/talks/2020_post_spectre_crypto/post_spectre_crypto.html. Accessed October 2022.
Cauligi et al. (2020) Sunjay Cauligi, Craig Disselkoen, Klaus v. Gleissenthall, Dean Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. 2020. Constant-Time Foundations for the New Spectre Era. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation.
Cauligi et al. (2022) Sunjay Cauligi, Craig Disselkoen, Daniel Moghimi, Gilles Barthe, and Deian Stefan. 2022. SoK: Practical Foundations for Spectre Defenses. (2022).
Cha et al. (2012) Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing Mayhem on Binary Code. In IEEE Symposium on Security and Privacy, SP 2012, 21-23 May 2012, San Francisco, California, USA. 380–394. https://doi.org/10.1109/SP.2012.31
Cheang et al. (2019) Kevin Cheang, Cameron Rasmussen, Sanjit Seshia, and Pramod Subramanyan. 2019. A Formal Approach to Secure Speculation. In 2019 IEEE 32nd Computer Security Foundations Symposium (CSF).
Daniel et al. (2021) Lesly-Ann Daniel, Sébastien Bardin, and Tamara Rezk. 2021. Hunting the Haunter - Efficient Relational Symbolic Execution for Spectre with Haunted RelSE. In 28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021.
Fabian et al. (2022) Xaver Fabian, Marco Guarnieri, and Marco Patrignani. 2022. Automatic Detection of Speculative Execution Combinations. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022. 965–978. https://doi.org/10.1145/3548606.3560555
Guanciale et al. (2020) Roberto Guanciale, Musard Balliu, and Mads Dam. 2020. InSpectre: Breaking and Fixing Microarchitectural Vulnerabilities by Formal Analysis. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.
Guarnieri et al. (2020) M. Guarnieri, B. Köpf, J. F. Morales, J. Reineke, and A. Sánchez. 2020. Spectector: Principled Detection of Speculative Information Flows. In 2020 IEEE Symposium on Security and Privacy (SP).
Guarnieri et al. (2021) Marco Guarnieri, Boris Köpf, Jan Reineke, and Pepe Vila. 2021. Hardware-Software Contracts for Secure Speculation. In 2021 IEEE Symposium on Security and Privacy.
Guo et al. (2020) Shengjian Guo, Yueqi Chen, Peng Li, Yueqiang Cheng, Huibo Wang, Meng Wu, and Zhiqiang Zuo. 2020. SpecuSym: Speculative Symbolic Execution for Cache Timing Leak Detection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.
Horn (2018) Jann Horn. 2018. Speculative execution, variant 4: Speculative store bypass. https://bugs.chromium.org/p/project-zero/issues/detail?id=1528
Hu et al. (2023) Guangyuan Hu, Zecheng He, and Ruby B. Lee. 2023. SoK: Hardware Defenses Against Speculative Execution Attacks. CoRR abs/2301.03724 (2023). https://doi.org/10.48550/arXiv.2301.03724
Khasawneh et al. (2019) Khaled N. Khasawneh, Esmaeil Mohammadian Koruyeh, Chengyu Song, Dmitry Evtyushkin, Dmitry Ponomarev, and Nael B. Abu-Ghazaleh. 2019. SafeSpec: Banishing the Spectre of a Meltdown with Leakage-Free Speculation. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC 2019, Las Vegas, NV, USA, June 02-06, 2019. 60. https://doi.org/10.1145/3316781.3317903
Kocher (2018) Paul Kocher. 2018. Spectre Mitigations in Microsoft’s C/C++ Compiler. https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html.
Kocher et al. (2019) Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Execution. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019. 1–19. https://doi.org/10.1109/SP.2019.00002
Koruyeh et al. (2018) Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song, and Nael B. Abu-Ghazaleh. 2018. Spectre Returns! Speculation Attacks using the Return Stack Buffer. 12th USENIX Workshop on Offensive Technologies (WOOT) (2018).
Koruyeh et al. (2020) Esmaeil Mohammadian Koruyeh, Shirin Haji Amin Shirazi, Khaled N. Khasawneh, Chengyu Song, and Nael B. Abu-Ghazaleh. 2020. SpecCFI: Mitigating Spectre Attacks using CFI Informed Speculation. In 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020. 39–53. https://doi.org/10.1109/SP40000.2020.00033
Lattner and Adve (2003) Chris Lattner and Vikram Adve. 2003. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. Tech. Report UIUCDCS-R-2003-2380. Computer Science Dept., Univ. of Illinois at Urbana-Champaign.
Li et al. (2019) Peinan Li, Lutan Zhao, Rui Hou, Lixin Zhang, and Dan Meng. 2019. Conditional Speculation: An Effective Approach to Safeguard Out-of-Order Execution Against Spectre Attacks. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16-20, 2019. 264–276. https://doi.org/10.1109/HPCA.2019.00043
Lindner et al. (2019) Andreas Lindner, Roberto Guanciale, and Roberto Metere. 2019. TrABin: Trustworthy analyses of binaries. 174 (2019), 72–89. https://doi.org/10.1016/j.scico.2019.01.001
Loughlin et al. (2021) Kevin Loughlin, Ian Neal, Jiacheng Ma, Elisa Tsai, Ofir Weisse, Satish Narayanasamy, and Baris Kasikci. 2021. DOLMA: Securing Speculation with the Principle of Transient Non-Observability. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021. 1397–1414. https://www.usenix.org/conference/usenixsecurity21/presentation/loughlin
Miller (2018) Matt Miller. 2018. Mitigating speculative execution side channel hardware vulnerabilities. https://msrc-blog.microsoft.com/2018/03/15/mitigating-speculative-execution-side-channel-hardware-vulnerabilities/
Mosier et al. (2022a) Nicholas Mosier, Hanna Lachnitt, Hamed Nemati, and Caroline Trippel. 2022a. Axiomatic hardware-software contracts for security. In ISCA 2022: The 49th Annual International Symposium on Computer Architecture, New York, USA, June 18 - 22. https://doi.org/10.1145/3470496.3527412
Mosier et al. (2022b) Nicholas Mosier, Hanna Lachnitt, Hamed Nemati, and Caroline Trippel. 2022b. Clou. https://github.com/nmosier/clou.
Mosier et al. (2022c) Nicholas Mosier, Hamed Nemati, and Caroline Trippel. 2022c. Clou. https://github.com/nmosier/clou
Nemati et al. (2020a) Hamed Nemati, Pablo Buiras, Andreas Lindner, Roberto Guanciale, and Swen Jacobs. 2020a. Validation of Abstract Side-Channel Models for Computer Architectures. In Computer Aided Verification - 32nd International Conference, CAV 2020 Los Angeles, CA, USA, July 21-24. https://doi.org/10.1007/978-3-030-53288-8_12
Nemati et al. (2020b) Hamed Nemati, Andreas Lindner, and Pablo Buiras. 2020b. Scam-V. https://github.com/kth-step/HolBA/tree/dev_scamv
Oleksenko et al. (2022) Oleksii Oleksenko, Christof Fetzer, Boris Köpf, and Mark Silberstein. 2022. Revizor: testing black-box CPUs against speculation contracts. In ASPLOS ’22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022. 226–239. https://doi.org/10.1145/3503222.3507729
Oleksenko et al. (2023) Oleksii Oleksenko, Marco Guarnieri, Boris Köpf, and Mark Silberstein. 2023. Hide and Seek with Spectres: Efficient discovery of speculative information leaks with random testing. CoRR abs/2301.07642 (2023). https://doi.org/10.48550/arXiv.2301.07642
Oleksenko et al. (2018) Oleksii Oleksenko, Bohdan Trach, Tobias Reiher, Mark Silberstein, and Christof Fetzer. 2018. You Shall Not Bypass: Employing data dependencies to prevent Bounds Check Bypass. abs/1805.08506 (2018). http://arxiv.org/abs/1805.08506
Oleksenko et al. (2020) Oleksii Oleksenko, Bohdan Trach, Mark Silberstein, and Christof Fetzer. 2020. SpecFuzz: Bringing Spectre-type vulnerabilities to the surface. In 29th USENIX Security Symposium (USENIX Security 20).
Patrignani and Guarnieri (2021) Marco Patrignani and Marco Guarnieri. 2021. Exorcising Spectres with Secure Compilers. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021. 445–461. https://doi.org/10.1145/3460120.3484534
Saileshwar and Qureshi (2019) Gururaj Saileshwar and Moinuddin K. Qureshi. 2019. CleanupSpec: An ”Undo” Approach to Safe Speculation. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. 73–86. https://doi.org/10.1145/3352460.3358314
Sakalis et al. (2019) Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jimborean, and Magnus Själander. 2019. Efficient invisible speculative execution through selective delay and value prediction. In Proceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019. 723–735. https://doi.org/10.1145/3307650.3322216
Shivakumar et al. (2022) Basavesh Ammanaghatta Shivakumar, Jack Barnes, Gilles Barthe, Sunjay Cauligi, Chitchanok Chuengsatiansup, Daniel Genkin, Sioli O’Connell, Peter Schwabe, Rui Qi Sim, and Yuval Yarom. 2022. Spectre Declassified: Reading from the Right Place at the Wrong Time. Cryptology ePrint Archive, Paper 2022/426. https://eprint.iacr.org/2022/426 https://eprint.iacr.org/2022/426.
Shoshitaishvili et al. (2016) Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy.
Taram et al. (2019) Mohammadkazem Taram, Ashish Venkat, and Dean M. Tullsen. 2019. Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019. 395–410. https://doi.org/10.1145/3297858.3304060
Vassena et al. (2021) Marco Vassena, Craig Disselkoen, Klaus von Gleissenthall, Sunjay Cauligi, Rami Gökhan Kıcı, Ranjit Jhala, Dean Tullsen, and Deian Stefan. 2021. Automatically Eliminating Speculative Leaks from Cryptographic Code with Blade. Proc. ACM Program. Lang. (2021).
Wang et al. (2020) Guanhua Wang, Sudipta Chattopadhyay, Arnab Kumar Biswas, Tulika Mitra, and Abhik Roychoudhury. 2020. KLEESpectre: Detecting Information Leakage through Speculative Cache Attacks via Symbolic Execution. ACM Trans. Softw. Eng. Methodol. 29, 3 (2020), 14:1–14:31. https://doi.org/10.1145/3385897
Wang et al. (2021) Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, and Abhik Roychoudhury. 2021. oo7: Low-Overhead Defense Against Spectre Attacks via Program Analysis. (2021).
Weisse et al. (2019) Ofir Weisse, Ian Neal, Kevin Loughlin, Thomas F. Wenisch, and Baris Kasikci. 2019. NDA: Preventing Speculative Execution Attacks at Their Source. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. 572–586. https://doi.org/10.1145/3352460.3358306
Wu and Wang (2019) Meng Wu and Chao Wang. 2019. Abstract Interpretation under Speculative Execution. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation.
Xiong and Szefer (2021) Wenjie Xiong and Jakub Szefer. 2021. Survey of Transient Execution Attacks and Their Mitigations. ACM Comput. Surv. 54, 3 (May 2021).
Yan et al. (2019) Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christopher W. Fletcher, and Josep Torrellas. 2019. InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy (Corrigendum). In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. 1076. https://doi.org/10.1145/3352460.3361129
Yarom and Falkner (2014) Yuval Yarom and Katrina Falkner. 2014. Flush+Reload: a high resolution, low noise, L3 cache side-channel attack. In Proceedings of the 23rd USENIX Conference on Security Symposium. 719–732.
Yu et al. (2019) Jiyong Yu, Mengjia Yan, Artem Khyzha, Adam Morrison, Josep Torrellas, and Christopher W. Fletcher. 2019. Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture.
Zhang et al. (2023) Zhiyuan Zhang, Gilles Barthe, Chitchanok Chuengsatiansup, Peter Schwabe, and Yuval Yarom. 2023. Ultimate SLH: Taking Speculative Load Hardening to the Next Level. In 32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023.