License: arXiv.org perpetual non-exclusive license
arXiv:2312.09770v1 [cs.CR] 15 Dec 2023

Beyond Over-Protection: A Targeted Approach to Spectre Mitigation and Performance Optimization

Tiziano Marinaro CISPA Helmholtz Center for Information Security, Saarland University tiziano.marinaro@cispa.de Pablo Buiras KTH Royal Institute of Technology buiras@kth.se Andreas Lindner KTH Royal Institute of Technology andili@kth.se Roberto Guanciale KTH Royal Institute of Technology robertog@kth.se  and  Hamed Nemati KTH Royal Institute of Technology hnnemati@kth.se
(2023)
Abstract.

Since the advent of Spectre attacks, researchers and practitioners have developed a range of hardware and software measures to counter transient execution attacks. A prime example of such mitigation is speculative load hardening in LLVM, which protects against leaks by tracking the speculation state and masking values during misspeculation. LLVM relies on static analysis to harden programs using slh that often results in over-protection, which incurs performance overhead. We extended an existing side-channel model validation framework, Scam-V, to check the vulnerability of programs to Spectre-PHT attacks and optimize the protection of programs using the slh approach. We illustrate the efficacy of Scam-V by first demonstrating that it can automatically identify Spectre vulnerabilities in real programs, e.g., fragments of crypto-libraries. We then develop an optimization mechanism that validates the necessity of slh hardening w.r.t. the target platform. Our experiments showed that hardening introduced by LLVM in most cases could be significantly improved when the underlying microarchitecture properties are considered.

hardware security, side-channel attacks, countermeasures, Spectre
copyright: acmcopyrightjournalyear: 2023doi: XXXXXXX.XXXXXXXconference: Proceedings of the 2024 ACM ASIA Conference on Computer and Communications Security; July 1–5, 2024; Singapore.isbn: 978-1-4503-XXXX-X/18/06ccs: Security and privacy Side-channel analysis and countermeasures

1. Introduction

Refer to captionpt(See Sec.4.1)pt(See Sec.3.1)pt(See Sec.4.3 & 3.3)pt(See   Sec.3.44.2 )pt(See Sec.5)
Figure 1. Scam-V’s workflow. Modules in bold boxes are those that have been added or modified in this paper.
\Description

Workflow of Scam-V pipeline. Step 1: program slicing, step 2: observation augmentation, step 3: symbolic execution in angr, step 4: relation synthesis with projection, step 5: input state generation by SMT solver, step 6: execution on hardware, step 7: side channel measurement, step 8: check the leakage, step 9: program hardening, step 10: hardening optimization

The past decade has witnessed a surge in the number of microarchitectural attacks that exploit hardware side channels to exfiltrate secret information. The prime example of such attacks is the Spectre attack family (Kocher et al., 2019) which leverages transient (speculative) execution to leak data through caches. To counter transient execution vulnerabilities, researchers and practitioners have developed several analysis techniques as well as hardware and software measures (Hu et al., 2023; Cauligi et al., 2022). Examples include fence instructions, like x86 lfence, which are serializing instructions to mitigate the side effects of speculative execution. Another example of software-based mitigation is Speculative Load Hardening (slh(Carruth, 2020), which protects against transient execution leakages by tracking the speculation state and masking values during misspeculation.

Unfortunately, to date, almost all existing mitigations against transient execution leakages are either incomplete and miss attacks or overly conservative and slow (Cauligi et al., 2022). For example, the only mitigation that guarantees complete protection against the Spectre-V1 (-PHT) attack on commodity processors are memory fences, such as lfence, csdb (used in the implementation of the slh pass for AArch64), and dsb+isb. Nevertheless, using too many fences (over-fencing) hinders performance (e.g., inserting a fence at every load or control flow point can incur around 440% overhead (Oleksenko et al., 2018)), while using too few fences (under-fencing) may allow unexpected leakage to occur. There are also tools such as LLVM (Lattner and Adve, 2003) that harden programs, e.g., using slh, against transient leakages. However, almost all existing software-based solutions rely on static analysis and follow a conservative approach to harden programs and thus suffer from over-protection. Moreover, different processors implementing the same architecture can have substantially different speculative leakage. This suggests we may need to rethink the way we analyze and protect programs against transient attacks.

Our Approach. In this paper, we present a systematic approach to identifying conditions (code alignment and compiler optimization level) under which programs become vulnerable to Spectre-PHT attacks and finding an optimal hardening scheme to protect programs against potential leakages. Our approach relies on relational testing w.r.t the underlying processor’s real implementation to determine the necessity of the hardening applied by state-of-the-art approaches like the LLVM compiler infrastructure.

We build on an existing platform, Scam-V (Nemati et al., 2020a; Buiras et al., 2021), which automates the validation of abstract side-channel models via relational testing. Fig. 1 shows the Scam-V’s workflow. The choice of Scam-V is supported by the fact that compared to existing tools like Revizor (Oleksenko et al., 2022), Scam-V requires fewer test cases to disclose leakages due to its testing approach and internal optimizations (see 2.3.1). Scam-V, first, generates test cases consisting of a program and two input states that are in an equivalence relation that is automatically synthesized w.r.t model under test. Then, Scam-V executes the test cases on real hardware and measures the side channel to find cases that invalidate the input model.

The current implementation of Scam-V is insufficient for our purposes, and several changes were required to enable Scam-V to test the information-flow security of real-world programs, e.g., fragments of OpenSSL (Sec. 5). Sections 3 and 4 describe our methodology and changes to the Scam-V’s pipeline. Particularly, instead of testing random binaries, we feed Scam-V with slices taken from the binary of programs under test. Moreover, Scam-V uses symbolic execution to synthesize the equivalence relation for the input states. Nevertheless, the existing simple symbolic execution engine in Scam-V could not handle slices taken from real programs. Therefore, we further connect Scam-V to the angr framework (Shoshitaishvili et al., 2016); Sec 4 elaborates on the details of this integration. Also, we have generalized the refinement technique that Scam-V uses to generate inputs that are more likely to trigger leakages (Buiras et al., 2021). Finally, we have connected Scam-V to LLVM and developed a compiler pass to optimally protect programs against leakages.

Results. Our experiments highlight several interesting results, showing that (i) unexpected microarchitectural details like the alignment111Alignment refers to the process of arranging the binary instructions in memory. of the code executing on the processor can affect the leakage behavior of programs; (ii) most hardening applied by the current approaches are not required in real executions; and (iii) there are transient leakages like the one presented in (Buiras et al., 2021) that are not mitigated by the implementation of existing mitigations such as LLVM AArch64 slh222This case has been theoretically proved (Patrignani and Guarnieri, 2021) before..

LLVM conservatively hardens programs to stop transient execution leakages. For example, in LLVM AArch64 slh pass, almost every load instruction is hardened. However, our experiments show that hardening programs to stop potential leakages strictly depends on the hardware platform that programs execute on. That is, a program that leaks on a specific architecture or processor does not necessarily show the same behavior when it executes on a different architecture or core. This behavior is witnessed by a few cases from the Kocher Spectre benchmarks, e.g., Case #10 (Kocher, 2018). While all existing analysis tools and techniques, e.g., (Guarnieri et al., 2021; Cauligi et al., 2020; Mosier et al., 2022a; Daniel et al., 2021; Wang et al., 2021), classify this case as a leaky program and conservatively protect it using fences, we verified that this test case does not induce any leakages on ARM processors we have tested (Cortex-A53 and -A72), and therefore, there is no need for any protection on these cores. Based on this insight, we have built a toolchain to examine the vulnerability of programs w.r.t the target hardware platform, and optimally harden them to stop the found leakages.

2. Background

2.1. Side channels

Side channels are unintended information flow paths that are potentially exploitable by a malicious process to exfiltrate secrets from the memory of other processes. A prime example of side-channel vulnerabilities is Spectre attacks. These attacks are characterized by a speculation primitive (Miller, 2018), which enables an instruction that accesses a secret to supply it to a transient transmitter (Yu et al., 2019) that leaks it. Speculation primitive is some hardware mechanism that initiates speculative execution. Examples include control-flow instructions (e.g., conditional or indirect branches (Kocher et al., 2019), return instructions (Koruyeh et al., 2018)), which predict the intended value of the program counter and loads which predict the effective addresses of prior unresolved stores (Horn, 2018).

Spectre attacks usually leak data via data caches, which can be measured by an attacker using techniques such Flush+Reload (Yarom and Falkner, 2014). In Flush+Reload, the attacker flushes cache entries, allowing the victim to (normally or speculatively) execute a secret-dependent load, after which the attacker measures the reload time. This reloads time enables the attacker to determine if the victim has accessed any cache entries during its execution and subsequently enables the attacker to extract the secret.

Figure 2. The running example. Benchmark Case #10 from (Kocher, 2018).
Refer to caption

The running example in C code, in pseudo-assembly with no optimization and in pseudo-assembly with optimization level O2.

Figure 2. The running example. Benchmark Case #10 from (Kocher, 2018).

Running example. We introduce the example of Fig. 2 to show the main concepts of our approach. The example is a simplified version of Case #10 from the benchmarks used in (Kocher, 2018; Guarnieri et al., 2020; Cauligi et al., 2020). The program consists of two nested conditional statements, where the first one checks the bound of an accessed index within the public array A𝐴Aitalic_A. The load of the first element from the array B𝐵Bitalic_B is then decided by the inner if statement by checking whether or not the accessed value in A𝐴Aitalic_A based on the provided index is equal to k𝑘kitalic_k.

When the processor encounters the first if statement, based on its internal state and the pipeline load, it may decide to speculatively execute the read from the array A𝐴Aitalic_A even when idx𝑖𝑑𝑥idxitalic_i italic_d italic_x is greater or equal to A_size𝐴_𝑠𝑖𝑧𝑒A\_sizeitalic_A _ italic_s italic_i italic_z italic_e. This potentially enables an attacker who controls the value of idx𝑖𝑑𝑥idxitalic_i italic_d italic_x and k𝑘kitalic_k to get access to sensitive information inferred from the comparison in the inner condition. Specifically, the attacker can infer the value of data loaded in speculation by executing A[idx]𝐴delimited-[]𝑖𝑑𝑥A[idx]italic_A [ italic_i italic_d italic_x ] when the B[0]𝐵delimited-[]0B[0]italic_B [ 0 ] shows up in the cache, indicating the condition in the inner if statement holds.

When analyzing the security of a program, it is essential to consider the security level of variables to get a precise understanding of the program leakage potential and decide the specific mitigation which must be applied. For instance, depending on whether idx𝑖𝑑𝑥idxitalic_i italic_d italic_x is deemed private or public, our running example leaks differently. While in the latter case, the leakage happens when the execution reaches B[0]𝐵delimited-[]0B[0]italic_B [ 0 ], i.e., the inner conditional expression holds, the former case shows different leakage at the point A[idx]𝐴delimited-[]𝑖𝑑𝑥A[idx]italic_A [ italic_i italic_d italic_x ] is executed.

2.2. Mitigations of transient execution attacks

Spectre attacks can be mitigated through a combination of hardware and software measures. Using fence instructions or speculative load hardening are examples of such measures.

Fence instructions.

To mitigate Spectre-PHT, ARM architecture includes different fences like dmb (data memory barrier), dsb (data synchronization barrier), and isb (instruction synchronization barrier) instructions. Another AArch64 barrier, which is used by LLVM’s slh pass, is csdb or Consumption of Speculative Data Barrier. csdb restricts speculative execution and data value prediction. Once csdb is executed, no instruction other than branch instructions in program order can be speculatively executed using the results of data value predictions or PSTATE predictions of any instructions appearing before csdb that have not been resolved. This allows for control flow speculation before and after csdb and permits speculative execution of conditional data processing instructions after csdb, as long as they don’t use the results of predictions made before csdb.

Speculative load hardening

An alternative approach is speculative load hardening (slh(Carruth, 2020), which masks addresses or values of loads inside in a conditional branch with the branch predicate. The idea of slh is to maintain a predicate indicating if the execution is currently in a mispredicted branch or not. This predicate is then used to “poison” either the outputs (i.e., values) or inputs (i.e., addresses) of load instructions. Since slh limits the amount of speculative execution that can be performed, it also impacts the programs’ performance. However, compared to other mitigation, slh is more efficient, with a speed improvement of approximately 1.77 times. The overhead of load hardening is expected to range from 10% to 50%, with most large applications experiencing an overhead of similar-to\sim30% (Carruth, 2018).

LLVM AArch64 slh

The LLVM slh pass for AArch64 uses taint tracking (blue lines) to find program points that must be protected.

[Uncaptioned image]

Example of Speculative load hardening performed by the LLVM Pass on the pseudo-assembly of the running example with no optimization.

The snippet shows the hardened version of our example using the LLVM slh pass. AArch64 slh reserves the register x16 as the taint register, which contains all-ones when no misspeculation happens, and all-zeros when misspeculation is detected. Mask operations (red lines) are inserted using the and instruction, and misspeculation is tracked using a data-flow conditional select instruction (csel) based on the evaluation of CPU flags, which can be limited with csdb to avoid its speculation. When a conditional branch direction gets misspeculated, the semantics of the inserted csel instruction is such that the taint register will contain all zero bits.

AArch64 slh limitation

Current implementation of the AArch64 slh can only protect against data leak through memory, and it does not prevent leakages through registers. An example of the latter case is when idx𝑖𝑑𝑥idxitalic_i italic_d italic_x in our running example is labelled as private and the memory load at line 12 in the optimized (-O2) assembly code of Fig 2 leaks the value of idx𝑖𝑑𝑥idxitalic_i italic_d italic_x.

Besides, slh conservatively hardens programs, and some of the introduced hardenings can safely be removed without affecting the security of the hardened code. For example, in case idx𝑖𝑑𝑥idxitalic_i italic_d italic_x is a public value, the hardening at line 24 in the unoptimized (-O0) assembly code of Fig 2 is unnecessary and can be safely removed. Optimizing the slh hardening can, in some cases, be done using static analysis, e.g., it can detect that the hardening at 24 is not necessary when idx𝑖𝑑𝑥idxitalic_i italic_d italic_x is public. Nevertheless, static analysis can not always be helpful and full optimization requires taking into account the properties of the underlying microarchitecture and must be guided by analyzing the execution traces of programs on real hardware.

2.3. Side channel analysis — Scam-V

In the following, we use sS𝑠𝑆s\in Sitalic_s ∈ italic_S to range over ISA states and we say that two states s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are low-equivalent (i.e., s1=Ls2subscript𝐿subscript𝑠1subscript𝑠2s_{1}=_{L}s_{2}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) iff they agree on every public register and memory location. We also say that two states s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are indistinguishable iff an attacker cannot distinguish executions on real hardware that start from the same microarchitectural state and any states corresponding to s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Intuitively, a system is free of side channels if low-equivalent states are also indistinguishable. That is, the attacker cannot learn anything from the channel that is not already public.

Since verification requires models, verifying the absence of leakage due to side channels requires a model capturing the channels. However, due to the complexity of modern processors, it is infeasible to explicitly model all the complex and intertwined microarchitectural features like cache hierarchies, cache replacement policies, as well as memory and buses. Abstract observational models, hereafter denoted by Misubscript𝑀𝑖\mathit{M}_{i}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, tackle this problem by overapproximating attacker capabilities. To this end, the abstract model of the system is extended with a set of possible observations oO𝑜𝑂\mathit{o}\in\mathit{O}{}italic_o ∈ italic_O, e.g., cache tag or set index, and a transition relation S×O×S\rightarrow\subseteq S\times\mathit{O}\times S→ ⊆ italic_S × italic_O × italic_S. For each such model, the observations represent the part of the processor state that may affect the channel at each transition.

For instance, in order to overapproximate the information leakage that may occur in Fig. 2 due to the presence of caches, the processor model may be extended with the observations that are shown in the right column: e.g., the execution of the first instruction of the non-optimised binary may leak the value of stack pointer plus 8. Intuitively, this model, say M𝑀\mathit{M}italic_M, captures that the program execution time depends on the addresses of loads and instructions that are executed based on conditional branches. We say that two states s,sS𝑠superscript𝑠𝑆s,s^{\prime}\in Sitalic_s , italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_S are observationally equivalent (i.e., sMssubscriptsimilar-to𝑀𝑠superscript𝑠s\sim_{M}s^{\prime}italic_s ∼ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT), iff for every possible trace so1s1onsnsubscript𝑜1𝑠subscript𝑠1subscript𝑜𝑛subscript𝑠𝑛s\xrightarrow{\mathit{o}_{1}}s_{1}\dots\xrightarrow{\mathit{o}_{n}}s_{n}italic_s start_ARROW start_OVERACCENT italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … start_ARROW start_OVERACCENT italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_OVERACCENT → end_ARROW italic_s start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT of M𝑀Mitalic_M there is a trace so1s1onsnsuperscriptsubscript𝑜1superscript𝑠subscriptsuperscript𝑠1superscriptsubscript𝑜𝑛subscriptsuperscript𝑠𝑛s^{\prime}\xrightarrow{\mathit{o}_{1}^{\prime}}s^{\prime}_{1}\dots\xrightarrow% {\mathit{o}_{n}^{\prime}}s^{\prime}_{n}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_ARROW start_OVERACCENT italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_OVERACCENT → end_ARROW italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT … start_ARROW start_OVERACCENT italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_OVERACCENT → end_ARROW italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT such that [o1,,on]=[o1,,on]subscript𝑜1subscript𝑜𝑛superscriptsubscript𝑜1superscriptsubscript𝑜𝑛[\mathit{o}_{1},\dots,\mathit{o}_{n}]=[\mathit{o}_{1}^{\prime},\dots,\mathit{o% }_{n}^{\prime}][ italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ] = [ italic_o start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , italic_o start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ].

Clearly, we would like models to be sound, which means that observationally equivalent states should lead to executions that cannot be distinguished by an attacker on real hardware.

Definition 2.1 (Soundness).

An observational model M𝑀Mitalic_M is sound if sMssubscriptsimilar-to𝑀𝑠superscript𝑠s\sim_{M}s^{\prime}italic_s ∼ start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT entails indistinguishability of s𝑠sitalic_s and ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Sound observational models can be regarded as reliable foundations for side-channel analysis, since they allow to demonstrate the absence of side channels by statically proving that low-equivalent states are observationally equivalent. The problem with speculative execution is that observational models that do not take into account transient observations are unsound, and there is no general model of transient observations that covers all possible microarchitectures.

Scam-V (Nemati et al., 2020b, a; Buiras et al., 2021) combines techniques from program verification and fuzzing to perform relational testing and examine the soundness of observation models. At a high level (see Fig. 1), (1) it generates well-formed binary programs and (2) synthesizes a relation that, for a generated program, identifies states that are observationally equivalent according to the model under the validation. Then, (3) an instance of this relation in terms of two input states is generated. Finally, (4) Scam-V runs the generated binary with different input pairs on real hardware and compares the measurements on the side channel of the real processor. Since the generated inputs satisfy the synthesized relation, the soundness of the model would imply that the side-channel data on hardware cannot be distinguished either. Thus, a test case where we can distinguish the two runs on the hardware amounts to a counterexample (a potential side channel) that invalidates the observational model.

Scam-V uses the symbolic execution of programs that are annotated with observations to synthesize the equivalence relation. Symbolic execution is a technique to explore all program execution paths by using symbolic values instead of concrete ones for the inputs. Starting from an initial symbolic state, the execution explores all possible paths and collects the execution effects in a final symbolic state σΣ𝜎Σ\sigma\in\Sigmaitalic_σ ∈ roman_Σ for each path. Each symbolic state σ𝜎\sigmaitalic_σ consists of a map from variables to symbolic expressions (i.e., where symbols represent initial state variables) and a path condition pσsubscript𝑝𝜎p_{\sigma}italic_p start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT (i.e., a symbolic expression identifying the condition that leads to the execution of that path). Scam-V also maintains a list of symbolic expressions lσsubscript𝑙𝜎l_{\sigma}italic_l start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT, which collects the effects of observable statements that have been encountered. E.g., starting from an initial state which maps each register xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to a symbol αisubscript𝛼𝑖\alpha_{i}italic_α start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the memory to αMsubscript𝛼𝑀\alpha_{M}italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, an empty list of observations, and a true𝑡𝑟𝑢𝑒trueitalic_t italic_r italic_u italic_e path condition, the symbolic execution of the -O2 program of Fig. 2 produces three final states:

  • one for idxA_size𝑖𝑑𝑥𝐴_𝑠𝑖𝑧𝑒idx\geq A\_sizeitalic_i italic_d italic_x ≥ italic_A _ italic_s italic_i italic_z italic_e, where path condition α0αM[α8]subscript𝛼0subscript𝛼𝑀delimited-[]subscript𝛼8\alpha_{0}\geq\alpha_{M}[\alpha_{8}]italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ], state mapping x8αM[α8]maps-tosubscript𝑥8subscript𝛼𝑀delimited-[]subscript𝛼8x_{8}\mapsto\alpha_{M}[\alpha_{8}]italic_x start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ↦ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ], and observation list [α8;false]subscript𝛼8𝑓𝑎𝑙𝑠𝑒[\alpha_{8};false][ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ; italic_f italic_a italic_l italic_s italic_e ],

  • one for idx<A_size𝑖𝑑𝑥𝐴_𝑠𝑖𝑧𝑒idx<A\_sizeitalic_i italic_d italic_x < italic_A _ italic_s italic_i italic_z italic_e and A[idx]k𝐴delimited-[]𝑖𝑑𝑥𝑘A[idx]\neq kitalic_A [ italic_i italic_d italic_x ] ≠ italic_k, where path condition α0<αM[α8]αM[α0+#A]α1subscript𝛼0subscript𝛼𝑀delimited-[]subscript𝛼8subscript𝛼𝑀delimited-[]subscript𝛼0#𝐴subscript𝛼1\alpha_{0}<\alpha_{M}[\alpha_{8}]\land\alpha_{M}[\alpha_{0}+\#A]\neq\alpha_{1}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ] ∧ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + # italic_A ] ≠ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, state mapping x8αM[α0]maps-tosubscript𝑥8subscript𝛼𝑀delimited-[]subscript𝛼0x_{8}\mapsto\alpha_{M}[\alpha_{0}]italic_x start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ↦ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ], and observation list [α8;true;α0+#A;false[\alpha_{8};true;\alpha_{0}+\#A;false[ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ; italic_t italic_r italic_u italic_e ; italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + # italic_A ; italic_f italic_a italic_l italic_s italic_e],

  • one for idx<A_size𝑖𝑑𝑥𝐴_𝑠𝑖𝑧𝑒idx<A\_sizeitalic_i italic_d italic_x < italic_A _ italic_s italic_i italic_z italic_e and A[idx]=k𝐴delimited-[]𝑖𝑑𝑥𝑘A[idx]=kitalic_A [ italic_i italic_d italic_x ] = italic_k, where path condition α0<αM[α8]αM[α0+#A]α1subscript𝛼0subscript𝛼𝑀delimited-[]subscript𝛼8subscript𝛼𝑀delimited-[]subscript𝛼0#𝐴subscript𝛼1\alpha_{0}<\alpha_{M}[\alpha_{8}]\land\alpha_{M}[\alpha_{0}+\#A]\neq\alpha_{1}italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ] ∧ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + # italic_A ] ≠ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, state mapping x8αM[α0+#A],x10αM[#B]formulae-sequencemaps-tosubscript𝑥8subscript𝛼𝑀delimited-[]subscript𝛼0#𝐴maps-tosubscript𝑥10subscript𝛼𝑀delimited-[]#𝐵x_{8}\mapsto\alpha_{M}[\alpha_{0}+\#A],x_{10}\mapsto\alpha_{M}[\#B]italic_x start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ↦ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + # italic_A ] , italic_x start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ↦ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ # italic_B ], and observation list [α8;true;α0+#A;true;#B]subscript𝛼8𝑡𝑟𝑢𝑒subscript𝛼0#𝐴𝑡𝑟𝑢𝑒#𝐵[\alpha_{8};true;\alpha_{0}+\#A;true;\#B][ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ; italic_t italic_r italic_u italic_e ; italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + # italic_A ; italic_t italic_r italic_u italic_e ; # italic_B ].

Scam-V uses self composition (Barthe et al., 2004) to compute the observational equivalence relation by imposing equivalence of the symbolic observation list of the final states of the symbolic execution. For the example above, the relation would be as follows:

α8=α8(α0αM[α8]α0αM[α8])(α0<αM[α8]α0=α0αM[α0+#A]α1αM[α0+#A]α1)\begin{array}[]{ll}&\alpha_{8}=\alpha_{8}^{\prime}\land(\alpha_{0}\geq\alpha_{% M}[\alpha_{8}]\Leftrightarrow\alpha_{0}^{\prime}\geq\alpha_{M}^{\prime}[\alpha% _{8}^{\prime}])\land\\ (\alpha_{0}<\alpha_{M}[\alpha_{8}]\Rightarrow&\alpha_{0}=\alpha_{0}^{\prime}% \land\\ &\alpha_{M}[\alpha_{0}+\#A]\neq\alpha_{1}\Leftrightarrow\alpha_{M}^{\prime}[% \alpha_{0}^{\prime}+\#A]\neq\alpha_{1}^{\prime})\end{array}start_ARRAY start_ROW start_CELL end_CELL start_CELL italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∧ ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ≥ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ] ⇔ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≥ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] ) ∧ end_CELL end_ROW start_ROW start_CELL ( italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ] ⇒ end_CELL start_CELL italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∧ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + # italic_A ] ≠ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⇔ italic_α start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT [ italic_α start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + # italic_A ] ≠ italic_α start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_CELL end_ROW end_ARRAY

2.3.1. Observation Models Refinement

Using relational testing similar to the Scam-V’s approach to validate observational models can lead to state space explosion. This is because an unguided search may explore states that are either too similar to each other, thus unlikely to invalidate the given model or fail to trigger attacker-visible microarchitectural behavior. Observation refinement (Buiras et al., 2021) tackles this problem by adding more fine-grained observations of the system state to capture behaviors we need to exclude.

For a given program, the observation model M𝑀\mathit{M}italic_M partitions the input states into observation equivalence classes. Relevant pairs in such equivalence classes must be tested to validate the soundness of M𝑀\mathit{M}italic_M. To make validation more efficient, the observation refinement suggests further repartitioning the induced equivalence classes using a complementary model Msuperscript𝑀\mathit{M}^{\prime}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that captures the observations that might arise from the side channel under scrutiny.

For instance, in ARM Cortex-M0, a 32-bit multiplication instruction can take between 1 and 32 clock cycles to complete, depending on the operands being multiplied (Arm Limited, 2013). The number of clock cycles required to execute a multiplication operation can potentially leak information about the operands being multiplied, revealing sensitive information about the cryptographic algorithm being executed. Arithmetic operations like multiplication normally are not assigned an observation in standard observational models. However, to check for the existence of the side channel above, one can define a refined observation model which reveals the most significant bits of the multiplication operator:

[Uncaptioned image]

Example of a refined observation of a multiplication instruction that defines as public the most significant bits of its operators contained in registers x1 and x2.

Having defined a refined observation model, test cases (pairs of states s𝑠sitalic_s and ssuperscript𝑠s^{\prime}italic_s start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) are chosen s.t. they are observationally equivalent w.r.t M𝑀\mathit{M}italic_M yet are distinguishable w.r.t Msuperscript𝑀\mathit{M}^{\prime}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, i.e., sM¬Mss{}\sim_{\mathit{M}\wedge\neg\mathit{M}^{\prime}}s{}^{\prime}italic_s ∼ start_POSTSUBSCRIPT italic_M ∧ ¬ italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_s start_FLOATSUPERSCRIPT ′ end_FLOATSUPERSCRIPT. Observation refinement is especially important to facilitate finding transient execution leakages by steering the test case generation toward generating input states that ensure distinguishable cache updates due to misspeculated memory load instructions.

3. Methodology

To efficiently protect against microarchitectural leakages, it’s essential to design a mitigation strategy to ensure that the resulting system meets both the necessary security and performance requirements. In doing that, an obstacle is the lack of documentation or model regarding the speculative leakage of different microarchitectures. We address this problem by using relational testing to identify when a specific program is affected by transient leakages on a specific microarchitecture. This allows us to remove unnecessary mitigation and protect only the parts of the program that are indeed vulnerable on a given processor. The general strategy consists of three tasks: (a) use Scam-V to generate pairs of input states that should trigger potential information leakages in case of speculative behavior; (b) test if the leakage indeed occurs; (c) protect the vulnerable program fragments and remove unnecessary hardening.

Figure 3. The running example instrumented with shadow code and refined observations.
Refer to caption

The running example in pseudo-assembly with no optimization instrumented with shadow code and refined observations.

Figure 3. The running example instrumented with shadow code and refined observations.

3.1. Observation refinement for speculation

In order to drive the generation of test cases, we leverage observation refinement (see Sec. 2.3.1). However, the previous implementation of the refinement technique in Scam-V could only handle a single conditional branch and was insufficient for our tests. Thus, we generalized the observation refinement in Scam-V by introducing a new program transformation that simulates the speculative execution of program instructions. This refinement enables us to observe memory operations that happen in speculation.

Our experiments focus on branch prediction, but the technique is general enough to cover other types of speculation. We achieve refinement by choosing a set of branches that can be misspeculated and composing them with the program. This is in line with the Spectector approach, which uses the always misprediction policy (Guarnieri et al., 2020).

Our program transformation inlines a shadow copy of the program fragment starting from the execution point before the first misspeculation. We statically fix the size of these fragments to cover the largest expected speculative window of the processor. This shadow code (marked with {}^{\star}start_FLOATSUPERSCRIPT ⋆ end_FLOATSUPERSCRIPT in Fig. 3) is a copy of the original program fragment, where all selected misspeculating branches have negated conditions—to simulate what can happen in misspeculation. The shadow code also operates over a shadow machine state in order to not affect the non-speculative behavior. All memory operations executed in the shadow code raise a refined observation, indicating that they are possible causes of leakage.

Fig. 3 shows how this transformation works for our running example. In this example, we inline a shadow copy (code snippet between Start and End) of the program fragment starting from the ‘if’ statement with the negated condition. During the execution, we save the current program state at Start, switch to a shadow copy of the state, execute the shadow fragment and collect observations, and restore the normal execution of the program when we reach the execution point marked with End.

While our new implementation of the refinement technique can deal with more complex cases, e.g., programs with nested branches, it has some limitations, as shown in the snippet below. Since we negate all conditions in the shadow copies, in the example, we end up synthesizing relations that are unsatisfiable for the path that accesses memory in the inner ‘if’ statement’s true branch.

[Uncaptioned image]

Example of a limitation of the refinement technique, in which the negation of the condition of the nested branch makes the relation unsatisfiable.

Possible solutions to this problem are to either apply refinement to only the inner ‘if’ statement or implement a heuristics that chooses the right ‘if’ statement (i.e., the one that makes the relation satisfiable) that needs to be refined. For complex scenarios, e.g., with nested branches and functions, similar heuristics could be designed to automatically try different combinations of branches to find optimal refinement settings. We have tried the former solution in our experiments in Sec. 5; e.g., to refine observations of Case #5 when compiled with -O2 enabled.

The refinement involves two observational models: the base model M𝑀\mathit{M}italic_M and its refined counterpart Msuperscript𝑀\mathit{M}^{\prime}italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. We follow the technique of Buiras et al. (Buiras et al., 2021) to optimize the process of computing the observations w.r.t. the two models. Say lMsuperscript𝑙𝑀l^{\mathit{M}}italic_l start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT and lMsuperscript𝑙superscript𝑀l^{\mathit{M}^{\prime}}italic_l start_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT are, respectively, the observation lists obtained by symbolically executing the program under the base and the refined models. As Fig. 3 shows lMlMsuperscript𝑙𝑀superscript𝑙superscript𝑀l^{\mathit{M}}\subseteq l^{\mathit{M}^{\prime}}italic_l start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ⊆ italic_l start_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. Thus, we can implement a projection function π𝜋\piitalic_π that enables obtaining the symbolic observation list of the base model from the observation list of the refined model, i.e., π(lM)=lM𝜋superscript𝑙superscript𝑀superscript𝑙𝑀\pi(l^{\mathit{M}^{\prime}})=l^{\mathit{M}}italic_π ( italic_l start_POSTSUPERSCRIPT italic_M start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ) = italic_l start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT, without the need of executing symbolic execution twice.

3.2. Branch misprediction

To ensure that misprediction will happen in the desired branch, we must train the branch predictor. The standard observation model to check a program’s vulnerability to transient execution observes both the program counter and the address of memory operations. Therefore, each pair of test input states, i.e., two observational equivalent states, follow the same execution path and satisfy the same path condition 𝐩𝐩\mathbf{p}bold_p. Based on this insight, to generate a training state, it is sufficient to find satisfying assignments for a different path 𝐩superscript𝐩\mathbf{p^{\prime}}bold_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the symbolic execution tree, where 𝐩𝐩𝐩superscript𝐩\mathbf{p\neq p^{\prime}}bold_p ≠ bold_p start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

3.3. Microarchitectural state configuration

The state of microarchitectural features can impact the success of side-channel attacks such as Spectre, as it may affect the speculative execution process. The presence or absence of certain data in the cache, or the prediction made by the branch predictor, can alter the speculative execution path, leading to successfully mounting that attack or missing potential information leakage. As an instance, in our running example, the speculation of the nested branch can be affected by the presence of A[idx]𝐴delimited-[]𝑖𝑑𝑥A[idx]italic_A [ italic_i italic_d italic_x ] in the cache before executing the program. Therefore, tests that only start with an empty cache may not reflect the actual leakage of the processor (see Sec. 5.1.4). Furthermore, we have primed the branch predictor state by training it according to different training inputs that we generated based on symbolic execution of the test programs.

3.4. Optimizing programs hardening

Given a program fragment to analyze, the base and the one with shadow observations represents two observational models: the first one models leakage of a non-speculative processor, and the latter models leakage of a processor that can leak information by speculatively executing all shadow instructions. Assuming that the fragment is large enough to fill the processor’s speculative window, the second model represents the worst-case scenario from a defense point of view, where all potential speculative memory operations must be protected.

On a real processor, the worst-case scenario is not usually possible: the processor may not be able to execute all the speculative instructions. For example, peculiarities of the program may prevent some cache misses, and inter-instruction dependencies may limit the ability of the processor to proceed with speculative execution.

We can generate different observational models by removing some of the shadow observations. We say that model M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT refines model M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT if any pair of states that is M2subscriptsimilar-tosubscript𝑀2\sim_{M_{2}}∼ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, is also M1subscriptsimilar-tosubscript𝑀1\sim_{M_{1}}∼ start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. For example, it is easy to show that if model M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is obtained by removing a shadow observation of model M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, then M1subscript𝑀1M_{1}italic_M start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT refines M2subscript𝑀2M_{2}italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. This forms a lattice of models, where the bottom corresponds to the original program and the top corresponds to the program with all shadow observations. Navigating this lattice can be used to identify which shadow observation (i.e., speculative instruction) causes the leakage and must be protected. To guide our optimizing slh hardening, we adopt this lattice structure. Intuitively, for a given program and on a specific processor, there is a hardened copy that is produced conservatively and fully protects against transient leakages. On the opposite side, we have a program that is not hardened and is vulnerable to transient leakages. Between these two, we can find many other partially hardened programs. Our technique is to walk—from the fully hardened version to not protected one—on this lattice to find a version of a program which is hardened with a maximally permissive set of fences, i.e., removing any additional hardening would lead to leakage of secret data in some form.

3.5. Classification aware tests

We would also like to avoid the introduction of protections against leakage of variables that are already public. For example, without knowing the classification of variables for the running example, we should consider the load at line 24 of the -O0 compilation potentially insecure, since it may leak the value of idx𝑖𝑑𝑥idxitalic_i italic_d italic_x. However, if idx𝑖𝑑𝑥idxitalic_i italic_d italic_x is public, we should avoid generating experiments where the index differs in s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, since this may lead to useless experiments where the potential different cache footprint depends on public information. Therefore, we must allow users to possibly configure variable classification and add the constraint s1=Ls2subscript𝐿subscript𝑠1subscript𝑠2s_{1}=_{L}s_{2}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to the relation generated by Scam-V. We achieve this by allowing the user to add arbitrary initial observations to the translated program into the BIR language. Since Scam-V generates only pairs of states that are observationally equivalent, adding an initial observation for each public variable results in restricting Scam-V to generate only a pair of states that are low-equivalent.

4. Implementation

Scam-V333Scam-V is available at: https://github.com/FMSecure/HolBA/tree/dev_scamv_spec is developed as a part of HolBA (Lindner et al., 2019) using HOL4’s meta language SML. We have extended and modified the Scam-V’s pipeline implementation to (1) extend coverage of transient execution vulnerabilities; (2) analyze binaries produced by common compilers; (3) tailor countermeasures to specific microarchitectures.

4.1. angr integration

The internal symbolic execution engine of Scam-V does not scale even to mid-size (more than ten instructions) programs, and the infeasible paths are not pruned in its execution tree. These render Scam-V impractical to analyze real-world programs; e.g., Spectre-PHT gadget extracted from OpenSSL in Sec. 5.2 consists of a series of conditional statements that cannot be handled by Scam-V’s symbolic execution. In order to resolve these issues, we have integrated Scam-V to angr (Shoshitaishvili et al., 2016)—a state-of-the-art binary analysis framework. This required several changes ranging from developing a new interface for communication between Scam-V and angr to modifying Scam-V’s pipeline to work with the angr generated symbolic execution tree.

4.1.1. BIR to VEX transpilation

To outsource symbolic execution to angr, we have implemented a translation from Scam-V’s intermediate language BIR to VEX, which is a representation used by the angr internal analysis passes. On the other hand, the output of angr symbolic execution is the list of observations and the set of path constraints that are expressed in Claripy abstract syntax tree444Claripy is a Python library for constraint-based symbolic execution.. Therefore, to transfer back the results of the angr symbolic execution, we also had to implement a translation from Claripy in angr into BIR. Interfacing Scam-V and angr further required extending BIR to support missing VEX constructs like bitstring concatenation.

4.1.2. Handling observations in angr

In contrast to BIR, VEX lacks support for observations that we need in our analysis. To compensate for this shortcoming, the translation module from BIR to VEX replaces BIR observations with an angr system call, a feature of angr to modify the symbolic state, handle library calls, etc. We have implemented a specific angr system call handler to process observations. The handler takes as input the state elements we want to observe, like memory address or register number, and updates the list of observations in the symbolic state for the running path.

4.1.3. Simulating speculative execution in angr

We have also used system calls to simulate speculative execution up to a parameterized depth d𝑑ditalic_d, reminiscent of the processor speculation depth, in the angr symbolic execution. As shown in Fig. 4, we use two system calls to mark where the speculative execution begins and ends. The first system call saves the current state using global plugin feature of the angr symbolic execution to maintain the program state across multiple execution paths. Then we jump into the code fragment that is supposed to run speculatively. Having reached the specified depth d𝑑ditalic_d, we use the second system call to collect the transient observations and the path constraints and then context-switch to the normal execution by restoring the program state prior to the start of speculative execution.

Figure 4. angr’s system calls marking speculative execution of the running example. S𝑆Sitalic_S denotes the system state.
Refer to caption

Abstract graph of the angr symbolic execution, simulating speculative execution through angr system calls.

Figure 4. angr’s system calls marking speculative execution of the running example. S𝑆Sitalic_S denotes the system state.

4.1.4. Concretization of memory accesses

angr is a static and dynamic symbolic (a.k.a concolic) executor based on the Z3 solver. It trades performance for soundness by adopting Mayhem (Cha et al., 2012) partial memory model to scale to large codebases. Using this model, all symbolic pointers used in memory store operations are concretized by making a query to Z3. However, the address of load operations is conditionally and based on the size of the contiguous interval of possible values treated as symbolic or get concretized.

Concretization makes symbolic execution more efficient. Yet, to build a generic equivalence relation in the BIR language that can be used to generate multiple test cases by querying an SMT solver, we would need a strategy to generalize from concretized memory locations and perform a remapping from concrete to symbolic values to the corresponding BIR symbolic expressions. For memory addresses, angr keeps a mapping (in the path predicates) from concrete values to the corresponding symbolic expressions, which facilitates the reconstruction of symbolic expressions.

Nevertheless, the existing angr concretization strategy is insufficient for our analysis. For example, for the load operations, angr queries Z3 for the min and max value of memory addresses under the current path conditions, but our analysis expects one specific valuation of addresses. If the concretization fails, angr marks the memory as unconstrainted, which also breaks our analysis. Moreover, since angr uses a just-in-time style strategy to concretize symbolic expressions, if the concretized values invalidate an assertion that comes later on a path, angr prunes the path rather than restarting the concretization. Case #1 in Fig. 5 is an example where concretizations of two memory addresses are consistent with each other, but a following alignment check makes the path unsatisfiable.

Moreover, angr uses a naïve strategy to implement concretization, and it always produces a new value for symbolic expressions it encounters. This, however, usually breaks the consistency of values assigned to a specific expression. Finally, the angr concretization is not collision free, which may cause different symbolic expressions to be mapped to the same value. Case #2 in Fig. 5 is an example showing how x0 and sp registers are mapped to the same value by the angr concretization. Given these problems, to ensure the soundness of our approach, we had to develop an efficient concretization strategy that does not suffer from these limitations.

New concretization strategy: For a symbolic memory address aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and a path constraint ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT associated with aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the concretization of aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is performed by submitting a query qisubscript𝑞𝑖q_{i}italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT that is constrained with ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to the SMT solver. The obtained concrete value sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is assigned to aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and added as a new constraint to the path. This will ensure that solutions for the subsequent queries will be consistent with those obtained in the previous steps. Additional constraints are also provided to ensure that solutions do not collide with each other. More formally, if a solution sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is obtained for a path constraint ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT associated with memory access (ai,ϕi)subscript𝑎𝑖subscriptitalic-ϕ𝑖(a_{i},\phi_{i})( italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) in the sequence of queries submitted to the SMT solver, then sisubscript𝑠𝑖s_{i}italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT does not match any previous solution sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT obtained for a path constraint ϕjsubscriptitalic-ϕ𝑗\phi_{j}italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT associated with memory access (aj,ϕj)subscript𝑎𝑗subscriptitalic-ϕ𝑗(a_{j},\phi_{j})( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) with j<i𝑗𝑖j<iitalic_j < italic_i.

We repeat the concretization each time a memory access is encountered during the symbolic execution. We do not query the solver for memory addresses that have already undergone concretization. Also, a record of all performed concretizations is maintained in order to retrieve old solutions.

Figure 5. angr problem. Case #1: inconsistency of concretizations with the following code assertion. Case #2: the collision of angr concretizations.
Refer to caption

Two cases of problems encountered in angr. Case 1: concretization of x0 causes a subsequent alignment assertion of x0 to fail. Case 2: two different variables are concretized with two different values, but a subsequent concretization of one of these variables plus an offset collides with the other variable making the path unsatisfiable.

Figure 5. angr problem. Case #1: inconsistency of concretizations with the following code assertion. Case #2: the collision of angr concretizations.

If a solution cannot be found for a query, then the SMT solver will be asked to find a solution for all the memory accesses reached up to that point with one single query. If such a solution is found, the concretization of the memory access for the path that fails is updated accordingly and the symbolic execution is restarted from the beginning. To avoid already explored paths, we add the path constraints of the failed path to the initial state. If no solution is found, the concretization of the program is deemed impossible and the path will be pruned from the execution.

4.2. Optimizing the LLVM’s slh hardening

Selective SLH

Our slh optimization algorithm takes as input a version of the program that is fully hardened. We enumerate the introduced hardening in the program by changing the LLVM pass. Then Scam-V tries to find hardenings that are not required based on the leakage lattice of Sec. 3.4 when the program executes on a specific platform. Essentially, our algorithm removes the slh hardening (or poisoning) of loads from top to bottom to keep only those that are essential to stop the leakage. Masks introduced by slh are independent of each other, therefore removing fences with their corresponding mask operation does not affect the subsequent fences. Yet, to ensure the soundness of this optimization our algorithm does not remove instructions that perform taint tracking, which is essential for slh. Algo. 1 presents our optimization algorithm. In Algo. 1, “ℎ𝑎𝑠𝑆𝑖𝑑𝑒𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝐿𝑒𝑎𝑘𝑎𝑔𝑒ℎ𝑎𝑠𝑆𝑖𝑑𝑒𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝐿𝑒𝑎𝑘𝑎𝑔𝑒\mathit{hasSideChannelLeakage}italic_hasSideChannelLeakage” invokes Scam-V to test the program under test for the existence of any leakages.

Algorithm 1 Scam-V Selective SLH

Input: program P hardened using LLVM slh

Output: program P’ with an optimized number of hardenings

1:  procedure SelectiveSLH(P𝑃Pitalic_P)
2:     PPsuperscript𝑃𝑃P^{\prime}\leftarrow Pitalic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ← italic_P {Initialize Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with the original program}
3:     EnH𝑒𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑒𝐻𝑎𝑟𝑑𝑒𝑛𝑖𝑛𝑔(P)𝐸𝑛𝐻𝑒𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑒𝐻𝑎𝑟𝑑𝑒𝑛𝑖𝑛𝑔superscript𝑃EnH\leftarrow\mathit{enumerateHardening}(P^{\prime})italic_E italic_n italic_H ← italic_enumerateHardening ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) {List all load hardenings}
4:     for i1𝑖1i\leftarrow 1italic_i ← 1 to 𝑙𝑒𝑛𝑔𝑡ℎ(EnH)𝑙𝑒𝑛𝑔𝑡ℎ𝐸𝑛𝐻\mathit{length}(EnH)italic_length ( italic_E italic_n italic_H ) do
5:        𝑅𝑒𝑚𝑜𝑣𝑒𝐻𝑟𝑑(P,EnH[i])𝑅𝑒𝑚𝑜𝑣𝑒𝐻𝑟𝑑superscript𝑃𝐸𝑛𝐻delimited-[]𝑖\mathit{RemoveHrd}(P^{\prime},EnH[i])italic_RemoveHrd ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_E italic_n italic_H [ italic_i ] ) {Remove the i𝑖iitalic_i-th hardening}
6:        if ℎ𝑎𝑠𝑆𝑖𝑑𝑒𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝐿𝑒𝑎𝑘𝑎𝑔𝑒(P)ℎ𝑎𝑠𝑆𝑖𝑑𝑒𝐶ℎ𝑎𝑛𝑛𝑒𝑙𝐿𝑒𝑎𝑘𝑎𝑔𝑒superscript𝑃\mathit{hasSideChannelLeakage}(P^{\prime})italic_hasSideChannelLeakage ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) then
7:           𝑖𝑛𝑠𝑒𝑟𝑡𝐻𝑟𝑑(P,EnH[i])𝑖𝑛𝑠𝑒𝑟𝑡𝐻𝑟𝑑superscript𝑃𝐸𝑛𝐻delimited-[]𝑖\mathit{insertHrd}(P^{\prime},EnH[i])italic_insertHrd ( italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_E italic_n italic_H [ italic_i ] ) {Reinsert the i𝑖iitalic_i-th hardening}
8:     return  Psuperscript𝑃P^{\prime}italic_P start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

4.3. Experiment setup

We have run our test cases, each consisting of a program and two inputs, under seven different cache configurations, starting with an empty cache in the first iteration to mimic a cold start. However, for the subsequent iterations, we replicate a more realistic execution environment to account for the effects of cache hits and misses by constructing a cache state based on its content after the first iteration and randomly evicting cache lines. This is to ensure that speculative execution does not get trapped in other loads in the mispredicted branch and, with a higher probability, can reach interesting loads that leak secret data.

To ensure the consistency of our results for each cache configuration, we executed each experiment ten times, and we used the same cache state for runs starting from the two inputs. Unless all these ten executions give the same result, the experiment is classified as inconclusive. In order to resolve such cases, we inspect the cache state after each iteration and we keep the count of valid cache lines, which are populated by the program. Based on the collected data, a counterexample happens when a cache line was present in 70% of ten iterations in the cache state of one run, and it never appeared in the cache for the other run. In case all valid cache lines were present in the cache state of both runs at least once, we mark the experiment as conclusive when the total number of cache lines that were present in the cache state of both runs is at least 80% of the total number of all valid cache lines in both runs. We have chosen this threshold based on our statistical analysis to exclude outliers.

5. Evaluation

Case #Exp   Cortex-A53   Cortex-A72
#C #I #SLH #OpSLH ExT SLHExT OSLHExT   #C #I #SLH #OpSLH ExT SLHExT OSLHExT
01 -O0 500   0 0 6 / 929   928   0.58 932   931.57   0.79 /   241 259 6 0 864   862.29   1.60 948   946   1.29 966   963.57   1.51
-O2 500   0 0 4 / 688   688   0.0 904   903.29   0.49 /   477 4 4 1 686   685.42   0.53 962   949.71   6.85 962   943.14   14.36
02 -O0 500   0 0 7 / 920   919.43   0.53 941   940.43   0.53 /   0 20 7 / 976   972.43   2.51 955   952   1.63 /
-O2 500   0 0 4 / 688   688   0.0 906   904.86   0.69 /   477 5 4 1 686   685.43   0.53 964   945.43   10.63 974   948   18.64
03 -O0 500   0 0 8 / 1183   1183   0.0 1459   1459   0.0 /   0 0 8 / 1305   1304.71   0.49 1565   1564.57   0.53 /
-O2 500   0 0 4 / 917   916.57   0.53 1036   1035.86   0.38 /   28 51 4 1 935   934.14   0.38 1063   1059   3.87 938   937.29   0.76
04 -O0 500   0 0 6 / 929   928.29   0.49 932   930.86   0.69 /   37 52 6 0 866   862   2.24 1063   1058.14   4.45 965   962.71   1.70
-O2 156   0 0 4 / 793   792.43   0.53 907   906.43   0.79 /   122 20 4 1 800   799.86   0.38 863   860.57   1.72 977   941.86222 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.   19.19
05 -O0 500*3   0 0 9 / 1272   1271.71   0.49 1437   1436.43   0.53 /   0 0 9 / 1150   1147   2.08 1548   1548   0.0 /
-O2 500*2   0 0 4 / 794   793.14   0.38 1043   1042.43   0.53 /   349 245 4 0 820   817   1.91 992   989.86   1.07 850   848.57   1.13
06 -O0 500   0 0 7 / 932   931   0.58 1055   1053.71   0.76 /   0 0 7 / 936   934.86   0.90 1174   1173.29   0.49 /
-O2 500   0 0 4 / 793   793   0.0 1048   1047.14   0.38 /   482 0 4 1 819   817.14   1.35 985   982.85   1.21 827   825.43   1.13
07 -O0 500*2   0 0 9 / 1308   1308   0.0 1187   1186.43   0.53 /   330 25 9 0 1336   1335   0.82 1172   1170.71   0.95 1339   1337.14   1.21
-O2 500*2   0 0 5 / 784   783.71   0.49 1163   1162.43   0.53 /   498 0 5 1 807   805.71   1.11 1037   1036.57   0.53 1173   1172.14222 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.   0.69
08 -O0 1000   0 0 7 / 912   911.29   0.49 963   961.71   0.76 /   0 43 7 / 871   867.29   1.98 1008   1007   0.58 /
-O2 /   / / / / / / /   / / / / / / /
09 -O0 /   / / / / / / /   / / / / / / /
-O2 /   / / / / / / /   / / / / / / /
09v2 -O0 /   / / / / / / /   / / / / / / /
-O2 /   / / / / / / /   / / / / / / /
10 -O0 500*3   0 0 7 / 1173   1173   0.0 1236   1236   0.0 /   0 1 7 / 1065   1063.57   0.79 1185   1184.29   0.49 /
-O2 500*3   0 0 4 / 601   601   0.0 965   964.86   0.38 /   0 155 4 / 696   696   0.0 1023   1014.86   7.03 /
11gcc -O0 500   0 0 18 / 2046   2045.57   0.53 2733   2732.29   0.49 /   0 0 18 / 2162   2160.86   0.69 2688   2687.29   0.95 /
-O2 500   0 0 4 / 789   788.43   0.53 1051   1050.86   0.38 /   479 1 4 1 819   816.86   1.57 1045   1044.71   0.49 865   863.14   1.34
11ker -O0 500*2   0 0 17 / 2183   2182.14   0.38 2760   2760   0.0 /   0 0 17 / 2162   2160.71   0.95 2702   2700.71   0.76 /
-O2 500   0 0 4 / 688   688   0.0 906   905.43   0.79 /   476 5 4 1 686   685.86   0.38 964   944.57   15.11 966   948.43   12.99
11sub -O0 500   0 0 22 / 2076   2075.14   0.38 2720   2719.71   0.49 /   0 0 22 / 2060   2059.71   0.49 2565   2563.86   1.21 /
-O2 500   0 0 4 / 688   688   0.0 906   904.71   0.95 /   476 2 4 1 686   685.43   0.53 962   945.14   12.43 966   945.57   10.10
12 -O0 500   0 0 8 / 937   936.43   0.53 1176   1176   0.0 /   0 0 8 / 907   905.86   0.90 1142   1139.86   2.27 /
-O2 500   0 0 4 / 703   703   0.0 834   834   0.0 /   345 80 4 1 812   812   0.0 863   861.29   1.38 938   937222 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.   1.15
13 -O0 500   0 0 8 / 1012   1012   0.0 1434   1434   0.0 /   0 0 8 / 1167   1165.57   1.13 1325   1324.71   0.49 /
-O2 500   0 0 4 / 688   688   0.0 905   904.29   0.76 /   477 5 4 1 686   685.28   0.49 958   945.85   11.58 966   939.42   14.46
14 -O0 500   0 0 6 / 929   928.57   0.53 931   930.86   0.38 /   1 5 6 0 866   861.57   2.64 1057   1053.86   2.12 964   961.86   1.95
-O2 500   0 0 4 / 793   792.43   0.53 906   905.29   0.49 /   500 0 4 1 800   800   0.0 862   860.57   0.98 960   947.86222 Table 1 presents some counterintuitive results, where the optimized code exhibits worse performance than the non-optimized one. Further experiments showed that these results are caused by changes in alignment. In fact, replacing unnecessary hardening with NOP instructions results in performance comparable to the standard slh.   14.15
14v2 -O0 500   0 0 8 / 931   930.71   0.49 1186   1185.86   0.38 /   0 0 8 / 1039   1027.71   6.80 1312   1310.86   0.69 /
-O2 500   0 0 4 / 792   791.43   0.53 1047   1047   0.0 /   0 0 4 / 819   816.71   1.98 1045   1043   1.41 /
SiSCloack -O0 500   0 0 3 / 1024   1024   0.0 1145   1144.43   0.53 /   422 0 3 / / / /
-O2 500   488 0 1 / 579   579   0.0 798   798   0.0 /   488 0 1 / / / /
Table 1. Analysis of collected microbenchmarks (Kocher, 2018; Guarnieri et al., 2020; Cauligi et al., 2020). Abbreviations in the table: Exp: number of experiments (for some cases, multiplication shows how many times the experiment is repeated with different ways of training the branch predictor), C: Counterexamples, I: Inconclusive cases, SLH: slh inserted hardenings, OpSLH: retained hardening by Scam-V, (O)SLHExT: (optimized) (SLH) Execution time in CPU cycle. The execution time is represented by a tuple of three values corresponding to the maximum value, the average and the standard deviation, separated by ”—”. Highlighted rows are discussed in Section  5.1.5.

To show the effectiveness of Scam-V in optimally protecting programs against Spectre-PHT, we have conducted several experiments on two widely used ARMv8 boards, Raspberry Pi 3 and 4. First, we have evaluated Scam-V on a suit of benchmarks that are used by Kocher (Kocher, 2018) and others (Guarnieri et al., 2020; Cauligi et al., 2020; Mosier et al., 2022b) to analyze and mitigate transient execution attacks (Cauligi et al., 2020; Mosier et al., 2022c; Guarnieri et al., 2020). Second, we used Scam-V to analyze the vulnerability of the OpenSSL library on AArch64 processors and optimally harden it to stop found leakages. For this experiment, we have only analyzed fragments of OpenSSL that are reported to be vulnerable to transient execution attacks in related literature (Mosier et al., 2022a). Our experiments run as bare-metal code, i.e., no operating system or other background processes exist. Therefore, our experiments represent the worst-case scenario where no defense is in place and the attacker can inspect the cache state after execution directly using available hardware instructions.

Evaluation boards

Raspberry Pi 3 and 4 use Cortex-A53 and Cortex-A72 processors, resp. Cortex-A53 is an 8-stage pipelined processor with a 2-way superscalar and in-order execution pipeline. Similarly, Cortex-A72 is a 15-stage pipelined ARMv8 core with a 3-way superscalar and out-of-order execution pipeline. Both processors support speculative execution based on control flow prediction. However, while ARM Ltd. declared Cortex-A72’s vulnerability to Spectre-PHT, it was recently that the vulnerability of Cortex-A53 to transient execution attacks was proved (Nemati et al., 2020a). Scam-V uses a special module in TrustZone to run experiments. The module sets up memory types (e.g., the cacheability of memory regions), configures the cache’s initial state, and probes the cache state after the execution of programs. In a real attack scenario, an attacker can use performance monitor counter or PMC for timing analysis. We also used the PMC to evaluate the performance of our slh optimization.

5.1. Analyzing Spectre microbenchmarks

We have successfully analyzed 17 variants of Spectre and detected leakage in those that are vulnerable to Spectre-PHT on our evaluation processors. The only exceptions are cases #9 and its variation #9v2, which we could not analyze due to the limitation of our approach (Sec. 5.1.6 elaborates on this). Please note that specification-based testing and the use of observation refinement in Scam-V helped us to reduce the number of required test cases in our experiments.

We used clang with compiler optimizations -O0 and -O2 to produce the binary of microbenchmarks. We repeated the same experiments on both RPi boards that we have to check their vulnerability on these platforms and to protect them optimally. We identified several cases where compilers unnecessarily hardened programs. This provides opportunities for optimization without sacrificing security. Table 1 summarizes the results of our experiments for the benchmark programs. Our evaluation is done under the assumption that the code is executed standalone and no other code or vector of attacks is allowed. Note that the microbenchmarks represent different versions of Spectre-PHT at the source code level. However, when compiling with optimization -O2 enabled, sometimes the same binaries are produced for different cases, e.g. Case #1 and Case #13.

Timing analysis

We measured the time by reading the CPU cycle register of the PMC. To better evaluate the performance, we used inputs that cause programs to take the longest path possible and to go through as much hardening as possible. To minimize the effect of hardware internal noises that cause variations in CPU cycles, we run the program under test 50000 times and compute the mean value of all iterations. We repeated each measurement seven times and computed the average, the standard deviation and the maximum value of all the executions.

5.1.1. Cortex-A53 experiments

None of the 17 cases from Kocher benchmarks resulted in leakage on Cortext-A53, i.e., no need for any protection. This may be attributed to the absence of register renaming and the short CPU pipeline, which prevents using the result of a speculated load instruction in subsequent operations. The only case that induces leakage on Cortex-A53 is the last case in Table 1, which is a variant of SiSCloak presented in (Buiras et al., 2021). Compared to Spectre-PHT, in SiSCloak only the load which leaks the secret (read from array B) is protected by the bound check and reading from the public array (A) is moved before the if statement:

[Uncaptioned image]

Example of SiSCloak vulnerability, where the first load after a conditional branch is vulnerable to Spectre-PHT attack.

slh fails to protect against SiSCloak on Cortex-A72 as SiSCloak’s implementation relies on inlined assembly that is not considered in the slh protection model. Thus, we do not proceed with further performance analysis of SiSCloak. On Cortex-A53, slh could prevent the leakage, but this was solely due to the added tracking instructions that fill up the processor’s short speculation pipeline, causing the leaky load to not execute in speculation.

5.1.2. Cortex-A72 experiments

ARM officially confirmed that Cortex-A72 processors are vulnerable to Spectre attacks. However, our experiments highlight interesting findings, which we summarize here. While for most benchmarks compiled with optimizations -O2 enabled, Scam-V identified several counterexamples, it only found a few vulnerable cases when -O0 was used. This suggests that additional operations, including operations on the stack, which are introduced when the compiler optimizations are disabled, may invalidate the transient leakages. Also, Scam-V did not detect any leak for two benchmarks, namely Case #10 and Case #14, when compiled with clang using both -O0 and -O2. Scam-V successfully generated well-formed inputs to exploit transient execution. However, no counterexamples were identified on hardware, even though both cases are considered insecure in the literature (Guarnieri et al., 2020). This observation indicates that the properties of the underlying hardware can significantly influence the leakage potential of the code. This is because different architectures may execute the same high-level code using varying machine instructions and ordering, thereby affecting the programs’ security.

The other interesting case is Case #6. The snippet below shows (left) the unoptimized and unmodified output of clang.

[Uncaptioned image]

Two versions of a snippet of Case 6 in ARM assembly code. The left version illustrates the original code, while the right version shows the same instructions with the replacement of the indirect jump using a NOP instruction.

For this version, we could not identify any leaks. However, replacing the jump instruction at address 400030, with a nop makes this program vulnerable to Spectre-PHT and Scam-V identified a few counterexamples. The replaced jump instruction does not change the program control flow and just moves the control to the next instruction in the program order. We conjecture that such jump instructions cause the processor to flush the instruction pipeline.

5.1.3. Effect of compiler optimizations

Except Case #1, #4, #7, and #14, all other cases compiled with -O0 show no leakage. When a program is compiled with compiler optimizations disabled (i.e., using -O0), the produced assembly includes several unnecessary memory operations. For example, function arguments, such as the value of the index in the running example, are stored (resp. loaded) on (from) the stack. These additional memory operations can affect speculative execution by filling up the pipeline, causing leakage-inducing operations to not execute in speculation.

5.1.4. Cache configuration effect

As discussed in Sec. 3.3, The state of microarchitectural features, like the data cache, can impact the success of Spectre attacks. We found evidence of this in our microbenchmarks. As an example, consider Case #1 which leaks data through a memory access like B[A[idx]]𝐵delimited-[]𝐴delimited-[]𝑖𝑑𝑥B[A[idx]]italic_B [ italic_A [ italic_i italic_d italic_x ] ]. When Case #1 is compiled with -O0, all 241 found counterexamples are achievable only when A[idx]𝐴delimited-[]𝑖𝑑𝑥A[idx]italic_A [ italic_i italic_d italic_x ] is in the cache. If A[idx]𝐴delimited-[]𝑖𝑑𝑥A[idx]italic_A [ italic_i italic_d italic_x ] does not hit the cache, fetching A[idx]𝐴delimited-[]𝑖𝑑𝑥A[idx]italic_A [ italic_i italic_d italic_x ] from the main memory causes a delay, which prevents the execution of B[A[idx]]𝐵delimited-[]𝐴delimited-[]𝑖𝑑𝑥B[A[idx]]italic_B [ italic_A [ italic_i italic_d italic_x ] ] in speculation.

5.1.5. Security and performance

Cases without counterexamples do not need any protection; thus, no hardening optimization was required. Instead, for those cases that Scam-V identified counterexamples, we synthesized a minimal set of required hardening needed to protect against data leaks. In particular, we found an improvement in performance in Case #3, #5, #6, #11gcc when slh hardening is optimized. Furthermore, we evaluated the security of AArch64 slh and our slh optimization by re-executing counterexamples found for the unprotected program to ensure that the leakage has been mitigated.

Our experiments highlighted a few cases where our optimization pass removes all protections: Case #1, #4, #5, #7, and #14. Further analysis revealed that the slh hardening changes the code alignment in memory that affects speculative leakage. Code alignment may influence programs’ behavior in several ways, such as branch prediction accuracy, memory access patterns, and instruction decoding and dispatching. For example, if the branch instruction crosses cache line boundaries, this might affect branch prediction accuracy. Changing memory access patterns could affect, e.g., data prefetching, thus impacting the cache hits rate. Also, code alignment may increase the latency of instruction decoding and dispatching, potentially impacting speculative execution. Notice that code alignment w.r.t. cache lines is preserved by the majority of implementations of ASLR, since randomization is usually done at page-level granularity.

5.1.6. Analysis limitations

Our approach has some limitations:

Branch predictor training

To mount Spectre-PHT, we need at least one conditional branch in the program. In this way, one path is necessary to generate an input to train the branch predictor and the opposite one will be used to generate inputs to exploit the miss-prediction. In Case #9_1_1\_1_ 1, however, we could get only one path from angr due to the constant propagation that leads to pruning of the path. Similarly, Scam-V could not test Case #9_2_2\_2_ 2 because of path pruning. The problem arises from utilizing a pointer in the branch condition, which is retrieved from the stack point through a memory load. As we constrain every memory access to be within a specific memory region, the condition will never become false, and therefore, the path is discarded. Case #8 compiled with -O2 is also excluded from the analysis as it lacks a conditional branch and is therefore not vulnerable to Spectre-PHT.

[Uncaptioned image]

Two cases that Scam-V cannot analyze due to the limitations of angr symbolic execution. In Case 9 version 1, the branch condition is a variable assigned to zero, whereas in Case 9 version 2, the branch condition is a pointer.

Misspeculation and observation refinement

The other limitation of Scam-V is associated with applying the observation refinement, as discussed in Sec. 3. By default, the misspeculation is expected to trigger at the first conditional branch and continue to all subsequently encountered conditional branches. Thus, our refinement approach negates all branch conditions to make observable memory operations that can be executed in misspeculation. However, this does not always work (e.g., Case #11*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPT and #13), as some attacks do not follow this pattern. To overcome this limitation and ensure that Scam-V can analyze cases in which potential memory accesses are not always in the opposite branch, we had to manually decide the condition of which if statements must be negated.

Loops in symbolic execution

Handling infinite loops is challenging in symbolic execution. In our experiments, only Case #5 contains an infinite loop, which we handle by performing a one-time unrolling and back-edge cutting of the loop at the LLVM level. For this specific case, one iteration of the loop body was enough to detect the leakage. For all other cases which contain a finite loop, we have performed loop unrolling. The same problem can also be addressed by introducing a precondition to constrain variables involved in the loop condition to a specific range.

5.1.7. More details on Case #1 and #10

Ex. #Exp   Cortex-A53   Cortex-A72
#C #I #H #OpH ExT HExT OpHExT   #C #I TypeH #H #OpH ExT HExT OpHExT
01 -O0 500   0 0 6 / 928   927.57   0.53 932   931   0.82 /   464 34 6 1 863   860.29   1.80 948   946.71   1.38 1042   1024.14   14.71
-O2 500   0 0 4 / 688   688   0.0 908   906.14   1.07 /   500 0 aSLH 4 1     866   861.71   2.29 944   937.43   6.45
        FI 1 / 686   685.71   0.49 999   998.29   0.49 /
10 -O0 500*3   0 0 7 / 936   935.29   0.76 1331   1331   0.0 /   602 0 7 1 915   912.86   1.77 1334   1332.29   1.98 1091   1089.43   1.13
-O2 500*3   0 0 4 / 699   698.57   0.53 836   835.86   0.38 /   620 216 aSLH 4 1     1031   1029.57   0.98 803   802.43   0.53
        FI 1 / 733   731.29   1.11 1049   1046.43   1.51 /
SSL_get_shared_sigalgs -O0 105*5   0 0 35 / 2419   2418.43   0.53 3686   3685.43   0.53 /   0 0 35 / 2362   2361   1.41 3390   3382.14   4.91 /
-O2 85*6   0 0 9 / 1190   1190   0.0 934   933.29   0.49 /   78 216 aSLH 9 1 1020   1018.57   0.98 1094   1093.29   0.49 1090   1089.43   0.53
Table 2. Analysis of specific cases for a private array index (i.e., idx) and Spectre-PHT gadgets from OpenSSL. TypeH column indicates the type of applied hardening: none (default LLVM slh), aSLH (poisoning on memory addresses), FI (dsb+++isb).

We have investigated Case #1 (the prime Spectre-PHT example) and Case #10 (our running example in the paper) in more detail under different (i) compiler optimizations and (ii) when different security labels are assigned to the idx variable. We summarize our findings as follows.

Case #1

When idx is labelled as a public variable and the code is compiled with optimization level -O0, Scam-V detects the leakage and refines the slh hardening. However, all protections are ultimately removed due to the effect of code alignment changes as discussed in 5.1.5. When idx is labelled as private and the code is compiled with optimization -O0, Scam-V can also detect the leakage. Subsequently, Scam-V refines the slh protection to retain only the hardening of the vulnerable load. However, this refinement did not result in a performance improvement.

Similarly, with a public idx and the optimization level set to -O2, Scam-V identifies the leakage and refines the slh protection to retain only those that are essential for the vulnerable load. Notably, our refinement did not improve performance in this case either. Finally, when idx is labelled as private and the code is compiled with -O2 enabled, Scam-V detects the leakage. However, the LLVM slh implementation fails to prevent leakage. As a result, we employed alternative hardening methods for the program: slh on memory addresses (aSLH555We borrowed the terminology from  (Zhang et al., 2023)) and fence insertion. Refining aSLH did not improve performance. In the latter case, the speculative barrier formed by dsb and isb increased the execution time.

Case #10

When idx is labelled as a public value, and the code is compiled with optimization level -O0, Scam-V does not identify any leakage. Thus, there is no need to protect the code with slh. However, when idx is labelled as private and the code is compiled with optimization -O0, Scam-V successfully detects the leakage. Subsequently, Scam-V refines the slh protection to retain only the essential hardening for the vulnerable load. Notably, our slh refinement enhances performance in this scenario.

With a public idx and an optimization level set to -O2, Scam-V fails to detect any leakages, making hardening unnecessary. Finally, when idx is considered as private and the code is compiled with -O2 enabled, Scam-V identifies the leakage. However, slh does not provide effective protection to stop the leakage. Therefore, alternative countermeasures, including refined aSLH and fence insertion, are employed. In this scenario, refining aSLH results in performance improvement.

5.2. Analyzing crypto libraries

To show Scam-V scales to real-world case studies, we used Spectre-PHT gadgets in OpenSSL v3.1.0 that had been discussed in (Mosier et al., 2022b). Among the three vulnerable gadgets, namely EVP_PKEY_asn1_get0, ts_check_status_info and SSL_get_shared_sigalgs, Scam-V identifies only SSL_get_shared_sigalgs (due to a leakage at line #44 in the snippet below) to be vulnerable on actual hardware (see Table 2). Others did not trigger speculation as the speculation primitive’s (i.e., branch condition) operands were already loaded into the cache by the code executed before the if statement. To protect against SSL_get_shared_sigalgs leakage, we initially applied the default LLVM slh countermeasure. However, employing slh solely on loaded values did not mitigate the leakage. Subsequently, by adopting slh on the memory load addresses, we could stop the leakage.

[Uncaptioned image]

Function from OpenSSL v3.1.0 that is vulnerable to Spectre-PHT attack. There is a dangerous memory access in which the miss-speculation could potentially trigger a speculative out-of-bounds load, allowing access to an arbitrary secret.

6. Related work

Several studies tried to identify and mitigate transient execution attacks. We only discuss a few studies relevant to our results. For a comprehensive list of existing work, we refer the reader to (Xiong and Szefer, 2021; Cauligi et al., 2022).

Detecting Spectre Attacks

Widely used techniques for detecting Spectre-style vulnerabilities include Symbolic execution and relation analysis (Guarnieri et al., 2020; Daniel et al., 2021; Nemati et al., 2020a), as well as fuzzing (Oleksenko et al., 2020, 2022), both at the machine code  (Wang et al., 2021; Guarnieri et al., 2020; Cauligi et al., 2020; Daniel et al., 2021; Oleksenko et al., 2020; Cheang et al., 2019) and LLVM-IR levels (Wang et al., 2020; Wu and Wang, 2019; Guo et al., 2020). Yet, most existing tools either do not scale well or face qualitative limitations. For example, SpecFuzz (Oleksenko et al., 2020) simulates the execution of code fragments in misspeculated branches and uses input fuzzing to pinpoint programs’ vulnerability to Spectre attacks. However, it does not perform well for nested speculation and inherits limitations of fuzzing, e.g., input coverage. Scam-V (Nemati et al., 2020a) uses instruction fuzzing and relational testing to synthesize test cases and check the vulnerability of modern processors to Spectre-PHT. A similar approach was used by Revizor (Oleksenko et al., 2022, 2023) to identify Spectre-PHT/STL (Store-to-Load forwarding). However, in contrast to Revizor, Scam-V utilizes observation refinement and symbolic execution to guide input generation and reduce the search space, thus requiring fewer test cases to uncover the potential leakages (Buiras et al., 2021). The other symbolic execution-based approaches include Spectector (Guarnieri et al., 2020; Fabian et al., 2022) (detects Spectre-PHT/STL/RSB (Return Stack Buffer)), KLEESpec (Wang et al., 2020) (detects Spectre-PHT), Pitchfork (Cauligi et al., 2020) (detects Spectre-PHT/STL), and BH (Daniel et al., 2021) (detects Spectre-PHT/STL); all with scalability limitations.

Software Mitigations

There is also a growing body of work that (formally) analyze programs’ vulnerabilities and mitigate leakages using software measures (Barthe et al., 2021; Cauligi et al., 2020; Patrignani and Guarnieri, 2021; Guarnieri et al., 2020; Vassena et al., 2021; Guanciale et al., 2020; Guarnieri et al., 2021; Shivakumar et al., 2022). For example, oo7 (Wang et al., 2021) uses taint tracking to find Spectre-PHT attacks and inserts lfence to stop the leakage. Cauligi et al. (Cauligi et al., 2020) proposed Pitchfork based on the concept of speculative constant-time for speculative execution. However, while their theoretical developments suggest inserting fences to mitigate leakages, Pitchfork does not provide this in practice. InSpectre (Guanciale et al., 2020) offers an operational model to aid in reasoning about countermeasures and transient execution attacks. Patrignani and Guarnieri (Patrignani and Guarnieri, 2021) analyzed the effects of compiler transformations and countermeasures on speculative execution security. They showed that the existing slh mitigation in LLVM is inadequate for stopping Spectre-PHT leakage in programs and proposed a more powerful version of slh to prevent data leaks. Shivakumar et al. (Shivakumar et al., 2022) demonstrated the ineffectiveness of the LLVM primitives to mitigate Spectre-PHT, proposing a new slh variant to address the limitations of the existing LLVM slh mitigation. Blade (Vassena et al., 2021) employs a static type system to detect transient leakage and uses lfences or slh to mitigate the found leaks in constant-time WebAssembly. None of these mitigations are easily deployable in an existing toolchain, such as LLVM’s lfence and slh mitigations. Mosier et al. (Mosier et al., 2022a) introduced leakage containment models (LCMs), which are axiomatic security contracts designed to formally model and automatically detect leakages in programs.

While some of these works use static analysis techniques to optimize the number of fences, e.g., (Wang et al., 2021; Mosier et al., 2022a), we are not aware of any work optimizing fence placement by consulting the hardware.

Hardware Mitigations

The research community also proposed several hardware defenses against Spectre attacks. Hardware-level mitigations can be grouped into two main classes. The first class are techniques that hide the effect of speculative access instructions (Yan et al., 2019; Khasawneh et al., 2019; Sakalis et al., 2019; Saileshwar and Qureshi, 2019) by, e.g., introducing a speculative buffer (Yan et al., 2019) or shadow hardware structures to squash microarchitectural state changes if the processor mispredicts (Khasawneh et al., 2019). The second class includes techniques that leverage information flow tracking to block leakages by preventing data forwarding between speculatively executed access and transmitter instructions (Yu et al., 2019; Weisse et al., 2019; Loughlin et al., 2021).

Hardware-Software Co-design

In order to deliver the promised security guarantees without sacrificing performance, hardware-based mitigations require significant modifications in hardware. Instead, there exist also works that propose a software-hardware co-design approach. Examples of such techniques include (Taram et al., 2019; Koruyeh et al., 2020; Li et al., 2019). For example, Taram et al. (Taram et al., 2019) proposed the concept of context-sensitive fencing that uses taint tracking to find the optimal location for inserting fences at the decoder level. They also make various speculative barriers available to software.

7. Concluding Remarks

We explored the necessity of hardenings introduced by the LLVM slh pass against Spectre-PHT by taking into account the properties of the underlying microarchitecture. Our experiments highlighted several interesting results. We showed that the vulnerability of programs to Spectre attacks and the required level of protection to stop potential leaks strictly depend on the properties of the underlying processor and the compiler optimization level. Additionally, we showed that there are unexpected factors (e.g., code alignment) that can impact the vulnerability of programs to side-channel attacks.

Scam-V’s current implementation only supports the ARM and RISC-V architectures, but porting it to other architectures like x86 just requires extending the binary-to-BIR translation module. Moreover, we only focused on Spectre-PHT. Covering other variants like Spectre-STL mainly requires developing new observation refinement techniques to synthesize an equivalence relation that can be used to generate suitable test cases and training data w.r.t the variant under test.

Acknowledgments

This work was supported in part by a gift from Intel. We thank the anonymous reviewers for their valuable feedback during the review process.

References

  • (1)
  • Arm Limited (2013) Arm Limited. 2013. Cortex-M0+ Technical Reference Manual r0p0 (r0p0 ed.). Arm Limited, Cambridge, UK. https://developer.arm.com/documentation/ddi0432/latest/
  • Barthe et al. (2021) Gilles Barthe, Sunjay Cauligi, Benjamin Grégoire, Adrien Koutsos, Kevin Liao, Tiago Oliveira, Swarn Priya, Tamara Rezk, and Peter Schwabe. 2021. High-Assurance Cryptography in the Spectre Era. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021.
  • Barthe et al. (2004) Gilles Barthe, Pedro R. D’Argenio, and Tamara Rezk. 2004. Secure Information Flow by Self-Composition. In 17th IEEE Computer Security Foundations Workshop, (CSFW-17 2004), 28-30 June 2004, Pacific Grove, CA, USA. 100–114. https://doi.org/10.1109/CSFW.2004.17
  • Buiras et al. (2021) Pablo Buiras, Hamed Nemati, Andreas Lindner, and Roberto Guanciale. 2021. Validation of Side-Channel Models via Observation Refinement. In MICRO ’21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Greece, October 18-22. https://doi.org/10.1145/3466752.3480130
  • Carruth (2018) Chandler Carruth. 2018. RFC: Speculative load hardening (a Spectre variant #1 mitigation). https://llvm.org/docs/SpeculativeLoadHardening.html. Accessed October 2022.
  • Carruth (2020) Chandler Carruth. 2020. Cryptographic software in a post-Spectre world. Talk at the Real World Crypto Symposium. https://chandlerc.blog/talks/2020_post_spectre_crypto/post_spectre_crypto.html. Accessed October 2022.
  • Cauligi et al. (2020) Sunjay Cauligi, Craig Disselkoen, Klaus v. Gleissenthall, Dean Tullsen, Deian Stefan, Tamara Rezk, and Gilles Barthe. 2020. Constant-Time Foundations for the New Spectre Era. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation.
  • Cauligi et al. (2022) Sunjay Cauligi, Craig Disselkoen, Daniel Moghimi, Gilles Barthe, and Deian Stefan. 2022. SoK: Practical Foundations for Spectre Defenses. (2022).
  • Cha et al. (2012) Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing Mayhem on Binary Code. In IEEE Symposium on Security and Privacy, SP 2012, 21-23 May 2012, San Francisco, California, USA. 380–394. https://doi.org/10.1109/SP.2012.31
  • Cheang et al. (2019) Kevin Cheang, Cameron Rasmussen, Sanjit Seshia, and Pramod Subramanyan. 2019. A Formal Approach to Secure Speculation. In 2019 IEEE 32nd Computer Security Foundations Symposium (CSF).
  • Daniel et al. (2021) Lesly-Ann Daniel, Sébastien Bardin, and Tamara Rezk. 2021. Hunting the Haunter - Efficient Relational Symbolic Execution for Spectre with Haunted RelSE. In 28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021.
  • Fabian et al. (2022) Xaver Fabian, Marco Guarnieri, and Marco Patrignani. 2022. Automatic Detection of Speculative Execution Combinations. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, November 7-11, 2022. 965–978. https://doi.org/10.1145/3548606.3560555
  • Guanciale et al. (2020) Roberto Guanciale, Musard Balliu, and Mads Dam. 2020. InSpectre: Breaking and Fixing Microarchitectural Vulnerabilities by Formal Analysis. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.
  • Guarnieri et al. (2020) M. Guarnieri, B. Köpf, J. F. Morales, J. Reineke, and A. Sánchez. 2020. Spectector: Principled Detection of Speculative Information Flows. In 2020 IEEE Symposium on Security and Privacy (SP).
  • Guarnieri et al. (2021) Marco Guarnieri, Boris Köpf, Jan Reineke, and Pepe Vila. 2021. Hardware-Software Contracts for Secure Speculation. In 2021 IEEE Symposium on Security and Privacy.
  • Guo et al. (2020) Shengjian Guo, Yueqi Chen, Peng Li, Yueqiang Cheng, Huibo Wang, Meng Wu, and Zhiqiang Zuo. 2020. SpecuSym: Speculative Symbolic Execution for Cache Timing Leak Detection. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.
  • Horn (2018) Jann Horn. 2018. Speculative execution, variant 4: Speculative store bypass. https://bugs.chromium.org/p/project-zero/issues/detail?id=1528
  • Hu et al. (2023) Guangyuan Hu, Zecheng He, and Ruby B. Lee. 2023. SoK: Hardware Defenses Against Speculative Execution Attacks. CoRR abs/2301.03724 (2023). https://doi.org/10.48550/arXiv.2301.03724
  • Khasawneh et al. (2019) Khaled N. Khasawneh, Esmaeil Mohammadian Koruyeh, Chengyu Song, Dmitry Evtyushkin, Dmitry Ponomarev, and Nael B. Abu-Ghazaleh. 2019. SafeSpec: Banishing the Spectre of a Meltdown with Leakage-Free Speculation. In Proceedings of the 56th Annual Design Automation Conference 2019, DAC 2019, Las Vegas, NV, USA, June 02-06, 2019. 60. https://doi.org/10.1145/3316781.3317903
  • Kocher (2018) Paul Kocher. 2018. Spectre Mitigations in Microsoft’s C/C++ Compiler. https://www.paulkocher.com/doc/MicrosoftCompilerSpectreMitigation.html.
  • Kocher et al. (2019) Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Execution. In 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019. 1–19. https://doi.org/10.1109/SP.2019.00002
  • Koruyeh et al. (2018) Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, Chengyu Song, and Nael B. Abu-Ghazaleh. 2018. Spectre Returns! Speculation Attacks using the Return Stack Buffer. 12th USENIX Workshop on Offensive Technologies (WOOT) (2018).
  • Koruyeh et al. (2020) Esmaeil Mohammadian Koruyeh, Shirin Haji Amin Shirazi, Khaled N. Khasawneh, Chengyu Song, and Nael B. Abu-Ghazaleh. 2020. SpecCFI: Mitigating Spectre Attacks using CFI Informed Speculation. In 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020. 39–53. https://doi.org/10.1109/SP40000.2020.00033
  • Lattner and Adve (2003) Chris Lattner and Vikram Adve. 2003. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. Tech. Report UIUCDCS-R-2003-2380. Computer Science Dept., Univ. of Illinois at Urbana-Champaign.
  • Li et al. (2019) Peinan Li, Lutan Zhao, Rui Hou, Lixin Zhang, and Dan Meng. 2019. Conditional Speculation: An Effective Approach to Safeguard Out-of-Order Execution Against Spectre Attacks. In 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, DC, USA, February 16-20, 2019. 264–276. https://doi.org/10.1109/HPCA.2019.00043
  • Lindner et al. (2019) Andreas Lindner, Roberto Guanciale, and Roberto Metere. 2019. TrABin: Trustworthy analyses of binaries. 174 (2019), 72–89. https://doi.org/10.1016/j.scico.2019.01.001
  • Loughlin et al. (2021) Kevin Loughlin, Ian Neal, Jiacheng Ma, Elisa Tsai, Ofir Weisse, Satish Narayanasamy, and Baris Kasikci. 2021. DOLMA: Securing Speculation with the Principle of Transient Non-Observability. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021. 1397–1414. https://www.usenix.org/conference/usenixsecurity21/presentation/loughlin
  • Miller (2018) Matt Miller. 2018. Mitigating speculative execution side channel hardware vulnerabilities. https://msrc-blog.microsoft.com/2018/03/15/mitigating-speculative-execution-side-channel-hardware-vulnerabilities/
  • Mosier et al. (2022a) Nicholas Mosier, Hanna Lachnitt, Hamed Nemati, and Caroline Trippel. 2022a. Axiomatic hardware-software contracts for security. In ISCA 2022: The 49th Annual International Symposium on Computer Architecture, New York, USA, June 18 - 22. https://doi.org/10.1145/3470496.3527412
  • Mosier et al. (2022b) Nicholas Mosier, Hanna Lachnitt, Hamed Nemati, and Caroline Trippel. 2022b. Clou. https://github.com/nmosier/clou.
  • Mosier et al. (2022c) Nicholas Mosier, Hamed Nemati, and Caroline Trippel. 2022c. Clou. https://github.com/nmosier/clou
  • Nemati et al. (2020a) Hamed Nemati, Pablo Buiras, Andreas Lindner, Roberto Guanciale, and Swen Jacobs. 2020a. Validation of Abstract Side-Channel Models for Computer Architectures. In Computer Aided Verification - 32nd International Conference, CAV 2020 Los Angeles, CA, USA, July 21-24. https://doi.org/10.1007/978-3-030-53288-8_12
  • Nemati et al. (2020b) Hamed Nemati, Andreas Lindner, and Pablo Buiras. 2020b. Scam-V. https://github.com/kth-step/HolBA/tree/dev_scamv
  • Oleksenko et al. (2022) Oleksii Oleksenko, Christof Fetzer, Boris Köpf, and Mark Silberstein. 2022. Revizor: testing black-box CPUs against speculation contracts. In ASPLOS ’22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022 - 4 March 2022. 226–239. https://doi.org/10.1145/3503222.3507729
  • Oleksenko et al. (2023) Oleksii Oleksenko, Marco Guarnieri, Boris Köpf, and Mark Silberstein. 2023. Hide and Seek with Spectres: Efficient discovery of speculative information leaks with random testing. CoRR abs/2301.07642 (2023). https://doi.org/10.48550/arXiv.2301.07642
  • Oleksenko et al. (2018) Oleksii Oleksenko, Bohdan Trach, Tobias Reiher, Mark Silberstein, and Christof Fetzer. 2018. You Shall Not Bypass: Employing data dependencies to prevent Bounds Check Bypass. abs/1805.08506 (2018). http://arxiv.org/abs/1805.08506
  • Oleksenko et al. (2020) Oleksii Oleksenko, Bohdan Trach, Mark Silberstein, and Christof Fetzer. 2020. SpecFuzz: Bringing Spectre-type vulnerabilities to the surface. In 29th USENIX Security Symposium (USENIX Security 20).
  • Patrignani and Guarnieri (2021) Marco Patrignani and Marco Guarnieri. 2021. Exorcising Spectres with Secure Compilers. In CCS ’21: 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, November 15 - 19, 2021. 445–461. https://doi.org/10.1145/3460120.3484534
  • Saileshwar and Qureshi (2019) Gururaj Saileshwar and Moinuddin K. Qureshi. 2019. CleanupSpec: An ”Undo” Approach to Safe Speculation. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. 73–86. https://doi.org/10.1145/3352460.3358314
  • Sakalis et al. (2019) Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jimborean, and Magnus Själander. 2019. Efficient invisible speculative execution through selective delay and value prediction. In Proceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019. 723–735. https://doi.org/10.1145/3307650.3322216
  • Shivakumar et al. (2022) Basavesh Ammanaghatta Shivakumar, Jack Barnes, Gilles Barthe, Sunjay Cauligi, Chitchanok Chuengsatiansup, Daniel Genkin, Sioli O’Connell, Peter Schwabe, Rui Qi Sim, and Yuval Yarom. 2022. Spectre Declassified: Reading from the Right Place at the Wrong Time. Cryptology ePrint Archive, Paper 2022/426. https://eprint.iacr.org/2022/426 https://eprint.iacr.org/2022/426.
  • Shoshitaishvili et al. (2016) Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In IEEE Symposium on Security and Privacy.
  • Taram et al. (2019) Mohammadkazem Taram, Ashish Venkat, and Dean M. Tullsen. 2019. Context-Sensitive Fencing: Securing Speculative Execution via Microcode Customization. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019. 395–410. https://doi.org/10.1145/3297858.3304060
  • Vassena et al. (2021) Marco Vassena, Craig Disselkoen, Klaus von Gleissenthall, Sunjay Cauligi, Rami Gökhan Kıcı, Ranjit Jhala, Dean Tullsen, and Deian Stefan. 2021. Automatically Eliminating Speculative Leaks from Cryptographic Code with Blade. Proc. ACM Program. Lang. (2021).
  • Wang et al. (2020) Guanhua Wang, Sudipta Chattopadhyay, Arnab Kumar Biswas, Tulika Mitra, and Abhik Roychoudhury. 2020. KLEESpectre: Detecting Information Leakage through Speculative Cache Attacks via Symbolic Execution. ACM Trans. Softw. Eng. Methodol. 29, 3 (2020), 14:1–14:31. https://doi.org/10.1145/3385897
  • Wang et al. (2021) Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, and Abhik Roychoudhury. 2021. oo7: Low-Overhead Defense Against Spectre Attacks via Program Analysis. (2021).
  • Weisse et al. (2019) Ofir Weisse, Ian Neal, Kevin Loughlin, Thomas F. Wenisch, and Baris Kasikci. 2019. NDA: Preventing Speculative Execution Attacks at Their Source. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. 572–586. https://doi.org/10.1145/3352460.3358306
  • Wu and Wang (2019) Meng Wu and Chao Wang. 2019. Abstract Interpretation under Speculative Execution. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation.
  • Xiong and Szefer (2021) Wenjie Xiong and Jakub Szefer. 2021. Survey of Transient Execution Attacks and Their Mitigations. ACM Comput. Surv. 54, 3 (May 2021).
  • Yan et al. (2019) Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christopher W. Fletcher, and Josep Torrellas. 2019. InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy (Corrigendum). In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. 1076. https://doi.org/10.1145/3352460.3361129
  • Yarom and Falkner (2014) Yuval Yarom and Katrina Falkner. 2014. Flush+Reload: a high resolution, low noise, L3 cache side-channel attack. In Proceedings of the 23rd USENIX Conference on Security Symposium. 719–732.
  • Yu et al. (2019) Jiyong Yu, Mengjia Yan, Artem Khyzha, Adam Morrison, Josep Torrellas, and Christopher W. Fletcher. 2019. Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture.
  • Zhang et al. (2023) Zhiyuan Zhang, Gilles Barthe, Chitchanok Chuengsatiansup, Peter Schwabe, and Yuval Yarom. 2023. Ultimate SLH: Taking Speculative Load Hardening to the Next Level. In 32nd USENIX Security Symposium, USENIX Security 2023, Anaheim, CA, USA, August 9-11, 2023.
LS4ZhTiRJgqIoVBRFIoric47jPnmeB1mW/9rr9ZpSSn3Lsmir1fJZlqWlUonKsvwWwD8ymc/nXwVBeLjf7xEKhdBut9Hr9WgmkyGEkJwsy5eHG5vN5g0AKIoCAEgkEkin0wQAfN9/cXPdheu6P33fBwB4ngcAcByHJpPJl+fn54mD3Gg0NrquXxeLRQAAwzAYj8cwTZPwPH9/sVg8PXweDAauqqr2cDjEer1GJBLBZDJBs9mE4zjwfZ85lAGg2+06hmGgXq+j3+/DsixYlgVN03a9Xu8jgCNCyIegIAgx13Vfd7vdu+FweG8YRkjXdWy329+dTgeSJD3ieZ7RNO0VAXAPwDEAO5VKndi2fWrb9jWl9Esul6PZbDY9Go1OZ7PZ9z/lyuD3OozU2wAAAABJRU5ErkJggg==" alt="[LOGO]">