|
| 1 | +# IR SSA Construction |
| 2 | + |
| 3 | +This document describes how Static Single Assignment (SSA) form is constructed for the Intermediate |
| 4 | +Representation (IR). The SSA form that we use is based on the traditional [SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form) |
| 5 | +commonly used in compilers, with additional extensions to support accesses to aliased memory |
| 6 | +inspired by [ChowCLLS96](https://link.springer.com/chapter/10.1007%2F3-540-61053-7_66). |
| 7 | + |
| 8 | +SSA construction takes as input an instance of the IR, and creates a new instance of the IR that is |
| 9 | +in SSA form. If the input IR is already in SSA form, SSA construction will still recompute SSA form |
| 10 | +from scratch. However, the input SSA information will be taken into account to improve the alias |
| 11 | +analysis that guides the new SSA computation. The current implementation creates three successive |
| 12 | +instances of the IR: |
| 13 | +- *Raw IR* is constructed directly from the original AST. Raw IR does not have any of its memory |
| 14 | +accesses in SSA form. |
| 15 | +- *Unaliased SSA IR* is constructed from Raw IR. It places memory accesses in SSA form only for |
| 16 | +accesses to unescaped local variables that are loaded or stored in their entirety, and as their |
| 17 | +declared type. Accesses to aliased memory are not modeled, nor are accesses to variables that have |
| 18 | +any partial reads or writes. |
| 19 | +- *Aliased SSA IR* is constructed from Unaliased SSA IR. All memory accesses are placed in SSA form, |
| 20 | +including accesses to aliased memory. |
| 21 | + |
| 22 | +Constructing SSA form involves three steps in succession: Alias analysis, the memory model, and |
| 23 | +the actual SSA construction itself. Each step is a module that is parameterized on an implementation |
| 24 | +of the previous step, so the memory model and alias analysis modules can be replaced in order to |
| 25 | +provide different analysis heuristics or performance/precision tradeoffs. |
| 26 | + |
| 27 | +## Alias Analysis |
| 28 | +The alias analysis component is responsible for determining two closely related sets of facts about |
| 29 | +the input IR: What memory is being accessed by each memory operand or memory result, and which |
| 30 | +variables "escape" such that the analysis can no longer precisely track all accesses to those |
| 31 | +variables. This information is consumed by the memory model component, but is not consumed directly |
| 32 | +by the actual SSA construction. |
| 33 | + |
| 34 | +The current alias analysis exposes two predicates: |
| 35 | + |
| 36 | +``` |
| 37 | +predicate resultPointsTo(Instruction instr, IRVariable var, IntValue bitOffset); |
| 38 | +
|
| 39 | +predicate variableAddressEscapes(IRVariable var); |
| 40 | +``` |
| 41 | + |
| 42 | +The `resultPointsTo` predicate computes, for each `Instruction`, the `IRVariable` that is pointed |
| 43 | +into by the result of that `Instruction`, and the bit offset that the result of the `Instruction` |
| 44 | +points to within that variable. If it can not prove that the result points into exactly one |
| 45 | +`IRVariable`, then the predicate does not hold. If the result is known to point into a specific |
| 46 | +`IRVariable`, but the offset is unknown, then the predicate will hold, but the `bitOffset` parameter |
| 47 | +will be `Ints::unknown()`. This is useful for cases including array accesses, where the array index |
| 48 | +may be computed at runtime, but it is known that some element in the array, rather than to some |
| 49 | +arbitrary unknown ___location. |
| 50 | + |
| 51 | +The `variableAddressEscapes` predicate computes the set of `IRVariable`s whose address "escapes". A |
| 52 | +variable's address escapes if there is a possibility that there exists a memory access somewhere in |
| 53 | +the program that access the variable, without that access being modeled by the `resultPointsTo` |
| 54 | +predicate. Common reasons for a variable's address escaping include: |
| 55 | +- The address is assigned into a global variable, heap memory, or some other ___location where code may |
| 56 | +be able to later dereference the address outside the scope of the `resultPointsTo` analysis. |
| 57 | +- The address is passed as an argument to a function, unless the called function is known not to |
| 58 | +retain that address after it returns. |
| 59 | + |
| 60 | +### Current Implementation |
| 61 | +The current alias analysis implementation can track the pointed-to variable and offset through |
| 62 | +copies, pointer arithmetic, and field offset computations. If the input IR is already in SSA form, |
| 63 | +even an address assigned to a local variable can be tracked. |
| 64 | + |
| 65 | +## Memory Model |
| 66 | +The memory model uses the results of alias analysis to describe the memory ___location accessed by each |
| 67 | +memory operand or memory result in the function. It exposes two classes and three non-member |
| 68 | +predicate: |
| 69 | + |
| 70 | +``` |
| 71 | +class MemoryLocation { |
| 72 | + VirtualVariable getVirtualVariable(); |
| 73 | +} |
| 74 | +
|
| 75 | +class VirtualVariable extends MemoryLocation { |
| 76 | +} |
| 77 | +
|
| 78 | +MemoryLocation getResultMemoryLocation(Instruction instr); |
| 79 | +
|
| 80 | +MemoryLocation getOperandMemoryLocation(MemoryOperand operand); |
| 81 | +
|
| 82 | +Overlap getOverlap(MemoryLocation def, MemoryLocation use); |
| 83 | +``` |
| 84 | + |
| 85 | +A `MemoryLocation` represents the set of bits of memory read by a memory operand or written by a |
| 86 | +memory result. The `getResultMemoryLocation` predicate returns the `MemoryLocation` written by the |
| 87 | +result of the specified `Instruction`, and the `getOperandMemoryLocation` predicate returns the |
| 88 | +`MemoryLocation` read by the specified `MemoryOperand`. From the point of view of the SSA |
| 89 | +construction module, which consumes the memory model, `MemoryLocation` is essentially opaque. The |
| 90 | +memory model can assign `MemoryLocation`s to memory accesses however it wants, as long as the few |
| 91 | +basic constraints outlined later in this section are respected. |
| 92 | + |
| 93 | +The `getOverlap` predicate returns the overlap relationship between a definition of ___location `def` |
| 94 | +and a use of the ___location `use`. The possible overlap relationships are as follows: |
| 95 | +- `MustExactlyOverlap` - The set of bits written by the definition is identical to the set of bits |
| 96 | +read by the use, *and* the data type of both the definition and the use are the same. |
| 97 | +- `MustTotallyOverlap` - Either the set of bits written by the definition is a proper superset of |
| 98 | +the bits read by the use, or the set of bits written by the definition is identical to that of the |
| 99 | +use, but the data type of the definition differs from that of the use. |
| 100 | +- `MayPartiallyOverlap` - Neither of the two relationships above apply, but there may be at least |
| 101 | +one bit written by the definition that is read by the use. `MayPartiallyOverlap` is always a sound |
| 102 | +result, because it is technically correct even if the actual overlap at runtime is exact, total, or |
| 103 | +even no overlap at all. |
| 104 | +- (No result) - The definition does not overlap the use at all. |
| 105 | + |
| 106 | +Each `MemoryLocation` is associated with exactly one `VirtualVariable`. A `VirtualVariable` |
| 107 | +represents a set of `MemoryLocation`s such that any two `MemoryLocation`s that overlap have the same |
| 108 | +`VirtualVariable`. Note that each `VirtualVariable` is itself a `MemoryLocation` that totally |
| 109 | +overlaps each of its member `MemoryLocation`s. `VirtualVariable`s are used in SSA construction to |
| 110 | +separate the problem of matching uses and definitions by partitioning memory locations into groups |
| 111 | +that do not overlap with one another. |
| 112 | + |
| 113 | +### Current Implementation |
| 114 | +#### Unaliased SSA |
| 115 | +The current memory model used to construct Unaliased SSA models only variables that are unescaped, |
| 116 | +and always accessed in their entirety via their declared type. There is one `MemoryLocation` for |
| 117 | +each unescaped `IRVariable`, and each `MemoryLocation` is its own `VirtualVariable`. The overlap |
| 118 | +relationship is simple: Each `MemoryLocation` exactly overlaps itself, and does not overlap any |
| 119 | +other `MemoryLocation`. |
| 120 | + |
| 121 | +#### Aliased SSA |
| 122 | +The current memory model used to construct Aliased SSA models every memory access. There are two |
| 123 | +kinds of `MemoryLocation`: |
| 124 | +- `VariableMemoryLocation` represents an access to a known `IRVariable` with a specific type, at a bit |
| 125 | +offset that may or may not be a known constant. `VariableMemoryLocation` represents any access to a |
| 126 | +known `IRVariable` even if that variable's address escapes. |
| 127 | +- `UnknownMemoryLocation` represents an access where the memory being accessed is not known to be part |
| 128 | +of a single specific `IRVariable`. |
| 129 | + |
| 130 | +In addition, there are two different kinds of `VirtualVariable`: |
| 131 | +- `VariableVirtualVariable` represents an `IRVariable` whose address does not escape. The |
| 132 | +`VariableVirtualVariable` is just the `VariableMemoryLocation` that represents an access to the entire |
| 133 | +`IRVariable` with its declared type. |
| 134 | +- `UnknownVirtualVariable` represents all memory that is not covered by a `VariableVirtualVariable`. |
| 135 | +This includes the `UnknownMemoryLocation`, as well as any `VariableMemoryLocation` whose |
| 136 | +`IRVariable`'s address escapes. |
| 137 | + |
| 138 | +The overlap relationship for this model is slightly more complex than that of Unaliased SSA. A |
| 139 | +definition of a `VariableMemoryLocation` overlaps a use of another `VariableMemoryLocation` if both |
| 140 | +locations have the same `IRVariable` and the offset ranges overlap. The overlap kind is determined |
| 141 | +based on the overlap of the offset ranges, and may be any of the three overlaps kinds, or no overlap |
| 142 | +at all if the offset ranges are disjoint. A definition of a `VariableMemoryLocation` overlaps a use |
| 143 | +of the `UnknownMemoryLocation` (or vice versa) if and only if the address of the `IRVariable` |
| 144 | +escapes; this is a `MayPartiallyOverlap` relationship. |
0 commit comments