Skip to content

Commit 82a6629

Browse files
authored
Merge pull request github#1016 from dave-bartolomeo/dave/PreciseDefs
C++: SSA flow through fields and imprecise defs
2 parents e68dda8 + 7071692 commit 82a6629

File tree

23 files changed

+4085
-2463
lines changed

23 files changed

+4085
-2463
lines changed
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# IR SSA Construction
2+
3+
This document describes how Static Single Assignment (SSA) form is constructed for the Intermediate
4+
Representation (IR). The SSA form that we use is based on the traditional [SSA](https://en.wikipedia.org/wiki/Static_single_assignment_form)
5+
commonly used in compilers, with additional extensions to support accesses to aliased memory
6+
inspired by [ChowCLLS96](https://link.springer.com/chapter/10.1007%2F3-540-61053-7_66).
7+
8+
SSA construction takes as input an instance of the IR, and creates a new instance of the IR that is
9+
in SSA form. If the input IR is already in SSA form, SSA construction will still recompute SSA form
10+
from scratch. However, the input SSA information will be taken into account to improve the alias
11+
analysis that guides the new SSA computation. The current implementation creates three successive
12+
instances of the IR:
13+
- *Raw IR* is constructed directly from the original AST. Raw IR does not have any of its memory
14+
accesses in SSA form.
15+
- *Unaliased SSA IR* is constructed from Raw IR. It places memory accesses in SSA form only for
16+
accesses to unescaped local variables that are loaded or stored in their entirety, and as their
17+
declared type. Accesses to aliased memory are not modeled, nor are accesses to variables that have
18+
any partial reads or writes.
19+
- *Aliased SSA IR* is constructed from Unaliased SSA IR. All memory accesses are placed in SSA form,
20+
including accesses to aliased memory.
21+
22+
Constructing SSA form involves three steps in succession: Alias analysis, the memory model, and
23+
the actual SSA construction itself. Each step is a module that is parameterized on an implementation
24+
of the previous step, so the memory model and alias analysis modules can be replaced in order to
25+
provide different analysis heuristics or performance/precision tradeoffs.
26+
27+
## Alias Analysis
28+
The alias analysis component is responsible for determining two closely related sets of facts about
29+
the input IR: What memory is being accessed by each memory operand or memory result, and which
30+
variables "escape" such that the analysis can no longer precisely track all accesses to those
31+
variables. This information is consumed by the memory model component, but is not consumed directly
32+
by the actual SSA construction.
33+
34+
The current alias analysis exposes two predicates:
35+
36+
```
37+
predicate resultPointsTo(Instruction instr, IRVariable var, IntValue bitOffset);
38+
39+
predicate variableAddressEscapes(IRVariable var);
40+
```
41+
42+
The `resultPointsTo` predicate computes, for each `Instruction`, the `IRVariable` that is pointed
43+
into by the result of that `Instruction`, and the bit offset that the result of the `Instruction`
44+
points to within that variable. If it can not prove that the result points into exactly one
45+
`IRVariable`, then the predicate does not hold. If the result is known to point into a specific
46+
`IRVariable`, but the offset is unknown, then the predicate will hold, but the `bitOffset` parameter
47+
will be `Ints::unknown()`. This is useful for cases including array accesses, where the array index
48+
may be computed at runtime, but it is known that some element in the array, rather than to some
49+
arbitrary unknown ___location.
50+
51+
The `variableAddressEscapes` predicate computes the set of `IRVariable`s whose address "escapes". A
52+
variable's address escapes if there is a possibility that there exists a memory access somewhere in
53+
the program that access the variable, without that access being modeled by the `resultPointsTo`
54+
predicate. Common reasons for a variable's address escaping include:
55+
- The address is assigned into a global variable, heap memory, or some other ___location where code may
56+
be able to later dereference the address outside the scope of the `resultPointsTo` analysis.
57+
- The address is passed as an argument to a function, unless the called function is known not to
58+
retain that address after it returns.
59+
60+
### Current Implementation
61+
The current alias analysis implementation can track the pointed-to variable and offset through
62+
copies, pointer arithmetic, and field offset computations. If the input IR is already in SSA form,
63+
even an address assigned to a local variable can be tracked.
64+
65+
## Memory Model
66+
The memory model uses the results of alias analysis to describe the memory ___location accessed by each
67+
memory operand or memory result in the function. It exposes two classes and three non-member
68+
predicate:
69+
70+
```
71+
class MemoryLocation {
72+
VirtualVariable getVirtualVariable();
73+
}
74+
75+
class VirtualVariable extends MemoryLocation {
76+
}
77+
78+
MemoryLocation getResultMemoryLocation(Instruction instr);
79+
80+
MemoryLocation getOperandMemoryLocation(MemoryOperand operand);
81+
82+
Overlap getOverlap(MemoryLocation def, MemoryLocation use);
83+
```
84+
85+
A `MemoryLocation` represents the set of bits of memory read by a memory operand or written by a
86+
memory result. The `getResultMemoryLocation` predicate returns the `MemoryLocation` written by the
87+
result of the specified `Instruction`, and the `getOperandMemoryLocation` predicate returns the
88+
`MemoryLocation` read by the specified `MemoryOperand`. From the point of view of the SSA
89+
construction module, which consumes the memory model, `MemoryLocation` is essentially opaque. The
90+
memory model can assign `MemoryLocation`s to memory accesses however it wants, as long as the few
91+
basic constraints outlined later in this section are respected.
92+
93+
The `getOverlap` predicate returns the overlap relationship between a definition of ___location `def`
94+
and a use of the ___location `use`. The possible overlap relationships are as follows:
95+
- `MustExactlyOverlap` - The set of bits written by the definition is identical to the set of bits
96+
read by the use, *and* the data type of both the definition and the use are the same.
97+
- `MustTotallyOverlap` - Either the set of bits written by the definition is a proper superset of
98+
the bits read by the use, or the set of bits written by the definition is identical to that of the
99+
use, but the data type of the definition differs from that of the use.
100+
- `MayPartiallyOverlap` - Neither of the two relationships above apply, but there may be at least
101+
one bit written by the definition that is read by the use. `MayPartiallyOverlap` is always a sound
102+
result, because it is technically correct even if the actual overlap at runtime is exact, total, or
103+
even no overlap at all.
104+
- (No result) - The definition does not overlap the use at all.
105+
106+
Each `MemoryLocation` is associated with exactly one `VirtualVariable`. A `VirtualVariable`
107+
represents a set of `MemoryLocation`s such that any two `MemoryLocation`s that overlap have the same
108+
`VirtualVariable`. Note that each `VirtualVariable` is itself a `MemoryLocation` that totally
109+
overlaps each of its member `MemoryLocation`s. `VirtualVariable`s are used in SSA construction to
110+
separate the problem of matching uses and definitions by partitioning memory locations into groups
111+
that do not overlap with one another.
112+
113+
### Current Implementation
114+
#### Unaliased SSA
115+
The current memory model used to construct Unaliased SSA models only variables that are unescaped,
116+
and always accessed in their entirety via their declared type. There is one `MemoryLocation` for
117+
each unescaped `IRVariable`, and each `MemoryLocation` is its own `VirtualVariable`. The overlap
118+
relationship is simple: Each `MemoryLocation` exactly overlaps itself, and does not overlap any
119+
other `MemoryLocation`.
120+
121+
#### Aliased SSA
122+
The current memory model used to construct Aliased SSA models every memory access. There are two
123+
kinds of `MemoryLocation`:
124+
- `VariableMemoryLocation` represents an access to a known `IRVariable` with a specific type, at a bit
125+
offset that may or may not be a known constant. `VariableMemoryLocation` represents any access to a
126+
known `IRVariable` even if that variable's address escapes.
127+
- `UnknownMemoryLocation` represents an access where the memory being accessed is not known to be part
128+
of a single specific `IRVariable`.
129+
130+
In addition, there are two different kinds of `VirtualVariable`:
131+
- `VariableVirtualVariable` represents an `IRVariable` whose address does not escape. The
132+
`VariableVirtualVariable` is just the `VariableMemoryLocation` that represents an access to the entire
133+
`IRVariable` with its declared type.
134+
- `UnknownVirtualVariable` represents all memory that is not covered by a `VariableVirtualVariable`.
135+
This includes the `UnknownMemoryLocation`, as well as any `VariableMemoryLocation` whose
136+
`IRVariable`'s address escapes.
137+
138+
The overlap relationship for this model is slightly more complex than that of Unaliased SSA. A
139+
definition of a `VariableMemoryLocation` overlaps a use of another `VariableMemoryLocation` if both
140+
locations have the same `IRVariable` and the offset ranges overlap. The overlap kind is determined
141+
based on the overlap of the offset ranges, and may be any of the three overlaps kinds, or no overlap
142+
at all if the offset ranges are disjoint. A definition of a `VariableMemoryLocation` overlaps a use
143+
of the `UnknownMemoryLocation` (or vice versa) if and only if the address of the `IRVariable`
144+
escapes; this is a `MayPartiallyOverlap` relationship.

cpp/ql/src/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,17 @@ module InstructionSanity {
5050
/**
5151
* Holds if instruction `instr` is missing an expected operand with tag `tag`.
5252
*/
53-
query predicate missingOperand(Instruction instr, OperandTag tag) {
54-
expectsOperand(instr, tag) and
55-
not exists(NonPhiOperand operand |
56-
operand = instr.getAnOperand() and
57-
operand.getOperandTag() = tag
53+
query predicate missingOperand(Instruction instr, string message, IRFunction func, string funcText) {
54+
exists(OperandTag tag |
55+
expectsOperand(instr, tag) and
56+
not exists(NonPhiOperand operand |
57+
operand = instr.getAnOperand() and
58+
operand.getOperandTag() = tag
59+
) and
60+
message = "Instruction '" + instr.getOpcode().toString() + "' is missing an expected operand with tag '" +
61+
tag.toString() + "' in function '$@'." and
62+
func = instr.getEnclosingIRFunction() and
63+
funcText = getIdentityString(func.getFunction())
5864
)
5965
}
6066

@@ -302,7 +308,7 @@ class Instruction extends Construction::TInstruction {
302308
result = type
303309
}
304310

305-
private string getResultTypeString() {
311+
string getResultTypeString() {
306312
exists(string valcat |
307313
valcat = getValueCategoryString(getResultType().toString()) and
308314
if (getResultType() instanceof UnknownType and

0 commit comments

Comments
 (0)