Optimization is the process of improving code generated by the compiler: usually with regard to execution speed, but sometimes with respect to code size
Simple examples: constant folding, eliminating useless expressions (multiply by 1, add 0)
Two considerations:
safety: An optimization should produce code that has the same results as the unoptimized code
profitability: An optimization should be implemented in a compiler only if it will actually improve the generated code.
An optimization that is difficult to perform and yields small improvements is probably not worth implementing
Note that some optimizations that work well in the average case may, under some circumstances, actually produce code with inferior performance (example: heavy use of registers may slow down procedure calls b/c of requirement to save/restore registers).
In decreasing order of "payoff":
Register Allocation: good use of registers is "the most important feature of efficient code." A count of the number of times each variable is accessed in a procedure is made; variable usage inside loops is weighted *10. The variables with the highest usage are assigned to registers, instead of memory, if possible.
Elimination of unnecessary operations, including:
Dead code elimination:
Don't generate unreachable Code:
const
boolean debug = false;
if (debug) { ... } // this can be
skipped in code generation
Common subexpression elimination: An expression which appears multiple times in source code can be evaluated once and the result saved for use later (unless one of the variables in the subexpression is changed)
Peephole optimization: In the target code, remove unnecessary jumps (jumps to the following instruction), loads from the same memory location that was just stored, etc.
Loop invariant code motion:
Identifies code inside a loop which computes values that do not
change during loop execution, and moves these outside the
loop.
Example (from wikipedia):
for (i = 0;
i < n; ++i) {
x = y + z;
a[i]
= 6 * i + x * x;
}
can be reorganized as
follows:
x = y + z;
t1 = x * x;
for (i = 0; i
< n; ++i) {
a[i] = 6 * i + t1;
}
Replacement of costly operations with less expensive ones:
Strength Reduction: Replace expensive computations with less expensive ones. For example, 2*x is reduced to x << 1 (shift is faster than multiply); x2 is replaced by x*x
Constant Folding: Computation of constant expressions, or expressions in which a variable has a known value, at compile time
Constant Propagation: Determining whether a variable might have a constant value for all or part of a program, and using that knowledge to replace the variable with the constant value in some or all program calculations
Procedure Inlining: Replace a procedure call with the instructions in the procedure body (C++ provides direct hints to the compiler w/ inline keyword)
Tail Recursion Removal:
Replace a recursive call at the end of a procedure with a jump to
the beginning:
void countdown(int
num) {
if (num == 0) return;
print
num;
countdown(num-1);
}
changed
to
void countdown(int num)
{
begincountdown:
if (num == 0)
return;
print num;
num =
num-1;
goto begincountdown;
}
Instruction Selection: Use special instructions on the target machine (block memory moves, indexed memory access, etc.) in place of a sequence of simpler operations. Especially common with compilers for CISC architectures.
Loop Unrolling: The
compiler duplicates the body of the loop a few times, in exchange
for fewer iterations. This can increase speed (fewer branchesl;
more opportunities for pipelining), at the expense of
size
Example:
for (int i = 0; i < 10; i++)
{
printf("%d"\n, i);
}
might
be unrolled to something like:
for (int i = 0; i <
10; i+=3) {
printf("%d"\n,
i);
printf("%d"\n,
i+1);
printf("%d"\n,
i+2);
}
printf("%d"\n, 9);
An intermediate representation is a data structure that represents the program being translated during the translation process.
We've been using a parse tree as our IR.
Often, an IR uses
machine-independent instructions, sometimes called three address
code. Each instruction in three address code is called an atom,
and consists of an instruction and up to three operands. During code
generation, the compiler generates 1 or more target instructions
from each atom that target the desired target architecture.
Example atom: (ADD, A, B, C)
After semantic analysis is complete, a source optimization step traverses the tree, produces a linear IR, and performs some global optimizations on it. Then, the code generator converts the linear IR to target code. Finally, a target optimization step sifts through the target code, performing local optimizations.
Optimizations can be classified as local or global:
Local optimizations apply to small sections of target code (ex. unnecessary loads/stores; strength reduction)
Global optimizations apply within basic blocks
Other optimizations apply across basic blocks
A basic block is a maximal straight-line segment of code: code sequences that contain no labels (targets for jumps) or jump/call instructions.
First instruction in a program/procedure begins a basic block
Each label (target of a jump) begins a new basic block
Each instruction that follows a jump begins a new basic block
Example:
x := in.readint() if x > 0 then fact := 1 loop while not (x = 0) fact := fact * x x := x - 1 end loop out.writeint(fact) end if
The compiler translates this into 3-address code:
(IN, x) (GT, x, 0, t1) (JNZ, t1, L1) (STORE, 1, fact) (LABEL, L2) (MUL, fact, x, t2) (MOV, t2, fact) (SUB, x, 1, t3) (MOV, t3, x) (EQ, x, 0, t4) (JZ, t4, L2) (OUT, fact) (LABEL, L1) (HALT)
Now, compute basic blocks
Block 1
(IN, x) (GT, x, 0, t1) (JNZ, t1, L1)
Block 2 (STORE, 1, fact)
Block 3 (LABEL, L2) (MUL, fact, x, t2) (MOV, t2, fact) (SUB, x, 1, t3) (MOV, t3, x) (EQ, x, 0, t4) (JZ, t4, L2)
Block 4 (OUT, fact)
Block 5 (LABEL, L1) (HALT)
After a compiler carves your code up into basic blocks, it can use data flow analysis to determine whether it safe to perform optimizations such as constant propagation and common subexpression elimination.
To perform data flow analysis, construct a control flow graph as follows:
The nodes of the graph are the basic blocks
The edges are determined by the jumps
Example: Construct control flow graph for example above
A definition is an instruction that alters the value of a variable. Examples:
Assignment to the variable
Subroutine call in which variable is passed by reference
For each basic block, we compute the reaching definitions for each variable. A reaching definition is a definition which can affect the variable's value at the beginning of the block.
Example: Compute reaching definitions for fact and x in the graph
See example pp. 233-5
One approach is to write a tree traversal class (like SemanticChecker or CodeGen) that walks the tree, and replaces nodes whose value can be computed with a node containing the literal value.
Another approach is to attempt to compute a value for each expression node during semantic analysis, in addition to a type. During code generation, the generator checks to see if a value was successfully computed for the node, and if so, it uses the expression, rather than generating code.