CpS 450 Lecture Notes

Optimization

Introduction

Optimization

Optimization is the process of improving code generated by the compiler: usually with regard to execution speed, but sometimes with respect to code size
Simple examples: constant folding, eliminating useless expressions (multiply by 1, add 0)
Two considerations:
- safety: An optimization should produce code that has the same results as the unoptimized code
- profitability: An optimization should be implemented in a compiler only if it will actually improve the generated code.
  - An optimization that is difficult to perform and yields small improvements is probably not worth implementing
Note that some optimizations that work well in the average case may, under some circumstances, actually produce code with inferior performance (example: heavy use of registers may slow down procedure calls b/c of requirement to save/restore registers).

Sources of Code Optimization

In decreasing order of "payoff":

Register Allocation: good use of registers is "the most important feature of efficient code." A count of the number of times each variable is accessed in a procedure is made; variable usage inside loops is weighted *10. The variables with the highest usage are assigned to registers, instead of memory, if possible.
Elimination of unnecessary operations, including:
- Dead code elimination: Don't generate unreachable Code:
  const boolean debug = false; if (debug) { ... } // this can be skipped in code generation
- Common subexpression elimination: An expression which appears multiple times in source code can be evaluated once and the result saved for use later (unless one of the variables in the subexpression is changed)
- Peephole optimization: In the target code, remove unnecessary jumps (jumps to the following instruction), loads from the same memory location that was just stored, etc.
- Loop invariant code motion: Identifies code inside a loop which computes values that do not change during loop execution, and moves these outside the loop.
  
  Example (from wikipedia):
  
  for (i = 0; i < n; ++i) {
    x = y + z;
    a[i] = 6 * i + x * x;
  }
  
  can be reorganized as follows:
  
  x = y + z;
  t1 = x * x;
  for (i = 0; i < n; ++i) {
    a[i] = 6 * i + t1;
  }
Replacement of costly operations with less expensive ones:
- Strength Reduction: Replace expensive computations with less expensive ones. For example, 2*x is reduced to x << 1 (shift is faster than multiply); x² is replaced by x*x
- Constant Folding: Computation of constant expressions, or expressions in which a variable has a known value, at compile time
- Constant Propagation: Determining whether a variable might have a constant value for all or part of a program, and using that knowledge to replace the variable with the constant value in some or all program calculations
- Procedure Inlining: Replace a procedure call with the instructions in the procedure body (C++ provides direct hints to the compiler w/ inline keyword)
- Tail Recursion Removal: Replace a recursive call at the end of a procedure with a jump to the beginning:
  
  void countdown(int num) { if (num == 0) return; print num; countdown(num-1); }
  
  changed to
  
  void countdown(int num) { begincountdown: if (num == 0) return; print num; num = num-1; goto begincountdown; }
- Instruction Selection: Use special instructions on the target machine (block memory moves, indexed memory access, etc.) in place of a sequence of simpler operations. Especially common with compilers for CISC architectures.
- Loop Unrolling: The compiler duplicates the body of the loop a few times, in exchange for fewer iterations. This can increase speed (fewer branchesl; more opportunities for pipelining), at the expense of size
  
  Example:
  
  for (int i = 0; i < 10; i++) {
    printf("%d"\n, i);
  }
  
  might be unrolled to something like:
  
  for (int i = 0; i < 10; i+=3) {
    printf("%d"\n, i);
    printf("%d"\n, i+1);
    printf("%d"\n, i+2);
  }
  printf("%d"\n, 9);

Implementing Optimizations

Intermediate Representation

An intermediate representation is a data structure that represents the program being translated during the translation process.

We've been using a parse tree as our IR.
Often, an IR uses machine-independent instructions, sometimes called three address code. Each instruction in three address code is called an atom, and consists of an instruction and up to three operands. During code generation, the compiler generates 1 or more target instructions from each atom that target the desired target architecture.

Example atom: (ADD, A, B, C)
After semantic analysis is complete, a source optimization step traverses the tree, produces a linear IR, and performs some global optimizations on it. Then, the code generator converts the linear IR to target code. Finally, a target optimization step sifts through the target code, performing local optimizations.

Optimization Classifications

Optimizations can be classified as local or global:

Local optimizations apply to small sections of target code (ex. unnecessary loads/stores; strength reduction)
Global optimizations apply within basic blocks
Other optimizations apply across basic blocks

Basic Block

A basic block is a maximal straight-line segment of code: code sequences that contain no labels (targets for jumps) or jump/call instructions.

First instruction in a program/procedure begins a basic block
Each label (target of a jump) begins a new basic block
Each instruction that follows a jump begins a new basic block

Example:

x := in.readint()
if x > 0 then
  fact := 1
  loop while not (x = 0)
    fact := fact * x
    x := x - 1
  end loop
  out.writeint(fact)
end if

The compiler translates this into 3-address code:

(IN, x)
(GT, x, 0, t1)
(JNZ, t1, L1)
(STORE, 1, fact)
(LABEL, L2)
(MUL, fact, x, t2)
(MOV, t2, fact)
(SUB, x, 1, t3)
(MOV, t3, x)
(EQ, x, 0, t4)
(JZ, t4, L2)
(OUT, fact)
(LABEL, L1)
(HALT)

Now, compute basic blocks

Block 1

(IN, x)
(GT, x, 0, t1)
(JNZ, t1, L1)

Block 2
(STORE, 1, fact)

Block 3
(LABEL, L2)
(MUL, fact, x, t2)
(MOV, t2, fact)
(SUB, x, 1, t3)
(MOV, t3, x)
(EQ, x, 0, t4)
(JZ, t4, L2)

Block 4
(OUT, fact)

Block 5
(LABEL, L1)
(HALT)

Why Basic Blocks?

After a compiler carves your code up into basic blocks, it can use data flow analysis to determine whether it safe to perform optimizations such as constant propagation and common subexpression elimination.

Data Flow Analysis

To perform data flow analysis, construct a control flow graph as follows:

The nodes of the graph are the basic blocks
The edges are determined by the jumps

Example: Construct control flow graph for example above

Reaching Definitions

A definition is an instruction that alters the value of a variable. Examples:

Assignment to the variable
Subroutine call in which variable is passed by reference

For each basic block, we compute the reaching definitions for each variable. A reaching definition is a definition which can affect the variable's value at the beginning of the block.

Example: Compute reaching definitions for fact and x in the graph

Using DAGs to Eliminate Common Subexpressions

See example pp. 233-5

Sample Optimizations

Constant Folding

One approach is to write a tree traversal class (like SemanticChecker or CodeGen) that walks the tree, and replaces nodes whose value can be computed with a node containing the literal value.
Another approach is to attempt to compute a value for each expression node during semantic analysis, in addition to a type. During code generation, the generator checks to see if a value was successfully computed for the node, and if so, it uses the expression, rather than generating code.