Overview

Ch 3.4: Hardware Reference Counting
Discuss the merits of hardware supports for reference counting.

Ch 3.5: Cyclic Reference Counting
Discuss how to reclaim cyclic data structures.

Ch 3.6: Issues to consider
Discuss why reference counting is chosen.

3.4 Hardware reference counting

  • Goal: Reduction of overhead by reference counting
  • Means: Using two memory
    • Main Memory
    • Reference counting memory (RCM)
ex) Update(R, S)
  1. The CPU executes *R = S;
  2. Memory is updated.
  3. RCM manages reference counts.
  • Merits:
    • The CPU doesn't have to manage reference counts.
    • No synchronization required.
      • Synchronization between client programs and tracing garbage collectors
      • Locks on reference counts
  • But hardware reference counting has not been realized...
    • Because of high development costs
    • And mark-sweep is also required to reclaim cyclic garbages.

3.5 Cyclic Reference Counting

  • Reference counting cannot reclaim cyclic data structures
    • Cyclic data structures are common
    • ex) back-pointer, recursion in functional programming languages
  • Approach:
    • Manpower
      • Burdensome, unsafe
    • Hybrid memory manager
      • Reference counting + mark-sweep
    • Utilization of domain features
      • Functional programming languages
      • Certain programming idioms

※ None of the approaches have been adopted for use by significant systems.

3.5.1 Functional Programming Language

  • Functional programming languages have recursions.
    fun loop() = loop()
  • Features of cyclic data structures in functional programming languages:
    • The circular structure is created all at once;
    • Any use of a subset of the cycle without root is copied;
    • Cycle-closing pointers to the head are tagged as such;
  • These features ensure that:
    • The cycle is treated as a single entity.
    • If the head is deleted, the entire cycle can be reclaimed.

3.5.2 Bobrow's Technique

General Idea

Distinguish internal references in cycle from external references.
  • Internal references are not be counted.
  • External references are counted as references to the cycle as a whole.

Bobrow's Idea

Programmer assigns all cells to a group.
  • Each group is reference counted.
  • Each group is reclaimed as a whole.

Algorithm

Reference count manipulations by Update.

Update(R: ptr, S: obj) =
  T = *R
  gr = group_no(R)
  if gr != group_no(S)    // R and S are in different groups
    increment_groupRC(S)
  if gr != group_no(T)    // R and T are in different groups
    decrement_groupRC(T)
    if groupRC(T) == 0
      reclaim_group(T)
  *R = S

Question

Q1: Update(A, B) RC(group 1) = 1, RC(group 2) = 0
Q2: Update(B, C) RC(group 1) = 1, RC(group 2) = 2

Drawbacks

  • Cannot reclaim sub-groups
  • Cannot reclaim inter-group cycles

Best practice

Each group comprise a single strongly connected component.
  • Minimal size of groups
  • No inter-group cycles
Strongly Connected Component (SCC)

Definition: Set of nodes each of which is reachable from every other node.

ex)
graph LR subgraph scc0 a --> b --> e --> a end subgraph scc1 f --> g --> f end subgraph scc2 c --> d --> h h --> d --> c end b --> c c --> g e --> f

3.5.3 Weak-pointer Algorithms

General Idea

Distinguish closing pointer(weak pointer) from other references.
  • Two invariants:
    • Active nodes are reachable from root via a chain of strong pointers;
    • Strong pointers do not form cycles;

Brownbridge's Idea

Each cell has two reference counters.
  • Strong reference counters (SRC)
  • Weak reference counters (WRC)

Algorithms

New
Allocate a new cell and return a strong pointer.
  • Because allocating new cells cannot create cycles.
New() = 
  newcell = allocate()
  SRC(R) = 1
  return strong(newcell)    // return strong pointer
Update
All copies of pointers are weak.
  • Because copying pointers may create cycles.
    • Conservative approach
    • This may break the following invariant:
      • Active nodes are reachable from root via a chain of strong pointers;
    • delete corrects pointer strengths (See below)
Update(R: ptr, S: obj) = 
  WRC(S) = WRC(S) + 1
  delete(*R)
  *R = S
  weaken(*R)  // weaken pointer R
Delete
Decrement reference counts according to the following rules:
  • Delete weak reference: decrement weak reference counts
  • Delete strong reference
    • If SRC == 0 and WRC == 0: delete cells
    • If SRC == 0 and WRC > 0: correct pointer type
delete(T) = 
  if is_weak(T)                         // (i)
    WRC(T) -= 1
  else
    SRC(T) -= 1
    if SRC(T) == 0 and WRC(T) == 0      // (ii)
      for U in Children(T)
        delete(*U)
      free(T)
    else if SRC(T) == 0 and WRC(T) > 0  // (iii)
      invert_strength(T)
      for U in Children(T)
        suicide(T, *U)
      if SRC(T) == 0
        for U in Children(T)
          delete(*U)
        free(T)
Rule (i): Delete weak reference
  • Do not delete the cell because of the invariant "Active nodes are pointed by strong pointers".
WRC(T) -= 1
Rule (ii): Delete cells
SRC(T) -= 1
if SRC(T) == 0 and WRC(T) == 0
  for U in Children(T)
    delete(*U)
  free(T)
Rule (iii): Correct pointer type
SRC(T) -= 1
if SRC(T) == 0 and WRC(T) > 0
  // T is active
  // or cyclic garbage
  invert_strength(T)
  for U in Children(T)
    suicide(T, *U)
  if SRC(T) == 0
    for U in Children(T)
      delete(*U)
    free(T)
// Adjust pointer's strength
suicide(Start_node, S) = 
  if is_strong(S)
    if S == Start_node
      weaken(S)
    else if SRC(S) > 1
      // loop header candidate
      weaken(S)
    else for T in Children(S)
      suicide(Start_node, *T)

Question

Answer pointer's strengths after suicide(T, S)
Failure Example of Brownbridge's delete
delete discards cyclic structures incorrectly in the following example.
  • Because this algorithm does not take weak external references into consideration.
  • SRC(S) > 1

Solution: Salkild's suicide

Salkild's suicide

suicide should restart if it discovers may-active-loop nodes.
  • If suicide discover a cell with weak pointers but only one strong pointer;
    • Flip all pointers
    • Restart suicide from the cell
Failure Example
Salkild's suicide fails to terminate in the following example.

Solution: Marking scheme
Drawbacks: Poor efficiency, High space overhead(2 ref counts, mark-flag)

Weak pointer implementation

Strength-bit represents strong and weak pointer.
  • Each pointer and object have an strength-bit.
  • If pointer's bit == object's bit, the pointer is strong.
  • If pointer's bit != object's bit, the pointer is weak.
  • Strengthening or Weakening is done by flip pointer's strength bit.
  • Strengthening all weak pointers is done by flip object's strength bit.

3.5.4 Partial Mark-Sweep Algorithms

General Idea

Hybrid method of reference counting and mark-sweep.
  • Mark-sweep reclaims cycles.
  • Reference counting reclaims other cells.
How-to reclaim cycles (Trial Deletion):
  1. Find cells that may be cycle garbage
  2. Traverse these cells and decrement the reference counts
  3. If the reference counts are zero, these cells must be cycle garbage!!

Lins's Algorithm

  • Hybrid method
    • Reference counting reclaims uniquely referenced cells.
    • Mark-sweep reclaims shared cells.
      • Shared cells include cycles.
  • Reclaim cycles lazily
    • Same as mark-sweep
  • Traverse only cyclic garbage
    • Mark-sweep traverses all active cells
Colors (States)
Lins's algorithm colors cells with 4 color.
  • Black: active, liveness
  • Grey: marked (in mark-sweep)
  • White: garbages
  • Purple: may-be-members of cyclic garbages
    • Mark-sweep collector needs to traverse them.
    • Control set holds purple cells.
Deletion
Decrement reference counts and Paint cells:
  • If the reference count is zero
    • Free cells
  • If the reference count is not zero (shared)
    • Paint purple
    • Add the cell into control set
      • Control set holds purple cells.
      • Reclaimed by mark-sweep
delete(T) =
  RC(T) -= 1
  if RC(T) == 0
    color(T) = black          // for reuse (?)
    for U in Children(T)
      delete(*U)
    free(T)
  else if color(T) != purple  // shared
    if control_set is full
      gc_control_set()        // mark-sweep
    color(T) = purple
    push(T, control_set)
New
Allocate new cells painted with black (active).
New() = 
  newcell = allocate()
  RC(newcell) = 1
  color(newcell) = black
  return newcell
Update
Both arguments must be black (active).
  • If these cells are placed in control set, remove them from it.
    • These cells are not garbages.
Update(R: ptr, S: obj) = 
  RC(S) += 1
  color(R) = black
  color(S) = black
  delete(*R)
  *R = S
Three-phase Mark-Sweep
Mark-sweep reclaims garbage from control set:
  1. Get a purple cell from control set
  2. Mark: remove internal reference counts in subgraph
    • Garbages have no external references.
  3. Scan: determine whether cells are active or garbages
  4. Sweep: reclaim garbages
Algorithm:
gc_control_set() =
  S = pop(control_set)
  if color(S) == purple
    mark_gray(S)        // (i)
    scan(S)             // (ii)
    collect_white(S)    // (iii)
  else if control_set != empty
    gc_control_set()
Step-1: Mark (remove internal reference counts)
mark_grey(S) = 
  if color(S) != grey
    color = grey
    for T in Children(S)
      RC(*T) -= 1
      mark_grey(*T)
Step-2: Scan (determine whether cells are active or garbages)
scan(S) = 
  if color(S) == grey
    if RC(S) > 0
      scan_black(S)
    else  // RC(S) == 0
      // garbage
      color(S) = white
      for T in Children(S)
        scan(*T)
// Activate S and Children(S)
scan_black(S) = 
  color(S) = black
  for T in Children(S)
    RC(*T) += 1
    if color(*T) != black
      scan_black(*T)
Step-3: Sweep
collect_white(S) = 
  if color(S) == white
    color(S) = black
    for T in Children(S)
      collect_white(*T)
    free(S)

Question

mark_grey(A)
scan(A)
collect_white(A)
Control set strategies
  • Timing of mark-sweep
    • When control set is full (Lins)
    • When free-list is empty (same as mark-sweep)
    • When the size of free-list drops below a certain size
    • Or some heuristics
  • Data structure of control set
    • Stack
    • Queue
    • Heap-allocated list
    • Bitmap
  • Lins's approach works best when side-effects are rare.
    • Many nodes are uniquely referenced.
      • Reference count reclaims them.
    • Sub-graphs (cycles) are small.
      • Mark-sweep finishes quickly.

Christopher's Algorithm

Christopher's algorithm is a special case of Lin's algorithm.
  • Control set is the entire heap
    • Perform mark-sweep for the entire heap
    • Compared with Lins's, the following are not required:
      • Control set
      • Coloring
  • Features
    • Automatic memory management without any support from compilers
      • Because this algorithm doesn't use coloring.
    • No extra space required
      • Because this algorithm doesn't manage control set
    • Linear sweeping in the entire heap
      • Linear sweeping is faster than tracing graph.
      • Whether Christopher's or Lins's is better depends on:
        • Data structures in question
        • Virtual memory management
        • Heap size
        • etc
  • Three-phase Mark-sweep (Almost same as Lins's)
    1. Mark: remove internal reference counts
    2. Scan: determine active cells using external reference counts
    3. Sweep: reclaim any cells whose reference count is zero

3.6 Issues to consider

Reference counting have four weak points:
  1. The delay to free garbages recursively
  2. The high overhead imposed on pointer operations
  3. The space required for the reference counts
  4. The inability to reclaim garbage cycles

Nevertheless, many programmers often choose to use reference counting.
There may be four reasons for this choice.

Ease of implementation

Reference counting is easier to implement than tracing collectors because:
  • Pointer operations can be replaced by macros
    • e.g., smart pointer in object-oriented languages
  • We don't need to know all the roots of computation
    • The roots are important in tracing collectors

Control, optimization and correctness

Reference counting can provide programmers with total control.
  • Optimization
    • Programmer can apply reference counting only to the objects which manual deallocation is difficult
  • Trade-off between performance and correctness
    • Better performance if we optimize code
    • Worse correctness if we optimize code
      • More difficult maintenance

Garbage collection delay

The overheads of reference count are distributed throughout the computation.
reference count tracing
Pointer operation Slow Fast
Pause No Yes
Immediate reuse optimization
(e.g., destructive re-assignment)
Yes No

Space overhead

Space for a reference count is required in each heap object's header.
The relative space advantage is application dependent:

  • Reference count
    • High overhead for small cells
    • Low overhead for large cells
  • Mark-sweep
    • Some headroom for efficiency (to avoid thrashing)
  • Copying
    • Double address space of mark-sweep

3.7 Notes

  • Reference count was originally developed for Lisp by Gorge Collins[1960]
  • Many studies
    • Reduction of run-time cost
    • Limited size reference counts
    • Hardware reference count
    • Reclaiming cycles
  • Studies in parallel systems
    • Advantages
      • Synchronization between user program and garbage collection threads is not required.
    • Disadvantages
      • Locks on each object's reference count are required.
  • Studies in distributed systems
    • Advantages
      • Short pause time
    • Difficulties
      • Ensuring that count manipulation messages arrive in the right order.

Summary

In this chapter, we discuss solutions to reclaim cyclic data structures in reference count:
  1. Approaches for functional programming languages
  2. Grouping by programmer
  3. Weak pointers which represent cycle-closing pointers
  4. Hybrid approaches of reference count and mark-sweep

However, none of the approaches have been adopted for use by significant systems.
The simplest approach (reference counting + mark-sweep) may be the best.