Overview

Ch 7.5: Inter-generational pointers
Discuss how to address inter-generational pointers.

Ch 7.6: Non-copying generational garbage collection
Discuss about generational mark-sweep garbage collection.

Ch 7.7: Scheduling garbage collections
Discuss how to reduce pause time and increase efficiency.

Ch 7.8: Issues to consider
Discuss the weak points of generational garbage collectors.

7.5 Inter-generational pointers

Generational garbage collector must identify inter-generational pointers because:
  • Minor garbage collection search only the youngest generation.
    → Minor garbage collection may scavenge cells that should be alive by mistake.
The simplest approach: Scan all older generations
  • Advantage
    • No costs to the mutator (user program)
    • Linear scanning
  • Disadvantages:
    • More scanning
    • Worse locality
Shaw and Swanson suggest that:
  • The simplest approach reduce overheads by 1/3 compared with a non-generational two-space copying collector.

Goal of this section: Understand more precise methods of recording inter-generational pointers.

The write-barrier

Inter-generational pointers patterns

Two patterns of inter-generational pointers:
  • Promotion to next generation of objects that contain pointers
    • Minor scavenger can easily detect them.
  • Pointer stores in older generation
    • Pointer stores must be trapped and recorded.
      Write-barrier

Write-barrier implementation

Write-barrier can be implemented by several ways:
  • Software approaches:
    • Compiler supports
      • Record pointers before each pointer writes
      • Compile-time analysis to reduce the number of pointer operation to be instrumented
    • Virtual memory protection mechanisms:
      • Trap by accessing protected pages
      • Page modification dirty bits
  • Hardware approaches:
    • Special purpose hardware

Evaluation factors

Three factors to be considered in software approaches:
  • Write-barrier cost:
    • How the cost to the mutator can be minimized
  • Memory overhead:
    • The space overhead of recording pointer store
  • Collection-time cost:
    • How efficiently old-young pointers can be identified at scavenge time

Percentage of pointer stores

In Lisp:
  • Static frequency
    • Pointer Load: 13~15 %
    • Pointer Store: 4 %
    • Inlining barrier may explode the size of codes.
  • Dynamic frequency
    • Store that need trap: 5 ~ 10 %
      • Non-initializing pointer store
      • 2/3 are stores to objects in the youngest generation
    • Other memory references: 90 ~ 95 %
      • Pointer store to the root set
      • Alloc-and-Initializing store
        • ∵ A new allocated object is always in the youngest generation.

Even though the percentage of pointer stores is small, optimizing the write barrier is critical.
Ex) 1 % pointer store × 10 additional instructions = 10 % overhead

Goal for the rest of this section:
More precise and efficient methods of trapping and recording inter-generational pointers.

Entry tables

Entry Table records inter-generational pointers:
  • Entry: An indirection of an inter-generational pointer.
    • Add timing: When inter-generational pointers are stored
      • If the pointer already points to an entry in entry table: Remove the old entry
      • Other approaches (my understanding):
        • Overwrite the old entry
          • + An entry can be reused
            → Small size of entry table
          • − More checking in write-barrier → High costs of write-barrier
            • e.g., If there are more than 2 generations:
              • If gen. of old-contents == gen. of new-contents: Overwrite the entry
              • If gen. of old-contenst != gen. of new-contents: Remove the entry
    • Points-to relations: (See example below)
      • An old-pointer points-to the entry in entry table
      • The entry in entry table points-to a young-object.
    • At collection time: Search its entry table rather than every older generation.
Properties
  • Advantage
    • No need to search all older generations
      • Only need to search its entry table
  • Disadvantage
    • Duplicate references to a single object
      → The scanning cost becomes high.
      • Reason for duplication:
        • Multiple old-pointers may points-to a single object
        • Add a new entry at each old-pointer stores
    • The cost of trapping pointer stores and following indirections is very high
      • Trap a pointer store.
      • The pointer is over-written to a new entry.
      • The new entry points to the original target.

Result: Most modern generational garbage collection schemes don't use Entry Table.

Question

Q. Answer the points-to relations after executing the following code.
1
2
3
4
p = &o;
p = &o;
p = &o;
q = &o;

Remembered sets

Remembered set records inter-generational pointers:
  • A bit in each object header
    • Represent whether the object is a member of remembered set.
  • Entry: Record the address of an old-object which has inter-generational pointers.
    • Add timing: A new entry is added if:
      • An Old-to-young-pointers are stored, and
      • The object's header is not set
        → No duplicate entries
    • At collection time: Scan objects with bit set

Question

Q. Answer the points-to relations after executing the following code.
1
2
3
4
p = &o;
p = &o;
p = &o;
q = &o;

Properties

  • Write-barrier costs:
    • + No indirection of pointers
    • − High overhead of store checking
      • Lower than costs of entry table but still high
  • Collection-time costs:
    • + No duplicate entries
      • Better than entry table
    • − High overhead of scanning large objects
Slot recording
Remembered sets records the address of the slot(pointer) rather than the object:
  • Purpose: To avoid the high overhead of scanning large objects
  • Two problems:
    • "Allow duplicate entries" or "A bit in each pointers"
    • Multiple entries if the object has multiple pointers
      • → Increase remembered sets size
Appel's remembered set
  • Write-barrier: Instructions emitted by compiler
  • Entries of remembered set:
    • A new entry is added at each pointer store
      → Duplicate entries
  • Collection-time:
    1. Filter the remembered set by:
      • whether the pointer is inter-generational and
      • whether the pointer is unique.
    2. Scavenge younger generation
  • Target: Standard ML of New Jersey (SML/NJ)
    • Two benefits:
      • Non-initializing pointer stores are rare.
        • → Rare inter-generational pointers
      • The policy of en masse promotion
        • → All the entries can be discarded after each minor collection.

Sequential Store Buffers

Sequential Store Buffer (SSB) temporarily store may-be-inter-generational pointers.
  • Each entry records pointer stores
    • No check
      • + Low overhead of the write-barrier
      • − Duplicate entries
    • Slot recording
      • + High-precision recording
  • At SSB overflow or Collection-time:
    • Construct the remembered set from SSB.
Remembered set records unique inter-generational pointers.
  • Circular Hash Table using linear hashing
    • + No duplicate entries
    • + Dynamically sized
    • − High overhead of constructing remembered sets

Costs

  • Write-barrier costs: Low
    • ∵ Only push to SSB
  • Collection-time costs: High
    • ∵ Construction of remembered set from SSB

Question

p = &o;
p = &o;
p = &o;
q = &o;
Q1. Answer the SSB contents after executing the following code.
Q2. Answer the remembered set contents after constructing it.

Page marking with hardware support

Store recording vs. Marking

Store recording: Record each pointer store.
  • + Low memory overhead
  • − Duplicate entries
Marking:
  • + No duplicate entries
  • − High memory overhead
    • Solution: Coarser granularity to mark
The Symbolics 3600 (Lisp machine) provides page marking with hardware support:
  • Garbage Collector Page Table(GCPT): Record a bit per physical page
  • Ephemeral Space Reference Table(ESRT): Record bits per swapped-out page

Garbage Collector Page Table (GCPT)

GCPT records the physical pages where the pointer write was performed.
  • Each entry has a bit of the corresponding physical pages
    • The bit is set when a inter-generational pointer was stored in the page.
    • The bit is set whether a young-old pointer or a old-young pointer
  • Handling of swapped-out pages
    • Ephemeral Space Reference Table (ESRT)
  • Collection-time:
    • Scan the entire physical pages with bit set.

Ephemeral Space Reference Table (ESRT)

ESRT records swapped-out pages and generations referenced by the page.
  • Each entry has a bit-mask with a bit for each generation referenced by the swapped-out page
    • When a page was swapped out,
      1. its GCPT bit was cleared
      2. its ESRT entry was updated

Properties

  • Advantages:
    • Low overhead
      • ∵ Hardware support
    • One generation can be scavenged independently
      • If one generation is scavenged,
        1. Scan the physical pages with bit set in GCPT
        2. Scan the swapped-out pages with corresponding bit set in ESRT
      • ∵ GCPT and ESRT records young-old pointers too.
      • Most other approaches must collect all younger generations when they collect an older one.
  • Disadvantages:
    • Need specialized hardware
    • Small page size required
      • ∵ Scan the entire page with bit set

Page marking with virtual memory support

  • 3600's hardware support: Not available on many machine
  • Virtual memory machinery + Dirty bit: Available on many machine
    • Dirty bit can be used as pointer write marker !?

Dirty bit in page marking GC

The purpose of the dirty bit in virtual memory and in GC are different:
  • Virtual memory: Dirty bit determine whether a page needs to be written back to disk
  • Page marking GC: Dirty bit determine whether there has been pointer writes in interval
    • Garbage collection interval: Period from the last GC to next GC.
    • A copying GC wants to know about interval because:
      • Objects may be moved(copied)
      • The dirty bits have no sense
Page marking GC uses three dirty bits to archive the purpose:
  • Dirty bit:
    • Role: To track page modifications in interval
    • How to set:
      • At start of interval: Clear all Dirty bits
      • At modification: Set by hardware
  • Old_Dirty bit:
    • Role: To save past Dirty bits
      • To make virtual memory work
        • Original_Dirty = DirtyOld_Dirty
    • How to set:
      • Before Dirty bits are cleared: Save them to Old_Dirty bits
  • Dirty_on_Disk bit:
    • Role: To track modifications of swapped-out pages
    • How to set:
      • Before page swapping: Set by interception

Problems

  • Dirty bits handling
    • This system requires a way to get and clear dirty bits.
    • Approaches
      • Kernel modification: Add the new system calls
      • Write-protection of pages:
        1. Write-protect all pages
        2. Write-fault handler set a dirty bit for the page
        3. Clear all dirty bits after garbage collection
  • A coarse write-barrier
    • Large page size in modern systems
      • → Need to scan the entire pages
    • Dirty bits records any modification of the page
      • Not only generational-pointer stores

Card marking

Page markings are too coarse.
→ More fine-grained markings are desireable.

Word marking

Word marking records pointer writes at word granularity.

Naive approach

Prepare a very very big bitmap!!
→ High memory overhead

Sobalvarro's approach
Prepare a bitmap per segment:
  • Segment: 64 kilobytes address space
  • Modification Bit Table (MBT):
    • A bitmap per segment
    • A single MBT is shared by segments which need not be recorded:
      • The youngest generation
      • Unscanned segments (No data)
      • Only non-pointer data
  • Segment modification cache:
    • A cache which records modified segments
    • Garbage collectors only need to scan the segments in the cache.
Costs
  • Write-barrier
    • 10 instructions
    • 3 registers

Word marking's write-barrier is too fine-grained and has high overhead.
Page marking's write-barrier is too coarse-grained and has to scan the entire page.
→ The compromise between word and page marking is card marking.

Card marking

  • Card: Unit of division of heap
    • Card size is flexible.
      • Smaller size: Reduce collection-time scanning
      • Bigger size: Reduce card table size
  • Card table: A byte map where each entry records cards with pointer writes
    • Byte map can reduce the cost of write-barrier
    • ∵ Bit manipulations require several instructions (e.g., shift)

Question

Q. Answer the contents of card table after executing the following code.
p = &o;
p = &o;
p = &o;
q = &o;
Costs
  • Write-barrier
    • 3 instructions or
    • 2 instructions by optimization
      • Mark the object's address instead of the pointer address.
      • Collection-time costs are increased.
  • Collection-time
    • Proportional to the number and size of cards marked
      • + No duplicate entry
  • Memory overhead: Small
    • e.g., 128-byte card → + 1/128 overhead
Problem: Objects across multiple cards
Small card size or/and Big objects cause cards that cannot be scanned from the beginning.
  • Crossing Map: A byte map which indicates the cards that cannot be scanned from the beginning
    • Same size as the card table
    • At garbage collection time:
      1. Check whether the page is dirty from card table.
      2. Check whether the page is scannable from crossing map.
      3. If not, garbage collector skip back until it finds a scannable page.
Some considerations:
  • Card size
    • Larger card size: Increase page faults by skipping back
    • Smaller card size: Increase the size of the card table
  • Large objects handling
    • Large unscannable objects can be stored separately in a large object area

Remembered sets or cards?

Remembered Set vs. Card Marking

Remembered Set + SSB Card marking
Write-barrier cost 2 instructions
(SSB may overflow)
2 instructions
Granularity of recording Pointer
(Duplicate entries in SSB)
Card
Factors of collection-time cost The number of stores performed between scavenges The number of dirty pages
(Dirty pages must be scanned at every scavenge)

Hybrid approach

  • Write-barrier: Card marking
    • + No overflow
    • + No duplicate entry
  • Collection time: Construct remembered set from card table
    • + Unchanged dirty pages are not scanned each time.
      • ∵ Inter-generational pointers are stored in remembered set, and Card table are cleared.
  • Result: A significant improvement over:
    • Pure remembered sets
    • Pure card marking with optimal card size
    • Virtual memory techniques

7.6 Non-copying generational garbage collection

Zorn's Generational Mark-sweep Collector

  • Memory Layout
    • Four generations
    • Each generation contains:
      • A mark bitmap
      • A fixed-size-object region
        • Contains objects of a single size
      • A variable-sized-object region
        • Contains objects that don't fit in the fixed-size-object area
  • Collection Methods
    • A fixed-size-object region
      • 3 times: Mark-and-deferred-sweep garbage collection
      • 1 time: Copying with em masse promotion
    • A variable-sized-object region
      • Two space copying collector
Results: Copying vs. Mark-Sweep generational garbage collection
Copying Mark-sweep
Total CPU overhead Low Middle
Memory overhead High
(∵ Semi-space Copying)
Low
(- 30~40%)
Cache miss rate Middle
(Compaction may give no advantage)
Low
(∵ Mark bitmap)
Promotion rate Low High
(∵ em masse promotion)

The oldest generation handling

The oldest generations have to be treated differently in a copying collector:
  • Younger generation: Copy from younger to the next generation.
  • The oldest generation: Two-space copying
If the cost of two semi-spaces is too high, other methods are required:
  • Mark-sweep
    • + No semi-space required
    • − Memory fragmentation
      • → High cost of moving tenured objects to the oldest generation
      • If tenured objects are rare, the cost may be acceptable.

7.7 Scheduling garbage collections

The aims of generational garbage collection:
  • Fewer pause time
  • Fewer collection time
  • Good locality
    • ∵ Collect only small part

→ Scheduling garbage collection

Good GC Timing

Two possible strategies:
  • Hide collections
    • At midnight
    • When the machine is idle
    • When the program is not busy
      • During compute-bound periods
      • When the user is presented to interact but does not do so
        • e.g., Emacs text editor
  • Efficient collections
    • The ends of compute-bound periods
    • The ends of user interaction
    • At the local minima of stack height
      • Approximate approach: Trigger a collection when the stack height drops below a certain point
      • Wilson's approach[1991]:
        1. Rewrite return address to specific routine address
        2. The routine determine whether to call the collector based on:
          • The amount of memory currently available
          • The height of the stack

Key objects

Idea

Treat key objects as a representative of data structure:
  • Root node of tree
  • Head of list

Handling Key Objects

At promotion:
  • Key objects: Retained in generational scheme
  • Other objects: Out of generational scheme
    • Placed in Keyed area
    • The collector does not scan them.

Problems

  • How key objects are to be identified
    • Hints from programmer
    • Direct references from stack
  • Dangling pointers by collecting clusters by mistake
    • Mature object spaces

Mature object spaces

Mature object space

Mature object space: Keyed area + Remembered set

  • Memory Layout
    • Multiple areas:
      • Store one cluster
    • A remembered set per area
      • Record references from outside the area
  • Collection
    • Timing: When the remembered set is empty
      • When there are no references to objects in the area from outside
    • Target: All objects in the area

Problem: Too large clusters to fit in an area
Solution: A dynamic sized approach using trains and carriages

Train + Carriage approach

Change fixed size areas to dynamic sized trains:
  • Train: A linked list of carriages
    • Contains a single cluster
  • Carriage: A fixed size area
    • Space to store a cluster
    • Remembered set to record references from outside the carriage
  • Collection of carriage
    • Timing: At each collection
    • Target: A single carriage
    • Process: Four-step process (See below)
Four-step process

Choose a single From-carriage for collection.

Move any object that is referenced from outside into a fresh train (A).

Scan the moved objects in the usual copying collector way (B).

Move From-carriage objects referenced from other trains to those trains (P).
Move From-carriage objects referenced from other carriages to the last carriage in this train (X).

Scavenge From-carriage.

If there are no external references to the train, scavenge it.

Good Features
  • Incremental collection: The number of bytes moved at each collection is bounded
  • Clustering and Compaction:
    • Nodes of the same data structure are copied into the same train
  • Less dependence: No support of hardware or virtual memory mechanism

7.8 Issues to consider

Generational garbage collection has been successful because of:
  • Short pause time for minor collection
  • Good locality
    • ∵ By concentrating allocation and collection effort on the youngest gen.
  • Small overall cost of garbage collection
    • ∵ By delaying collection of long-lived objects

However, generational garbage collection is not a universal panacea.

Bad cases

  • Large root sets
    • Bad effect: Long pause time
    • Causes
      • Very large programs
      • Very large number of global variables pointing into the heap
      • Very deep stacks
    • Solutions of deep stacks:
      • The write-barrier to local variables
        • + The stack of root sets are reduced to pointers pointing into the heap
        • − Increase the cost of the barrier
      • 'High water mark' of stacks (in en masse promotion)
        • + The stack of root sets are reduced to an area above the 'mark'
        • − Need to replace some return address with the address of a 'mark-shifting' procedure
  • Long object lifetimes
    • Bad effects
      • Increase promotion rates
        • More frequent major collections
      • Worse spatial locality
        • Poorer paging behavior
        • Increase cache misses
  • Frequent pointer writes into older generations
    • Bad effects
      • Increase the cost of the write-barrier
    • Causes
      • Large arrays of heap
        • Long lifetimes
          • Promoted to older generations
        • All slot updates must be trapped by the write-barrier
          • Considered as root sets of the next minor collection

Summary

Contents of this slide:
  • Handling of inter-generational pointers
    • Remembered set: Record inter-generational pointers
    • Card marking: Mark cards where pointer writes were made
  • Generational mark-sweep GC
    • Mark-seep may be better in some cases
    • The oldest generation may be handled by mark-sweep
  • Scheduling GC
    • Hide collection
    • Efficient collection
  • Special handling of tree or list
    • Key objects: Root node of tree, Head of list
      • Handled in generational scheme
    • Other objects
      • Handled out of generational scheme
      • Remembered set: To scavenge safely
      • Train + Carriage: To handle large clusters

Generational techniques have been very successful and are now in widespread in use.