Garbage Collection Ch.7-b
-- Generational Garbage Collection
Contents
Overview
Ch 7.5: Inter-generational pointers
Discuss how to address inter-generational pointers.
Ch 7.6: Non-copying generational garbage collection
Discuss about generational mark-sweep garbage collection.
Ch 7.7: Scheduling garbage collections
Discuss how to reduce pause time and increase efficiency.
Ch 7.8: Issues to consider
Discuss the weak points of generational garbage collectors.
7.5 Inter-generational pointers
Generational garbage collector must identify inter-generational pointers because:-
Minor garbage collection search only the youngest generation.
→ Minor garbage collection may scavenge cells that should be alive by mistake.
-
Advantage
- No costs to the mutator (user program)
- Linear scanning
-
Disadvantages:
- More scanning
- Worse locality
- The simplest approach reduce overheads by 1/3 compared with a non-generational two-space copying collector.
Goal of this section: Understand more precise methods of recording inter-generational pointers.
The write-barrier
Inter-generational pointers patterns
Two patterns of inter-generational pointers:-
Promotion to next generation of objects that contain pointers
- Minor scavenger can easily detect them.
-
Pointer stores in older generation
-
Pointer stores must be trapped and recorded.
→ Write-barrier
-
Pointer stores must be trapped and recorded.
Write-barrier implementation
Write-barrier can be implemented by several ways:-
Software approaches:
-
Compiler supports
- Record pointers before each pointer writes
- Compile-time analysis to reduce the number of pointer operation to be instrumented
-
Virtual memory protection mechanisms:
- Trap by accessing protected pages
- Page modification dirty bits
-
Compiler supports
-
Hardware approaches:
- Special purpose hardware
Evaluation factors
Three factors to be considered in software approaches:-
Write-barrier cost:
- How the cost to the mutator can be minimized
-
Memory overhead:
- The space overhead of recording pointer store
-
Collection-time cost:
- How efficiently old-young pointers can be identified at scavenge time
Percentage of pointer stores
In Lisp:-
Static frequency
- Pointer Load: 13~15 %
- Pointer Store: 4 %
- Inlining barrier may explode the size of codes.
-
Dynamic frequency
-
Store that need trap: 5 ~ 10 %
- Non-initializing pointer store
- 2/3 are stores to objects in the youngest generation
-
Other memory references: 90 ~ 95 %
- Pointer store to the root set
-
Alloc-and-Initializing store
- ∵ A new allocated object is always in the youngest generation.
-
Store that need trap: 5 ~ 10 %
Even though the percentage of pointer stores is small, optimizing the write barrier is critical.
Ex) 1 % pointer store × 10 additional instructions = 10 % overhead
Goal for the rest of this section:
More precise and efficient methods of trapping and recording inter-generational pointers.
Entry tables
Entry Table records inter-generational pointers:-
Entry: An indirection of an inter-generational pointer.
-
Add timing: When inter-generational pointers are stored
- If the pointer already points to an entry in entry table: Remove the old entry
-
Other approaches (my understanding):
-
Overwrite the old entry
-
+ An entry can be reused
→ Small size of entry table -
− More checking in write-barrier
→ High costs of write-barrier
-
e.g., If there are more than 2 generations:
- If gen. of old-contents == gen. of new-contents: Overwrite the entry
- If gen. of old-contenst != gen. of new-contents: Remove the entry
-
e.g., If there are more than 2 generations:
-
+ An entry can be reused
-
Overwrite the old entry
-
Points-to relations: (See example below)
- An old-pointer points-to the entry in entry table
- The entry in entry table points-to a young-object.
- At collection time: Search its entry table rather than every older generation.
-
Add timing: When inter-generational pointers are stored
Properties
-
Advantage
-
No need to search all older generations
- Only need to search its entry table
-
No need to search all older generations
-
Disadvantage
-
Duplicate references to a single object
→ The scanning cost becomes high.-
Reason for duplication:
- Multiple old-pointers may points-to a single object
Add a new entry at each old-pointer stores
-
Reason for duplication:
-
The cost of trapping pointer stores and following indirections is very high
- Trap a pointer store.
- The pointer is over-written to a new entry.
- The new entry points to the original target.
-
Duplicate references to a single object
Result: Most modern generational garbage collection schemes don't use Entry Table.
Question
Q. Answer the points-to relations after executing the following code.
1
2
3
4
p = &o;
p = &o;
p = &o;
q = &o;
Remembered sets
Remembered set records inter-generational pointers:-
A bit in each object header
- Represent whether the object is a member of remembered set.
-
Entry: Record the address of an old-object which has inter-generational pointers.
-
Add timing: A new entry is added if:
- An Old-to-young-pointers are stored, and
-
The object's header is not set
→ No duplicate entries
- At collection time: Scan objects with bit set
-
Add timing: A new entry is added if:
Question
Q. Answer the points-to relations after executing the following code.
1
2
3
4
p = &o;
p = &o;
p = &o;
q = &o;
Properties
-
Write-barrier costs:
- + No indirection of pointers
-
− High overhead of store checking
- Lower than costs of entry table but still high
-
Collection-time costs:
-
+ No duplicate entries
- Better than entry table
- − High overhead of scanning large objects
-
+ No duplicate entries
Slot recording
Remembered sets records the address of the slot(pointer) rather than the object:- Purpose: To avoid the high overhead of scanning large objects
-
Two problems:
- "Allow duplicate entries" or "A bit in each pointers"
-
Multiple entries if the object has multiple pointers
- → Increase remembered sets size
Appel's remembered set
- Write-barrier: Instructions emitted by compiler
-
Entries of remembered set:
-
A new entry is added at each pointer store
→ Duplicate entries
-
A new entry is added at each pointer store
-
Collection-time:
-
Filter the remembered set by:
- whether the pointer is inter-generational and
- whether the pointer is unique.
- Scavenge younger generation
-
Filter the remembered set by:
-
Target: Standard ML of New Jersey (SML/NJ)
-
Two benefits:
-
Non-initializing pointer stores are rare.
- → Rare inter-generational pointers
-
The policy of en masse promotion
- → All the entries can be discarded after each minor collection.
-
Non-initializing pointer stores are rare.
-
Two benefits:
Sequential Store Buffers
Sequential Store Buffer (SSB) temporarily store may-be-inter-generational pointers.-
Each entry records pointer stores
-
No check
- + Low overhead of the write-barrier
- − Duplicate entries
-
Slot recording
- + High-precision recording
-
No check
-
At SSB overflow or Collection-time:
- Construct the remembered set from SSB.
-
Circular Hash Table using linear hashing
- + No duplicate entries
- + Dynamically sized
- − High overhead of constructing remembered sets
Costs
-
Write-barrier costs: Low
- ∵ Only push to SSB
-
Collection-time costs: High
- ∵ Construction of remembered set from SSB
Question
p = &o;
p = &o;
p = &o;
q = &o;
Q1. Answer the SSB contents after executing the following code.
Q2. Answer the remembered set contents after constructing it.
Page marking with hardware support
Store recording vs. Marking
Store recording: Record each pointer store.- + Low memory overhead
- − Duplicate entries
- + No duplicate entries
-
− High memory overhead
- Solution: Coarser granularity to mark
- Garbage Collector Page Table(GCPT): Record a bit per physical page
- Ephemeral Space Reference Table(ESRT): Record bits per swapped-out page
Garbage Collector Page Table (GCPT)
GCPT records the physical pages where the pointer write was performed.-
Each entry has a bit of the corresponding physical pages
- The bit is set when a inter-generational pointer was stored in the page.
- The bit is set whether a young-old pointer or a old-young pointer
-
Handling of swapped-out pages
- Ephemeral Space Reference Table (ESRT)
-
Collection-time:
- Scan the entire physical pages with bit set.
Ephemeral Space Reference Table (ESRT)
ESRT records swapped-out pages and generations referenced by the page.-
Each entry has a bit-mask with a bit for each generation referenced by the swapped-out page
-
When a page was swapped out,
- its GCPT bit was cleared
- its ESRT entry was updated
-
When a page was swapped out,
Properties
-
Advantages:
-
Low overhead
- ∵ Hardware support
-
One generation can be scavenged independently
-
If one generation is scavenged,
- Scan the physical pages with bit set in GCPT
- Scan the swapped-out pages with corresponding bit set in ESRT
- ∵ GCPT and ESRT records young-old pointers too.
- Most other approaches must collect all younger generations when they collect an older one.
-
If one generation is scavenged,
-
Low overhead
-
Disadvantages:
- Need specialized hardware
-
Small page size required
- ∵ Scan the entire page with bit set
Page marking with virtual memory support
- 3600's hardware support: Not available on many machine
-
Virtual memory machinery + Dirty bit: Available on many machine
- → Dirty bit can be used as pointer write marker !?
Dirty bit in page marking GC
The purpose of the dirty bit in virtual memory and in GC are different:- Virtual memory: Dirty bit determine whether a page needs to be written back to disk
-
Page marking GC: Dirty bit determine whether there has been pointer writes in interval
- Garbage collection interval: Period from the last GC to next GC.
-
A copying GC wants to know about interval because:
- Objects may be moved(copied)
- The dirty bits have no sense
-
Dirty
bit:- Role: To track page modifications in interval
-
How to set:
- At start of interval: Clear all
Dirty
bits - At modification: Set by hardware
- At start of interval: Clear all
-
Old_Dirty
bit:-
Role: To save past
Dirty
bits-
To make virtual memory work
Original_Dirty
=Dirty
∨Old_Dirty
-
To make virtual memory work
-
How to set:
- Before
Dirty
bits are cleared: Save them toOld_Dirty
bits
- Before
-
Role: To save past
-
Dirty_on_Disk
bit:- Role: To track modifications of swapped-out pages
-
How to set:
- Before page swapping: Set by interception
Problems
-
Dirty bits handling
- This system requires a way to get and clear dirty bits.
-
Approaches
- Kernel modification: Add the new system calls
-
Write-protection of pages:
- Write-protect all pages
- Write-fault handler set a dirty bit for the page
- Clear all dirty bits after garbage collection
-
A coarse write-barrier
-
Large page size in modern systems
- → Need to scan the entire pages
-
Dirty bits records any modification of the page
- Not only generational-pointer stores
-
Large page size in modern systems
Card marking
Page markings are too coarse.
→ More fine-grained markings are desireable.
Word marking
Word marking records pointer writes at word granularity.
Naive approach
Prepare a very very big bitmap!!
→ High memory overhead
Sobalvarro's approach
Prepare a bitmap per segment:- Segment: 64 kilobytes address space
-
Modification Bit Table (MBT):
- A bitmap per segment
-
A single MBT is shared by segments which need not be recorded:
- The youngest generation
- Unscanned segments (No data)
- Only non-pointer data
-
Segment modification cache:
- A cache which records modified segments
- Garbage collectors only need to scan the segments in the cache.
Costs
-
Write-barrier
- 10 instructions
- 3 registers
Word marking's write-barrier is too fine-grained and has high overhead.
Page marking's write-barrier is too coarse-grained and has to scan the entire page.
→ The compromise between word and page marking is card marking.
Card marking
-
Card: Unit of division of heap
-
Card size is flexible.
- Smaller size: Reduce collection-time scanning
- Bigger size: Reduce card table size
-
Card size is flexible.
-
Card table: A byte map where each entry records cards with pointer writes
- Byte map can reduce the cost of write-barrier
- ∵ Bit manipulations require several instructions (e.g., shift)
Question
Q. Answer the contents of card table after executing the following code.
p = &o;
p = &o;
p = &o;
q = &o;
Costs
-
Write-barrier
- 3 instructions or
-
2 instructions by optimization
- Mark the object's address instead of the pointer address.
- Collection-time costs are increased.
-
Collection-time
-
Proportional to the number and size of cards marked
- + No duplicate entry
-
Proportional to the number and size of cards marked
-
Memory overhead: Small
- e.g., 128-byte card → + 1/128 overhead
Problem: Objects across multiple cards
Small card size or/and Big objects cause cards that cannot be scanned from the beginning.-
Crossing Map: A byte map which indicates the cards that cannot be scanned from the beginning
- Same size as the card table
-
At garbage collection time:
- Check whether the page is dirty from card table.
- Check whether the page is scannable from crossing map.
- If not, garbage collector skip back until it finds a scannable page.
-
Card size
- Larger card size: Increase page faults by skipping back
- Smaller card size: Increase the size of the card table
-
Large objects handling
- Large unscannable objects can be stored separately in a large object area
Remembered sets or cards?
Remembered Set vs. Card Marking
Remembered Set + SSB | Card marking | |
---|---|---|
Write-barrier cost | 2 instructions (SSB may overflow) |
2 instructions |
Granularity of recording | Pointer (Duplicate entries in SSB) |
Card |
Factors of collection-time cost | The number of stores performed between scavenges | The number of dirty pages (Dirty pages must be scanned at every scavenge) |
Hybrid approach
-
Write-barrier: Card marking
- + No overflow
- + No duplicate entry
-
Collection time: Construct remembered set from card table
-
+ Unchanged dirty pages are not scanned each time.
- ∵ Inter-generational pointers are stored in remembered set, and Card table are cleared.
-
+ Unchanged dirty pages are not scanned each time.
-
Result: A significant improvement over:
- Pure remembered sets
- Pure card marking with optimal card size
- Virtual memory techniques
7.6 Non-copying generational garbage collection
Zorn's Generational Mark-sweep Collector
-
Memory Layout
- Four generations
-
Each generation contains:
- A mark bitmap
-
A fixed-size-object region
- Contains objects of a single size
-
A variable-sized-object region
- Contains objects that don't fit in the fixed-size-object area
-
Collection Methods
-
A fixed-size-object region
- 3 times: Mark-and-deferred-sweep garbage collection
- 1 time: Copying with em masse promotion
-
A variable-sized-object region
- Two space copying collector
-
A fixed-size-object region
Results: Copying vs. Mark-Sweep generational garbage collection
Copying | Mark-sweep | |
---|---|---|
Total CPU overhead | Low | Middle |
Memory overhead | High (∵ Semi-space Copying) |
Low (- 30~40%) |
Cache miss rate | Middle (Compaction may give no advantage) |
Low (∵ Mark bitmap) |
Promotion rate | Low | High (∵ em masse promotion) |
The oldest generation handling
The oldest generations have to be treated differently in a copying collector:- Younger generation: Copy from younger to the next generation.
- The oldest generation: Two-space copying
-
Mark-sweep
- + No semi-space required
-
− Memory fragmentation
- → High cost of moving tenured objects to the oldest generation
- If tenured objects are rare, the cost may be acceptable.
7.7 Scheduling garbage collections
The aims of generational garbage collection:- Fewer pause time
- Fewer collection time
-
Good locality
- ∵ Collect only small part
→ Scheduling garbage collection
Good GC Timing
Two possible strategies:-
Hide collections
- At midnight
- When the machine is idle
-
When the program is not busy
- During compute-bound periods
-
When the user is presented to interact but does not do so
- e.g., Emacs text editor
-
Efficient collections
- The ends of compute-bound periods
- The ends of user interaction
-
At the local minima of stack height
- Approximate approach: Trigger a collection when the stack height drops below a certain point
-
Wilson's approach[1991]:
- Rewrite return address to specific routine address
-
The routine determine whether to call the collector based on:
- The amount of memory currently available
- The height of the stack
Key objects
Idea
Treat key objects as a representative of data structure:- Root node of tree
- Head of list
Handling Key Objects
At promotion:- Key objects: Retained in generational scheme
-
Other objects: Out of generational scheme
- Placed in Keyed area
- The collector does not scan them.
Problems
-
How key objects are to be identified
- Hints from programmer
- Direct references from stack
-
Dangling pointers by collecting clusters by mistake
- → Mature object spaces
Mature object spaces
Mature object space
Mature object space: Keyed area + Remembered set
-
Memory Layout
-
Multiple areas:
- Store one cluster
-
A remembered set per area
- Record references from outside the area
-
Multiple areas:
-
Collection
-
Timing: When the remembered set is empty
- When there are no references to objects in the area from outside
- Target: All objects in the area
-
Timing: When the remembered set is empty
Problem: Too large clusters to fit in an area
Solution: A dynamic sized approach using trains and carriages
Train + Carriage approach
Change fixed size areas to dynamic sized trains:-
Train: A linked list of carriages
- Contains a single cluster
-
Carriage: A fixed size area
- Space to store a cluster
- Remembered set to record references from outside the carriage
-
Collection of carriage
- Timing: At each collection
- Target: A single carriage
- Process: Four-step process (See below)
Four-step process
Choose a single From-carriage for collection.
Move any object that is referenced from outside into a fresh train (A
).
Scan the moved objects in the usual copying collector way (B
).
Move From-carriage objects referenced from other trains to those trains (P
).
Move From-carriage objects referenced from other carriages to the last carriage in this train (X
).
Scavenge From-carriage.
If there are no external references to the train, scavenge it.
Good Features
- Incremental collection: The number of bytes moved at each collection is bounded
-
Clustering and Compaction:
- Nodes of the same data structure are copied into the same train
- Less dependence: No support of hardware or virtual memory mechanism
7.8 Issues to consider
Generational garbage collection has been successful because of:- Short pause time for minor collection
-
Good locality
- ∵ By concentrating allocation and collection effort on the youngest gen.
-
Small overall cost of garbage collection
- ∵ By delaying collection of long-lived objects
However, generational garbage collection is not a universal panacea.
Bad cases
-
Large root sets
- Bad effect: Long pause time
-
Causes
- Very large programs
- Very large number of global variables pointing into the heap
- Very deep stacks
-
Solutions of deep stacks:
-
The write-barrier to local variables
- + The stack of root sets are reduced to pointers pointing into the heap
- − Increase the cost of the barrier
-
'High water mark' of stacks (in en masse promotion)
- + The stack of root sets are reduced to an area above the 'mark'
- − Need to replace some return address with the address of a 'mark-shifting' procedure
-
The write-barrier to local variables
-
Long object lifetimes
-
Bad effects
-
Increase promotion rates
- More frequent major collections
-
Worse spatial locality
- Poorer paging behavior
- Increase cache misses
-
Increase promotion rates
-
Bad effects
-
Frequent pointer writes into older generations
-
Bad effects
- Increase the cost of the write-barrier
-
Causes
-
Large arrays of heap
-
Long lifetimes
- Promoted to older generations
-
All slot updates must be trapped by the write-barrier
- Considered as root sets of the next minor collection
-
Long lifetimes
-
Large arrays of heap
-
Bad effects
Summary
Contents of this slide:-
Handling of inter-generational pointers
- Remembered set: Record inter-generational pointers
- Card marking: Mark cards where pointer writes were made
-
Generational mark-sweep GC
- Mark-seep may be better in some cases
- The oldest generation may be handled by mark-sweep
-
Scheduling GC
- Hide collection
- Efficient collection
-
Special handling of tree or list
-
Key objects: Root node of tree, Head of list
- Handled in generational scheme
-
Other objects
- Handled out of generational scheme
- Remembered set: To scavenge safely
- Train + Carriage: To handle large clusters
-
Key objects: Root node of tree, Head of list
Generational techniques have been very successful and are now in widespread in use.