DCSIMG
PDC 2009 Day 2: Future of Garbage Collection - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

PDC 2009 Day 2: Future of Garbage Collection

Patrick Dussud described the variants of garbage collection that exist in the world today – specifically, reference counting and tracing GC.

GCs are measured by the speed of the allocator, the GC time overhead, the pause time (latency), the working set, and multi-core scaling.

Patrick described the general architecture of the GC, and said that he wants to focus on GC policies and mechanisms and not on the actual implementation of root scanning, thread suspension, virtual memory management and similar traits.

Evolution of GC

GC v1 – workstation and server GC, concurrent GC, scalable GC for servers. Assumption is that on workstation GC there are few allocations but latency is critical; on servers there’s uniform concurrent workload and latency is tolerable.

GC v2 – no significant changes to policies or mechanisms, only implementation.

GC v3 – low latency mode API, GC notifications API for full GCs.

GC v4 – on workstation: background GC allows gen0 and gen1 GCs while in the middle of a concurrent gen2 GC (two collections occurring at the same time). The assumption that allocation activity is moderate. The effect is reduced pause time in full GC for all workloads.

Future Trends

There are more cores, more memory (especially considering 64-bit operating systems becoming more prevalent, so address spaces grow), and virtualization.

As far as the speed of the allocation and GC time are concerned, there are no worries with these trends. Pause time for large heaps, however, is very bad; and so is the working set for a given workload.

Background GC needs more feedback; the same approach can be brought to the server GC flavor.

Real-time (fully concurrent, no-stop) GC is not likely because of the effect of read barriers required to implement it. The performance hit is immense. [See my post on write barriers, which was my first post on this blog!]

The problem with working set is that the LOH isn’t compacting (for reasons of refraining from doing large copies) and it seems that some workloads would benefit from LOH compaction.

Background GC isn’t compacting either, and fragmentation occurs. In background GC, if there is a low-memory condition then a full blocking GC occurs. In the future it seems that partial compaction for background GC is likely – pick a badly fragmented area and compact it while doing an ephemeral GC.

Hardware assistance is another option – it was done in the 80s but was too slow (TI Explorer, Symbolic Ivory chips), and now that transistors are cheaper it seems that implementing read barriers (mechanisms to intercept memory dereferences) in hardware would give the most value to implement fully concurrent GC.

Operating system assistance is another direction, and the CLR GC team has cooperated with the Windows team in the past. There is good research on avoiding paged out object scanning, which is a major problem if a full GC occurs and there’s paging. This can be somewhat improved by having more integration with the Windows memory manager so that paging is delayed for GC segments, and enabling better virtualization support, e.g. idle VMs can use less working set. [See my Non Paged CLR Host open source project for more information.]

Some other ideas – because tracing GC is worthwhile when survival is low, but in gen2 it’s usually high. Therefore, it would be possible to maintain a (inaccurate) reference count on gen2 objects, allowing to reclaim some of them without performing tracing GC (there’s research literature on the subject).

[Shameless plug: If some of the discussion of GC was incomprehensible to you, the .NET Performance course at Sela (that I teach) will give you a comprehensive overview of the .NET tracing GC including all latest features incorporated in CLR 4.0, and a vast amount of additional topics.]

Comments

No Comments

Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: