A Simple Garbage Collector For C++
A Simple Garbage Collector For C++
Throughout the history of computing, there has been an ongoing debate concerning
the best way to manage the use of dynamically allocated memory. Dynamically
allocated memory is memory that is obtained during runtime from the heap, which
is a region of free memory that is available for program use. The heap is also
commonly referred to as free store or dynamic memory. Dynamic allocation is
important because it enables a program to obtain, use, release, and then reuse
memory during execution. Because nearly all real-world programs use dynamic
allocation in some form, the way it is managed has a profound effect on the
architecture and performance of programs.
In general, there are two ways that dynamic memory is handled. The first is the
manual approach, in which the programmer must explicitly release unused memory
in order to make it available for reuse. The second relies on an automated
approach, commonly referred to as garbage collection, in which memory is
automatically recycled when it is no longer needed. There are advantages and
disadvantages to both approaches, and the favored strategy has shifted between the
two over time.
C++ uses the manual approach to managing dynamic memory. Garbage collection
is the mechanism employed by Java and C#. Given that Java and C# are newer
languages, the current trend in computer language design seems to be toward
garbage collection. This does not mean, however, that the C++ programmer is left
on the “wrong side of history.” Because of the power built into C++, it is possible
—even easy—to create a garbage collector for C++. Thus, the C++ programmer
can have the best of both worlds: manual control of dynamic allocation when
needed and automatic garbage collection when desired.
This chapter develops a complete garbage collection subsystem for C++. At the
outset, it is important to understand that the garbage collector does not replace C+
+’s built-in approach to dynamic allocation. Rather, it supplements it. Thus, both
the manual and garbage collection systems can be used within the same program.
Aside from being a useful (and fascinating) piece of code in itself, a garbage
collector was chosen for the first example in this book because it clearly shows the
unsurpassed power of C++. Through the use of template classes, operator
overloading, and C++’s inherent ability to handle the low-level elements upon
which the computer operates, such as memory addresses, it is possible to
transparently add a core feature to C++. For most other languages, changing the
way that dynamic allocation is handled would require a change to the compiler
itself. However, because of the unparalleled power that C++ gives the programmer,
this task can be accomplished at the source code level.
The garbage collector also shows how a new type can be defined and fully
integrated into the C++ programming environment. Such type extensibility is a key
component of C++, and it’s one that is often overlooked. Finally, the garbage
collector testifies to C++’s ability to “get close to the machine” because it
manipulates and manages pointers. Unlike some other languages which prevent
access to the low-level details, C++ lets the programmer get as close to the
hardware as necessary.
p = new some_object;
// ...
delete p;
In general, each use of new must be balanced by a matching delete. If delete is not
used, the memory is not released, even if that memory is no longer needed by your
program.
Garbage collection differs from the manual approach in one key way: it automates
the release of unused memory. Therefore, with garbage collection, dynamic
allocation is a one-step operation. For example, in Java and C#, memory is
allocated for use by new, but it is never explicitly freed by your program. Instead,
the garbage collector runs periodically, looking for pieces of memory to which no
other object points. When no other object points to a piece of dynamic memory, it
means that there is no program element using that memory. When it finds a piece
of unused memory, it frees it. Thus, in a garbage collection system, there is no
delete operator, nor a need for one, either.
At first glance, the inherent simplicity of garbage collection makes it seem like the
obvious choice for managing dynamic memory. In fact, one might question why
the manual method is used at all, especially by a language as sophisticated as C++.
However, in the case of dynamic allocation, first impressions prove deceptive
because both approaches involve a set of trade-offs. Which approach is most
appropriate is decided by the application. The following sections describe some of
the issues involved.
Other problems that can occur with C++’s manual approach include the premature
releasing of memory that is still in use, and the accidental freeing of the same
memory twice. Both of these errors can lead to serious trouble. Unfortunately, they
may not show any immediate symptoms, making them hard to find.
There are several different ways to implement garbage collection, each offering
different performance characteristics. However, all garbage collection systems
share a set of common attributes that can be compared against the manual
approach. The main advantages to garbage collection are simplicity and safety. In a
garbage collection environment, you explicitly allocate memory via new, but you
never explicitly free it. Instead, unused memory is automatically recycled. Thus, it
is not possible to forget to release an object or to release an object prematurely.
This simplifies programming and prevents an entire class of problems.
Furthermore, it is not possible to accidentally free dynamically allocated memory
twice. Thus, garbage collection provides an easy-to-use, error-free, reliable
solution to the memory management problem.
Unfortunately, the simplicity and safety of garbage collection come at a price. The
first cost is the overhead incurred by the garbage collection mechanism. All
garbage collection schemes consume some CPU cycles because the reclamation of
unused memory is not a cost-free process. This overhead does not occur with the
manual approach.
A second cost is loss of control over when an object is destroyed. Unlike the
manual approach, in which an object is destroyed (and its destructor called) at a
known point in time—when a delete statement is executed on that object—garbage
collection does not have such a hard and fast rule. Instead, when garbage collection
is used, an object is not destroyed until the collector runs and recycles the object,
which may not occur until some arbitrary time in the future. For example, the
collector might not run until the amount of free memory drops below a certain
point. Furthermore, it is not always possible to know the order in which objects
will be destroyed by the garbage collector. In some cases, the inability to know
precisely when an object is destroyed can cause trouble because it also means that
your program can’t know precisely when the destructor for a dynamically allocated
object is called.
For garbage collection systems that run as a background task, this loss of control
can escalate into a potentially more serious problem for some types of applications
because it introduces what is essentially nondeterministic behavior into a program.
A garbage collector that executes in the background reclaims unused memory at
times that are, for all practical purposes, unknowable. For example, the collector
will usually run only when free CPU time is available. Because this might vary
from one program run to the next, from one computer to next, or from one
operating system to the next, the precise point in program execution at which the
garbage collector executes is effectively nondeterministic. This is not a problem
for many programs, but it can cause havoc with real-time applications in which the
unexpected allocation of CPU cycles to the garbage collector could cause an event
to be missed.
Although opposites, the two approaches are not mutually exclusive. They can
coexist. Thus, it is possible for the C++ programmer to have access to both
approaches, choosing the proper method for the task at hand. All one needs to do is
create a garbage collector for C++, and this is the subject of the rest of this chapter.
Because C++ is a rich and powerful language, there are many different ways to
implement a garbage collector. One obvious, but limited, approach is to create a
garbage collector base class, which is then inherited by classes that want to use
garbage collection. This would enable you to implement garbage collection on a
class-by-class basis. This solution is, unfortunately, too narrow to be satisfying.
A better solution is one in which the garbage collector can be used with any type of
dynamically allocated object. To provide such a solution, the garbage collector
must: