Ghostscript/GhostPDL Memory Management Overview (Draft)


Ghostscript/GhostPDL memory management is based around what we term "allocators". Those are built around the gs_memory_t object.

There are different types of allocator (covered below) and there can be multiple instances of a given allocator type in existence in a Ghostscript/GhostPDL context.

This can be one of the more confusing aspects of Ghostscript's memory managment. From a Postscript point of view, however, some of those instances make some
sense - as covered below.

"Levels" of Allocator

Heap Allocator
Starting at the "lowest" level, we have the "heap allocator" (base/gsmalloc.c). The heap allocator is slightly special in that there should only be one instance
of it in existence for a given Ghostscript/GhostPDL context.

The heap allocator, as the name suggests, is the interface between Ghostscript/GhostPDL and the "system" memory management (usually the libc malloc/free API).
It is a relatively simple allocator which does no caching - it allocates blocks when asked for them, and frees them they are returned to it.

The heap allocator does track all the memory it allocates, thus guaranteeing that when it is shut down, all its memory can be freed.

The heap allocator can, optionally, be "wrapped" in a "retrying" memory (a build time option). That requires the calling code to supply callbacks that, in the
event that a memory request cannot be satisfied, the callback is called, allowing the calling code the option of releasing memory, and having the allocator retry
the allocation. That code is currently unused, and (likely) not tested for some time.


Clump Allocator
The clump allocator (base/gsalloc.c) is the allocator which does most of the "real" work in Ghostscript/GhostPDL. It is also the allocator to which the garbage
collector can attached when the Postscript interpreter is in use.

A little history: this allocator used to management memory in units called "chunks". Much more recently, a new allocator (covered later) was added
and named the "chunk" alocator (which used a different "chunk" for its units of memory). This caused considerable confusion, so after much
discussion, it was decided to change the name of the units used in gsalloc.c to "clump" and leave the newer allocator with the name "chunk".

As the name suggests, the clump allocator manages memory in clumps which generally default to 4k in size (the default is a run-time option). Clumps
can, however, be larger or smaller depending on the call-time requirement.

The clump allocator works by requesting a clump from the lower level allocator (normally, but not always the heap allocator - see the section on the chunk
allocator), and portioning out blocks of memory from the clump until it is full, then requests another and so on. When a clump is empty, it is released back to
the lower level allocator.

There are some subtleties about how the clump allocator works with the Postscript interpreter, in particular to do with save/restore operations, which will be
covered later.


Chunk Allocator
Lastly, there is the "chunk allocator" (gsmchunk.c) is a "wrapper", caching allocator. This was originally written to wrap around the heap allocator
because we had a customer on an embedded platform whose stdlib malloc/free were *extremely* slow, but we use it in other places now, too - like in the
fapi code, and the jpeg stream code. In the FAPI (Font API) code, it's done for performance. In the jpeg stream code, it protects us from possible memory
leaks in libjpeg.

The chunk allocator basically hangs onto all the memory it has requested from its "target" allocator, until specifically told to free it - usually, by being shut
down.

The chunk allocator should *never* be used to wrap a garbage collected allocator, except in *very* special circumstances which, so far, have never arisen. Basically,
the only way it can work safely is if the chunk allocator never lived past a return to the Postscript interpreter, because once control passes out to the Postscript
interpreter, a garbage collection run can happen, and that would result in memory in use by the chunk allocator potentially being freed, or moved around. That could be
changed, if the need ever arose, but it hasn't yet.

Multiple Allocator Instances
As is possibly implied by the description, the chunk allocator can have several instances in existence at a given time.

It is, perhaps, less immediately obvious that multiple clump allocators can exist, mainly for the use of the Postscript interpreter.

For Postscript, the following instances of the clump allocator are used:

1) Global (Postscript global VM, subject to garbage collection)
2) Global Stable (Postscript global VM *not* subject to save/restore, subject to garbage collection)
3) Local (Postscript local VM, subject to garbage collection)
4) Local Stable (Postscript local VM *not* subject to save/restore, subject to garbage collection)
5) System (system memory is Postscript memory, outside normal global/local, subject to garbage collection)
6) System Stable (basically, as above!)

There will also exist:
7) Thread Safe Memory (guaranteed to always be thread safe)
8) non_gc_memory (guaranteed non-garbage collected memory)
9) heap allocator

the final three may, or may not, point to the same underlying allocator.

All ultimately call the heap allocator to get memory from the system.

Each allocator can also be called to allocate "immovable" memory - a pointless distinction for non-gc allocators, but with gc allocators,
it prevents the garbage collector from relocating (covered later) the object (but is not very space efficient, so we try not to use it too much).

Important things to note from the above: Global and Local are subject to save/restore as required by the Red Book (so, Global only for the outermost
VM state), and garbage collection. Global Stable, Local Stable, System (and System Stable) are not subject to save/restore operators, but are
subject to garbage collection.

Objects allocated as "immovable" are still subject to the same save/restore and garbage collection rules as outline in the above paragraph.


Postscript save/restore
And this is where the first subtlety of the clump allocator appears. The basic design of the clump allocator is that, within a clump, memory is
allocated "bottom up": so we start at the lowest address in the clump, and work upwards (we use a hybrid first/best fit approach, with the
emphasis on first fit, for performance reasons). The exception to that rule is for Postscript strings, which are allocated "top down" in a
clump. The reason for that is streamlining garbage collection of strings.

This is only covering the memory management aspects of save/restore, there's stuff required to keep everything in sync, but this is already
expanding into quite a tome.

Now, as mentioned earlier, although we have a default clump size, that most clumps are created to that size, clumps can be (within reason) any
size. And the way we record the size, and track the free memory in a clump is by using a set of top/bottom and free top / free bottom pointers (rather
than a pointer and a size value).

The naive way to do save/restore from a memory management point of view would be to start a new clump on every save, and discard that and every subsequent
clump on a restore. But, there are enough files out that do multiple saves/restores with little or nothing between that allocating a whole 4k for each would be
terribly wasteful.

What we do is start a new clump (called in an "inner clump"), but without allocating any new memory. The top/bottom pointers for the new chunk are
the free top / free bottom pointers of the existing chunk, then we "hide" the current chunk, and make the new one current. When we do a restore, we
throw away all chunks newer than the "inner" clump, then throw away the inner clump, and un-hide the original clump on which the inner clump was based.
Basically, restoring the memory manager state to how it was when the save was done.

Obviously, that takes care of an awful lot of "garbage collection" of Postscript VM with fairly minimal effort.

Garbage Collector
The Ghostscript garbage collector (garbager) is fairly conventional mark & sweep garbager - of the type that Peter Deutsch rather pioneered.
Conceptually, it is very simple: it first traces through all the memory it knows about (starting from the "roots" that have been registered with
it), and marking every object that is accessible from from those roots. Once that is done, it goes back "sweeping" up all the non-marked objects,
and making them into free memory.

Where the GS garbager goes a bit further is that it implements "compacting" memory, too. In other words, when there are gaps between marked objects,
it will shift regular objects down, and string objects up to fill in those gaps, creating as much contiguous "in use memory", and as much contiguous free
memory as possible - this has a few benefits: other than the obvious (larger blocks of free memory making allocation more likely to succeed and quicker), it
makes subsequent mark operations in the garbager faster for that block of objects.

The complication of that is that, when you move an object, you have to go find all the places that reference it, and tell them it's moved. We only move objects within
a clump, we never move objects to another clump. There are three slightly different ways objects get marked, and this is where the fact
that Postscript strings are allocated top down in the clump becomes important. Postscript strings are allocated top down, and ideally contiguously in the clump - i.e. there is no
memory management metadata overhead for strings. So we don't need to mark every single string individually, but we can mark the range(s) of memory covered by one or more
strings - ideally, just one range, but there can be holes, necessitating multiple ranges. The main reason we can get away with this is because, although Postscript
strings are composite objects, they can only contain atomic objects - unlike dictionaries and arrays, which can contain other dictionaries and/or arrays, and
strings, and numbers, etc, etc. Strings can only contain 8 bit numbers, so we know we're not going to have to trace "deeper" than a string object.

Postscript ref objects can also be allocated in contiguous blocks, without each ref having its own memory manager overhead, so Postscript ref objects have their
own metadata, with size and attributes in the refs. The garbager mark is part of the ref attributes.

For objects other than strings and refs, included in the memory manager metadata is an entry for object marking and relocing. The garbager will use that.

Once we get past the "conceptually simple" bit, things get more complicated! The garbager isn't magic (although it has been described as black magic!), it can't magically know
about the internal structure of composite objects (structures). So, the way it works is that, when you create an object type, you also create a "struct descriptor", with
two, possibly three "methods" to be used by the garbager: enum_ptrs, reloc_ptrs and clear_marks. These are also stored in the memory management metadata as function
pointers. Objects defined with a struct descriptor should be allocated with the gs_alloc_struc(_immovable) call, which will correctly setup the memory management
metadata using that struct descriptor.


The emum_ptrs method is used, as the name suggests, to enumerate the pointers to other garbage collected objects in the object. Basically, when the garbager reaches an object,
it calls the object's enum_ptrs method (from the memory manager header) multiple times, with an incrementing index, and the enum_ptrs method returns the pointer for that index.
Generally, it's conventional to work "downwards": i.e. index 0 is the pointer closes to the top of the struct definition, and so on - but that's not an inherent requirement
(we'd appreciate sticking to it, however!). Once the pointer for that index has been returned to the garbager, it marks the target object as "in use", and enumerates any
pointers it contains.

The garbager is not recursive, as there is too much chance of object trees being deep enough to cause C stack overflow. Instead, it maintains a "mark stack", which is th
stack of objects traced through to get to the current level. There is an initial mark stack declared on the C stack, which is (I think) 100 levels deep, and when that overflows,
we'll allocate larger a mark stack from the heap. If we cannot allocate more from the heap, we make repeated calls using the 100 levels deep stack - that can be very slow, but will
succeed eventually.


Then we have the reloc_ptrs method. Unlike enum_ptrs, reloc_ptrs is only called once per object per garbager run, and doesn't take an index parameter. For each pointer in the structure,
it calls (generally via a macro) a garbager routine which takes the old pointer address, and returns the new one. The details of how the relocing is calculated is complicated and not
currently in my brain-cache, so if you get to needing to understand the details, I'll need to work through it in a debugger, and remind myself.


Finally, clear_marks is used very rarely, and it's not entirely clear to me under what circumstances it's required. If I ever have to know, I'll work it out. I think it is
for cases where non-Postscript ref objects contain pointers to Postscript refs.


Although not restricted to the garbage collector, struct descriptors can also, optionally include a "finalize" method, which the memory manager will call when instances of that
object type are freed (either explicitly, or with the garbager). These allow any extra memory cleanup required by that object to be carried out, before the object itself is destroyed.

There are a series of macros defined to make defining struct descriptors and the garbager methods easier - but in truth, they tend to make the code more opaque, and harder to
write, and even harder to understand.

An important point to note about the garbage collector: an clump allocator with the garbager attached can effectly request that the garbage collector be run. But it is up to the Postscript interpreter to actually initiate garbage collection.


-- Chris Liddell - 2017-03-31

Comments

TODO: example showing structure definition, structure descriptor and gc methods definitions.

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2017-03-31 - ChrisLiddell
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 2014 Artifex Software Inc