Clozure CL Documentation
Clozure CL Documentation
Introduction to Clozure CL
Introduction to Clozure CL
Clozure CL is a fast, mature, open source Common Lisp implementation that runs on
Linux, Mac OS X, FreeBSD, and Windows. Clozure CL was forked from Macintosh
Common Lisp (MCL) in 1998 and the development has been entirely separate since.
When it was forked from MCL in 1998, the new Lisp was named OpenMCL. Subsequently,
Clozure renamed its Lisp to Clozure CL, partly because its ancestor MCL has been released
as open source. Clozure thought it might be confusing for users if there were two
independent open-source projects with such similar names. The new name also reflects
Clozure CL's current status as the flagship product of Clozure Associates.
Furthermore, the new name refers to Clozure CL's ancestry: in its early years, MCL was
known as Coral Common Lisp, or “CCL”. For years the package that contains most of
Clozure CL's implementation-specific symbols has been named “CCL”, an acronym that
once stood for the name of the Lisp product. It seems fitting that “CCL” once again stands
for the name of the product.
Some commands and source files may still refer to “OpenMCL” instead of Clozure CL.
Clozure CL compiles to native code and supports multithreading using native OS threads.
It includes a foreign-function interface, and supports both Lisp code that calls external
code, and external code that calls Lisp code. Clozure CL can create standalone executables
on all supported platforms.
On Mac OS X, Clozure CL supports building GUI applications that use OS X's native Cocoa
frameworks, and the OS X distributions include an IDE written with Cocoa, and distributed
with complete sources.
Fast execution speed, competitive with other Common Lisp implementations on most
benchmarks.
Full native OS threads on all platforms. Threads are automatically distributed across
multiple cores. The API includes support for shared memory, locking, and blocking
for OS operations such as I/O.
An IDE on Mac OS X, fully integrated with the Macintosh window system and User
Interface standards.
Excellent debugging facilities. The names of all local variables are available in a
backtrace.
Many extensions including: files mapped to Common Lisp vectors for fast file I/O;
thread-local hash tables and streams to eliminate locking overhead; cons hashing
support; and much more
Although it's an open-source project, available free of charge under a liberal license,
Clozure CL is also a fully-supported product of Clozure Associates. Clozure continues to
extend, improve, and develop Clozure CL in response to customer and user needs, and
offers full support and development services for Clozure CL.
Installing
Running Clozure CL
The Init File
Command Line Options
Running Clozure CL as a Mac Application
Installing
After following the download instructions, you should have a directory on your system
named ccl. This directory is called the ccl directory.
Clozure CL is made up of two parts: the lisp kernel, and a heap image. When the lisp kernel
starts up, it locates the heap image, maps it into memory, and starts running the lisp code
contained in the image. In the ccl directory, you will find pre-built lisp kernel executables
and heap images for your platform.
The names used for the lisp kernel on the various platforms are listed in the table below.
The heap images have the same basename as the corresponding lisp kernel, but a .image
suffix. Thus, the image name for armcl would be armcl.image.
Platform Kernel
Linux x86, x86-64 lx86cl, lx86cl64
By default, the lisp kernel will look for a heap image with an appropriate name in the same
directory that the lisp kernel itself is in. Thus, it is possible to start Clozure CL simply by
running ./lx86cl64 (or whatever the appropriate binary is called) directly from the ccl
directory.
If the lisp kernel binary does not work, you may need to recompile it on your local system.
See Building the Kernel.
Running Clozure CL
If you always run Clozure CL from Emacs, it is sufficient to use the full pathname of the lisp
kernel binary directly. That is, in your Emacs init file, you could write something like
(setq inferior-lisp-program "/path/to/ccl/lx86cl64") or make the
equivalent changes to slime-lisp-implementations.
It can also be handy to run Clozure CL straight from a terminal prompt. In the scripts/
directory of the ccl directory, there are two files named ccl and ccl64. Copy these files
into /usr/local/bin or some other directory that is on your path, and then edit them so
that the value of CCL_DEFAULT_DIRECTORY is your ccl directory. You can then start up
the lisp by typing ccl or ccl64.
You may wish to install scripts/ccl64 with the name ccl if you use the 64-bit lisp
more. If you want the 32-bit lisp to be available as well, you can install scripts/ccl as
ccl32. Note that there is nothing magical about these scripts. You should feel free to edit
them as desired.
By default, Clozure CL will look for a file named ccl-init.lisp in your home directory,
and load it upon startup. On Unix systems, it will also look for .ccl-init.lisp.
If you wish, you can compile your init file, and Clozure CL will load the compiled version if
it is newer than the corresponding source file. In other words, Clozure CL loads your init
file with (load "home:ccl-init").
Because the init file is loaded the same way as normal Lisp code is, you can put anything
you want in it. For example, you can change the working directory, and load code that you
use frequently.
To suppress the loading of this init-file, invoke Clozure CL with the --no-init (or -n)
option.
When using Clozure CL from the command line, the following options may be used to
modify its behavior. The exact set of Clozure CL command-line arguments may vary per
platform and may change over time. The definitive list of command line options may be
retrieved by using the --help option.
-h, --help
Provides a definitive (if somewhat terse) summary of the command line options
accepted by the Clozure CL implementation and then exits.
-V, --version
Prints the version of Clozure CL then exits. The version string is the same value that
is returned by lisp-implementation-version.
-n, --no-init
If this option is given, the init file is not loaded. This is useful if Clozure CL is being
invoked by a shell script that should not be affected by whatever customizations a
user might have in place.
An expression is read (via read-from-string) from the string form and evaluated.
If form contains shell metacharacters, it may be necessary to escape or quote them to
prevent the shell from interpreting them.
-T, --set-lisp-heap-gc-threshold n
-Q, --quiet
Suppresses printing of heralds and prompts when the --batch command line option
is specified.
-R, --heap-reserve n
Reserves n bytes for heap expansion. The default depends on the particular platform
in use (see Heap space allocation).
-S, --stack-size n
Sets the size of the initial control stack to n (see Thread Stack Sizes).
-Z, --thread-stack-size n
Sets the size of the first thread's stack to n (see Thread Stack Sizes).
-b, --batch
--no-sigtrap
Specifies the image name for the kernel to load. Defaults to the kernel name with the
suffix .image appended.
The --load and --eval options can each be provided multiple times. They're executed in
the order specified on the command line, after the init file (if there is one) is loaded and
before the toplevel read-eval-print loop is entered.
Finally, any arguments following the pseudo-argument -- are not processed, and are made
available to Lisp as the value of *unprocessed-command-line-arguments*.
If you want to run Clozure CL as a double-clickable Macintosh application, you can do that.
A version of Clozure CL is available from the Mac App Store if you would like to obtain it
from there. Alternatively you can build the IDE yourself: please see Building the IDE.
Currently, it's not possible to use the Mac App Store version of Clozure CL as a
command-line program.
Building Definitions
Reasons for Building
Kernel Build Prerequisites
Building Everything
Building the Kernel
Building the Heap Image
Compile Lisp Source Code
Create a Bootstrapping Image
Building a full image from a bootstrapping image
Building Definitions
The following terms are used in subsequent sections; it may be helpful to refer to these
definitions.
A fasl file is the file produced by compile-file. These files store the machine code
associated with function definitions and the external representation of other lisp objects in
a compact, machine-readable form. Short for “FASt Loading”. Clozure CL uses different
pathname types (extensions) to name fasl files on different platforms; see Platform-specific
filename conventions for the details.
The lisp kernel is a C program with a fair amount of platform-specific assembly language
code. The lisp kernel provides runtime support for lisp code, such as garbage collection,
memory allocation, exception handling, and so forth. When the lisp kernel starts, it maps
the heap image into memory and transfers control to compiled lisp code that the image
contains.
A heap image is a file that can be quickly mapped into a process's address space.
Conceptually, it's not too different from an executable file or shared library in the OS's
native format (ELF or Mach-O/dyld format); for historical reasons, Clozure CL's own heap
images are in their own (fairly simple) format. A full heap image contains all of the code
and data that comprise Clozure CL. The default image file name is the name of the lisp
kernel with a .image suffix. See Platform-specific filename conventions.
A bootstrapping image (see Create a Bootstrapping Image) is a minimal heap image used in
the process of building Clozure CL itself. The bootstrapping image contains just enough
code to load the rest of Clozure CL from fasl files. The function rebuild-ccl
automatically creates a bootstrepping image as part of its work.
Each supported platform (and possibly a few as-yet-unsupported ones) has a uniquely
named subdirectory of ccl/lisp-kernel/; each such kernel build directory contains a
Makefile and may contain some auxiliary files (linker scripts, etc.) that are used to build
the lisp kernel on a particular platform. The platform-specific name of the kernel build
directory is described in Platform-specific filename conventions.
At a given time, there are generally two versions of Clozure CL that you might want to use
(and therefore might want to build from source).
The first of these is the current release branch. Fixes for serious bugs are sometimes
checked into the release branch, and you might want to update from Subversion and
You may also be interested in building the development version of Clozure CL, which is
often called the trunk. The trunk may contain both interesting new features and interesting
new bugs. See http://trac.clozure.com/ccl/wiki for information about how to check out a
copy of the trunk.
In order to build the lisp kernel, you must have installed a C compiler and its associated
tools (such as as and ld). Additionally, the lisp kernel build process uses m4; this program
is often not installed by default, so make sure you have it. The lisp kernel makefiles
generally assume that you are using GNU make.
Building Everything
$ ccl -n
? (ccl:rebuild-ccl :full t)
Deletes all fasl files and other object files in the ccl directory tree
Does (compile-ccl t) in the running lisp, to generate fasl files from the lisp
sources.
Does (xload-level-0 :force) in the running lisp. This compiles the lisp files in
the ccl:level-0; directory and then creates a special bootstrapping image from
the compiled fasl files.
Runs an external process that runs make in the current platform's kernel build
directory to create a new kernel. This step can only work if the C compiler and related
tools are installed; see Kernel Build Prerequisites.
Runs another external process, which causes the newly compiled lisp kernel to load
the new bootstrapping image. The bootstrapping image then loads the rest of the fasl
files and a new copy of the platform's full heap image is then saved.
When all goes well, this all happen without user intervention and with some simple
progress messages. If anything goes wrong during execution of either of the external
processes, the process output is displayed as part of a lisp error message.
rebuild-ccl is essentially just a short cut for running all the individual steps involved in
rebuilding the system. You can also execute these steps individually, as described below.
Rebuilding the lisp kernel is straightfoward. Consult the table Platform-specific filename
conventions to determine the name of the lisp kernel build directory you want. Then,
change to that directory and say make. Suppose you wanted to build the lisp kernel for
64-bit FreeBSD/x86. We find that the name of the lisp kernel build directory is
freebsdx8664, so we do the following:
$ cd ccl/lisp-kernel/freebsdx8664
$ make clean
$ make
The most common reason that the build fails is because m4 is not installed. If you see m4:
command not found, then you should install m4.
On Mac OS X, make sure that you have installed the command-line developer tools with
xcode-select --install. If you get an error message saying that the file
sys/signal.h cannot be found, this is a sign that you need to do this. You need to do this
even if you have Xcode already installed.
Typically, rebuild-ccl is used to rebuild all of Clozure CL. In special cases, you might
want to exercise more control over how the heap image is built. These cases typically arise
only when developing Clozure CL itself.
3. Start up the lisp kernel and tell it to load the bootstrapping image you just created
Calling:
? (ccl:compile-ccl)
at the lisp prompt compiles any fasl files that are out-of-date with respect to the
corresponding lisp sources; (ccl:compile-ccl t) forces recompilation.
ccl:compile-ccl reloads newly-compiled versions of some files; ccl:xcompile-ccl
is analogous, but skips this reloading step.
Unless there are bootstrapping considerations involved, it usually doesn't matter whether
these files are reloaded after they're recompiled.
Calling compile-ccl or xcompile-ccl in an environment where fasl files don't yet exist
may produce warnings to that effect whenever files are required during compilation;
those warnings can be safely ignored. Depending on the maturity of the Clozure CL release,
calling compile-ccl or xcompile-ccl may also produce several warnings about
undefined functions, etc. They should be cleaned up at some point.
The bootstrapping image isn't provided in Clozure CL distributions. It can be built from the
source code provided in distributions (using a lisp image and kernel provided in those
distributions) using the procedure described below.
The bootstrapping image is built by invoking a special utility inside a running Clozure CL
heap image to load files contained in the ccl/level-0 directory. The bootstrapping
image loads several dozen fasl files. After it's done so, it saves a heap image via
save-application. This process is called cross-dumping.
Given a source distribution, a lisp kernel, and a heap image, one can produce a
bootstrapping image by first invoking Clozure CL from the shell:
$ ccl
? (ccl:xload-level-0)
This function compiles the lisp sources in the ccl/level-0 directory as needed, and then
loads the resulting fasl files into a simulated lisp heap contained in data structures inside
the running lisp. It then writes this data to disk as a bootstrapping image and displays the
pathname of the newly-written image on the terminal.
xload-level-0 should be called whenever your existing boot image is out-of-date with
respect to the source files in ccl:level-0;.
To build a full image from a bootstrapping image, just invoke the kernel and tell it to load
the bootstrapping image (as reported by xload-level-0. For example, suppose you are
using 64-bit Mac OS X:
Other platoforms use analogous steps: use the appropriate platform-specific name for the
lisp kernel, and use the name of boot image as reported by xload-level-0.
This process will load a few dozen fasl files, printing a message as each file is loaded. If all
of these files successfully load, the lisp will print a prompt. You should be able to do
essentially everything in that environment that you can in the environment provided by a
full heap image. If everything went well, you can save that image using
save-application:
? (ccl:save-application "new.image")
The name new.image can be whatever you want. You may wish to use the default image
name for your platform; see Platform-specific filename conventions.
If things go wrong in the early stages of the loading sequence, errors are often difficult to
debug; until a fair amount of code (CLOS, the CL condition system, streams, the reader, the
read-eval-print loop) is loaded, it's generally not possible for the lisp to report an error.
Errors that occur during these early stages (“the cold load”) sometimes cause the lisp
kernel debugger to be invoked; it's primitive, but can sometimes help one to get oriented.
Introduction
Memory-mapped Files
Static Variables
Saving Applications
Concatenating FASL Files
Floating Point Numbers
Code Coverage
Overview
Limitations
Usage
Functions and Variables
Interpreting Code Coloring
Other Extensions
Introduction
Memory-mapped Files
In release 1.2 and later, Clozure CL supports memory-mapped files. On operating systems
that support memory-mapped files (including Mac OS X, Linux, and FreeBSD), the
operating system can arrange for a range of virtual memory addresses to refer to the
contents of an open file. As long as the file remains open, programs can read values from
the file by reading addresses in the mapped range.
Using memory-mapped files may in some cases be more efficient than reading the contents
of a file into a data structure in memory.
Without memory-mapped files, a common idiom for reading the contents of files might be
something like this:
Using a memory-mapped files has a result that is the same in that, like the above example,
it returns a vector whose contents are the same as the contents of the file. It differs in that
the above example creates a new vector in memory and copies the file's contents into it;
using a memory-mapped file instead arranges for the vector's elements to point to the file's
contents on disk directly, without copying them into memory first.
The map-file-to-ivector function tries to open the file at pathname for reading. If
successful, the function maps the file's contents to a range of virtual addresses. If
successful, it returns a read-only vector whose element-type is given by element-type, and
whose contents are the contents of the memory-mapped file.
pathname
element-type
Because of alignment issues, the mapped file's contents start a few bytes (4 bytes on 32-bit
platforms, 8 bytes on 64-bit platforms) into the vector. The displaced array returned by
map-file-to-ivector hides this overhead, but it's usually more efficient to operate on
the underlying simple 1-dimensional array. Given a displaced array (like the value returned
by map-file-to-ivector), the function array-displacement returns the underlying
array and the displacement index in elements.
Currently, Clozure CL supports only read operations on memory-mapped files. If you try to
change the contents of an array returned by map-file-to-ivector, Clozure CL signals a
memory error.
Static Variables
Clozure CL supports the definition of static variables, whose values are the same across
threads, and which may not be dynamically bound. The value of a static variable is thus the
same across all threads; changing the value in one thread changes it for all threads.
Attempting to dynamically rebind a static variable (for instance, by using LET, or using the
variable name as a parameter in a LAMBDA form) signals an error. Static variables are
shared global resources; a dynamic binding is private to a single thread.
Static variables therefore provide a simple way to share mutable state across threads. They
also provide a simple way to introduce race conditions and obscure bugs into your code,
since every thread reads and writes the same instance of a given static variable. You must
take care, therefore, in how you change the values of static variables, and use normal
multithreaded programming techniques, such as locks or semaphores, to protect against
race conditions.
In Clozure CL, access to a static variable is usually faster than access to a special variable
that has not been declared static.
Proclaims the variable special, assigns the variable the supplied value, and assigns the
docstring to the variable's variable documentation. Marks the variable static, preventing
any attempt to dynamically rebind it. Any attempt to dynamically rebind var signals an
error.
Saving Applications
Clozure CL consists of a small executable called the lisp kernel, which implements the very
lowest level features of the Lisp system, and a heap image, which contains the in-memory
representation of most of the Lisp system, including functions, data structures, variables,
and so on. When you start Clozure CL, you are launching the kernel, which then locates
and reads an image file, restoring the archived image in memory. Once the image is fully
restored, the Lisp system is running.
Using save-application, you can create a file that contains a modified image, one that
includes any changes you've made to the running Lisp system. If you later pass your image
file to the Clozure CL kernel as a command-line parameter, it then loads your image file
instead of its default one, and Clozure CL starts up with your modifications.
If this scenario seems to you like a convenient way to create an application, that's just as
intended. You can create an application by modifying the running Lisp until it does what
you want, then use save-application to preserve your changes and later load them for
use.
In fact, you can go further than that. You can replace Clozure CL's toplevel function with
your own, and then, when the image is loaded, the Lisp system immediately performs your
tasks rather than the default tasks that make it a Lisp development system. If you save an
image in which you have done this, the resulting Lisp system is your tool rather than a Lisp
development system.
You can go a step further still. You can tell save-application to prepend the Lisp
kernel to the image file. Doing this makes the resulting image into a self-contained
executable binary. When you run the resulting file, the Lisp kernel immediately loads the
attached image file and runs your saved system. The Lisp system that starts up can have
any behavior you choose. It can be a Lisp development system, but with your
customizations; or it can immediately perform some task of your design, making it a
specialized tool rather than a general development system.
In other words, you can develop any application you like by interactively modifying Clozure
CL until it does what you want, then using save-application to preserve your changes
in an executable image.
Also on Mac OS X, Clozure CL supports an object type called macptr, which is the type of
pointers into the foreign (Mac OS) heap. Examples of commonly-user macptr objects are
Cocoa windows and other dynamically-allocated Mac OS system objects.
Because a macptr object is a pointer into a foreign heap that exists for the lifetime of the
running Lisp process, and because a saved image is used by loading it into a brand new
Lisp process, saved macptr objects cannot be relied on to point to the same things when
reconstituted from a saved image. In fact, a restored macptr object might point to
anything at all-for example an arbitrary location in the middle of a block of code, or a
completely nonexistent virtual address.
object that points to the address 0 is not converted, because address 0 can always be relied
upon to refer to the same thing.
filename
The pathname of the file to be created when Clozure CL saves the application.
toplevel-function
If this parameter is not supplied, Clozure CL uses its default toplevel. The default
toplevel runs the read-eval-print loop.
init-file
The pathname of a Lisp file to be loaded when the image starts up. You can place
initialization expressions in this file, and use it to customize the behavior of the Lisp
system when it starts up.
error-handler
The error-handling mode for the saved image. The supplied value determines what
happens when an error is not handled by the saved image. Valid values are :quit
(Lisp exits with an error message); :quit-quietly (Lisp exits without an error
message); or :listener (Lisp enters a break loop, enabling you to debug the
problem by interacting in a listener). If you don't supply this parameter, the saved
image uses the default error handler (:listener).
application-class
The CLOS class that represents the saved Lisp application. Normally you don't need
to supply this parameter; save-application uses the class ccl:lisp-
development-system. In some cases you may choose to create a custom
application class; in that case, pass the name of the class as the value for this
parameter.
clear-clos-caches
If true, ensures that CLOS caches are emptied before saving the image. Normally you
don't need to supply this parameter, but if for some reason you want to ensure the
CLOS caches are clear when the image starts up, you can pass any true value.
purify
When true, calls (in effect) purify before saving the heap image. This moves certain
objects that are unlikely to become garbage to a special memory area that is not
scanned by the GC (since it is expected that the GC wouldn't find anything to collect).
impurify
If true, calls (in effect) impurify before saving the heap image. (If both :impurify
and :purify are true, first impurify is done, and then purify.)
impurify moves objects in certain special memory areas into the regular dynamic
heap, where they will be scanned by the GC.
mode
prepend-kernel
Specifies the file to prepend to the saved heap image. A value of t means to prepend
the lisp kernel binary that the lisp started with. Otherwise, the value of :prepend-
kernel should be a pathname designator for the file to be prepended.
If the prepended file is execuatable, its execute mode bits will be copied to the output
file.
This argument can be used to prepend any kind of file to the saved heap image. This
can be useful in some special cases.
native
If true, saves the image as a native (ELF, Mach-O, PE) shared library. (On platforms
where this isn't yet supported, a warning is issued and the option is ignored.)
*save-exit-functions* [Variable]
This variable contains a list of 0-argument functions that will be called before saving a
heap image. Users may add functions to this list as needed.
*restore-lisp-functions* [Variable]
This variable contains a list of 0-argument functions that will be called after restoring a
saved heap image. Users may add functions to this list as needed.
*lisp-cleanup-functions* [Variable]
This variable contains a list of 0-argument functions that will be called before quitting the
lisp.
Note that save-application quits the lisp, so any functions on this list will be invoked
*lisp-startup-functions* [Variable]
This variable contains a list of 0-argument functions that will be called after starting the
lisp.
Multiple fasl files can be concatenated into a single file. The single file might be easier to
distribute or install, and loading it may be slightly faster than loading the individual files
(since it avoids the overhead of opening and closing each file in succession).
This function reads the fasl files specified by the list fasl-files and combines them into a
single fasl file named output-file. The :if-exists keyword argument is interpreted as in
the standard function open.
Loading the concatenated fasl file has the same effect as loading the invidual input fasl files
in the specified order.
The pathname-type of the output file and of each input file defaults to the current
platform's fasl file type (see Platform-specific filename conventions). If any of the input
files has a different type an error will be signaled, but fasl-concatenate doesn't
otherwise try too hard to verify that the input files are real fasl files for the current
platform.
In Clozure CL, the Common Lisp types short-float and single-float are implemented as
IEEE single precision values; double-float and long-float are IEEE double precision values.
On 64-bit platforms, single-floats are immediate values (like fixnums and characters).
Floating-point exceptions are generally enabled and detected. By default, threads start up
with overflow, division-by-zero, and invalid enabled, and the rounding mode is set to
nearest. The functions set-fpu-mode and get-fpu-mode provide user control over
floating-point behavior.
Return the state of exception-enable and rounding-mode control flags for the current
thread.
When called without the optional mode argument, this function returns a plist of
keyword/value pairs which describe the floating point exception-enable and
rounding-mode flags for the current thread.
rounding-mode
Set the state of exception-enable and rounding-mode control flags for the current thread.
rounding-mode
Sets the current thread's exception-enable and rounding-mode control flags to the
indicated values for arguments that are supplied, and preserves the values assoicated with
those that aren't supplied.
Code Coverage
Overview
In Clozure CL 1.4 and later, code coverage provides information about which paths through
generated code have been executed and which haven't. For each source form, it can report
one of three possible outcomes:
Partly covered: This form was entered, and some parts were executed and some
weren't.
Fully covered: Every bit of code generated from this form was executed.
Limitations
While the information gathered for coverage of generated code is complete and precise, the
mapping back to source forms is of necessity heuristic, and depends a great deal on the
behavior of macros and the path of the source forms through compiler transforms. Source
information is not recorded for variables, which further limits the source mapping. In
practice, there is often enough information scattered about a partially covered function to
figure out which logical path through the code was taken and which wasn't. If that doesn't
work, you can try disassembling to see which parts of the compiled code were not executed:
in the disassembled code there will be references to #<CODE-NOTE [xxx] ...> where xxx is
NIL if the code that follows was never executed and non-NIL if it was.
Sometimes the situation can be improved by modifying macros to try to preserve more of
the input forms, rather than destructuring and rebuilding them.
Because the code coverage information is associated with compiled functions, code
coverage information is not available for load-time toplevel expressions. You can work
around this by creating a function and calling it. I.e. instead of
(progn
(do-this)
(setq that ...) ...))
do:
(defun init-this-and-that ()
(do-this)
(setq that ...) ...)
(init-this-and-that)
Then you can see the coverage information in the definition of init-this-and-that.
Usage
In order to gather code coverage information, you first have to recompile all your code to
include code coverage instrumentation. Compiling files will generate code coverage
instrumentation if ccl:*compile-code-coverage* is true:
(setq ccl:*compile-code-coverage* t)
(recompile-all-your-files)
The compilation process will be many times slower than normal, and the fasl files will be
many times bigger.
When you execute functions loaded from instrumented fasl files, they will record coverage
information every time they are executed. You can examine that information by calling
ccl:report-coverage or ccl:coverage-statistics.
While recording coverage, you can collect incremental coverage deltas between any two
points in time. You might do this while running a test suite, to record the coverage for each
test, for example:
(ccl:reset-incremental-coverage)
(loop with coverage = (make-hash-table)
for test in (tests-to-run)
do (run-test test)
do (setf (gethash test coverage) (ccl:get-incremental-coverage))
finally (return coverage))
creates a hash table mapping a test to a representation of all coverage recorded while
running the test. This hash table can then be passed to ccl:report-coverage,
ccl:incremental-coverage-svn-matches or ccl:incremental-coverage-
source-matches.
output-file
html
If non-nil (the default), this will generate an HTML report, consisting of an index file
in output-file and, in the same directory, one html file for each instrumented source
file that has been loaded in the current session.
tags
If non-nil, this should be a hash table mapping arbitrary keys (tags) to incremental
coverage deltas. The HTML report will show a list of tags, and allow selection of an
arbitrary subset of them to show the coloring and statistics for coverage by that
subset.
external-format
statistics
If you've loaded foo.lx64fsl and bar.lx64fsl, and have run some tests, you could do
(report-coverage "/my/dir/coverage/report.html")
reset-coverage [Function]
clear-coverage [Function]
Gets rid of the information about which instrumented files have been loaded, so
Saves all coverage info in a file, so you can restore the coverage state later. This allows you
to combine multiple runs or continue in a later session. Equivalent to (ccl:write-
coverage-to-file (ccl:get-coverage) pathname).
Restores the coverage data previously saved with ccl:save-coverage-in-file, for the set of
instrumented fasls that were loaded both at save and restore time. I.e. coverage info is only
restored for files that have been loaded in this session. For example if in a previous session
you had loaded "foo.lx86fsl" and then saved the coverage info, in this session you
must load the same "foo.lx86fsl" before calling restore-coverage-from-file in
order to retrieve the stored coverage info for "foo". Equivalent to (ccl:restore-
coverage (ccl:read-coverage-from-file pathname)).
get-coverage [Function]
Returns a snapshot of the current coverage data. A snapshot is a copy of the current
coverage state. It can be saved in a file with ccl:write-coverage-to-file, reinstated
back as the current state with ccl:restore-coverage, or combined with other
snapshots with ccl:combine-coverage.
Takes a list of coverage snapshots and returns a new coverage snapshot representing a
union of all the coverage data.
Saves the coverage snapshot in a file. The snapshot can be loaded back with ccl:read-
coverage-from-file or loaded and restored with ccl:restore-coverage-
from-file. Note that the file created is actually a lisp source file and can be compiled for
faster loading.
Returns the snapshot saved in pathname. Doesn't affect the current coverage state.
pathname can be the file previously created with ccl:write-coverage-to-file or
ccl:save-coverage-in-file, or it can be the name of the fasl created from compiling
such a file.
coverage-statistics [Function]
coverage-source-file
coverage-expressions-total
coverage-expressions-entered
the number of source expressions that have been entered (i.e. at least partially
covered)
coverage-expressions-covered
coverage-unreached-branches
the number of conditionals with one branch taken and one not taken
coverage-code-forms-total
the total number of code forms. A code form is an expression in the final stage of
compilation, after all macroexpansion and compiler transforms and simplification
coverage-code-forms-covered
coverage-functions-total
coverage-functions-fully-covered
coverage-functions-partly-covered
coverage-functions-not-entered
reset-incremental-coverage [Function]
Marks a starting point for recording incremental coverage. Note that calling this function
does not affect regular coverage data (whereas calling ccl:reset-coverage resets
incremental coverage as well).
Returns the delta of coverage since the last reset of incremental coverage. If reset is true
(the default), it also resets incremental coverage now, so that the next call to
get-incremental-coverage will return the delta from this point.
Incremental coverage deltas are represented differently than the full coverage snapshots
returned by functions such as ccl:get-coverage. Incremental coverage uses an
abbreviated format and is missing some of the information in a full snapshot, and therefore
cannot be passed to functions documented to accept a snapshot, only to functions
specifically documented to accept incremental coverage deltas.
collection
sources
A list of pathnames and/or source-notes, the latter representing a range within a file.
Given a hash table collection whose values are incremental coverage deltas, return a list
of all keys corresponding to those deltas that intersect any region in sources.
For example if the deltas represent tests, then the returned value is a list of all tests that
cover some part of the source regions.
Find incremental coverage deltas matching changes from a particular subversion revision.
collection
directory
revision
The revision to compare to the working directory, an integer or another value whose
printed representation is suitable for passing as the --revision argument to svn.
Given a hash table collection whose values are incremental coverage deltas, return a list
of all keys corresponding to those deltas that intersect any changed source in directory
since revision revision in subversion.
For example if the deltas represent tests, then the returned value is a list of all tests that
might be affected by the changes.
*compile-code-coverage* [Variable]
When true, instrument functions being compiled to collect code coverage information.
This variable controls whether functions are instrumented for code coverage. Files
compiled while this variable is true will contain code coverage instrumentation.
without-compiling-code-coverage [Macro]
This macro arranges so that body doesn't record internal details of code coverage. It will be
considered totally covered if it's entered at all. The Common Lisp macros ASSERT and
CHECK-TYPE use this macro.
The source coloring is applied from outside in. So for example if you have
first the whole outer form is painted with whatever color expresses the outer form
coverage, and then the inner form color is replaced with whatever color expresses the inner
form coverage. One consequence of this approach is that every part of the outer form that is
not specifically inside some executable inner form will have the outer form's coverage
color. If the syntax of outer form involves some non-executable forms, or forms that do not
have coverage info of their own for whatever reason, then they will just inherit the color of
the outer form, because they don't get repainted with a color of their own.
One case in which this approach can be confusing is in the case of symbols. As noted in the
Limitations section, coverage information is not recorded for variables; hence the coloring
of a variable does not convey information about whether the variable was evaluated or not
-- that information is not available, and the variable just inherits the color of the form that
contains it.
Other Extensions
In Clozure CL, the cleanup forms are always executed as if they were wrapped with
without-interrupts. To allow interrupts, use with-interrupts-enabled.
*quit-on-eof* [Variable]
When true, exit the top-level immediately upon receipt of an EOF. If *quit-on-eof* is
nil (which is the default), ignore the EOF.
Note, though, that an internally-defined number of consecutive EOFs will exit lisp anyway.
Change to the directory specified by dir, which may be a namestring or a pathname object.
Show information about the process proc, or all processes if proc is not specified.
Break Loops
Trace
Advising
Watched Objects
Notes
Examples
Memory
Disassemble
Source Notes
Break Loops
*break-on-warnings* [Variable]
This variable was removed from ANSI CL. The rationale was that the same effect may be
acheived with (setq *break-on-signals* 'warning).
*break-on-errors* [Variable]
When true (the default), lisp will enter a break loop when an error is signaled.
*show-restarts-on-break* [Variable]
When true, automatically print the available restarts before entering a break loop.
Trace
Clozure CL's tracing facility is invoked by an extended version of the Common Lisp trace
macro. Extensions allow tracing of methods, as well as finer control over tracing actions.
The trace macro encapsulates the functions named by specs, causing trace actions to take
place on entry and exit from each function. The default actions print a message on function
entry and exit. Keyword/value options can be used to specify changes in the default
behavior.
A spec is either a symbol that is the name of a function, or an expression of the form (setf
symbol), or a specific method of a generic function in the form (:method gf-name
{qualifier}* ({specializer}*)), where a specializer can be the name of a class or
an EQL specializer.
A spec can also be a string naming a package, or equivalently a list (:package package-
name), in order to request that all functions in the package to be traced.
*trace-output* showing the arguments on entry and values on exit. Options specified
as key/value pairs can be used to modify this behavior. Options preceding the function
specs apply to all the functions being traced. Options specified along with a spec apply to
that spec only and override any global options. The following options are supported:
:methods {T | nil}
If true, and if applied to a spec naming a generic function, arranges to trace all the
methods of the generic function in addition to the generic function itself.
Inhibits all trace actions unless the current invocation of the function being traced is
inside one of the outside-spec's, i.e. unless a function named by one of the outside-
spec's is currently on the stack. outside-spec can name a function, a method, or a
package, as above.
Evaluates form whenever the function being traced is about to be entered, and
inhibits all trace actions if form returns nil. The form may reference the lexical
variable ccl::args, which is a list of the arguments in this call. :condition is just
a synonym for :if, though if both are specified, both must return non-nil.
:before-if form
Evaluates form whenever the function being traced is about to be entered, and
inhibits the entry trace actions if form returns nil. The form may reference the lexical
variable ccl::args, which is a list of the arguments in this call. If both :if and
:before-if are specified, both must return non-nil in order for the before entry
actions to happen.
:after-if form
Evaluates form whenever the function being traced has just exited, and inhibits the
exit trace actions if form returns nil. The form may reference the lexical variable
ccl::vals, which is a list of values returned by this call. If both :if and
:after-if are specified, both must return non-nil in order for the after exit actions
to happen.
:print-before form
Evaluates form whenever the function being traced is about to be entered, and prints
the result before printing the standard entry message. The form may reference the
lexical variable ccl::args, which is a list of the arguments in this call. To see
multiple forms, use values: :print-before (values (one-thing)
(another-thing)).
:print-after form
Evaluates form whenever the function being traced has just exited, and prints the
result after printing the standard exit message. The form may reference the lexical
variable ccl::vals, which is a list of values returned by this call. To see multiple
forms, use values: :print-after (values (one-thing) (another-
thing)).
:print form
:eval-before form
Evaluates form whenever the function being traced is about to be entered. The form
may reference the lexical variable ccl::args, which is a list of the arguments in this
call.
:eval-after form
Evaluates form whenever the function being has just exited. The form may reference
the lexical variable ccl::vals, which is a list of values returned by this call.
:eval form
:break-before form
Evaluates form whenever the function being traced is about to be entered, and if the
result is non-nil, enters a debugger break loop. The form may reference the lexical
variable ccl::args, which is a list of the arguments in this call.
:break-after form
Evaluates form whenever the function being traced has just exited, and if the result is
non-nil, enters a debugger break loop. The form may reference the lexical variable
ccl::vals, which is a list of values returned by this call.
:break form
Evaluates form whenever the function being traced is about to be entered. The form
may reference the lexical variable ccl::args, which is a list of the arguments in this
call. The value returned by form is intepreted as follows:
nil
does nothing
:detailed
(:detailed integer)
integer
anything else
Note that unlike with the other options, :backtrace is equivalent to :backtrace-
before only, not both before and after, since it's usually not helpful to print the
same backtrace both before and after the function call.
:backtrace-after form
Evaluates form whenever the function being traced has just exited. The form may
reference the lexical variable ccl::vals, which is a list of values returned by this
call. The value returned by form is intepreted as follows:
nil
does nothing
:detailed
(:detailed integer)
integer
anything else
:beforeaction
specifies the action to be taken just before the traced function is entered. action is one
of:
The default, prints a short indented message showing the function name and
the invocation arguments
:break
:backtrace
function
:afteraction
specifies the action to be taken just after the traced function exits. action is one of:
The default, prints a short indented message showing the function name and
the returned values
:break
:backtrace
function
Any other value is interpreted as a function to call on exit instead of printing the
standard exit message. It is called with its first argument being the name of the
function being traced, the remaining arguments being all the values returned by
the function being traced, and ccl:*trace-level* bound to the current nesting
level of trace actions.
ccl:*trace-level* [Variable]
Variable bound to the current nesting level during execution of before and after trace
actions. The default printing actions use it to determine the amount of indentation.
ccl:*trace-max-indent* [Variable]
The default before and after print actions will not indent by more than the value of
*trace-max-indent* regardless of the current trace level.
This is a functional version of the TRACE macro. spec and keywords are as for TRACE,
except that all arguments are evaluated.
ccl:*trace-print-level* [Variable]
The default print actions bind *print-level* to this value while printing. Note that this
rebinding is only in effect during the default entry and exit messages. It does not apply to
printing of :print-before/:print-after forms or any explicit printing done by user
code.
ccl:*trace-print-length* [Variable]
The default print actions bind *print-length* to this value while printing. Note that
this rebinding is only in effect during the default entry and exit messages. It does not apply
to printing of :print-before/:print-after forms or any explicit printing done by
user code.
ccl:*trace-bar-frequency* [Variable]
By default, this is nil. If non-nil it should be a integer, and the default entry and exit
messages will print a | instead of space every this many levels of indentation.
Advising
The advise macro can be thought of as a more general version of trace. It allows code
that you specify to run before, after, or around a given function, for the purpose of
changing the behavior of the function. Each piece of added code is called a piece of advice.
Each piece of advice has a unique name, so that you can have multiple pieces of advice on
the same function, including multiple :before, :after, and :around pieces of advice.
The :name and :when keywords serve to identify the piece of advice. A later call to
advise with the same values of :name and :when will replace the existing piece of advice;
a call with different values will not.
Add a piece of advice to the function or method specified by spec according to form.
spec
A specification of the function on which to put the advice. This is either a symbol that
is the name of a function or generic function, or an expression of the form (setf
symbol), or a specific method of a generic function in the form (:method symbol
{qualifiers} (specializer {specializer})).
form
A form to execute before, after, or around the advised function. The form can refer to
the variable arglist that is bound to the arguments with which the advised function
was called. You can exit from form with (return).
name
when
An argument that specifies when the piece of advice is run. There are three allowable
values. The default is :before, which specifies that form is executed before the
advised function is called. Other possible values are :after, which specifies that
form is executed after the advised function is called, and :around, which specifies
that form is executed around the call to the advised function. Use (:do-it) within
form to indicate invocation of the original definition.
The function foo, already defined, does something with a list of numbers. The following
code uses a piece of advice to make foo return zero if any of its arguments is not a number.
Using :around advice, you can do the following:
(advise foo (if (some #'(lambda (n) (not (numberp n))) arglist)
0
(:do-it))
:when :around :name :zero-if-not-nums)
(advise foo (if (some #'(lambda (n) (not (numberp n))) arglist)
(return 0))
:when :before :name :zero-if-not-nums)
Remove the piece or pieces of advice matching spec, when, and name.
The unadvise macro removes the piece or pieces of advice matching spec, when, and
name. When the value of spec is t and the values of when and name are nil, unadvise
removes every piece of advice; when spec is t, the argument when is nil, and name is
non-nil, unadvise removes all pieces of advice with the given name.
Return a list of the pieces of advice matching spec, when, and name.
The advisedp macro returns a list of existing pieces of advice that match spec, when, and
name. When the value of spec is t and the values of when and name are nil, advisedp
returns all existing pieces of advice.
Watched Objects
As of release 1.4, Clozure CL provides a way for lisp objects to be watched so that a
condition will be signaled when a thread attempts to write to the watched object. For a
certain class of bugs (someone is changing this value, but I don't know who), this can be
extremely helpful.
object
The WATCH function arranges for the specified object to be monitored for writes. This is
accomplished by copying the object to its own set of virtual memory pages, which are then
write-protected. This protection is enforced by the computer's memory-management
hardware; the write-protection does not slow down reads at all.
When called with no arguments, WATCH returns a freshly-consed list of the objects
currently being watched.
WATCH returns NIL if the object cannot be watched (typically because the object is in a
static or pure memory area).
WATCH operates at a fairly low level; it is not possible to avoid the details of the internal
representation of objects. Nevertheless, as a convenience, WATCHing a standard-instance,
a hash-table, or a multi-dimensional or non-simple CL array will watch the underlying
slot-vector, hash-table-vector, or data-vector, respectively.
WATCH operates on cons cells, not lists. In order to watch a chain of cons cells, each cons
cell must be watched individually. Because each watched cons cell takes up its own own
virtual memory page (4 Kbytes), it's only feasible to watch relatively short lists.
If a memory-allocated object isn't a cons cell, then it is a vector-like object called a uvector.
A uvector is a memory-allocated lisp object whose first word is a header that describes the
object's type and the number of elements that it contains.
Some CL objects, like strings and other simple vectors, map in a straightforward way onto
the uvector representation. It is easy to understand what happens in such cases. The
uvector index corresponds directly to the vector index:
one of those "slots" contains the data that will be changed when the object is written to.
As mentioned above, watch knows about arrays, hash-tables, and standard-instances, and
will automatically watch the appropriate data-containing element.
? (defclass foo ()
(slot-a slot-b slot-c))
#<STANDARD-CLASS FOO>
? (defvar *a-foo* (make-instance 'foo))
*A-FOO*
? (watch *a-foo*)
#<SLOT-VECTOR #xDB00D>
;;; Note that WATCH has watched the internal slot-vector object
? (setf (slot-value *a-foo* 'slot-a) 'foo)
> Error: Write to watched uvector #<SLOT-VECTOR #xDB00D> at index 1
> Faulting instruction: (movq (% rsi) (@ -5 (% r8) (% rdi)))
> While executing: %MAYBE-STD-SETF-SLOT-VALUE-USING-CLASS, in process lis
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.
Looking at a backtrace would presumably show what object and slot name were written.
Note that even though the write was to slot-a, the uvector index was 1 (not 0). This is
because the first element of a slot-vector is a pointer to the instance that owns the slots. We
can retrieve that to look at the object that was modified:
The UNWATCH function ensures that the specified object is in normal, non-monitored
memory. If the object is not currently being watched, UNWATCH does nothing and returns
NIL. Otherwise, the newly unwatched object is returned.
write-to-watched-object [Condition]
This condition is signaled when a watched object is written to. There are three slots of
interest:
object
offset
The byte offset from the tagged object pointer to the address of the write.
instruction
A few restarts are provided: one will skip over the faulting write instruction and proceed;
another offers to unwatch the object and continue.
There is also an emulate restart. In some common cases, the faulting write instruction can
be emulated, enabling the write to be performed without having to unwatch the object (and
therefore let other threads potentially write to it). If the faulting instruction isn't
recognized, the emulate restart will not be offered.
Notes
Although some care has been taken to minimize potential problems arising from watching
and unwatching objects from multiple threads, there may well be subtle race conditions
present that could cause bad behavior.
For example, suppose that a thread attempts to write to a watched object. This causes the
operating system to generate an exception. The lisp kernel figures out what the exception
is, and calls back into lisp to signal the write-to-watched-object condition and perhaps
handle the error.
Now, as soon lisp code starts running again (for the callback), it's possible that some other
thread could unwatch the very watched object that caused the exception, perhaps before we
even have a chance to signal the condition, much less respond to it.
Having the object unwatched out from underneath a handler may at least confuse it, if not
cause deeper trouble. Use caution with unwatch.
Examples
Here are a couple more examples in addition to the above examples of watching a string
and a standard-instance.
Fancy arrays
In this case, uvector index in the report is the row-major index of the element that was
written to.
Hash tables
Hash tables are surprisingly complicated. The representation of a hash table includes an
element called a hash-table-vector. The keys and values of the elements are stored pairwise
in this vector.
One problem with trying to monitor hash tables for writes is that the underlying
hash-table-vector is replaced with an entirely new one when the hash table is rehashed. A
previously-watched hash-table-vector will not be the used by the hash table after
rehashing, and writes to the new vector will not be caught.
Lists
? (watch-list *l*)
((1 2 3) (2 3) (3))
? (setf (nth 2 *l*) 'foo)
> Error: Write to the CAR of watched cons cell (3)
> Faulting instruction: (movq (% rsi) (@ 5 (% rdi)))
> While executing: %SETNTH, in process listener(1).
> Type :POP to abort, :R for a list of available restarts.
> Type :? for other options.h
Memory
heap-utilization &key (stream *debug-io*) (gc-first t) area unit (sort :size) classes
start threshold) [Function]
This function walks the lisp heap, collects information about the objects stored on the
heap, and prints a report of the results to the stream speficied by the keyword argument
:stream. It shows the number of objects of each type, the sum of their logical sizes (the
size of the data part of the object) and the sum of their physical sizes (the total size as
computed by object-direct-size).
If :gc-first is true (the default), heap-utilization does a full gc before scanning the
heap.
If :classes is true, objects are classified by class rather than just basic type.
The keyword argument :area can be used to restrict the walk to one memory area or a list
of areas. Some possible values are :dynamic, :static, :managed-static, and
:readonly. By default, all areas (including stacks) are examined.
By default, sizes are shown in bytes. The keyword argument :unit can be :kb, :mb, or
:gb to show sizes in those units.
If threshold is non-nil, it should be a number between 0 and 1. All types whose share of the
heap is less than threshold will be lumped together in an “All Others” line rather than being
listed individually.
This function returns the size of thing in bytes, including any headers and alignment
overhead. It does not descend into an object's components.
%address-of [Function]
This function returns the address of thing as an integer. If thing is a fixnum, thing is simply
returned.
Note that there are types other than fixnums that are represented as immediate values
rather than heap-allocated objects. On various platforms, these might include characters
and single-flosts, and possibly other values. The %address-of function will return fairly
useless values for such objects.
The value returned by %address-of is only useful for debugging, since the GC may run at
any time and may move objects around in memory, thereby changing their addresses.
Disassemble
*disassemble-verbose* [Variable]
Source Notes
Source locations are recorded in source-note objects, which have accessors source-
note-filename, source-note-start-pos, source-note-end-pos, and source-
note-text.
The start and end positions are file positions and not character positions. The text will be
nil unless source recording was on at read time. If the original source file is still available,
ensure-source-note-text will force the missing source text to be read from the file.
Source notes are associated with definitons (via record-source-file) and also stored
in function objects (including anonymous and local functions). The former can be retrieved
via find-definition-sources, and the latter via function-source-note.
*record-source-file* [Variable]
*save-source-locations* [Variable]
nil
Do not record source location information. Filename information for definitions will
still be saved if *record-source-file* is true.
Store source location information, including the original source text, for function
objects and definitions.
:no-text
Store source location information, but do not store a copy of the original source text.
This is an optimization useful for compiling files that are not expected to change.
*record-pc-mapping* [Variable]
function-source-note f [Function]
Return the starting file position (not character position) for the thing described by
source-note.
Return the ending file position (not character position) for the thing described by
source-note.
Read the source text from the original file if it is not already present in source-note.
Characters
External Formats
Line Termination Keywords
Character Encodings
Encoding Problems
Byte Order Marks
Selected Character Encodings
Encoding and Decoding Strings
All characters and strings in Clozure CL fully support Unicode by using UTF-32. There is
only one character type and one string type in Clozure CL. There has been a lot of
discussion about this decision which can be found by searching the openmcl-devel archives
at http://clozure.com/pipermail/openmcl-devel/ . Suffice it to say that we decided that the
simplicity and speed advantages of only supporting UTF-32 outweigh the space
disadvantage.
Characters
There is one character type in Clozure CL. All characters are base-chars.
char-code-limit is now #x110000, which means that all Unicode characters can be
directly represented. As of Unicode 5.0, only about 100,000 of 1,114,112 possible
char-codes are actually defined. The function code-char knows that certain ranges of
code values (notably #xd800-#xddff) will never be valid character codes and will return
nil for arguments in that range, but may return a non-nil value (an undefined/non-
standard character object) for other unassigned code values.
Characters with codes in the range #xa0-#x7ff also have symbolic names These are the
names from the Unicode standard with spaces replaced by underscores. So
#\Greek_Capital_Letter_Epsilon can be used to refer to the character whose
char-code is #x395. To see the complete list of supported character names, look just
below the definition for ccl::register-character-name in ccl:level-
1;l1-reader.lisp.
External Formats
The standard functions open, load, and compile-file all accept an :external-
format keyword argument. The value of :external-format can be :default (the
default value), a line termination keyword (see Line Termination Keywords), a character
encoding keyword (see Character Encodings), an external-format object created using
make-external-format, or a plist with the keys :domain, :character-encoding
and :line-termination. If argument is a plist, the result of (apply #'make-
external-format argument) will be used.
Note that the set of keywords used to denote character-encodings and the set of
keywords used to denote line-termination conventions is disjoint: a keyword denotes at
most a character encoding or a line termination convention, but never both.
EXTERNAL-FORMATs are objects (structures) with two read-only fields that can be
accessed via the functions: external-format-line-termination and external-
format-character-encoding.
ccl:*default-external-format* [Variable]
The initial value of this variable in Clozure CL is :unix, which is equivalent to (:line-
termination :unix), among other things.
ccl:*default-line-termination* [Variable]
Either creates a new external format object, or return an existing one with the same
specified slot values.
domain
This is used to indicate where the external format is to be used. Its value can be
almost anything. It defaults to NIL. There are two domains that have a pre-defined
meaning in Clozure CL: :file indicates encoding for a file in the file system and
:socket indicates i/o to/from a socket. The value of domain affects the default
values for character-encoding and line-termination.
character-encoding
A keyword that specifies the character encoding for the external format. Character
Encodings. Defaults to :default which means if domain is :file use the value of
the variable *default-file-character-encoding* and if domain is :socket,
use the value of the variable *default-socket-character-encoding*. The
initial value of both of these variables is NIL, which means the :iso-8859-1
encoding.
line-termination
external-format
Despite the function's name, it doesn't necessarily create a new, unique external-
format object: two calls to make-external-format with the same arguments made in
the same dynamic environment return the same (eq) object.
Line termination keywords indicate which characters are used to indicate the end of a line.
On input, the external line termination characters are replaced by #\Newline and on
output, #\Newlines are converted to the external line termination characters.
keyword character(s)
:unix #\Linefeed
:macos #\Return
:cr #\Return
:crlf #\Return #\Linefeed
:cp/m #\Return #\Linefeed
:msdos #\Return #\Linefeed
:dos #\Return #\Linefeed
:windows #\Return #\Linefeed
:inferred see below
:unicode #\Line_Separator
Character Encodings
Internally, all characters and strings in Clozure CL are in UTF-32. Externally, files or socket
streams may encode characters in a wide variety of ways. The International Organization
for Standardization, widely known as ISO, defines many of these character encodings.
Clozure CL implements some of these encodings as detailed below. These encodings are
part of the specification of external formats (see External Formats). When reading from a
stream, characters are converted from the specified external character encoding to UTF-32.
When writing to a stream, characters are converted from UTF-32 to the specified character
encoding.
describe-character-encodings [Function]
Encoding Problems
The presence of a replacement character typically indicates that something got lost in
translation: either data was not encoded properly or there was a bug in the decoding
process.
The endianness of a character encoding is sometimes explicit, and sometimes not. For
example, :utf-16be indicates big-endian, but :utf-16 does not specify endianness. A
byte order mark is a special character that may appear at the beginning of a stream of
encoded characters to specify the endianness of a multi-byte character encoding. (It may
also be used with UTF-8 character encodings, where it is simply used to indicate that the
encoding is UTF-8.)
Clozure CL writes a byte order mark as the first character of a file or socket stream when
the endianness of the character encoding is not explicit. Clozure CL also expects a byte
order mark on input from streams where the endianness is not explicit. If a byte order
mark is missing from input data, that data is assumed to be in big-endian order.
A byte order mark from a UTF-8 encoded input stream is not treated specially and just
appears as a normal character from the input stream. It is probably a good idea to skip over
this character.
A few commonly-used encodings are described here. For the complete list, call describe-
character-encodings. Most encodings have aliases, e.g. the encoding named
:iso-8859-1 can also be referred to by the names :latin1 and :ibm819, among
others. Where possible, the keywordized name of an encoding is equivalent to the preferred
MIME charset name (and the aliases are all registered IANA charset names.)
:utf-8
Clozure CL uses this encoding for *terminal-io* and for all streams whose
EXTERNAL-FORMAT isn't explicitly specified. The default for *terminal-io* can
be set via the -K command-line argument (see Command Line Options).
:iso-8859-1
An 8-bit, fixed-width character encoding in which all character codes map to their
Unicode equivalents. Intended to support most characters used in most Western
European languages.
ISO-8859-1 just covers the first 256 Unicode code points, where the first 128 code
points are equivalent to US-ASCII. That should be pretty much equivalent to what
earliers versions of Clozure CL did that only supported 8-bit characters, but it may
not be optimal for users working in a particular locale.
:us-ascii
An 7-bit, fixed-width character encoding in which all character codes map to their
Unicode equivalents.
:utf-16
Clozure CL provides functions to encode and decode strings to and from vectors of type
(simple-array (unsigned-byte 8)).
Decodes the octets in vector (or the subsequence of it delimited by start and end) into a
string according to external-format.
If string is supplied, output will be written into it. It must be large enough to hold the
decoded characters. If string is not supplied, a new string will be allocated to hold the
decoded characters.
Returns, as multiple values, the decoded string and the position in vector where the
decoding ended.
Sequences of octets in vector that cannot be decoded into characters according to external-
format will be decoded as #\Replacement_Character.
Encodes string (or the substring of it delimited by start and end) into vector according to
external-format. It returns, as multiple values, the vector of octets containing the encoded
data and an integer that specifies the offset into the vector where the encoded data ends.
When use-byte-order-mark is true, a byte-order mark will be included in the encoded data.
If vector-offset is supplied, data will be written into the output vector starting at that offset.
Characters in string that cannot be encoded into external-format will be replaced with an
encoding-dependent replacement character (either #\Replacement_Character or
#\Sub) before being encoded and written into the output vector.
When use-byte-order-mark is true, the returned size will include the space needed for a
byte-order marker.
Pathnames
Pathname Expansion
Predefined Logical Hosts
Pathname Namestrings
Pathnames
Pathname Expansion
Leading tilde (~) characters in physical pathname namestrings are expanded in the way
that most shells do: “~user/...” can be used to refer to an absolute pathname rooted at
the home directory of the user named “user”, and “~/...” can be used to refer to an
absolute pathname rooted at the home directory of the current user.
Clozure CL sets up logical pathname translations for two logical hosts: ccl and home.
The ccl logical host is meant to refer to the ccl directory. It is used for a variety of
purposes by Clozure CL including: locating Clozure CL source code, require and
provide, accessing foreign function information, and the Clozure CL build process. It is
set to the value of the environment variable CCL_DEFAULT_DIRECTORY, if that variable
exists. Otherwise, it is set to the directory containing the heap image file.
Pathname Namestrings
When generating a namestring from a pathname object (as happens, for example, when
printing a pathname), Clozure CL tries to avoid some potential ambiguity by escaping
characters that might otherwise be used to separate pathname components. The character
used to quote or escape the separators is a backlash on Unix systems, and a #\> character
on Windows. So, for example, “a\\.b.c” has name “a.b” and type “c”, whereas “a.b\\.c”
has name “a” and type “b.c”.
To get a native namestring suitable for passing to an operating system command, use the
function native-translated-namestring.
This function returns a namestring that represents a pathname using the native
conventions of the operating system. Any quoting or escaping of special characters will be
removed.
Lisp strings are not interchangable with C strings. Clozure CL provides a reasonably
straightforward way to translate a lisp native namestring into a C-style string suitable for
passing to a foreign function.
For example, one might use this macro in the following way:
Various operating systems have different conventions for how they expect native pathname
strings to be encoded. Darwin expects then to be decomposed UTF-8. The Unicode variants
to Windows file-handling functions expect UTF-16. Other systems just treat them as
opaque byte sequences. This macro ensures that the correct encoding is used, whatever the
host operating system.
pathname-encoding-name [Function]
:files
:directories
:all
If true, includes files and directories whose names start with a dot character in the
output. (But note that entries named “.” or “..” are never included.) Defaults to t.
:follow-links
If true, includes the truenames of symbolic or hard links in the output; if false,
includes the link filenames without attempting to resolve them. Defaults to t.
Note that legacy HFS alias files are treated as plain files.
:test
A function of one argument (a pathname) which should return true if the pathname
should be included in the output.
:include-emacs-lockfiles
This function looks up the value of the environment variable denoted by the string name
and returns its value as a string. If there is no such envionment variable, then nil is
returned.
This function sets the operating system environment variable denoted by the string name
to the string value. If the environment variable is successfully set, 0 is returned. Otherwise,
a platform-specific integer error code is returned.
This function deletes the operating system environment variable denoted by the string
name.
Wait for the signal with signal number sig to be received, or until duration seconds have
elapsed. If duration is nil, wait for an indeterminate very long time (many years).
If sig is outside the range of valid signals, or reserved by Clozure CL for its own use, an
error is signaled. An error is always signaled on Windows systems.
Cleanly exit from lisp. If exit is a value of type (signed-byte 32), that value will be
passed to the C library function _exit() as the status code. A value of nil is treated as a
zero.
Alternatively, exit may be a function of no arguments. This function will be called instead
of _exit() to exit the lisp.
*command-line-argument-list* [Variable]
A list of strings decoded from the argument vector passed to the lisp process (as argv[])
by the operating system. The foreign C strings are assumed to be UTF-8 encoded.
*unprocessed-command-line-arguments* [Variable]
A list of strings that denotes the command-line arguments that remain after the lisp has
processed and removed arguments that it interprets itself.
Overview
Sockets Dictionary
Overview
IPv6 is supported by the :internet6 address family. Applications should use the resolve-
address function to translate host and port specifications to socket addresses. While host
and port numbers can still be dealt with separately, it is preferable to use ccl::socket-
address instances to specify socket endpoints for unified parsing of string representations
and printing.
All symbols mentioned in this chapter are exported from the CCL package. As of version
0.13, these symbols are additionally exported from the OPENMCL-SOCKET package.
Clozure CL supports three types of sockets: TCP sockets, UDP sockets, and Unix-domain
sockets. This should be enough for all but the most esoteric network situations. All sockets
are created by make-socket. The type of socket depends on the arguments to it, as
follows:
tcp-stream
file-socket-stream
listener-socket
A passive socket used to listen for incoming TCP/IP connections on a particular port.
A listener-socket is not a stream. It doesn't support I/O. It can only be used to create
new tcp-streams by accept-connection. Created by (make-socket :type :stream
:connect :passive ...)
file-listener-socket
A passive socket used to listen for incoming UNIX domain connections named by a
file in the local filesystem. A listener-socket is not a stream. It doesn't support I/O. It
can only be used to create new file-socket-streams by accept-connection. Created by
(make-socket :address-family :file :type :stream :connect :passive ...)
udp-socket
Sockets Dictionary
Make a socket.
address-family
The address/protocol family of this socket. Currently, :internet (the default), meaning
IPv4, :internet6, meaning IPv6, and :file, referring to UNIX domain addresses, are
supported.
type
connect
This argument is only relevant to sockets of type :stream. One of :active (the default)
to request a :passive to request a file or TCP listener socket.
eol
This argument is currently ignored (it is accepted for compatibility with Franz
Allegro).
format
One of :text (the default), :binary, or :bivalent. This argument is ignored for :stream
sockets for now, as :stream sockets are currently always bivalent (i.e. they support
both character and byte I/O). For :datagram sockets, this argument is ignored (the
format of a datagram socket is always :binary).
remote
For TCP streams, it specifies the socket address to connect to, specified as socket-
address instance. Ignored for listener sockets. For UDP sockets, it can be used to
specify a default address for subsequent calls to send-to or receive-from.
remote-host
For TCP streams, it specifies the host to connect to (in any format acceptable to
resolve-address). Ignored for listener sockets. For UDP sockets, it can be used to
specify a default host for subsequent calls to send-to or receive-from.
remote-port
For TCP streams, it specifies the port to connect to (in any format acceptable to
resolve-address). Ignored for listener sockets. For UDP sockets, it can be used to
specify a default port for subsequent calls to for subsequent calls to send-to or
receive-from.
remote-filename
For file-socket streams, it specifies the name of a file in the local filesystem (e.g., NOT
mounted via NFS, AFP, SMB, ...) which names and controls access to a UNIX-domain
socket.
local-address
Allows you to specify a local address for a listener or UDP socket, for the case where
you want to restrict connections to those coming to a specific local address for
security reasons.
local-host
Allows you to specify a local host address for a listener or UDP socket, for the case
where you want to restrict connections to those coming to a specific local address for
security reasons.
local-port
Specify a local port for a socket. Most useful for listener sockets, where it is the port
on which the socket will listen for connections.
local-filename
For file-listener-sockets, specifies the name of a file in the local filesystem which is
used to name a UNIX-domain socket. The actual filesystem file should not previously
exist when the file-listener-socket is created; its parent directory should exist and be
writable by the caller. The file used to name the socket will be deleted when the
file-listener-socket is closed.
keepalive
reuse-address
If true, allows the reuse of local ports in listener sockets, overriding some TCP/IP
protocol specifications. You will need this if you are debugging a server..
nodelay
If true, disables Nagle's algorithm, which tries to minimize TCP packet fragmentation
by introducing transmission delays in the absence of replies. Try setting this if you
are using a protocol which involves sending a steady stream of data with no replies
and are seeing significant degradations in throughput.
broadcast
linger
If specified and non-nil, should be the number of seconds the OS is allowed to wait
for data to be pushed through when a close is done. Only relevant for TCP sockets.
backlog
For a listener socket, specifies the number of connections which can be pending but
not accepted. The default is 5, which is also the maximum on some operating
systems.
input-timeout
The number of seconds before an input operation times out. Must be a real number
between zero and one million. If an input operation takes longer than the specified
number of seconds, an input-timeout error is signalled. (see Stream Timeouts and
Deadlines)
output-timeout
The number of seconds before an output operation times out. Must be a real number
between zero and one million. If an output operation takes longer than the specified
number of seconds, an output-timeout error is signalled. (see Stream Timeouts
and Deadlines)
connect-timeout
The number of seconds before a connection attempt times out. [TODO: what are
acceptable values?] If a connection attempt takes longer than the specified number of
seconds, a socket-error is signalled. This can be useful if the specified interval is
shorter than the interval that the OS's socket layer imposes, which is sometimes a
minute or two.
auto-close
When non-nil, any resulting socket stream will be closed when the GC can prove that
the stream is unreferenced. This is done via CCL's termination mechanism [TODO
add xref].
deadline
Creates and returns a new socket. For :passive sockets, the :local-address, :local-port or
:local-filename arguments are required, depending on the type of the socket. For :active
sockets, either the :remote-address, the :remote-host and :remote-port, or the :remote-
filename arguments must be present, depending on the socket type.
Extracts the first connection on the queue of pending connections for socket, accepts it (i.e.
completes the connection startup protocol), and returns a new tcp-stream or file-socket-
stream representing the newly established connection. The tcp stream inherits any
properties of the listener socket that are relevant (e.g. :keepalive, :nodelay, and so
forth.) The original listener socket continues to be open listening for more connections, so
you can call accept-connection on it again.
If :wait is t, and there are no connections waiting to be accepted, the function wait until
one arrives. Otherwise, accept-connection will return nil immediately.
host
Specification of the host, as a string. This can be either a host name such as
“clozure.com” or any of the literal address forms accepted by getaddrinfo().
port
Specification of the port. This can be either a service name such as “http” or a port
number.
socket-type
Service type for port lookups, can be either :stream for TCP services or :datagram for
UDP. Defaults to :stream.
connect
address-family
Specifies the address family that should be returned, can be specified as either
:internet or :internet6. If it is specified, only addresses of that family are returned.
numeric-host-p
If this argument is true, no host name lookups will be performed for the host address.
A numeric address literal must be passed in this case.
numeric-port-p
If this argument is true, no service name lookups will be performed for the port
address. A numeric port number must be passed in this case.
singlep
If this argument is set to a true value, which is the default, only the first matching
address is returned. If it is passed as NIL, all matching addresses are returned as a
list.
errorp
If this argument is set to a true value, which is the default, an error is signalled if the
given host and/or port combination did not yield any matches. If it is passed as NIL,
the function returns NIL if no addresses matched the supplied arguments.
Converts dotted, which should be a dotted quad string such as “192.168.0.1”, into a integer
representation. If :errorp is true, an error is signaled if dotted is invalid. Otherwise, nil
is returned.
This function converts ipaddr, an integer representing an IPv4 host, into a dotted quad
string. If :values is true, instead of a dotted quad string, it returns the four octets of the
address as multiple values.
Converts ipaddr, a 32-bit unsigend integer, into a host name string. The keyword argument
:ignore-cache is ignored (it is accepted for compatibility with Franz Allegro CL).
Converts host-spec into a 32-bit unsigned IPv4 address. IPv6-enabled applications should
use the resolve-address function instead.
Acceptable formats for host-spec include a host name string such as “www.clozure.com”, a
dotted address string such as “192.168.0.1”, or a 32-bit unsigned IPv4 address such as
3232235521.
Finds the numeric port number for the specified port and protocol.port can be a string
such as “http”, a symbol such as :http, or a port number. Note that strings are
case-sensitive. Symbol names are converted to lower-case before lookup. Protocol must be
one of “tcp” or “udp”.
receive-from (socket udp-socket) size code &key buffer extract offset want-socket-
address-p [Function]
Reads a UDP packet from socket. If no packets are available, waits for a packet to arrive.
3. The 32-bit unsigned IPv4 address or the 16 byte IPv6 addresss of the sender of the
data
socket
size
Maximum number of bytes to read. If the packet is larger than this, any extra bytes
are discarded.
buffer
If specified, must be an octet vector which will be used to read in the data. If not
specified, a new buffer will be created (of type determined by socket-format).
extract
If true, the subsequence of the buffer corresponding only to the data read in is
extracted and returned as the first value. If false (the default) the original buffer is
returned even if it is only partially filled.
offset
Specifies the start offset into the buffer at which data is to be stored. The default is 0.
want-socket-address-p
Indicates that the address of the sender of the data should be returned as a
ccl::socket-address instance rather than as separate host and port values.
send-to (socket udp-socket) buffer size code &key remote remote-host remote-port
offset [Function]
socket
buffer
size
remote
remote-host
The host to send the packet to, in any format acceptable to lookup-hostname. The
default is the remote host specified in the call to make-socket.
remote-port
The port to send the packet to, in any format acceptable to lookup-port. The default is
the remote port specified in the call to make-socket.
offset
Shuts down part of a bidirectional connection represented by socket. Typcially socket will
be a tcp-stream. One situation where this can be useful is when you need to read responses
after sending an end-of-file signal. The keyword argument :direction may be either
:input (to disallow further input) or :output (to disallow further output).
Returns the native OS's representation of socket, or nil if the socket is closed. On Unix,
this will be a file descriptor. Note that it is rather dangerous to mess around with
tcp-stream file descriptors, as there is all sorts of buffering and asynchronous I/O going on
above the OS level. listener-socket and udp-socket file descriptors are safer to mess with
directly as there is less magic going on.
Returns the 32-bit unsigned IPv4 address of the remote host, or nil if socket is not
connected.
Returns the remote port number of socket, or NIL if socket is not connected.
Returns 32-bit unsigned IPv4 address or the 16 byte IPv6 address of the local host for
socket.
Returns :internet, :internet6 or :file, as appropriate for thing, which should be a socket or
socket-address.
Returns the host portion of the given socket-address. For :internet addresses, this is a
32-bit integer. For :internet6 addresses, a vector of 16 bytes returned. For :file addresses, it
is the file name string.
This function returns the port number of the given socket-address. This function is
available only for :internet and :internet6 socket addresses.
This function examines socket and returns :active for a tcp-stream, :passive for
listener-socket, and nil for a udp-socket.
socket-address [Class]
This class is a representation of a socket endpoint address. Instances of this class are used
to encapsulate the host and port of an IP socket endpoint or the filename of a file socket.
They can be created by applications from a possibly symbolic address representation by the
resolve-address function.
socket-error [Class]
Returns a symbol representing the socket error code contained in socket-error. It will be
one of :address-in-use, :connection-aborted, :no-buffer-space,
:connection-timed-out, :connection-refused, :host-unreachable,
:host-down, :network-down, :address-not-available, :network-reset,
Returns a string describing the context where the error socket-error happened. On Linux,
this is the name of the system call which returned the error.
Close socket, releasing the operating system resources associated with it. Normally, any
pending buffered I/O will be finished up before closing, but if :abort is t, any pending
I/O will be aborted. Note that for listener and udp sockets, there is never any buffered I/O
to clean up, so the value of :abort is effectively ignored.
Threads Overview
(Intentionally) Missing Functionality
Implementation Decisions and Open Questions
Thread Stack Sizes
As of August 2003:
Porting Code from the Old Thread Model
Background Terminal Input
Overview
An example
A more elaborate example.
Summary
The Threads which Clozure CL Uses for Its Own Purposes
Threads Dictionary
Threads Overview
Wherever possible, I'll try to use the term "thread" to denote a lisp thread, even though
many of the functions in the API have the word "process" in their name. A lisp-process is a
lisp object (of type CCL:PROCESS) which is used to control and communicate with an
underlying native thread. Sometimes, the distinction between these two (quite different)
objects can be blurred; other times, it's important to maintain.
Lisp threads share the same address space, but maintain their own execution context
(stacks and registers) and their own dynamic binding context.
Under Clozure CL's cooperative scheduling model, it was possible (via the use of the
CCL:WITHOUT-INTERRUPTS construct) to defer handling of the periodic interrupt that
invoked the lisp scheduler; it was not uncommon to use WITHOUT-INTERRUPTS to gain
safe, exclusive access to global data structures. In some code (including much of Clozure
CL itself) this idiom was very common: it was (justifiably) believed to be an efficient way of
inhibiting the execution of other threads for a short period of time.
The timer interrupt that drove the cooperative scheduler was only able to
(pseudo-)preempt lisp code: if any thread called a blocking OS I/O function, no other
thread could be scheduled until that thread resumed execution of lisp code. Lisp library
functions were generally attuned to this constraint, and did a complicated mixture of
polling and "timed blocking" in an attempt to work around it. Needless to say, this code is
complicated and less efficient than it might be; it meant that the lisp was a little busier than
it should have been when it was "doing nothing" (waiting for I/O to be possible.)
For a variety of reasons - better utilization of CPU resources on single and multiprocessor
systems and better integration with the OS in general - threads in Clozure CL 0.14 and later
are preemptively scheduled. In this model, lisp threads are native threads and all
scheduling decisions involving them are made by the OS kernel. (Those decisions might
involve scheduling multiple lisp threads simultaneously on multiple processors on SMP
systems.) This change has a number of subtle effects:
it is possible for two (or more) lisp threads to be executing simultaneously, possibly
trying to access and/or modify the same data structures. Such access really should
have been coordinated through the use of synchronization objects regardless of the
scheduling modeling effect; preemptively scheduled threads increase the chance of
things going wrong at the wrong time and do not offer lightweight alternatives to the
use of those synchronization objects.
there is no simple and efficient way to "inhibit the scheduler"or otherwise gain
exclusive access to the entire CPU.
There are a variety of simple and efficient ways to synchronize access to particular
data structures.
As a broad generalization: code that's been aggressively tuned to the constraints of the
cooperative scheduler may need to be redesigned to work well with the preemptive
scheduler (and code written to run under Clozure CL's interface to the native scheduler
may be less portable to other CL implementations, many of which offer a cooperative
scheduler and an API similar to Clozure CL (< 0.14) 's.) At the same time, there's a large
overlap in functionality in the two scheduling models, and it'll hopefully be possible to
write interesting and useful MP code that's largely independent of the underlying
scheduling details.
Much of the functionality described above is similar to that provided by Clozure CL's
cooperative scheduler, some other parts of which make no sense in a native threads
implementation.
There were a number of primitives for maintaining process queues;that's now the
OS's job.
When you use MAKE-PROCESS to create a thread, you can specify a stack size. Clozure CL
does not impose a limit on the stack size you choose, but there is some evidence that
choosing a stack size larger than the operating system's limit can cause excessive paging
activity, at least on some operating systems.
The maximum stack size is operating-system-dependent. You can use shell commands to
determine what it is on your platform. In bash, use "ulimit -s -H" to find the limit; in tcsh,
use "limit -h s".
This issue does not affect programs that create threads using the default stack size, which
you can do either by specifying no value for the :stack-size argument to MAKE-PROCESS,
or by specifying the value CCL::*default-control-stack-size*.
If your program creates threads with a specified stack size, and that size is larger than the
OS-specified limit, you may want to consider reducing the stack size in order to avoid
possible excessive paging activity.
As of August 2003:
It has traditionally been possible to reset and enable a process that's "exhausted" . (As
used here, the term "exhausted" means that the process's initial function has run and
returned and the underlying native thread has been deallocated.) One of the principal
uses of PROCESS-RESET is to "recycle" threads; enabling an exhausted process
involves creating a new native thread (and stacks and synchronization objects and
...),and this is the sort of overhead that such a recycling scheme is seeking to avoid. It
might be worth trying to tighten things up and declare that it's an error to apply
PROCESS-ENABLE to an exhausted thread (and to make PROCESS-ENABLE detect
this error.)
When native threads that aren't created by Clozure CL first call into lisp, a "foreign
process" is created, and that process is given its own set of initial bindings and set up
to look mostly like a process that had been created by MAKE-PROCESS. The life cycle
of a foreign process is certainly different from that of a lisp-created one: it doesn't
make sense to reset/preset/enable a foreign process, and attempts to perform these
operations should be detected and treated as errors.
Older versions of Clozure CL used what are often called "user-mode threads", a less
versatile threading model which does not require specific support from the operating
system. This section discusses how to port code which was written for that mode.
It's hard to give step-by-step instructions; there are certainly a few things that one should
look at carefully:
I've only seen one case where a process's "run reasons" were used to communicate
information as well as to control execution; I don't think that this is a common idiom,
but may be mistaken about that.
It's certainly possible that programs written for cooperatively scheduled lisps that
have run reliably for a long time have done so by accident: resource-contention issues
tend to be timing-sensitive, and decoupling thread scheduling from lisp program
execution affects timing. I know that there is or was code in both Clozure CL and
commercial MCL that was written under the explicit assumption that certain
sequences of open-coded operations were uninterruptable; it's certainly possible that
the same assumptions have been made (explicitly or otherwise) by application
developers.
Overview
Unless and until Clozure CL provides alternatives (via window streams, telnet streams, or
some other mechanism) all lisp processes share a common *TERMINAL-IO* stream (and
therefore share *DEBUG-IO*, *QUERY-IO*, and other standard and internal interactive
streams.)
It's anticipated that most lisp processes other than the "Initial" process run mostly in the
background. If a background process writes to the output side of *TERMINAL-IO*, that
may be a little messy and a little confusing to the user, but it shouldn't really be
catastrophic. All I/O to Clozure CL's buffered streams goes thru a locking mechanism that
prevents the worst kinds of resource-contention problems.
Although the problems associated with terminal output from multiple processes may be
mostly cosmetic, the question of which process receives input from the terminal is likely to
be a great deal more important. The stream locking mechanisms can make a confusing
situation even worse: competing processes may "steal" terminal input from each other
unless locks are held longer than they otherwise need to be, and locks can be held longer
than they need to be (as when a process is merely waiting for input to become available on
an underlying file descriptor).
Even if background processes rarely need to intentionally read input from the terminal,
they may still need to do so in response to errors or other unanticipated situations. There
are tradeoffs involved in any solution to this problem. The protocol described below allows
background processes which follow it to reliably prompt for and receive terminal input.
Background processes which attempt to receive terminal input without following this
protocol will likely hang indefinitely while attempting to do so. That's certainly a harsh
tradeoff, but since attempts to read terminal input without following this protocol only
worked some of the time anyway, it doesn't seem to be an unreasonable one.
In the solution described here (and introduced in Clozure CL 0.9), the internal stream used
to provide terminal input is always locked by some process (the "owning" process.) The
initial process (the process that typically runs the read-eval-print loop) owns that stream
when it's first created. By using the macro WITH-TERMINAL-INPUT, background
processes can temporarily obtain ownership of the terminal and relinquish ownership to
the previous owner when they're done with it.
An example
?
;;
;; Process sleeper(1) needs access to terminal input.
;;
This example was run under ILISP; ILISP often gets confused if one tries to enter input and
"point" doesn't follow a prompt. Entering a "simple" expression at this point gets it back in
synch; that's otherwise not relevant to this example.
()
NIL
? (:y 1)
;;
;; process sleeper(1) now controls terminal input
;;
> Break in process sleeper(1): broken
> While executing: #<Anonymous Function #x3063B276>
> Type :GO to continue, :POP to abort.
> If continued: Return from BREAK.
Type :? for other options.
1 > :b
(30C38E30) : 0 "Anonymous Function #x3063B276" 52
(30C38E40) : 1 "Anonymous Function #x304984A6" 376
(30C38E90) : 2 "RUN-PROCESS-INITIAL-FORM" 340
(30C38EE0) : 3 "%RUN-STACK-GROUP-FUNCTION" 768
1 > :pop
;;
;; control of terminal input restored to process Initial(0)
;;
?
If a background process ("A") needs access to the terminal input stream and that stream is
owned by another background process ("B"), process "A" announces that fact, then waits
until the initial process regains control.
? ;;
;; Process sleep-5(2) needs access to terminal input.
;;
()
NIL
? (:y 2)
;;
;; process sleep-5(2) now controls terminal input
;;
> Break in process sleep-5(2): quicker
> While executing: #x3063CFDE>
> Type :GO to continue, :POP to abort.
> If continued: Return from BREAK.
Type :? for other options.
1 > ;; Process sleep-60(1) will need terminal access when
;; the initial process regains control of it.
;;
()
NIL
1 > :pop
;;
;; Process sleep-60(1) needs access to terminal input.
;;
;;
;; control of terminal input restored to process Initial(0)
;;
? (:y 1)
;;
;; process sleep-60(1) now controls terminal input
;;
> Break in process sleep-60(1): Huh?
> While executing: #x3063BE5E>
> Type :GO to continue, :POP to abort.
> If continued: Return from BREAK.
Type :? for other options.
1 > :pop
;;
;; control of terminal input restored to process Initial(0)
;;
Summary
The longer-term fix would probably involve using network or window-system streams to
give each process unique instances of *TERMINAL-IO*.
Existing code that attempts to read from *TERMINAL-IO* from a background process will
need to be changed to use WITH-TERMINAL-INPUT. Since that code was probably not
working reliably in previous versions of Clozure CL, this requirement doesn't seem to be
too onerous.
Note that WITH-TERMINAL-INPUT both requests ownership of the terminal input stream
and promises to restore that ownership to the initial process when it's done with it. An ad
hoc use of READ or READ-CHAR doesn't make this promise; this is the rationale for the
restriction on the :Y command.
? :proc
1 : -> listener [Active]
0 : Initial [Active]
If you look at a running Clozure CL with a debugging tool, such as GDB, or Apple's Thread
Viewer.app, you'll see an additional kernel-level thread on Darwin; this is used by the Mach
exception-handling mechanism.
The initial thread, conveniently named "initial", is the one that was created by the
operating system when it launched Clozure CL. It maps the heap image into memory, does
some Lisp-level initialization, and, when the Cocoa IDE isn't being used, creates the thread
"listener", which runs the top-level loop that reads input, evaluates it, and prints the result.
After the listener thread is created, the initial thread does "housekeeping": it sits in a loop,
sleeping most of the time and waking up occasionally to do "periodic tasks". These tasks
include forcing output on specified interactive streams, checking for and handling
control-C interrupts, etc. Currently, those tasks also include polling for the exit status of
external processes and handling some kinds of I/O to and from those processes.
In this environment, the initial thread does these "housekeeping" activities as necessary,
until ccl:quit is called; quitting interrupts the initial thread, which then ends all other
threads in as orderly a fashion as possible and calls the C function #_exit.
The Cocoa features use more threads. Adding a Cocoa listener creates two threads:
? :proc
3 : -> Listener [Active]
2 : housekeeping [Active]
1 : listener [Active]
0 : Initial [Active]
The Cocoa event loop has to run in the initial thread; when the event loop starts up, it
creates a new thread to do the "housekeeping" tasks which the initial thread would do in
the terminal-only mode. The initial thread then becomes the one to receive all Cocoa events
from the window server; it's the only thread which can.
It also creates one "Listener" (capital-L) thread for each listener window, with a lifetime
that lasts as long as the thread does. So, if you open a second listener, you'll see five threads
all together:
? :proc
4 : -> Listener-2 [Active]
3 : Listener [Active]
2 : housekeeping [Active]
1 : listener [Active]
0 : Initial [Active]
Unix signals, such as SIGINT (control-C), invoke a handler installed by the Lisp kernel.
Although the OS doesn't make any specific guarantee about which thread will receive the
signal, in practice, it seems to be the initial thread. The handler just sets a flag and returns;
the housekeeping thread (which may be the initial thread, if Cocoa's not being used) will
check for the flag and take whatever action is appropriate to the signal.
In the case of SIGINT, the action is to enter a break loop, by calling on the thread being
interrupted. When there's more than one Lisp listener active, it's not always clear what
thread that should be, since it really depends on the user's intentions, which there's no way
to divine programmatically. To make its best guess, the handler first checks whether the
value of ccl:*interactive-abort-process* is a thread, and, if so, uses it. If that
fails, it chooses the thread which currently "owns" the default terminal input stream; see .
This thread-per-window scheme makes many things simpler, including the process of
entering a "recursive command loop" in commands like "Incremental Search Forward", etc.
(It might be possible to handle all Hemlock commands in the Cocoa event thread, but these
"recursive command loops" would have to maintain a lot of context/state information;
threads are a straightforward way of maintaining that information.)
Currently (August 2004), when a dedicated thread needs to alter the contents of the buffer
or the selection, it does so by invoking methods in the initial thread, for synchronization
purposes, but this is probably overkill and will likely be replaced by a more efficient scheme
in the future.
The per-window thread could probably take more responsibility for drawing and handling
the screen than it currently does; -something- needs to be done to buffer screen updates a
bit better in some cases: you don't need to see everything that happens during something
like indentation; you do need to see the results...
When Hemlock is being used, listener windows are editor windows, so in addition to each
"Listener" thread, you should also see a thread which handles Hemlock command
processing.
The Cocoa runtime may make additional threads in certain special situations; these threads
usually don't run lisp code, and rarely if ever run much of it.
Threads Dictionary
all-processes [Function]
Returns a fresh list of all lisp processes (threads) known to Clozure CL as of the precise
instant it's called. Since other threads can create and kill threads at any time, there's no
way to get a perfectly accurate list of all threads.
name
persistent
priority
class
initargs
stack-size
the size, in bytes, of the newly-created process's control stack; used for foreign
function calls and to save function return address context. The default is
CCL:*DEFAULT-CONTROL-STACK-SIZE*.
vstack-size
the size, in bytes, of the newly-created process's value stack; used for lisp function
arguments, local variables, and other stack-allocated lisp objects. The default is
CCL:*DEFAULT-VALUE-STACK-SIZE*.
tstack-size
the size, in bytes, of the newly-created process's temp stack; used for the allocation of
dynamic-extent objects. The default is CCL:*DEFAULT-TEMP-STACK-SIZE*.
use-standard-initial-bindings
when true, the global "standard initial bindings" are put into effect in the new thread
before. See DEF-STANDARD-INITIAL-BINDING. "standard" initial bindings are put
into effect before any bindings specified by :initial-bindings are. The default is t.
initial-bindings
an alist of (symbol . valueform) pairs, which can be used to initialize special variable
bindings in the new thread. Each valueform is used to compute the value of a new
binding of symbol in the execution environment of the newly-created thread. The
default is nil.
process
Creates and returns a new lisp process (thread) with the specified attributes. process will
not begin execution immediately; it will need to be preset (given an initial function to run,
as by process-preset) and enabled (allowed to execute, as by process-enable)
before it's able to actually do anything.
process
result
T if process had been runnable and is now suspended; NIL otherwise. That is, T if
process's process-suspend-count transitioned from 0 to 1.
Suspends process, preventing it from running, and stopping it if it was already running.
This is a fairly expensive operation, because it involves a few calls to the OS. It also risks
creating deadlock if used improperly, for instance, if the process being suspended owns a
lock or other resource which another process will wait for.
A process can't suspend itself, though this once worked and this documentation claimed
has claimed that it did.
process-resume, process-suspend-count
process
result
T if process had been suspended and is now runnable; NIL otherwise. That is, T if
process's process-suspend-count transitioned from to 0.
Undoes the effect of a previous call to process-suspend; if all such calls are undone,
makes the process runnable. Has no effect if the process is not suspended. What process-
resume actually does is decrement the process-suspend-count of process, to a
minimum of 0.
process-suspend, process-suspend-count
process
result
An "outstanding" process-suspend call is one which has not yet been reversed by a call
to process-resume. A process expires when its initial function returns, although it may
later be reset.
process-suspend, process-resume
process
function
args
result
undefined.
process
timeout
a time interval in seconds. May be any non-negative real number the floor of which
fits in 32 bits. The default is 1.
result
undefined.
Tries to begin the execution of process. An error is signaled if process has never been
process-preset. Otherwise, process invokes its initial function.
It would be nice to have more discussion of what it means to synchronize with the process.
process-specifier
name
function
persistent
priority
ignored.
class
initargs
stack-size
vstack-size
tstack-size
process
Creates a lisp process (thread) via make-process, presets it via process-preset, and
enables it via process-enable. This means that process will immediately begin to
execute. process-run-function is the simplest way to create and run a process.
Arranges for the target process to invoke a specified function at some point in the near
future, and then return to what it was doing.
process
function
a function.
args
result
Arranges for process to apply function to args at some point in the near future
(interrupting whatever process was doing.) If function returns normally, process resumes
execution at the point at which it was interrupted.
If the interrupted thread is blocking in a system call, that system call is aborted by the
signal and the interrupt is handled on return.
It is still difficult to reliably interrupt arbitrary foreign code (that may be stateful or
otherwise non-reentrant); the interrupt request is handled when such foreign code returns
to or enters lisp.
without-interrupts
It would probably be better for result to always be NIL, since the present behavior is
inconsistent.
*current-process* [Variable]
Bound separately in each process, to that process itself. It may be used when lisp code
needs to find out what process it is executing in. It should not be set by user code.
This function causes process to cleanly exit from any ongoing computation and enter a
state wehre it can be process-preset.
The kill-option argument is for internal use only and should not be specified by user code.
There is in general no way to know precisely when process has completed the act of
resetting or killing itself; a process which has either entered the limbo of the reset state or
exited has few ways of communicating either fact.
The function process-enable can reliably determine when a process has entered the
limbo of the reset state, but can't predict how long the clean exit from ongoing computation
might take: that depends on the behavior of unwind-protect cleanup forms, and of the
OS scheduler.
Resetting a process other than *current-process* involves the use of the function
process-interrupt.
Reset and enable the specified process, which may not be the current process.
process
result
undefined.
process-reset, process-enable
Causes process to cleanly exit from any ongoing computation, and then exit. Note that
unwind-protect cleanup forms will be run with interrupts disabled.
process
condition
If condition is non-NIL, process-abort does not consider any handlers which are
explicitly bound to conditions other than condition.
process-reset, process-kill
*ticks-per-second* [Variable]
A positive integer.
The clock resolution of the OS scheduler. Currently, both LinuxPPC and DarwinPPC yield
an initial value of 100.
This value is ordinarily of marginal interest at best, but, for backward compatibility, some
functions accept timeout values expressed in "ticks". This value gives the number of ticks
per second.
process-wait-with-timeout
process
whostate
This information is primarily for the benefit of debugging tools. whostate is a terse report
on what process is doing, or not doing, and why.
This should arguably be SETFable, but doesn't seem to ever have been.
process-allow-schedule [Function]
Advises the OS scheduler that the current thread has nothing useful to do and that it
should try to find some other thread to schedule in its place. There's almost always a better
alternative, such as waiting for some specific event to occur. For example, you could use a
lock or semaphore.
This is a holdover from the days of cooperative multitasking. All modern general-purpose
operating systems use preemptive multitasking.
Causes the current lisp process (thread) to wait for a given predicate to return true.
whostate
a string, which will be the value of process-whostate while the process is waiting.
function
args
result
NIL.
Causes the current lisp process (thread) to repeatedly apply function to args until the call
returns a true result, then returns NIL. After each failed call, yields the CPU as if by
process-allow-schedule.
As with process-allow-schedule, it's almost always more efficient to wait for some
specific event to occur; this isn't exactly busy-waiting, but the OS scheduler can do a better
job of scheduling if it's given the relevant information. For example, you could use a lock or
semaphore.
Causes the current thread to wait for a given predicate to return true, or for a timeout to
expire.
whostate
a string, which will be the value of process-whostate while the process is waiting.
ticks
function
args
result
If ticks is NIL, behaves exactly like process-wait, except for returning T. Otherwise,
function will be tested repeatedly, in the same kind of test/yield loop as in process-wait
until either function returns true, or the duration ticks has been exceeded.
body
an implicit progn.
result
process-interrupt
body
an implicit progn.
result
Creates and returns a lock object, which can be used for synchronization between threads.
name
any lisp object; saved as part of lock. Typically a string or symbol which may appear
in the process-whostates of threads which are waiting for lock.
lock
Creates and returns a lock object, which can be used to synchronize access to some shared
resource. lock is initially in a "free" state; a lock can also be "owned" by a thread.
Waits until a given lock can be obtained, then evaluates its body with the lock held.
lock
body
an implicit progn.
result
Waits until lock is either free or owned by the calling thread, then executes body with the
lock owned by the calling thread. If lock was free when with-lock-grabbed was called,
it is restored to a free state after body is executed.
lock
lock
Obtains the given lock, but only if it is not necessary to wait for it.
lock
result
Tests whether lock can be obtained without blocking - that is, either lock is already free, or
it is already owned by *current-process*. If it can, causes it to be owned by the calling
lisp process (thread) and returns T. Otherwise, the lock is already owned by another thread
and cannot be obtained without blocking; NIL is returned in this case.
make-read-write-lock-write-lock [Function]
Creates and returns a read-write lock, which can be used for synchronization between
threads.
read-write-lock
There probably should be some way to atomically "promote" a reader, making it a writer
without releasing the lock, which could otherwise cause delay.
Waits until a given lock is available for read-only access, then evaluates its body with the
lock held.
read-write-lock
body
an implicit progn.
result
Waits until the given lock is available for write access, then executes its body with the lock
held.
read-write-lock
body
an implicit progn.
result
Waits until read-write-lock has no readers and no writer other than *current-
process*, then ensures that *current-process* is the writer of it. With the lock held,
executes body.
make-semaphore [Function]
Creates and returns a semaphore, which can be used for synchronization between threads.
semaphore
semaphore
result
Waits until the given semaphore has a positive count which can be atomically
decremented.
semaphore
result
Waits until semaphore has a positive count that can be atomically decremented; this will
succeed exactly once for each corresponding call to SIGNAL-SEMAPHORE.
Waits until the given semaphore has a positive count which can be atomically
decremented, or until a timeout expires.
semaphore
timeout
a time interval in seconds. May be any non-negative real number the floor of which
fits in 32 bits. The default is 1.
result
Waits until semaphore has a positive count that can be atomically decremented, or until
the duration timeout has elapsed.
fd
timeout
Wait until input is available on fd. This uses the select() system call, and is generally a
fairly efficient way of blocking while waiting for input. More accurately, process-
input-wait waits until it's possible to read from fd without blocking, or until timeout, if
it is not NIL, has been exceeded.
Note that it's possible to read without blocking if the file is at its end - although, of course,
the read will return zero bytes.
fd
timeout
Wait until output is possible on fd or until timeout, if it is not NIL, has been exceeded. This
uses the select() system call, and is generally a fairly efficient way of blocking while
waiting to output.
Executes its body in an environment with exclusive read access to the terminal.
body
an implicit progn.
result
*request-terminal-input-via-break* [Variable]
A boolean.
NIL.
Controls how attempts to obtain ownership of terminal input are made. When NIL, a
message is printed on *TERMINAL-IO*; it's expected that the user will later yield control
of the terminal via the :Y toplevel command. When T, a BREAK condition is signaled in the
owning process; continuing from the break loop will yield the terminal to the requesting
process (unless the :Y command was already used to do so in the break loop.)
( :y p) [Toplevel Command]
a lisp process (thread), designated either by an integer which matches its process-
serial-number, or by a string which is equal to its process-name.
:Y is a toplevel command, not a function. As such, it can only be used interactively, and
only from the initial process.
The command yields control of terminal input to the process p, which must have used
with-terminal-input to request access to the terminal input stream.
Waits for a specified process to complete and returns the values that that process's initial
function returned.
process
default
values
The values returned by the specified process's initial function if that function returns,
or the value of the default argument, otherwise.
Waits for the specified process to terminate. If the process terminates "normally" (if its
initial function returns), returns the values that that initial function returnes. If the process
does not terminate normally (e.g., if it's terminated via process-kill and a default
argument is provided, returns the value of that default argument. If the process doesn't
terminate normally and no default argument is provided, signals an error.
A process can't successfully join itself, and only one process can successfully receive
notification of another process's termination.
Brief Examples
Specifying And Using Foreign Types
Type Annotations
Foreign Types as Classes
Syntax of Foreign Type Specifiers
Foreign Function Calls
Overview
Return Conventions for C Structures
Referencing and Using Foreign Memory Addresses
Overview
Foreign-Memory-Addresses Dictionary
The Interface Database
Overview
Other issues:
Using Interface Directories
Overview
Creating new interface directories
Using Shared Libraries
Overview
Limitations and known bugs
>Darwin Notes
The Interface Translator
Overview
Details: rebuilding the CDB databases, step by step
Case-sensitivity of foreign names in Clozure CL
Overview
Foreign constant and function names
Foreign type, record, and field names
Examples
Reading Foreign Names
Tutorial: Using Basic Calls and Types
Acknowledgement
Tutorial: Allocating Foreign Data on the Lisp Heap
Acknowledgement
The Foreign-Function-Interface Dictionary
Clozure CL can call external code that uses the platform's C ABI. C++ functions cannot be
called directly. All of the mechanisms used to do this are ultimately based on the function
%ff-call, but the #_ reader macro and the external-call macro are typically more
convenient and easier to use.
There is also a mechanism to let external code call into lisp. This is useful when you want to
pass a lisp function as a callback function. (See defcallback.)
Brief Examples
A few examples may be useful before discussing the FFI facilities in detail.
Most functions that are in the system C library are included in the interface database that
comes with Clozure CL. The reader macro #_ consults that database for information about
the foriegn code, and makes foreign functions easy to use.
? (#_getpid)
2845
? (#_log2 2048d0)
11.0D0
? (with-encoded-cstrs :ascii ((s "Yow!"))
(#_write 1 s 4))
These same calls can be made with the slightly lower-level external-call.
Note that with external-call, it is necessary to specify the foreign types of the
arguments and return values. The #_ reader macro uses the interface database to
determine that the function getpid (for example) returns a pid_t value, and it is
therefore not necessary to specify it explicitly.
Finally, just to frighten pets and small children, here is a fairly complicated example
(adapted from a man page for getpwuid_r):
Clozure CL provides a fairly rich language for defining and specifying foreign data types
(this language is derived from CMUCL's “alien type” system.)
In practice, most foreign type definitions are introduced into Clozure CL via its interface
database, though it is also possible to define foreign types interactively and/or
programmatically.
Clozure CL's foreign type system is “evolving” (a polite word for not-quite-complete): there
are some inconsistencies involving package usage, for instance. Symbols used in foreign
type specifiers should be keywords, but this convention isn't always enforced.
Foreign type, record, and field names are case-sensitive; Clozure CL uses some escaping
conventions to allow keywords to be used to denote these names.
Type Annotations
As of version 1.2, Clozure CL supports annotating the types of foreign pointers on Mac OS
X. Forms that create pointers to foreign memory-that is, MACPTRs-store with the MACPTR
object a type annotation that identifies the foreign type of the object pointed to. Calling
PRINT-OBJECT on a MACPTR attempts to print information about the identified foreign
type, including whether it was allocated on the heap or the stack, and whether it's
scheduled for automatic reclamation by the garbage collector.
Support for type annotation is not yet complete. In particular, some uses of PREF and
SLOT-VALUE do ot yet take type annotations into account, and neither do DESCRIBE and
INSPECT.
Some types of foreign pointers take advantage of the support for type annotations, and
pointers of these types can be treated as instances of known classes. Specifically, a pointer
to an :<NSR>ect is recognized as an instance of the built-in class NS:NS-RECT, a pointer
to an <NSS>ize is treated as an instance of NS:NS-SIZE, a pointer to an <NSP>oint is
recognized as an instance of NS:NS-POINT, and a pointer to an <NSR>ange is recognized
as an instance of NS:NS-RANGE.
A few more obscure structure types also support this mechanism, and it's possible that a
future version will support user definition of similar type mappings.
This support for foreign types as classes provides the following conveniences for each
supported type:
a foreign type name is created and treated as an alias for the corresponding type. As
an example, the name :NS-RECT is a name for the type that corresponds to
NS:NS-RECT, and you can use :NS-RECT as a type designator in rlet forms to
specify a structure of type NS-RECT.
the class is integrated into the type system so that (TYPEP R 'NS:NS-RECT) is
implemented with fair efficiency.
inlined accessor and SETF inverses are defined for the structure type's fields. In the
case of an <NSR*gt;ect, for example, the fields in question are the fields of the
embedded point and size, so that NS:NS-RECT-X, NS:NS-RECT-Y, NS:NS-
RECT-WIDTH, NS-RECT-HEIGHT and SETF inverses are defined. The accessors and
setter functions typecheck their arguments and the setters handle coercion to the
appropriate type of CGFLOAT where applicable.
(NS:INIT-NS-SIZE s w h)
is roughly equivalent to
(SETF (NS:NS-SIZE-WIDTH s) w
(NS:NS-SIZE-HEIGHT s) h)
(ns:ns-make-point x y)
is functionally equivalent to
a macro is defined which, like RLET, stack-allocates an instance of the foreign record
type, optionally initializes that instance, and executes a body of code with a variable
bound to that instance.
For example,
Some foreign types are builtin: keywords denote primitive,builtin types such as the
IEEE-double-float type (denoted:DOUBLE-FLOAT), in much the same way as certain
symbols(CONS, FIXNUM,etc.) define primitive CL types.
Constructors such as :SIGNED and :UNSIGNED can be used to denote signed and
unsigned integer subtypes (analogous to the CL type specifiers SIGNED-BYTE and
UNSIGNED-BYTE.) :SIGNED is shorthand for(:SIGNED 32) and :UNSIGNED is
shorthand for (:UNSIGNED 32).
Aliases for other (perhaps more complicated) types can be defined via ccl:def-
foreign-type (sort of like cl:deftype or the C typedef facility). The type :char is defined
as an alias for (:SIGNED8) on some platforms, as (:UNSIGNED 8) on others.
The construct (:struct name) can be used to refer to a named structure type; (:UNION
name) can be used to refer to a named union type. It isn't necessary to enumerate a
structure or union type's fields in order to refer to the type.
If X is a valid foreign type reference,then (:* X) denotes the foreign type “pointer to
X”. By convention, (:* T) denotes an anonymous pointer type, vaguely equivalent to
void * in C.
If a fieldlist is a list of lists, each of whose CAR is a foreign field name (keyword) and
whose CADR is a foreign type specifier, then (:STRUCT name ,@fieldlist) is a
definition of the structure type name, and (:UNION name ,@fieldlist) is a definition
of the union type name. Note that it's necessary to define a structure or union type in
order to include that type in a structure, union, or array, but only necessary to refer to
a structure or union type in order to define a type alias or a pointer type.
If X is a defined foreign type, then (:array X &rest dims) denotes the foreign type
“array of X”. Although multiple array dimensions are allowed by the :array
constructor, only single-dimensioned arrays are (at all) well-supported in Clozure CL.
Overview
Clozure CL provides a number of constructs for calling foreign functions from Lisp code, all
of which are based on the function %ff-call. In many cases, Clozure CL's interface
translator provides information about the foreign function's entrypoint name and
argument and return types; this enables the use of the #_ reader macro (described below),
which is usually nicer to use than the than other constructs.
Clozure CL also provides a mechanism for defining callbacks: lisp functions which can be
called from foreign code.
There's no supported way to directly pass lisp data to foreign functions: scalar lisp data
must be coerced to an equivalent foreign representation, and lisp arrays (notably strings)
must be copied to non-GCed memory.
The types of foreign argument and return values in foreign function calls and callbacks can
be specified by any of the following keywords:
:UNSIGNED-BYTE
:SIGNED-BYTE
:UNSIGNED-HALFWORD
:SIGNED-HALFWORD
:UNSIGNED-FULLWORD
:SIGNED-FULLWORD
:UNSIGNED-DOUBLEWORD
:SIGNED-DOUBLEWORD
:SINGLE-FLOAT
:DOUBLE-FLOAT
:ADDRESS
:VOID
or NIL Not valid as an argument type specifier; specifies that there is no meaningful
return value
On some platforms, a small positive integer N can also be used as an argument specifier; it
indicates that the corresponding argument is a pointer to an N-word structure or union
which should be passed by value to the foreign function. Exactly which foreign structures
are passed by value and how is very dependent on the Application Binary Interface (ABI) of
the platform; unless you're very familiar with ABI details (some of which are quite
baroque), it's often easier to let higher-level constructs deal with these details.
PowerPC machine instructions are always aligned on 32-bit boundaries, so the two least
significant bits of the first instruction (the entrypoint) of a foreign function are always 0.
Clozure CL often represents an entrypoint address as a fixnum that's binary-equivalent to
the entrypoint address: ifE is an entrypoint address expressed as a signed 32-bit integer,
then (ash E -2) is an equivalent fixnum representation of that address. An entrypoint
address can also be encapsulated in a MACPTR, but that's somewhat less efficient.
Although it's possible to use fixnums or macptrs to represent entrypoint addresses, it's
somewhat cumbersome to do so. Clozure CL can cache the addresses of named external
functions in structure-like objects of type CCL:EXTERNAL-ENTRY-POINT (sometimes
abbreviated as EEP). Through the use of LOAD-TIME-VALUE, compiled lisp functions are
able to reference EEPs as constants; the use of an indirection allows Clozure CL runtime
system to ensure that the EEP's address is current and correct.
Exactly how a C function that's defined to return a foreign structure does so is dependent
on the ABI (and on the size and composition of the structure/union in many cases.)
Overview
Basics
For a variety of technical reasons, it isn't generally possible to directly reference arbitrary
absolute addresses (such as those returned by the C library function malloc(), for instance)
in Clozure CL. In Clozure CL (and in MCL), such addresses need to be encapsulated in
objects of type CCL:MACPTR; one can think of a MACPTR as being a specialized type of
structure whose sole purpose is to provide a way of referring to an underlying raw address.
It's sometimes convenient to blur the distinction between a MACPTR and the address it
represents; it's sometimes necessary to maintain that distinction. It's important to
remember that a MACPTR is (generally) a first-class Lisp object in the same sense that a
CONS cell is: it'll get GCed when it's no longer possible to reference it. The lifetime of a
MACPTR doesn't generally have anything to do with the lifetime of the block of memory its
address points to.
It might be tempting to ask “How does one obtain the address encapsulated by a macptr?”.
The answer to that question is that one doesn't do that (and there's no way to do that):
addresses aren't first-class objects, and there's no way to refer to one.
Two MACPTRs that encapsulate the same address are EQL to each other.
There are a small number of ways to directly create a MACPTR (and there's a fair amount
of syntactic sugar built on top of of those primitives.) These primitives will be discussed in
greater detail below, but they include:
Creating a MACPTR with a specified address, usually via the function CCL:%INT-
TO-PTR.
Referencing the return value of a foreign function call (see )that's specified to return
an address.
One consequence of the use of MACPTR objects to encapsulate foreign addresses is that
(naively) every reference to a foreign address causes a MACPTR to be allocated.
(defun get-next-event ()
"get the next event from a hypothetical window system"
(loop
(let* ((event (#_get_next_window_system_event))) ; via an FF-CALL
(unless (null-event-p event)
(handle-event event)))))
(defun get-next-event ()
(let* ((event (%int-to-ptr 0))) ; create a MACPTR with address 0
(loop
(%setf-macptr event (#_get_next_window_system_event)) ; re-use it
(unless (null-event-p event)
(handle-event event)))))
That version's a bit more realistic: it allocates a single MACPTR outside if the loop, then
changes its address to point to the current address of the hypothetical event structure on
each loop iteration. If there are a million loop iterations per call to GET-NEXT-EVENT,
we're allocating a million times fewer MACPTRs per call; that sounds like a Good Thing.
An Even Better Thing would be to advise the compiler that the initial value (the null
macptr) bound to the variable event has dynamic extent (that value won't be referenced
once control leaves the extent of the binding of that variable.) Common Lisp allows us to
make such an assertion via a dynamic-extent declaration; Clozure CL's compiler can
recognize the primitive macptr-creating operation involved and can replace it with an
equivalent operation that stack-allocates the macptr object. If we're not worried about the
cost of allocating that macptr on every iteration (the cost is small and there's no hidden GC
cost), we could move the binding back inside the loop:
(defun get-next-event ()
(loop
(let* ((event (%null-ptr))) ; (%NULL-PTR) is shorthand for (%INT-TO-
(declare (dynamic-extent event))
(%setf-macptr event (#_get_next_window_system_event))
(defun get-next-event ()
(loop
(with-macptrs ((event (#_get_next_window_system_event)))
(unless (null-event-p event)
(handle-event event)))))
Fairly often, the blocks of foreign memory (obtained by malloc or something similar) have
well-defined lifetimes (they can safely be freed at some point when it's known that they're
no longer needed and it's known that they're no longer referenced.) A common idiom
might be:
That's not unreasonable code, but it's fairly expensive for a number of reasons: foreign
functions calls are themselves fairly expensive (as is UNWIND-PROTECT), and most
library routines for allocating and deallocating foreign memory (things like malloc and
free) can be fairly expensive in their own right.
In the idiomatic code above, both the MACPTR P and the block of memory that's being
allocated and freed have dynamic extent and are therefore good candidates for stack
allocation. Clozure CL provides the %STACK-BLOCK macro, which executes a body of code
with one or more variables bound to stack-allocated MACPTRs which encapsulate the
addresses of stack-allocated blocks of foreign memory. Using %STACK-BLOCK, the
idiomatic code is:
which is a bit more efficient and a bit more concise than the version presented earlier.
%STACK-BLOCK is used as the basis for slightly higher-level things like RLET. (See
FIXTHIS for more information about RLET.)
Caveats
Reading from, writing to, allocating, and freeing foreign memory are all potentially
dangerous operations; this is no less true when these operations are performed in Clozure
CL than when they're done in C or some other lower-level language. In addition,
destructive operations on Lisp objects be dangerous, as can stack allocation if it's abused (if
DYNAMIC-EXTENT declarations are violated.) Correct use of the constructs and
primitives described here is reliable and safe; even slightly incorrect use of these constructs
and primitives can crash Clozure CL.
Foreign-Memory-Addresses Dictionary
Unless otherwise noted, all of the symbols mentioned below are exported from the CCL
package.
References and returns the unsigned 8-bit byte at the effective byte address formed by
adding offset to the address encapsulated by ptr.
Like %get-unsigned-byte above, but returns the signed 8-bit byte at the computed
address.
References and returns the unsigned 16-bit word at the effective byte address formed by
adding offset to the address encapsulated by ptr.
Like %get-unsigned-word above, but returns the signed 16-bit word at the computed
address.
References and returns the unsigned 32-bit long word at the effective byte address formed
by adding offset to the address encapsulated by ptr.
Like %get-unsigned-long above, but returns the signed 32-bit long word at the
computed address.
References and returns the unsigned 64-bit long long word at the effective byte address
formed by adding offset to the address encapsulated by ptr.
Like %%get-unsigned-longlong above, but returns the signed 64-bit long long word at
the computed address.
Returns a macptr encapsulating the address found at the effective byte address formed by
adding offset to the address represented by ptr.
Returns the single-float found at the effective byte address formed by adding offset to
the address represented by ptr.
Returns the double-float found at the effective byte address formed by adding offset to
the address represented by ptr.
All of the memory reference primitives described above can be used with setf.
References and returns the bit-offsetth bit at the address encapsulated by ptr. (Bit 0 at a
given address is the most significant bit of the byte at that address.) Can be used with
SETF.
References and returns an unsigned integer composed from the width bits found bit-offset
bits from the address encapsulated by ptr. The least significant bit of the result is the value
of (%get-bit ptr (1- (+ bit-offset width))). Can be used with setf.
Creates and returns a macptr whose address is the address of ptr plus delta. The idiom
(%inc-ptr ptr 0) is sometimes used to copy a macptr, that is, to create a new macptr
encapsulating the same address as ptr.
%null-ptr [Macro]
+null-ptr+ [Variable]
This function returns true if ptr is a macptr that encapsulates the address 0. It returns nil
if ptr encapsulates some other address.
Causes dest-ptr to encapsulate the same address that src-ptr does, then returns the
updated dest-ptr.
Destructively modifies ptr by adding delta to the address it encapsulates. Returns ptr.
Allocates a block of foreign memory (via malloc) of length (1+ (length string)). It
then copies string to this block and appends a trailing nul byte. Returns a macptr to the
block.
The encoding-name should be a keyword that names a character encoding. Each foreign
string is encoded according to the named encoding. Each foreign string has dynamic
extent.
Note that with-encoded-cstrs does not automatically prepend byte-order marks to the
encoded strings; additionally, the size of the terminating #\NUL character depends on the
number of octets per code unit in the encoding.
The expression
is functionally equivalent to
Returns a lisp string of length length, whose contents are initialized from the bytes at ptr.
Creates and returns a lisp string from len octets pointed to by ptr decoded according to
encoding-name.
Overview
Clozure CL uses a set of database files which contain foreign type, record, constant, and
function definitions derived from the operating system's header files, be that Linux or
Darwin. An archive containing these database files (and the shell scripts which were used
in their creation) is available; see the Distributions page for information about obtaining
current interface database files.
In both cases, the symbol foo is interned in the OS package. The #$ reader macro has the
side-effect of defining foo as a constant (as if via DEFCONSTANT); the #_ reader macro
has the side effect of defining foo as a macro which will expand into an (EXTERNAL-CALL
form.)
It's important to remember that the side-effect happens when the form containing the
reader macro is read. Macroexpansion functions that expand into forms which contain
instances of those reader macros don't do what one might think that they do, unless the
macros are expanded in the same lisp session as the reader macro was read in.
In addition, references to foreign type, structure/union, and field names (when used in the
RREF/PREF and RLET macros) will cause these database files to be consulted.
Since the Clozure CL sources contain instances of these reader macros (and references to
foreign record types and fields), compiling Clozure CL from those sources depends on the
ability to find and use (see Building the Heap Image).
Other issues:
Clozure CL now preserves the case of external symbols in its database files. See
Case-sensitivity of foreign names in Clozure CL for information about case in foreign
symbol names.
The Linux databases are derived from a somewhat arbitrary set of Linux header files.
Linux is enough of a moving target that it may be difficult to define a standard,
reference set of interfaces from which to derive a standard, reference set of database
files.This seems to be less of an issue with Darwin and FreeBSD.
For information about building the database files, see The Interface Translator.
Overview
headers/
headers/gl/
headers/gl/C/
headers/gl/C/populate.sh
headers/gl/constants.cdb
headers/gl/functions.cdb
headers/gl/records.cdb
headers/gl/objc-classes.cdb
headers/gl/objc-methods.cdb
headers/gl/types.cdb
headers/gnome/
headers/gnome/C/
headers/gnome/C/populate.sh
headers/gnome/constants.cdb
headers/gnome/functions.cdb
headers/gnome/records.cdb
headers/gnome/objc-classes.cdb
headers/gnome/objc-methods.cdb
headers/gnome/types.cdb
headers/gtk/
headers/gtk/C/
headers/gtk/C/populate.sh
headers/gtk/constants.cdb
headers/gtk/functions.cdb
headers/gtk/records.cdb
headers/gtk/objc-classes.cdb
headers/gtk/objc-methods.cdb
headers/gtk/types.cdb
headers/libc/
headers/libc/C/
headers/libc/C/populate.sh
headers/libc/constants.cdb
headers/libc/functions.cdb
headers/libc/records.cdb
headers/libc/objc-classes.cdb
headers/libc/objc-methods.cdb
headers/libc/types.cdb
e.g, as a set of parallel subdirectories, each with a lowercase name and each of which
contains a set of 6 database files and a C subdirectory which contains a shell script used in
the database creation process.
As one might assume, the database files in each of these subdirectories contain foreign
type, constant, and function definitions - as well as Objective-C class and method info -that
correspond (roughly) to the information contained in the header files associated with a
“-dev” package in a Linux distribution. libc corresponds pretty closely to the interfaces
associated with glibc/libc6 header files, gl corresponds to an openGL+GLUT development
package, gtk and gnome contain interface information from the GTK+1.2 and GNOME
libraries, respectively.
To see the precise set of .h files used to generate the database files in a given interface
directory, consult the corresponding populate.sh shell script (in the interface directory's
C subdirectory.)
The intent is that this initial set can be augmented to meet local needs, and that this can be
done in a fairly incremental fashion: one needn't have unrelated header files installed in
order to generate interface databases for a package of interest.
Hopefully, this scheme will also make it easier to distribute patches and bug fixes.
Clozure CL maintains a list of directories; when looking for a foreign type, constant,
function, or record definition, it'll consult the database files in each directory on that list.
Initially, the list contains an entry for the libc interface directory. Clozure CL needs to be
explicitly told to look in other interface directories should it need to do so.
This example refers to "ccl:headers;", which is appropriate for LinuxPPC. The procedure's
analogous under Darwin, where the "ccl:darwin-headers;" directory would be used instead.
To create a new interface directory, "foo", and a set of database files in that directory:
4. Edit the file created above, using the "populate.sh" files in the distribution as
guidelines.
#/bin/sh
h-to-ffi.sh `foo-config -cflags` /usr/include/foo/foo.h
Refer to The Interface Translator for information about running the interface translator
and .ffi parser.
Assuming that all went well, there should now be .cdb files in "ccl:headers;foo;". You can
then do
? (use-interface-dir :foo)
whenever you need to access the foreign type information in those database files.
Overview
A small number of shared libraries (including libc, libm, libdl under Linux, and the
"system" library under Darwin) are opened by the lisp kernel and can't be closed.
Clozure CL keeps track of all libraries that have been opened in a lisp session. When a
saved application is first started, an attempt is made to reopen all libraries that were open
when the image was saved, and an attempt is made to resolve all entry points that had been
referenced when the image was saved. Either of these attempts can fail "quietly", leaving
some entry points in an unresolved state.
Linux shared libraries can be referred to either by a string which describes their full
pathname or by their soname, a shorter string that can be defined when the library is
created. The dynamic linker mechanisms used in Linux make it possible (through a series
of filesystem links and other means) to refer to a library via several names; the library's
soname is often the most appropriate identifier.
so names are often less version-specific than other names for libraries; a program that
refers to a library by the name "libc.so.6" is more portable than one which refers to "libc-
2.1.3.so" or to "libc-2.2.3.so", even though the latter two names might each be platform-
specific aliases of the first.
All of the global symbols described below are exported from the CCL package.
The underlying functionality has a poor notion of dependency;it's not always possible
to open libraries that depend on unopened libraries, but it's possible to close libraries
on which other libraries depend. It may be possible to generate more explicit
dependency information by parsing the output of the Linux ldd and ldconfig
programs.
>Darwin Notes
"dylibs" (which often have the extension".dylib") are primarily intended to be linked
against at compile/link time. They can be loaded dynamically,but can't be unloaded.
Accordingly,OPEN-SHARED-LIBRARY can be used to open a .dylib-style
library;calling CLOSE-SHARED-LIBRARY on the result of such a call produces a
warning, and has no other effect. It appears that (due to an OS bug) attempts to open
.dylib shared-libraries that are already open can cause memory corruption unless the
full pathname of the .dylib file is specified on the first and all subsequent calls.
Thanks to Michael Klingbeil for getting both kinds of Darwin shared libraries working in
Clozure CL.
Overview
Clozure CL uses an interface translation system based on the FFIGEN system, which is
described at this page The interface translator makes the constant, type, structure, and
function definitions in a set of C-language header files available to lisp code.
The basic idea of the FFIGEN scheme is to use the C compiler's frontend and parser to
translate .h files into semantically equivalent .ffi files, which represent the definitions from
the headers using a syntax based on S-expressions. Lisp code can then concentrate on the
.ffi representation, without having to concern itself with the semantics of header file
inclusion or the arcana of C parsing.
The original FFIGEN system used a modified version of the LCC C compiler to produce .ffi
files. Since many OS header files contain GCC-specific constructs, Clozure CL's translation
system uses a modified version of GCC (called, somewhat confusingly, ffigen.)
A component shell script called h-to-ffi.sh reads a specified .h file (and optional
preprocessor arguments) and writes a (hopefully) equivalent .ffi file to standard output,
calling the ffigen program with appropriate arguments.
For each interface directory (see FIXTHIS) subdir distributed with Clozure CL, a shell
script (distributed with Clozure CL as "ccl:headers;subdir;C;populate.sh" (or some other
platform-specific headers directory) calls h-to-ffi.sh on a large number of the header files
in /usr/include (or some other system header path) and creates a parallel directory tree in
"ccl:headers;subdir;C;system;header;path;" (or "ccl:darwin-
headers;subdir;C;system;header;path;", etc.), populating that directory with .ffi files.
The CDB databases are used by the #$ and #_ reader macros and are used in the
expansion of RREF, RLET, and related macros.
1. Ensure that the FFIGEN program is installed. See the"README" file generated
during the FFIGEN build process for specific installation instructions.This example
assumes LinuxPPC; for other platforms, substitute the appropriate headers directory.
? (require "PARSE-FFI")
PARSE-FFI
? (ccl::parse-standard-ffi-files :SUBDIR)
;;; lots of output ... after a while, shiny new .cdb fi
;;; appear in "ccl:headers;subdir;"
Overview
As of release 0.11, Clozure CL addresses the fact that foreign type, constant, record, field,
and function nams are case-sensitive and provides mechanisms to refer to these names via
lisp symbols.
Previous versions of Clozure CL have tried to ignore that fact, under the belief that case
conflicts were rare and that many users (and implementors) would prefer not to deal with
case-related issues. The fact that some information in the interface databases was
incomplete or inaccessible because of this policy made it clearer that the policy was
untenable. I can't claim that the approach described here is aesthetically pleasing, but I can
honestly say that it's less unpleasant than other approaches that I'd thought of. I'd be
interested to hear alternate proposals.
The issues described here have to do with how lisp symbols are used to denote foreign
functions, constants, types, records, and fields. It doesn't affect how other lisp objects are
sometimes used to denote foreign objects. For instance, the first argument to the
EXTERNAL-CALL macros is now and has always been a case-sensitive string.
The primary way of referring to foreign constant and function names in Clozure CL is via
the #$ and #_ reader macros. These reader macro functions each read a symbol into the
"OS" package, look up its constant or function definition in the interface database, and
assign the value of the constant to the symbol or install a macroexpansion function on the
symbol.
In order to observe case-sensitivity, the reader-macros now read the symbol with
(READTABLE-CASE :PRESERVE) in effect.
This means that it's necessary to type the foreign constant or function name in correct case,
but it isn't necessary to use any special escaping constructs when writing the variable
name. For instance:
Constructs like RLET expect a foreign type or record name to be denoted by a symbol
(typically a keyword); RREF (and PREF) expect an "accessor" form, typically a keyword
formed by concatenating a foreign type or record name with a sequence of one or more
foreign field names, separated by dots. These names are interned by the reader as other
lisp symbols are, with an arbitrary value of READTABLE-CASE in effect (typically
:UPCASE.) It seems like it would be very tedious to force users to manually escape (via
vertical bar or backslash syntax) all lowercase characters in symbols used to specify foreign
type, record, and field names (especially given that many traditional POSIX structure, type,
and field names are entirely lowercase.)
The approach taken by Clozure CL is to allow the symbols (keywords) used to denote
foreign type, record, and field names to contain angle brackets (< and >). Such symbols are
translated to foreign names via the following set of conventions:
All instances of < and > in the symbol's pname are balanced and don't nest.
Any alphabetic characters in the symbol's pname that aren't enclosed in angle
brackets are treated as lower-case,regardless of the value of READTABLE-CASE and
regardless of the case in which they were written.
Alphabetic characters that appear within angle brackets are mapped to upper-case,
again regardless of how they were written or interned.
There may be many ways of "escaping" (with angle brackets) sequences of upper-case and
non-lower-case characters in a symbol used to denote a foreign name. When translating in
the other direction, Clozure CL always escapes the longest sequence that starts with an
upper-case character and doesn't contain a lower-case character.
It's often preferable to use this canonical form of a foreign type name.
Older POSIX code tends to use lower-case exclusively for type, record, and field names;
there are only a few cases in the Clozure CL sources where mixed-case names need to be
escaped.
Examples
Clozure CL provides several reader macros to make it more convenient to handle foreign
type, function, variable, and constant names. Each of these reader macros reads symbols
preserving the case of the source text, and selects an appropriate package in which to
intern the resulting symbol. These reader macros are especially useful when your Lisp code
interacts extensively with a foreign library-for example, when using Mac OS X's Cocoa
frameworks.
These reader macros include #_ to read foreign function names, #& to read foreign variable
names, #$ to read foreign constant names, #/ to read the names of foreign Objective-C
methods, and #> to read keywords that can be used as the names of types, records, and
accessors.
All of these reader macros preserve the case of the text that they read; beyond that
similarity, each performs some additional work, unique to each reader macro, to create
symbols suitable for a particular use. For example, the function, variable, and constant
reader macros intern the resulting symbol in the os package of the running platform, but
the reader macro for Objective-C method names interns symbols in the nextstep-
functions package.
You are likely to see these reader macros used extensively in Lisp code that works with
foreign libraries; for example, Clozure CL IDE code, which defines numerous Objective-C
classes and methods, uses these reader macros extensively.
For more detailed descriptions of each of these reader macros, see the Foreign-Function-
Interface Dictionary section.
This tutorial is meant to cover the basics of Clozure CL for calling external C functions and
passing data back and forth. These basics will provide the foundation for more advanced
techniques which will allow access to the various external libraries and toolkits.
The first step is to start with a simple C dynamic library in order to actually observe what is
actually passing between Clozure CL and C. So, some C code is in order:
Create the file typetest.c, and put the following code into it:
#include <stdio.h>
void
void_void_test(void)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
signed char
sc_sc_test(signed char data)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Data In: %d\n", (signed int)data);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
return data;
}
unsigned char
uc_uc_test(unsigned char data)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Data In: %d\n", (signed int)data);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
return data;
}
This defines three functions. If you're familiar with C, notice that there's no main(),
because we're just building a library, not an executable.
The function void_void_test() doesn't take any parameters, and doesn't return
anything, but it prints two lines to let us know it was called. sc_sc_test() takes a signed
char as a parameter, prints it, and returns it. uc_uc_test() does the same thing, but with
an unsigned char. Their purpose is just to prove to us that we really can call C functions,
pass them values, and get values back from them.
This code is compiled into a dynamic library on OS X 10.3.4 with the command:
Users of 64-bit platforms may need to pass options such as "-m64" to gcc, may
need to give the output library a different extension (such as ".so"), and may
need to user slightly different values for other options in order to create an
equivalent test library.
The -dynamiclib tells gcc that we will be compiling this into a dynamic library and not an
executable binary program. The output filename is "libtypetest.dylib". Notice that we chose
a name which follows the normal OS X convention, being in the form "libXXXXX.dylib", so
that other programs can link to the library. Clozure CL doesn't need it to be this way, but it
is a good idea to adhere to existing conventions.
The -install_name flag is primarily used when building OS X "bundles". In this case, we are
not using it, so we put a placeholder into it, "./libtypetest.dylib". If we wanted to use
typetest in a bundle, the -install_name argument would be a relative path from some
"current" directory.
After creating this library, the first step is to tell Clozure CL to open the dynamic library.
This is done by calling .
? (open-shared-library "/Users/andewl/openmcl/libtypetest.dylib")
#<SHLIB /Users/andewl/openmcl/libtypetest.dylib #x638EF3E>
You should use an absolute path here; using a relative one, such as just "libtypetest.dylib",
would appear to work, but there are subtle problems which occur after reloading it. See the
Darwin notes on for details. It would be a bad idea anyway, because software should never
rely on its starting directory being anything in particular.
This command returns a reference to the opened shared library, and Clozure CL also adds
one to the global variable ccl::*shared-libraries*:
? ccl::*shared-libraries*
(#<SHLIB /Users/andewl/openmcl/libtypetest.dylib #x638EF3E>
#<SHLIB /usr/lib/libSystem.B.dylib #x606179E>)
Before we call anything, let's check that the individual functions can actually be found by
the system. We don't have to do this, but it helps to know how to find out whether this is
the problem, when something goes wrong. We use external-call:
? (external "_void_void_test")
#<EXTERNAL-ENTRY-POINT "_void_void_test" (#x000CFDF8) /Users/andewl
? (external "_sc_sc_test")
#<EXTERNAL-ENTRY-POINT "_sc_sc_test" (#x000CFE50) /Users/andewl/ope
? (external "_uc_uc_test")
#<EXTERNAL-ENTRY-POINT "_uc_uc_test" (#x000CFED4) /Users/andewl/ope
Notice that the actual function names have been "mangled" by the C linker. The first
function was named "void_void_test" in typetest.c, but in libtypetest.dylib, it has an
underscore (a "_" symbol) before it: "_void_void_test". So, this is the name which you
have to use. The mangling - the way the name is changed - may be different for other
operating systems or other versions, so you need to "just know" how it's done...
Also, pay particular attention to the fact that a hexadecimal value appears in the
EXTERNAL-ENTRY-POINT. (#x000CFDF8, for example - but what it is doesn't matter.)
These hex numbers mean that the function can be dereferenced. Functions which aren't
found will not have a hex number. For example:
? (external "functiondoesnotexist")
#<EXTERNAL-ENTRY-POINT "functiondoesnotexist" {unresolved} #x638E3
The "unresolved" tells us that Clozure CL wasn't able to find this function, which means
you would get an error, "Can't resolve foreign symbol," if you tried to call it.
These external function references also are stored in a hash table which is accessible
through a global variable, ccl::*eeps*.
At this point, we are ready to try our first external function call:
We used , which is is the normal mechanism for accessing externally linked code. The
"_void_void_test" is the mangled name of the external function. The :void refers to the
return type of the function.
The next step is to try passing a value to C, and getting one back:
The first :signed-byte gives the type of the first argument, and then -128 gives the value to
pass for it. The second :signed-byte gives the return type. The return type is always given
by the last argument to .
Everything looks good. Now, let's try a number outside the range which fits in one byte:
Hmmmm. A little odd. Let's look at the unsigned stuff to see how it reacts:
That looks okay. Now, let's go outside the valid range again:
Since a signed byte can only hold values from -128 through 127, and an unsigned one can
only hold values from 0 through 255, any number outside that range gets "clipped": only
the low eight bits of it are used.
What is important to remember is that external function calls have very few safety checks.
Data outside the valid range for its type will silently do very strange things; pointers
outside the valid range can very well crash the system.
That's it for our first example library. If you're still following along, let's add some more C
code to look at the rest of the primitive types. Then we'll need to recompile the dynamic
library, load it again, and then we can see what happens.
int
si_si_test(int data)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Data In: %d\n", data);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
return data;
}
long
sl_sl_test(long data)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Data In: %ld\n", data);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
return data;
}
long long
sll_sll_test(long long data)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Data In: %lld\n", data);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
return data;
}
float
f_f_test(float data)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Data In: %e\n", data);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
return data;
}
double
d_d_test(double data)
{
printf("Entered %s:\n", __FUNCTION__);
printf("Data In: %e\n", data);
printf("Exited %s:\n", __FUNCTION__);
fflush(stdout);
return data;
}
The command line to compile the dynamic library is the same as before:
Now, restart Clozure CL. This step is required because Clozure CL cannot close and reload
a dynamic library on OS X.
? (open-shared-library "/Users/andewl/openmcl/libtypetest.dylib")
#<SHLIB /Users/andewl/openmcl/libtypetest.dylib #x638EF3E>
? (external-call "_sll_sll_test"
:signed-doubleword -973891578912 :signed-doubleword)
Entered sll_sll_test:
Data In: -973891578912
Exited sll_sll_test:
-973891578912
Okay, everything seems to be acting as expected. However, just to remind you that most of
this stuff has no safety net, here's what happens if somebody mistakes sl_sl_test() for
sll_sll_test(), thinking that a long is actually a doubleword:
? (external-call "_sl_sl_test"
:signed-doubleword -973891578912 :signed-doubleword)
Entered sl_sl_test:
Data In: -227
Exited sl_sl_test:
-974957576192
Ouch. The C function changes the value with no warning that something is wrong. Even
worse, it manages to pass the original value back to Clozure CL, which hides the fact that
something is wrong.
? (open-shared-library "/Users/andewl/openmcl/libtypetest.dylib")
#<SHLIB /Users/andewl/openmcl/libtypetest.dylib #x638EF3E>
Notice that the number ends with "...e+11" for the single-float, and "...d+290" for the
double-float. Lisp has both of these float types itself, and the d instead of the e is how you
specify which to create. If you tried to pass :double-float 1.0e2 to external-call, Lisp would
be nice enough to notice and give you a type error. Don't get the :double-float wrong,
though, because then there's no protection.
Congratulations! You now know how to call external C functions from within Clozure CL,
and pass numbers back and forth. Now that the basic mechanics of calling and passing
work, the next step is to examine how to pass more complex data structures around.
Acknowledgement
Not every foreign function is so marvelously easy to use as the ones we saw in the last
section. Some functions require you to allocate a C struct, fill it with your own information,
and pass in a pointer to that struct. Some of them require you to allocate an empty struct
that they will fill in so that you can read the information out of it.
There are generally two ways to allocate foreign data. The first way is to allocate it on the
stack; the RLET macro is one way to do this. This is analogous to using automatic variables
in C. In the jargon of Common Lisp, data allocated this way is said to have dynamic extent.
The other way to heap-allocate the foreign data. This is analogous to calling malloc in C.
Again in the jargon of Common Lisp, heap-allocated data is said to have indefinite extent.
If a function heap-allocates some data, that data remains valid even after the function itself
exits. This is useful for data which may need to be passed between multiple C calls or
multiple threads. Also, some data may be too large to copy multiple times or may be too
large to allocate on the stack.
The big disadvantage to allocating data on the heap is that it must be explicitly
deallocated-you need to "free" it when you're done with it. Normal Lisp objects, even those
with indefinite extent, are deallocated by the garbage collector when it can prove that
they're no longer referenced. Foreign data, though, is outside the GC's ken: it has no way to
know whether a blob of foreign data is still referenced by foreign code or not. It is thus up
to the programmer to manage it manually, just as one does in C with malloc and free.
What that means is that, if you allocate something and then lose track of the pointer to it,
there's no way to ever free that memory. That's what's called a memory leak, and if your
program leaks enough memory it will eventually use up all of it! So, you need to be careful
to not lose your pointers.
That disadvantage, though, is also an advantage for using foreign functions. Since the
garbage collector doesn't know about this memory, it will never move it around. External C
code needs this, because it doesn't know how to follow it to where it moved, the way that
Lisp code does. If you allocate data manually, you can pass it to foreign code and know that
no matter what that code needs to do with it, it will be able to, until you deallocate it. Of
course, you'd better be sure it's done before you do. Otherwise, your program will be
unstable and might crash sometime in the future, and you'll have trouble figuring out what
caused the trouble, because there won't be anything pointing back and saying "you
deallocated this too soon."
As in the last tutorial, our first step is to create a local dynamic library in order to help
show what is actually going on between Clozure CL and C. So, create the file ptrtest.c, with
the following code:
#include <stdio.h>
void
reverse_int_ptr_ptrtest(int **ptrs)
{
reverse_int_ptr_array(ptrs, 2);
reverse_int_array(*(ptrs+0), 4);
reverse_int_array(*(ptrs+1), 4);
}
The function make-heap-ivector is the primary tool for allocating objects in heap
memory. It allocates a fixed-size Clozure CL object in heap memory. It returns both an
array reference, which can be used directly from Clozure CL, and a macptr, which can be
used to access the underlying memory directly. For example:
? a
#(1396 2578 97862649)
? ap
#<A Mac Pointer #x10217C>
It's important to realize that the contents of the ivector we've just created haven't been
initialized, so their values are unpredictable, and you should be sure not to read from them
before you set them, to avoid confusing results.
At this point, a references an object which works just like a normal array. You can refer to
any item of it with the standard aref function, and set them by combining that with setf.
As noted above, the ivector's contents haven't been initialized, so that's the next order of
business:
? a
#(1396 2578 97862649)
? (aref a 2)
97862649
? (setf (aref a 0) 3)
3
? (setf (aref a 1) 4)
4
? (setf (aref a 2) 5)
5
? a
#(3 4 5)
? (setq *byte-length-of-long* 4)
4
? (%get-signed-long ap (* 2 *byte-length-of-long*))
5
? (%get-signed-long ap (* 0 *byte-length-of-long*))
3
So far, there is nothing about this object that could not be done much better with standard
Lisp. However, the macptr can be used to pass this chunk of memory off to a C function.
? a
#(6 4 7)
? ap
#<A Mac Pointer #x10217C>
? a
#(7 4 6)
? ap
#<A Mac Pointer #x10217C>
The array gets passed correctly to the C function, reverse_int_array. The C function
reverses the contents of the array in-place; that is, it doesn't make a new array, just keeps
the same one and reverses what's in it. Finally, the C function passes control back to
Clozure CL. Since the allocated array memory has been directly modified, Clozure CL
reflects those changes directly in the array as well.
There is one final bit of housekeeping to deal with. Before moving on, the memory needs to
be deallocated:
? (dispose-heap-ivector a ap)
NIL
When do you call dispose-heap-ivector? Anytime after you know the ivector will
never be used again, but no sooner. If you have a lot of ivectors, say, in a hash table, you
need to make sure that when whatever you were doing with the hash table is done, those
ivectors all get freed. Unless there's still something somewhere else which refers to them, of
course! Exactly what strategy to take depends on the situation, so just try to keep things
The simplest situation is when you have things set up so that a Lisp object "encapsulates" a
pointer to foreign data, taking care of all the details of using it. In this case, you don't want
those two things to have different lifetimes: You want to make sure your Lisp object exists
as long as the foreign data does, and no longer; and you want to make sure the foreign data
doesn't get deallocated while your Lisp object still refers to it.
If you're willing to accept a few limitations, you can make this easy. First, you can't let
foreign code keep a permanent pointer to the memory; it has to always finish what it's
doing, then return, and not refer to that memory again. Second, you can't let any Lisp code
that isn't part of your encapsulating "wrapper" refer to the pointer directly. Third, nothing,
either foreign code or Lisp code, should explicitly deallocate the memory.
If you can make sure all of these are true, you can at least ensure that the foreign pointer is
deallocated when the encapsulating object is about to become garbage, by using Clozure
CL's nonstandard "termination" mechanism, which is essentially the same as what Java
and other languages call "finalization".
Termination is a way of asking the garbage collector to let you know when it's about to
destroy an object which isn't used anymore. Before destroying the object, it calls a function
which you write, called a terminator.
So, you can use termination to find out when a particular macptr is about to become
garbage. That's not quite as helpful as it might seem: It's not exactly the same thing as
knowing that the block of memory it points to is unreferenced. For example, there could be
another macptr somewhere to the same block; or, if it's a struct, there could be a macptr
to one of its fields. Most problematically, if the address of that memory has been passed to
foreign code, it's sometimes hard to know whether that code has kept the pointer. Most
foreign functions don't, but it's not hard to think of exceptions.
You can use code such as this to make all this happen:
(when ivector
(dispose-heap-ivector ivector macptr)
(setq ivector nil
macptr nil))))
The ccl:terminate method will be called on some arbitrary thread sometime (hopefully
soon) after the GC has decided that there are no strong references to an object which has
been the argument of a ccl:terminate-when-unreachable call.
If it makes sense to say that the foreign object should live as long as there's Lisp code that
references it (through the encapsulating object) and no longer, this is one way of doing
that.
Now we've covered passing basic types back and forth with C, and we've done the same
with pointers. You may think this is all... but we've only done pointers to basic types. Join
us next time for pointers... to pointers.
Acknowledgement
#_ [Reader Macro]
Reads a symbol from the current input stream, with *PACKAGE* bound to the "OS"
package and with readtable-case preserved.
Does a lookup on that symbol in the Clozure CL interface database, signalling an error if no
foreign function information can be found for the symbol in any active interface directory.
Notes the foreign function information, including the foreign function's return type, the
number and type of the foreign function's required arguments, and an indication of
whether or not the function accepts additional arguments (via e.g., the "varargs"
mechanism in C).
Defines a macroexpansion function on the symbol, which expand macro calls involving the
symbol into EXTERNAL-CALL forms where foreign argument type specifiers for required
arguments and the return value specifer are provided from the information in the database.
The effect of these steps is that it's possible to call foreign functions that take fixed
numbers of arguments by simply providing argument values, as in:
(#_isatty fd)
(#_read fd buf n)
and to call foreign functions that take variable numbers of arguments by specifying the
types of non-required args, as in:
You can query whether a given name is defined in the interface databases by appending the
'?' character to the reader macro; for example:
CL-USER> #_?printf
T
CL-USER> #_?foo
NIL
In Clozure CL 1.2 and later, the #& reader macro can be used to access foreign variables;
this functionality depends on the presence of "vars.cdb" files in the interface database. The
current behavior of the #& reader macro is to:
Read a symbol from the current input stream, with *PACKAGE* bound to the "OS" package
and with readtable-case preserved.
Use that symbol's pname to access the Clozure CL interface database, signalling an error if
no appropriate foreign variable information can be found with that name in any active
interface directory.
Use type information recorded in the database to construct a form which can be used to
access the foreign variable, and return that form.
Please note that the set of foreign variables declared in header files may or may not match
the set of foreign variables exported from libraries (we're generally talking about C and
Unix here ...). When they do match, the form constructed by the #& reader macro manages
the details of resolving and tracking changes to the foreign variable's address.
Future extensions (via prefix arguments to the reader macro) may offer additional
behavior; it might be convenient (for instance) to be able to access the address of a foreign
variable without dereferencing that address.
In LinuxPPC,
? #&stderr
returns a pointer to the stdio error stream ("stderr" is a macro under OSX/Darwin).
? #&sys_errlist
You can query whether a given name is defined in the interface databases by appending the
'?' character to the reader macro; for example:
CL-USER> #&?sys_errlist
T
CL-USER> #&?foo
NIL
#$ [Reader Macro]
In Clozure CL 0.14.2 and later, the #? reader macro can be used to access foreign
constants; this functionality depends on the presence of "constants.cdb" files in the
interface database. The current behavior of the #$ reader macro is to:
Read a symbol from the current input stream, with *PACKAGE* bound to the "OS" package
and with readtable-case preserved.
Use that symbol's pname to access the Clozure CL interface database, signalling an error if
no appropriate foreign constant information can be found with that name in any active
interface directory.
Use type information recorded in the database to construct a form which can be used to
access the foreign constant, and return that form.
Please note that the set of foreign constants declared in header files may or may not match
the set of foreign constants exported from libraries. When they do match, the form
constructed by the #$ reader macro manages the details of resolving and tracking changes
to the foreign constant's address.
You can query whether a given name is defined in the interface databases by appending the
'?' character to the reader macro; for example:
CL-USER> #$?SO_KEEPALIVE
T
CL-USER> #$?foo
NIL
#/ [Reader Macro]
In Clozure CL 1.2 and later, the #/ reader macro can be used to access foreign functions on
the Darwin platform. The current behavior of the #/ reader macro is to:
Read a symbol from the current input stream, with *PACKAGE* bound to the
"NEXTSTEP-FUNCTIONS" package, with readtable-case preserved, and with any colons
included.
Do limited sanity-checking on the resulting symbol; for example, any name that contains at
least one colon is required also to end with a colon, to conform to Objective-C method-
naming conventions.
Export the resulting symbol from the "NEXTSTEP-FUNCTIONS" package and return it.
A symbol read using this macro can be used as an operand in most places where an
Objective-C message name can be used, such as in the (OBJ:@SELECTOR ...) construct.
Please note: the reader macro is not rigorous about enforcing Objective-C method-naming
conventions. Despite the simple checking done by the reader macro, it may still be possible
to use it to construct invalid names.
The dispatching knows how to call declared Objective-C methods defined on the message.
In many cases, all methods have the same foreign type signature, and the dispatching
function merely passes any arguments that it receives to a function that does an
Objective-C message send with the indicated foreign argument and return types. In other
cases, where different Objective-C messages have different type signatures, the dispatching
function tries to choose a function that handles the right type signature based on the class
of the dispatching function's first argument.
The argument and result coercion that the bridge has traditionally supported is supported
by the new mechanism (e.g., :<BOOL> arguments can be specified as lisp booleans and
:<BOOL> results are returned as lisp boolean values, and an argument value of NIL is
coerced to a null pointer if the corresponding argument type is :ID.
Some Objective-C methods accept variable numbers of arguments; the foreign types of
non-required arguments are determined by the lisp types of those arguments (e.g., integers
are passed as integers, floats as floats, pointers as pointers, record types by reference.)
Examples:
;;; or by
(#/frame my-window)
In Clozure CL 1.2 and later, the #> reader macro reads the following text as a keyword,
preserving the case of the text. For example:
CL-USER> #>FooBar
:<F>OO<B>AR
The resulting keyword can be used as the name of foreign types, records, and accessors.
Stops using a shared library, informing the operating system that it can be unloaded if
appropriate.
library
either an object of type SHLIB, or a string which designates one by its so-name.
completely
Proclaims name to be a special variable; sets its value to a MACPTR which, when called by
foreign code, calls a lisp function which expects foreign arguments of the specified types
and which returns a foreign value of the specified result type. Any argument variables
which correspond to foreign arguments of type :ADDRESS are bound to stack-allocated
MACPTRs.
If name is already a callback function pointer, its value is not changed; instead, it's
arranged that an updated version of the lisp callback function will be called. This feature
allows for callback functions to be redefined incrementally, just like Lisp functions are.
name
arg-type-specifer
var
A symbol (lisp variable), which will be bound to a value of the specified type.
body
A sequence of lisp forms, which should return a value which can be coerced to the
specified result-type.
If name is non-nil, defines name to be an alias for the foreign type specified by foreign-
type-spec. If foreign-type-spec is a named structure or union type, additionally defines
that structure or union type.
Note that there are two separate namespaces for foreign type names: one for the names of
ordinary types and one for the names of structs and unions. Which one name refers to
depends on foreign-type-spec in the obvious manner.
name
foreign-type-spec
name
entry
Tries to resolve the entry point to a memory address, and identify the containing library.
Be aware that under Darwin, external functions which are callable from C have
underscores prepended to their names, as in "_fopen".
name
arg-type-specifer
arg
result-type-specifier
entrypoint
A fixnum or macptr
arg-type-keyword
arg
result-type-keyword
Calls the foreign function at address entrypoint passing the values of each arg as a foreign
argument of type indicated by the corresponding arg-type-specifier. Returns the foreign
function result (coerced to a Lisp object of type indicated by result-type-specifier), or NIL if
result-type-specifer is :VOID or NIL
entrypoint
A fixnum or MACPTR
arg-type-specifer
arg
result-type-specifier
This function tries to resolve the address of the foreign symbol name (a lisp string). If
successful, it returns that address encapsulated in a macptr; otherwise, it returns nil.
This function rries to resolve the address of the foreign symbol name (a lisp string). If
successful, it returns a fixnum representation of that address. Otherwise, it returns nil.
Free the foreign memory pointed to by ptr by invoking the standard C function free(). If
ptr is a gcable pointer (such as an object returned from ccl::make-gcable-record),
then free first informs the garbage collector that the foreign memory has been deallocted
before actually calling free().
make-heap-ivector allocates an ivector in foreign memory. The GC will never move this
vector, and will in fact not pay any attention to it at all. The returned pointer to it can
therefore be passed safely to foreign code.
element-count
A positive integer.
element-type
A type specifier.
vector
mactpr
size
Allocates a block of foreign memory suitable to hold the foreign type described by typespec,
in the same manner as make-record. In addition, ccl::make-gcable-record marks
the returned object gcable: in other words, it informs the garbage collector that it may
reclaim the object when it becomes unreachable.
When using gcable pointers, it's important to remember the distinction between a macptr
object (which is a lisp object, more or less like any other) and the block of foreign memory
that the macptr object points to. If a gcable macptr object is the only thing in the world
(lisp world or foreign world) that references the underlying block of foreign memory, then
freeing the foreign memory when it becomes impossible to reference it is convenient and
sane. If other lisp macptrs reference the underlying block of foreign memory or if the
address of that foreign memory is passed to and retained by foreign code, having the GC
free the memory may have unpleasant consequences if those other references are used.
Take care, therefore, not to create a gcable record unless you are sure that the returned
macptr will be the only reference to the allocated memory that will ever be used.
typespec
A foreign type specifier, or a keyword which is used as the name of a foreign struct or
union.
initforms
If the type denoted by typespec is scalar, a single value appropriate for that type;
otherwise, a list of alternating field names and values appropriate for the types of
those fields.
result
Expands into code which allocates and initializes an instance of the type denoted by
typespec, on the foreign heap. The record is allocated using the C function ccl::malloc,
and the user of make-record must explicitly call the function free to deallocate the
record, when it is no longer needed.
If initforms is provided, its value or values are used in the initialization. When the type is a
scalar, initforms is either a single value which can be coerced to that type, or no value, in
which case binary 0 is used. When the type is a struct, initforms is a list, giving field
names and the values for each. Each field is treated in the same way as a scalar is: If a value
for it is given, it must be coerceable to the field's type; if not, binary 0 is used.
When the type is an array, initforms may not be provided, because make-record cannot
initialize its values. make-record is also unable to initialize fields of a struct which are
themselves structs. The user of make-record should set these values by another means.
A possibly-significant limitation is that it must be possible to find the foreign type at the
time the macro is expanded; make-record signals an error if this is not the case.
typespec
A foreign type specifier, or a keyword which is used as the name of a foreign struct or
union.
initforms
If the type denoted by typespec is scalar, a single value appropriate for that type;
otherwise, a list of alternating field names and values appropriate for the types of
those fields.
result
It is inconvenient that make-record is a macro, because this means that typespec cannot
be a variable; it must be an immediate value.
If it weren't for this requirement, make-record could be a function. However, that would
mean that any stand-alone application using it would have to include a copy of the
interface database (see The Interface Database), which is undesirable because it's large.
Asks the operating system to load a shared library for Clozure CL to use.
name
library
If the library denoted by name can be loaded by the operating system, returns an object of
type SHLIB that describes the library; if the library is already open, increments a reference
count. If the library can't be loaded, signals a SIMPLE-ERROR which contains an often-
cryptic message from the operating system.
;;; Grovel around, curse, and try to find out where "gdk_thread
;;; might be defined. Then try again:
? (open-shared-library "libgdk.so")
#<SHLIB libgdk.so #x3046DBB6>
? (open-shared-library "libgtk.so")
#<SHLIB libgtk.so #x3046DC86>
? (external "gtk_main")
#<EXTERNAL-ENTRY-POINT "gtk_main" (#x012C3004) libgtk.so #x3046
? (close-shared-library "libgtk.so")
T
? (close-shared-library "libgdk.so")
T
? (external "gtk_main")
#<EXTERNAL-ENTRY-POINT "gtk_main" {unresolved} libgtk.so #x3046
Does the SHLIB still get returned if the library is already open?
References an instance of a foreign type (or a component of a foreign type) accessible via
ptr.
Expands into code which references the indicated scalar type or component, or returns a
pointer to a composite type.
ptr
a MACPTR.
accessor-form
a keyword which names a foreign type or record, as described in Foreign type, record,
and field names.
Tries to resolve the address of the external-entry-point eep and returns a fixnum
representation of that address if successful; else signals an error.
eep
Executes body in an environment in which each var is bound to a macptr encapsulating the
address of a stack-allocated foreign memory block, allocated and initialized from typespec
and initforms as permake-record. Returns whatever values body returns.
var
typespec
initforms
This macro is just like rlet, except that the stack-allocated foreign memory is zeroed.
The termination mechanism is a way to have the garbage collector run a function right
before an object is about to become garbage. It is very similar to the finalization
mechanism which Java has. It is not standard Common Lisp, although other Lisp
implementations have similar features. It is useful when there is some sort of special
cleanup, deallocation, or releasing of resources which needs to happen when a certain
object is no longer being used.
When the garbage collector discovers that an object is no longer referred to anywhere in
the program, it deallocates that object, freeing its memory. However, if terminate-
when-unreachable has been called on the object at any time, the garbage collector first
invokes the generic function terminate, passing it the object as a parameter.
object
A CLOS object of a class for which there exists a method of the generic function
terminate.
(defclass resource-wrapper ()
((resource :accessor resource)))
Tells Clozure CL to remove the interface directory denoted by dir-id from the list of
interface directories which are consulted for foreign type and function information.
Returns T if the directory was on the search list, NIL otherwise.
dir-id
Tells Clozure CL to add the interface directory denoted by dir-id to the list of interface
directories which it consults for foreign type and function information. Arranges that that
directory is searched before any others.
Note that use-interface-dir merely adds an entry to a search list. If the named
directory doesn't exist in the file system or doesn't contain a set of database files, a runtime
error may occur when Clozure CL tries to open some database file in that directory, and it
will try to open such a database file whenever it needs to find any foreign type or function
information. unuse-interface-dir may come in handy in that case.
dir-id
Using the :GTK interface directory makes available information on foreign types, functions,
and constants. It's generally necessary to load foreign libraries before actually calling the
foreign code, which for GTK can be done like this:
(load-gtk-libraries)
(#_gtk_widget_destroy w)
Overview
Examples
Limitations and known bugs
External-Program Dictionary
Overview
Clozure CL provides primitives to run external Unix programs, to select and connect Lisp
streams to their input and output sources, to (optionally) wait for their completion and to
check their execution and exit status.
All of the global symbols described below are exported from the CCL package.
This implementation is modeled on - and uses some code from - similar facilities in
CMUCL.
Examples
These last examples will only produce output if Clozure CL's current directory contains
.lisp files, of course.
Clozure CL and the external process may get confused about who owns which
streams when input, output, or error are specified as T and wait is specified as NIL.
External processes that need to talk to a terminal device may not work properly; the
environment (SLIME, ILISP) under which Clozure CL is run can affect this.
External-Program Dictionary
program
args
A list of simple-strings
wait
pty
This option is accepted but currently ignored; it's intended to make it easier to run
external programs that need to interact with a terminal device.
sharing
Sets a specific sharing mode (see Additional keywords for OPEN and
MAKE-SOCKET) for any streams created within RUN-PROGRAM when INPUT,
OUTPUT or ERROR are requested to be a :STREAM.
input
Selects the input source used by the EXTERNAL-PROCESS. May be any of the
following:
NIL Specifies that a null input stream (e.g., /dev/null) should be used.
T Specifies that the EXTERNAL-PROCESS should use the input source with
which Clozure CL was invoked.
:STREAM Creates a Lisp stream opened for character output. Any data written
to this stream (accessible as the EXTERNAL-PROCESS-INPUT-STREAM of the
EXTERNAL-PROCESS object) appears as input to the external process.
A stream. Specifies that the lisp stream should provide input to the
EXTERNAL-PROCESS.
if-input-does-not-exist
If the input argument specifies the name of an existing file, this argument is used as
the if-does-not-exist argument to OPEN when that file is opened.
output
Specifies where standard output from the external process should be sent. Analogous
to input above.
if-output-exists
error
Specifies where error output from the external process should be sent. In addition to
the values allowed for output, the keyword :OUTPUT can be used to indicate that
error output should be sent where standard output goes.
if-error-exists
Analogous to if-output-exists.
status-hook
external-format
The external format (see External Formats) for all of the streams (input, output, and
error) used to communicate with the external process.
env
New OS environment variable bindings for the external process. By default the
external process inherits the environment of the running Lisp process. Env is an
association list with elements (<Environment Variable Name> . <Value>). Name and
value are case sensitive strings. See setenv.
silently-ignore-catastrophic-failures
Runs the specified program in an external (Unix) process, returning an object of type
EXTERNAL-PROCESS if successful.
The implementation involves a lisp process/thread which monitors the status of this
external process and arranges for the standard I/O descriptors for the external process to
be connected to the specified lisp streams. Since this may require the monitoring thread to
do I/O on lisp streams in some cases, streams provided as the values of the :INPUT,
:OUTPUT, and :ERROR arguments should not be private to some other lisp thread.
Sends signal number sig to the external process proc (which would have been returned by
run-program. Typically, it would only be useful to call this function if the proc was
created with :wait nil.
However, if error-if-exited is nil, and the attempt to signal the external process fails
because the external process has already exited, the function will return nil rather than
signaling an error.
Returns the operating system process ID assigned to the external-process object proc.
Returns the lisp stream which is used to write input to the external-process object proc, if it
has one. This will be the stream created when the :input argument to run-program is
specified as :stream.
Returns the lisp stream which is used to read output from the external-process object proc,
if there is one. This is the stream created when the :output argument to run-program is
specified as :stream.
Returns the stream which is used to read error output from a given OS subprocess, if there
is one. This is the stream created when the :error argument to run-program is specified
as :stream.
Returns, as multiple values, a keyword denoting the status of the external process proc
(one of :running, :stopped, signaled, or exited), and the exit code or terminating
signal if the first value is other than :running.
Stream Extensions
Stream External Format
Additional keywords for OPEN and MAKE-SOCKET
Basic Versus Fundamental Streams
Stream Timeouts and Deadlines
Open File Streams
Creating Your Own Stream Classes with Gray Streams
Lisp Standard Streams and OS Standard Streams
Stream Extensions
open and make-socket have each been extended to take the additional keyword
arguments: :CLASS, :SHARING, and :BASIC.
:CLASS
A symbol that names the desired class of the stream. The specified class must inherit
from FILE-STREAM for open.
:SHARING
Specifies how a stream can be used by multiple threads. The possible values are:
:PRIVATE, :LOCK and :EXTERNAL. :PRIVATE is the default. NIL is also accepted as
a synonym for :EXTERNAL.
:PRIVATE
Specifies that the stream can only be accessed by the thread that first tries to do
I/O to it; that thread becomes the "owner" of the stream and is not necessarily
the same thread as the one which created the stream. This is the default. (There
was some discussion on openmcl-devel about the idea of "transferring
ownership" of a stream; this has not yet been implemented.) Attempts to do I/O
on a stream with :PRIVATE sharing from a thread other than the stream's
owner yield an error.
:LOCK
Specifies that all access to the stream require the calling thread to obtain a lock.
There are separate "read" and "write" locks for IO streams. This makes it
possible for instance, for one thread to read from such a stream while another
thread writes to it. (see also make-read-write-lockwith-
read-lockwith-write-lock)
:EXTERNAL
:BASIC
A boolean that indicates whether or not the stream is a Gray stream, i.e. whether or
not the stream is an instance of FUNDAMENTAL-STREAM or CCL::BASIC-
STREAM(see Basic Versus Fundamental Streams). Defaults to T.
Gray streams (see Creating Your Own Stream Classes with Gray Streams) all inherit from
FUNDAMENTAL-STREAM whereas basic streams inherit from CCL::BASIC-STREAM. The
tradeoff between FUNDAMENTAL and BASIC streams is entirely between flexibility and
performance, potential or actual. I/O primitives can recognize BASIC-STREAMs and
exploit knowledge of implementation details. FUNDAMENTAL stream classes can be
subclassed and extended in a standard way (the Gray streams protocol).
For existing stream classes (FILE-STREAMs, SOCKETs, and the internal CCL::FD-
STREAM classes used to implement file streams and sockets), a lot of code can be shared
between the FUNDAMENTAL and BASIC implementations. The biggest difference should
be that that code can be reached from I/O primitives like READ-CHAR without going
through some steps that're there to support generality and extensibility, and skipping those
steps when that support isn't needed can improve I/O performance.
A simple loop reading 2M characters from a text file runs about 10X faster when the file is
opened the new defaults (:SHARING :PRIVATE :BASIC T) than it had before these
changes were made. That sounds good, until one realizes that the "equivalent" C loop can
be about 10X faster still ...
A stream that is associated with a file descriptor has attributes and accessors: stream-
input-timeout, stream-output-timeout, and stream-deadline. All three
accessors have corresponding setf methods. stream-input-timeout and stream-
output-timeout are specified in seconds and can be any positive real number less than
one million. When a timeout is set and the corresponding I/O operation takes longer than
the specified interval, an error is signalled. The error is INPUT-TIMEOUT for input and
OUTPUT-TIMEOUT for output. STREAM-DEADLINE specifies an absolute time in internal-
time-units. If an I/O operation on the stream does not complete before the deadline then a
COMMUNICATION-DEADLINE-EXPIRED error is signalled. A deadline takes precedence
over any input/output timeouts that may be set.
Clozure CL maintains a list of open file streams. This helps to ensure that streams get
closed in an orderly manner when the lisp exits. The following thread-safe functions
manage this list.
open-file-streams [Function]
Adds file-stream to the internal list of open file streams that is returned by open-file-
streams. This function is thread-safe. It will usually only be called from custom stream
code when a file-stream is created.
Removes file-stream from the internal list of open file streams that is returned by
open-file-streams. This function is thread-safe. It will usually only be called from
custom stream code when a file-stream is closed.
Overview
This section is still being written and revised, because it is woefully incomplete. The
dictionary section currently only lists a couple functions. Caveat lector.
Gray streams are an extension to Common Lisp. They were proposed for standardization
by David Gray (the astute reader now understands their name) quite some years ago, but
not accepted, because they had not been tried sufficiently to find conceptual problems with
them.
They have since been implemented by quite a few modern Lisp implementations. However,
they do indeed have some inadequacies, and each implementation has addressed these in
different ways. The situation today is that it's difficult to even find out how to get started
using Gray streams. This is why standards are important.
Here's a list of some classes which you might wish for your new stream class to inherit
from:
All of these are defined in ccl/level-1/l1-streams.lisp, except for the ccl:file-* ones, which
are in ccl/level-1/l1-sysio.lisp.
According to the original Gray streams proposal, you should inherit from the most specific
of the fundamental-* classes which applies. Using Clozure CL, though, if you want
buffering for better performance, which, unless you know of some reason you wouldn't, you
do, you should instead inherit from the appropriate ccl::buffered-* class The buffering you
get this way is exactly the same as the buffering which is used on ordinary, non-Gray
streams, and force-output will work properly on it.
Notice that -mixin suffix in the names of all the ccl::buffered-* classes? The suffix means
that this class is not "complete" by itself; you still need to inherit from a fundamental-*
stream, even if you also inherit from a *-mixin stream. You might consider making your
own class like this. .... Except that they do inherit from the fundamental-* streams, that's
weird.
If you want to be able to create an instance of your class with the :class argument to (open)
and (with-open-file), you should make it inherit from one of the file-* classes. If you do
this, it's not necessary to inherit from any of the other classes (though it won't hurt
anything), since the file-* classes already do.
When you inherit from the file-* classes, you can use (call-next-method) in any of your
methods to get the standard behavior. This is especially useful if you want to create a class
which performs some simple filtering operation, such as changing everything to uppercase
or to a different character encoding. If you do this, you will definitely need to specialize
ccl::select-stream-class. Your method on ccl::stream-select-class should accept an instance
of the class, but pay no attention to its contents, and return a symbol naming the class to
actually be instantiated.
If you need to make your functionality generic across all the different types of stream,
probably the best way to implement it is to make it a mixin, define classes with all the
variants of input, output, io, character, and binary, which inherit both from your mixin and
from the appropriate other class, then define a method on ccl::select-stream-class which
chooses from among those classes.
Note that some of these classes are internal to the CCL package. If you try to inherit from
those ones without the ccl:: prefix, you'll get an error which may confuse you, calling them
"forward-referenced classes". That just means you used the wrong symbol, so add the
prefix.
Here's a list of some generic functions which you might wish to specialize for your new
stream class, and which ought to be documented at some point.
The following functions are standard parts of Common Lisp, but behave in special ways
with regard to Gray streams.
Specifically, (open) and (with-open-file) accept a new keyword argument, :class, which may
be a symbol naming a class; the class itself; or an instance of it. The class so given must be
a subtype of 'stream, and an instance of it with no particular contents will be passed to
ccl::select-stream-class to determine what class to actually instantiate.
The following are standard, and do not behave specially with regard to Gray streams, but
probably should.
stream-external-format
Overview
The "Gray Streams" API is based on an informal proposal that was made before ANSI CL
adopted the READ-SEQUENCE and WRITE-SEQUENCE functions; as such, there is no
"standard" way for the author of a Gray stream class to improve the performance of these
functions by exploiting knowledge of the stream's internals (e.g., the buffering mechanism
it uses.)
Notes
Example
Multibyte I/O
All heap-allocated objects in Clozure CL that cannot contain pointers to lisp objects are
represented as ivectors. Clozure CL provides low-level functions, and , to efficiently
transfer data between buffered streams and ivectors. There's some overlap in functionality
between the functions described here and the ANSI CL READ-SEQUENCE and WRITE-
SEQUENCE functions.
As used here, the term "octet" means roughly the same thing as the term "8-bit byte". The
functions described below transfer a specified sequence of octets between a buffered
stream and an ivector, and don't really concern themselves with higher-level issues (like
whether that octet sequence is within bounds or how it relates to the logical contents of the
ivector.) For these reasons, these functions are generally less safe and more flexible than
their ANSI counterparts.
Should try to read up to count elements from stream into the list list, returning the number
of elements actually read (which may be less than count in case of a premature end-of-file.)
stream
list
count
Write the first count elements of list to stream. The return value of this method is ignored.
stream
list
count
Read successive elements from stream into vector, starting at element start (inclusive) and
continuing through element end (exclusive.) Should return the index of the vector element
beyond the last one stored into, which may be less than end in case of premature
end-of-file.
stream
vector
start
end
should try to write successive elements of vector to stream, starting at element start
(inclusive) and continuing through element end (exclusive.)
stream
vector
start
end
a stream.
direction
fd
Returns the file descriptor associated with s in the direction given by direction. It is
necessary to specify direction because the input and output file descriptors may be
different; the most common case is when one of them has been redirected by the Unix
shell.
Reads up to max-octets octets from stream into ivector, storing them at start-octet. Returns
the number of octets actually read.
stream
ivector
Any ivector.
start-octet
A non-negative integer.
max-octets
A non-negative integer. The return value may be less than the value of this parameter
if EOF was encountered.
Writes max-octets octets to stream from ivector, starting at start-octet. Returns max-octets.
stream
ivector
Any ivector
start-octet
A non-negative integer.
max-octet
A non-negative integer.
In normal interactive usage, the input and output sides of the bidirectional stream
*terminal-io* are hooked up to the the operating system's standard input and standard
output. The lisp streams *standard-input*, *standard-output*, and *error-
output* are synonym streams for *terminal-io*.
In batch mode, this arrangement is modified slightly. The lisp streams *standard-
input*, *standard-output*, and *standard-error* correspond directly to the
operating system's standard input, standard output, and standard error. If the lisp can
determine that it has access to an operating system tty, then *terminal-io* will be
hooked up to that. Otherwise, the input and output streams of *terminal-io* will
correspond to the operating system's standard input and standard output.
Overview
File-system case
Line Termination Characters
Single-precision trig & transcendental functions
Shared libraries
Cocoa Programming in Clozure CL
The Command Line and the Window System
Writing (and reading) Cocoa code
Overview
The documentation and whatever experience you may have in using Clozure CL under
Linux should also apply to using it under Darwin/MacOS X and FreeBSD. There are some
differences between the platforms, and these differences are sometimes exposed in the
implementation.
File-system case
Darwin and MacOS X use HFS+ file systems by default; HFS+ file systems are usually
case-insensitive. Most of Clozure CL's filesystem and pathname code assumes that the
underlying filesystem is case-sensitive; this assumption extends to functions like EQUAL,
which assumes that #p"FOO" and #p"foo" denote different, un-EQUAL filenames. Since
Darwin/MacOS X can also use UFS and NFS filesystems, the opposite assumption would
be no more correct than the one that's currently made.
Whatever the best solution to this problem turns out to be, there are some practical
considerations. Doing:
? (save-application "DPPCCL")
on 32-bit DarwinPPC has the unfortunate side-effect of trying to overwrite the Darwin
Clozure CL kernel, "dppccl", on a case-insensitive filesystem.
To work around this, the Darwin Clozure CL kernel expects the default heap image file
name to be the kernel's own filename with the string ".image" appended, so the idiom
would be:
? (save-application "dppccl.image")
Clozure CL follows the Unix convention on both Darwin and LinuxPPC, but offers some
support for reading and writing files that use other conventions (including traditional
MacOS conventions) as well.
This support (and anything like it) is by nature heuristic: it can successfully hide the
distinction between newline conventions much of the time, but could mistakenly change
the meaning of otherwise correct programs (typically when files contain both #\Return and
#\Linefeed characters or when files contain mixtures of text and binary data.) Because of
this concern, the default settings of some of the variables that control newline translation
and interpretation are somewhat conservative.
Although the issue of multiple newline conventions primarily affects MacOSX users, the
functionality described here is available under LinuxPPC as well (and may occasionally be
useful there.)
None of this addresses issues related to the third newline convention ("CRLF") in
widespread use (since that convention isn't native to any platform on which Clozure CL
currently runs). If Clozure CL is ever ported to such a platform, that issue might be
revisited.
Note that some MacOS programs (including some versions of commercial MCL) may use
HFS file type information to recognize TEXT and other file types and so may fail to
recognize files created with Clozure CL or other Darwin applications (regardless of line
termination issues.)
Unless otherwise noted, the symbols mentioned in this documentation are exported from
the CCL package.
Despite what Darwin's man pages say, early versions of its math library (up to and
including at least OSX 10.2 (Jaguar) don't implement single-precision variants of the
transcendental and trig functions (#_sinf, #_atanf, etc.) Clozure CL worked around this by
coercing single-precision args to double-precision, calling the double-precision version of
the math library function, and coercing the result back to a SINGLE-FLOAT. These steps
can introduce rounding errors (and potentially overflow conditions) that might not be
present or as severe if true 32-bit variants were available.
Shared libraries
Cocoa is one of Apple's APIs for GUI programming; for most purposes, development is
considerably faster with Cocoa than with the alternatives. You should have a little
familiarity with it, to better understand this section.
A small sample Cocoa program can be invoked by evaluating (REQUIRE 'TINY) and then
(CCL::TINY-SETUP). This program provides a simple example of using several of the
bridge's capabilities.
The Tiny demo creates Cocoa objects dynamically, at runtime, which is always an option.
However, for large applications, it is usually more convenient to create your objects with
Apple Interface Builder, and store them in .nib files to be loaded when needed. Both
approaches can be freely mixed in a single program.
The syntax of the constructs used to define Cocoa classes and methods has changed a bit (it
was never documented outside of the source code and never too well documented at all),
largely as the result of functionality offered by Randall Beer's bridge; the "standard
name-mapping conventions" referenced below are described in his CocoaBridgeDoc.txt
file, as are the constructs used to invoke ("send messages to") Cocoa methods.
All of the symbols described below are currently internal to the CCL package.
The Cocoa API is broken into several pieces. The Application Kit, affectionately called
AppKit, is the one which deals with window management, drawing, and handling events.
AppKit really wants all these things to be done by a "distinguished thread". creation, and
drawing to take place on a distinguished thread.
Apple has published some guidelines which discuss these issues in some detail; see the
Apple Multithreading Documentation, and in particular the guidelines on Using the
Application Kit from Multiple Threads. The upshot is that there can sometimes be
unexpected behavior when objects are created in threads other than the distinguished
event thread; eg, the event thread sometimes starts performing operations on objects that
haven't been fully initialized.
Each thread in the Cocoa runtime system is expected to maintain a current "autorelease
pool" (an instance of the NSAutoreleasePool class); newly created objects are often added
to the current autorelease pool (via the -autorelease method), and periodically the current
autorelease pool is sent a "-release" message, which causes it to send "-release" messages to
all of the objects that have been added to it.
If the current thread doesn't have a current autorelease pool, the attempt to autorelease
any object will result in a severe-looking warning being written via NSLog. The event
thread maintains an autorelease pool (it releases the current pool after each event is
processed and creates a new one for the next event), so code that only runs in that thread
should never provoke any of these severe-looking NSLog messages.
To try to suppress these messages (and still participate in the Cocoa memory management
scheme), each listener thread (the initial listener and any created via the "New Listener"
command in the IDE) is given a default autorelease pool; there are REPL colon-commands
for manipulating the current listener's "toplevel autorelease pool".
In the current scheme, every time that Cocoa calls lisp code, a lisp error handler is
established which maps any lisp conditions to ObjC exceptions and arranges that this
exception is raised when the callback to lisp returns. Whenever lisp code invokes a Cocoa
method, it does so with an ObjC exception handler in place; this handler maps ObjC
exceptions to lisp conditions and signals those conditions.
Any unhandled lisp error or ObjC exception that occurs during the execution of the
distinguished event thread's event loop causes a message to be NSLog'ed and the event
loop to (try to) continue execution. Any error that occurs in other threads is handled at the
point of the outermost Cocoa method invocation. (Note that the error is not necessarily
"handled" in the dynamic context in which it occurs.)
Both of these behaviors could possibly be improved; both of them seem to be substantial
improvements over previous behaviors (where, for instance, a misspelled message name
typically terminated the application.)
Acknowledgement
The Cocoa bridge was originally developed, and generously contributed by, Randall Beer.
You may have noticed that (require "COCOA") takes a long time to load. It is possible to
avoid this by saving a Lisp heap image which has everything already loaded. There is an
example file which allows you to do this, "ccl/examples/cocoa-application.lisp", by
producing a double-clickable application which runs your program. First, load your own
program. Then, do:
? (require "COCOA-APPLICATION")
When it finishes, you should be able to double-click the Clozure CL icon in the ccl
directory, to quickly start your program.
The OS may have already decided that Clozure CL.app isn't a valid executable bundle, and
therefore won't let you double-click it. If this happens to you, to force it to reconsider, just
update the last-modified time of the bundle. In Terminal:
When an image which had contained ObjC classes (which are also CLOS classes) is
re-launched, those classes are "revived": all preexisting classes have their addresses
updated destructively, so that existing subclass/superclass/metaclass relationships are
maintained. It's not possible (and may never be) to preserve foreign instances across
SAVE-APPLICATION. (It may be the case that NSArchiver and NSCoder and related
classes offer some approximation of that.)
Recommended Reading
These are top-level pages pertaining to Cocoa in Apple's Mac OS X Developer Library.
If you are unfamiliar with Cocoa, these links are good places to start.
This is one of the two most important Cocoa references; it covers all of the basics,
except for GUI programming. This is a reference, not a tutorial.
This is the other very important Cocoa reference; it covers GUI programming with
Cocoa / Application Kit Framework in considerable depth. This is a reference, not a
tutorial.
This is the top page for Mac OS X developer documentation. Go here to find the
documentation on any other Mac OS X API. Also go here if you need general
guidance about OS X, Carbon, Cocoa, Core Foundation, or Objective-C.
Operating-System Dictionary
class-name
a string which denotes an existing class name, or a symbol which can be mapped to
such a string via the standard name-mapping conventions for class names
Used to refer to a known ObjC class by name. (Via the use LOAD-TIME-VALUE, the results
of a class-name -> class lookup are cached.)
objc:@class is obsolete as of late 2004, because find-class now works on ObjC classes. It
is described here only because some old code still uses it.
string
name-and-result-type
either an Objective-C message name, for methods that return a value of type :ID, or a
list containing an Objective-C message name and a foreign type specifier for methods
with a different foreign result type.
receiver-arg-and-class
a two-element list whose first element is a variable name and whose second element
is the Lisp name of an Objective-C class or metaclass. The receiver variable name can
be any bindable lisp variable name, but SELF might be a reasonable choice. The
receiver variable is declared to be "unsettable"; i.e., it is an error to try to change the
value of the receiver in the body of the method definition.
other-args
either variable names (denoting parameters of type :ID) or 2-element lists whose
first element is a variable name and whose second element is a foreign type specifier.
For a detailed description of the features and restrictions of the OBJC:DEFMETHOD macro,
see the section Using objc:defmethod.
selector
either a string which represents the name of the selector or a list which describes the
method's return type, selector components, and argument types (see below.) If the
first form is used, then the first form in the body must be a list which describes the
selector's argument types and return value type, as per DEFCALLBACK.
class-name
either a string which names an existing ObjC class name or a list symbol which can
map to such a string via the standard name-mapping conventions for class names.
(Note that the "canonical" lisp class name is such a symbol)
Defines an ObjC-callable method which implements the specified message selector for
instances of the existing ObjC class class-name.
As per DEFINE-OBJC-METHOD
a list of alternating keywords and variable/type specifiers, where the set of keywords
can be mapped to a selector string for a parameterized method according to the
standard name-mapping conventions for method selectors and each variable/type-
specifier is either a variable name (denoting a value of type :ID) or a list whose CAR is
a variable name and whose CADR is the corresponding argument's foreign type
specifier.
ccl:*alternate-line-terminator* [Variable]
This variable is currently only used by the standard reader macro function for #\;
(single-line comments); that function reads successive characters until EOF, a #\NewLine
is read, or a character EQL to the value of *alternate-line-terminator* is read. In Clozure
CL for Darwin, the value of this variable is initially #\Return ; in Clozure CL for other OSes,
it's initially NIL.
Their default treatment by the #\; reader macro is the primary way in which #\Return and
#\Linefeed differ syntactically; by extending the #\; reader macro to (conditionally) treat
#\Return as a comment-terminator, that distinction is eliminated. This seems to make
LOAD and COMPILE-FILE insensitive to line-termination issues in many cases. It could
fail in the (hopefully rare) case where a LF-terminated (Unix) text file contains embedded
#\Return characters, and this mechanism isn't adequate to handle cases where newlines
are embedded in string constants or other tokens (and presumably should be translated
from an external convention to the external one) : it doesn't change what READ-CHAR or
READ-LINE "see", and that may be necessary to handle some more complicated cases.
ccl::ns-lisp-string [Class]
NS:NS-STRING
:string
This class implements the interface of an NSString, which means that it can be passed to
any Cocoa or Core Foundation function which expects one.
The string itself is stored on the Lisp heap, which means that its memory management is
automatic. However, the ns-lisp-string object itself is a foreign object (that is, it has an objc
metaclass), and resides on the foreign heap. Therefore, it is necessary to explicitly free it,
by sending a dealloc message.
You can create an ns-lisp-string with make-instance, just like any normal Lisp class:
? (defvar *the-string*
(make-instance 'ccl::ns-lisp-string
:string "Hello, Cocoa."))
When you are done with the string, you must explicitly deallocate it:
You may wish to use an unwind-protect form to ensure that this happens:
(let (*the-string*)
(unwind-protect (progn (setq *the-string*
(make-instance 'ccl::ns-lisp-string
:string "Hello, Cocoa."))
(format t "~&The string is ~D characters long.~%
(ccl::send *the-string* 'length)))
(when *the-string*
(ccl::send *the-string* 'dealloc))))
Release 0.10 or later of Clozure CL uses a different memory management scheme than
previous versions did. Those earlier versions would allocate a block of memory (of specified
size) at startup and would allocate lisp objects within that block. When that block filled
with live (non-GCed) objects, the lisp would signal a "heap full" condition. The heap size
imposed a limit on the size of the largest object that could be allocated.
The new strategy involves reserving a very large (2GB on DarwinPPC32, 1GB on LinuxPPC,
"very large" on 64-bit implementations) block at startup and consuming (and
relinquishing) its contents as the size of the live lisp heap data grows and shrinks. After the
initial heap image loads and after each full GC, the lisp kernel will try to ensure that a
specified amount (the "lisp-heap-gc-threshold") of free memory is available. The initial
value of this kernel variable is 16MB on 32-bit implementations and 32MB on 64-bit
implementations ; it can be manipulated from Lisp (see below.)
The large reserved memory block consumes very little in the way of system resources;
memory that's actually committed to the lisp heap (live data and the "threshold" area
where allocation takes place) consumes finite resources (physical memory and swap
space). The lisp's consumption of those resources is proportional to its actual memory
usage, which is generally a good thing.
This scheme is much more flexible than the old one, but it may also increase the possibility
that those resources can become exhausted. Neither the new scheme nor the old handles
that situation gracefully; under the old scheme, a program that consumes lots of memory
may have run into an artificial limit on heap size before exhausting virtual memory.
The -R or -heap-reserve command-line option can be use to limit the size of the reserved
block and therefore bound heap expansion. Running
would provide an execution environment that's very similar to that provided by earlier
Clozure CL versions.
Ephemeral GC
For many programs, the following observations are true to a very large degree:
1. Most heap-allocated objects have very short lifetimes ("are ephemeral"): they become
inaccessible soon after they're created.
2. Most non-ephemeral objects have very long lifetimes: it's rarely productive for the GC
to consider reclaiming them, since it's rarely able to do so. (An object that has
survived a large number of GCs is likely to survive the next one. That's not always
true of course, but it's a reasonable heuristic.)
3. It's relatively rare for an old object to be destructively modified (via SETF) so that it
points to a new one, therefore most references to newly-created objects can be found
in the stacks and registers of active threads. It's not generally necessary to scan the
entire heap to find references to new objects (or to prove that such references don't
exists), though it is necessary to keep track of the (hopefully exceptional) cases where
old objects are modified to point at new ones.
and disruptive, and minimizing the frequency (and sometimes the duration) of these
pauses is probably the EGC's primary goal (though there may be other benefits, such as
increased locality of reference and better paging behavior.) The EGC generally leads to
slightly longer execution times (and slightly higher, amortized GC time), but there are cases
where it can improve overall performance as well; the nature and degree of its impact on
performance is highly application-dependent.
Most EGC strategies (including the one employed by Clozure CL) logically or physically
divide memory into one or more areas of relatively young objects ("generations") and one
or more areas of old objects. Objects that have survived one or more GCs as members of a
young generation are promoted (or "tenured") into an older generation, where they may or
may not survive long enough to be promoted to the next generation and eventually may
become "old" objects that can only be reclaimed if a full GC proves that there are no live
references to them. This filtering process isn't perfect - a certain amount of premature
tenuring may take place - but it usually works very well in practice.
It's important to note that a GC of the youngest generation is typically very fast (perhaps a
few milliseconds on a modern CPU, depending on various factors), Clozure CL's EGC is not
concurrent and doesn't offer realtime guarantees.
Clozure CL's EGC maintains three ephemeral generations; all newly created objects are
created as members of the youngest generation. Each generation has an associated
threshold, which indicates the number of bytes in it and all younger generations that can be
allocated before a GC is triggered. These GCs will involve the target generation and all
younger ones (and may therefore cause some premature tenuring); since the older
generations have larger thresholds, they're GCed less frequently and most short-lived
objects that make it into an older generation tend not to survive there very long.
The EGC can be enabled or disabled under program control; under some circumstances, it
may be enabled but inactive (because a full GC is imminent.) Since it may be hard to know
or predict the consing behavior of other threads, the distinction between the "active" and
"inactive" state isn't very meaningful, especially when native threads are involved.
Many programs reach near stasis in terms of the amount of logical memory that's in use
after full GC (or run for long periods of time in a nearly static state), so the logical address
range used for consing after the Nth full GC is likely to be nearly or entirely identical to the
address range used by the N+1th full GC.
By default (and traditionally in Clozure CL), the GC's policy is to "release" the pages in this
address range: to advise the virtual memory system that the pages contain garbage and any
physical pages associated with them don't need to be swapped out to disk before being
reused and to (re-)map the logical address range so that the pages will be zero-filled by the
virtual memory system when they're next accessed. This policy is intended to reduce the
load on the VM system and keep Clozure CL's working set to a minimum.
For some programs (especially those that cons at a very high rate), the default policy may
be less than ideal: releasing pages that are going to be needed almost immediately - and
zero-fill-faulting them back in, lazily - incurs unnecessary overhead. (There's a false
economy associated with minimizing the size of the working set if it's just going to shoot
back up again until the next GC.) A policy of "retaining" pages between GCs might work
better in such an environment.
Functions described below give the user some control over this behavior. An adaptive,
feedback-mediated approach might yield a better solution.
SAVE-APPLICATION identifies code vectors and the pnames of interned symbols and
copies these objects to a "pure" area of the image file it creates. (The "pure" area accounts
for most of what the ROOM function reports as "static" space.)
When the resulting image file is loaded, the pure area of the file is now memory-mapped
with read-only access. Code and pure data are paged in from the image file as needed (and
don't compete for global virtual memory resources with other memory areas.)
Code-vectors and interned symbol pnames are immutable : it is an error to try to change
the contents of such an object. Previously, that error would have manifested itself in some
random way. In the new scheme, it'll manifest itself as an "unhandled exception" error in
the Lisp kernel. The kernel could probably be made to detect a spurious, accidental write to
read-only space and signal a lisp error in that case, but it doesn't yet do so.
The image file should be opened and/or mapped in some mode which disallows writing to
the memory-mapped regions of the file from other processes. I'm not sure of how to do
that; writing to the file when it's mapped by Clozure CL can have unpredictable and
unpleasant results. SAVE-APPLICATION will delete its output file's directory entry and
create a new file; one may need to exercise care when using file system utilities (like tar, for
instance) that might overwrite an existing image file.
Weak References
In general, a "weak reference" is a reference to an object which does not prevent the object
from being garbage-collected. For example, suppose that you want to keep a list of all the
objects of a certain type. If you don't take special steps, the fact that you have a list of them
will mean that the objects are always "live", because you can always reference them
through the list. Therefore, they will never be garbage-collected, and their memory will
never be reclaimed, even if they are referenced nowhere else in the program. If you don't
want this behavior, you need weak references.
Clozure CL supports weak references with two kinds of objects: weak hash tables and
populations.
Weak hash tables are created with the standard Common Lisp function make-hash-
table, which is extended to accept the keyword argument :weak. Hash tables may be
weak with respect to either their keys or their values. To make a hash table with weak keys,
invoke make-hash-table with the option :weak t, or, equivalently, :weak :key. To make
one with weak values, use :weak :value. When the key is weak, the equality test must be
#'eq (because it wouldn't make sense otherwise).
When garbage-collection occurs, key-value pairs are removed from the hash table if there
are no non-weak references to the weak element of the pair (key or value).
In general, weak-key hash tables are useful when you want to use the hash to store some
extra information about the objects you look up in it, while weak-value hash tables are
useful when you want to use the hash as an index for looking up objects.
If you are experimenting with weak references interactively, remember that an object is not
dead if it was returned by one of the last three interactively-evaluated expressions, because
of the variables *, **, and ***. The easy workaround is to evaluate some meaningless
expression before invoking gc, to get the object out of the REPL variables.
type
initial-contents
returns the list encapsulated in population. Note that as long as there is a direct
(non-weak) reference to this list, it will not be modified by the garbage collector. Therefore
it is safe to traverse the list, and even modify it, no different from any other list. If you want
the elements to become garbage-collectable again, you must stop refering to the list
directly.
Sets the list encapsulated in population to contents. Contents is not copied, it is used
directly.
Garbage-Collection Dictionary
gc [Function]
lisp-heap-gc-threshold [Function]
Returns the value of the kernel variable that specifies the amount of free space to leave in
the heap after full GC.
new-threshold
Sets the value of the kernel variable that specifies the amount of free space to leave in the
heap after full GC to new-value, which should be a non-negative fixnum. Returns the value
of that kernel variable (which may be somewhat larger than what was specified).
use-lisp-heap-gc-threshold [Function]
Tries to grow or shrink lisp's heap space, so that the free space is (approximately) equal to
the current heap threshold. Returns NIL
arg
a generalized boolean
Enables the EGC if arg is non-nil, disables the EGC otherwise. Returns the previous
enabled status. Although this function is thread-safe (in the sense that calls to it are
serialized), it doesn't make a whole lot of sense to be turning the EGC on and off from
multiple threads ...
egc-enabled-p [Function]
Returns T if the EGC was enabled at the time of the call, NIL otherwise.
egc-active-p [Function]
Returns T if the EGC was active at the time of the call, NIL otherwise. Since this is
generally a volatile piece of information, it's not clear whether this function serves a useful
purpose when native threads are involved.
egc-configuration [Function]
Returns, as multiple values, the sizes in kilobytes of the thresholds associated with the
youngest ephemeral generation, the middle ephemeral generation, and the oldest
ephemeral generation
generation-0-size
generation-1-size
generation-2-size
Puts the indicated threshold sizes in effect. Each threshold indicates the total size that may
be allocated in that and all younger generations before a GC is triggered. Disables EGC
while setting the values. (The provided threshold sizes are rounded up to a multiple of
64Kbytes in Clozure CL 0.14 and to a multiple of 32KBytes in earlier versions.)
arg
a generalized boolean
Tries to influence the GC to retain/recycle the pages allocated between GCs if arg is true,
and to release them otherwise. This is generally a tradeoff between paging and other VM
considerations.
gc-retaining-pages [Function]
Returns T if the GC tries to retain pages between full GCs and NIL if it's trying to release
them to improve VM paging performance.
Fixnums on 32-bit systems are 30 bits long, and cover the interval (-536870912,
536870911). Fixnums on 64-bit systems are 61 bits long, and cover the interval
(-1152921504606846976, 1152921504606846975) (see Tagging scheme).
Clozure CL's threads are "native" (meaning that they're scheduled and controlled by the
operating system.) Most of the implications of this are discussed elsewhere; this section
tries to describe how threads look from the lisp kernel's perspective (and especially from
the GC's point of view.)
Clozure CL's runtime system tries to use machine-level exception mechanisms (conditional
traps when available, illegal instructions, memory access protection in some cases) to
detect and handle exceptional situations. These situations include some TYPE-ERRORs
and PROGRAM-ERRORS (notably wrong-number-of-args errors), and also include cases
like "not being able to allocate memory without GCing or obtaining more memory from the
OS." The general idea is that it's usually faster to pay (very occasional) exception-
processing overhead and figure out what's going on in an exception handler than it is to
maintain enough state and context to handle an exceptional case via a lighter-weight
mechanism when that exceptional case (by definition) rarely occurs.
Some emulated execution environments (the Rosetta PPC emulator on x86 versions of Mac
OS X) don't provide accurate exception information to exception handling functions.
Clozure CL can't run in such environments.
When a lisp thread is first created (or when a thread created by foreign code first calls back
to lisp), a data structure called a Thread Context Record (or TCR) is allocated and
initialized. On modern versions of Linux and FreeBSD, the allocation actually happens via
a set of thread-local-storage ABI extensions, so a thread's TCR is created when the thread is
created and dies when the thread dies. (The World's Most Advanced Operating System-as
Apple's marketing literature refers to Darwin-is not very advanced in this regard, and I
know of no reason to assume that advances will be made in this area anytime soon.)
A TCR contains a few dozen fields (and is therefore a few hundred bytes in size.) The fields
are mostly thread-specific information about the thread's stacks' locations and sizes,
information about the underlying (POSIX) thread, and information about the thread's
dynamic binding history and pending CATCH/UNWIND-PROTECTs. Some of this
information could be kept in individual machine registers while the thread is running (and
the PPC - which has more registers available - keeps a few things in registers that the
X86-64 has to access via the TCR), but it's important to remember that the information is
thread-specific and can't (for instance) be kept in a fixed global memory location.
When lisp code is running, the current thread's TCR is kept in a register. On PPC
platforms, a general purpose register is used; on x86-64, an (otherwise nearly useless)
segment register works well (prevents the expenditure of a more generally useful general-
purpose register for this purpose.)
The address of a TCR is aligned in memory in such a way that a FIXNUM can be used to
represent it. The lisp function CCL::%CURRENT-TCR returns the calling thread's TCR as a
fixnum; actual value of the TCR's address is 4 or 8 times the value of this fixnum.
When the lisp kernel initializes a new TCR, it's added to a global list maintained by the
kernel; when a thread exits, its TCR is removed from this list.
When a thread calls foreign code, lisp stack pointers are saved in its TCR, lisp registers (at
least those whose value should be preserved across the call) are saved on the thread's value
stack, and (on x86-64) RSP is switched to the control stack. A field in the TCR (tcr.valence)
is then set to indicate that the thread is running foreign code, foreign argument registers
are loaded from a frame on the foreign stack, and the foreign function is called. (That's a
little oversimplified and possibly inaccurate, but the important things to note are that the
thread "stops following lisp stack and register usage conventions" and that it advertises the
fact that it's done so. Similar transitions in a thread's state ("valence") occur when it enters
or exits an exception handler (which is sort of an OS/hardware-mandated foreign function
call where the OS thoughtfully saves the thread's register state for it beforehand.)
Unix-like OSes tend to refer to exceptions as "signals"; the same general mechanism
("signal handling") is used to process both asynchronous OS-level events (such as the result
of the keyboard driver noticing that ^C or ^Z has been pressed) and synchronous
hardware-level events (like trying to execute an illegal instruction or access protected
memory.) It makes some sense to defer ("block") handling of asynchronous signals so that
some critical code sequences complete without interruption; since it's generally not
possible for a thread to proceed after a synchronous exception unless and until its state is
modified by an exception handler, it makes no sense to talk about blocking synchronous
signals (though some OSes will let you do so and doing so can have mysterious effects.)
On OSX/Darwin, the POSIX signal handling facilities coexist with lower-level Mach-based
exception handling facilities. Unfortunately, the way that this is implemented interacts
poorly with debugging tools: GDB will generally stop whenever the target program
encounters a Mach-level exception and offers no way to proceed from that point (and let
the program's POSIX signal handler try to handle the exception); Apple's CrashReporter
program has had a similar issue and, depending on how it's configured, may bombard the
user with alert dialogs which falsely claim that an application has crashed (when in fact the
application in question has routinely handled a routine exception.) On Darwin/OSX,
Clozure CL uses Mach thread-level exception handling facilities which run before GDB or
CrashReporter get a chance to confuse themselves; Clozure CL's Mach exception handling
tries to force the thread which received a synchronous exception to invoke a signal
handling function ("as if" signal handling worked more usefully under Darwin.) Mach
exception handlers run in a dedicated thread (which basically does nothing but wait for
exception messages from the lisp kernel, obtain and modify information about the state of
threads in which exceptions have occurred, and reply to the exception messages with an
indication that the exception has been handled. The reply from a thread-level exception
handler keeps the exception from being reported to GDB or CrashReporter and avoids the
problems related to those programs. Since Clozure CL's Mach exception handler doesn't
claim to handle debugging-related exceptions (from breakpoints or single-step operations),
it's possible to use GDB to debug Clozure CL.
On platforms where signal handling and debugging don't get in each other's way, a signal
handler is entered with all signals blocked. (This behavior is specified in the call to the
sigaction() function which established the signal handler.) The signal handler receives
three arguments from the OS kernel; the first is an integer that identifies the signal, the
second is a pointer to an object of type "siginfo_t", which may or may not contain a few
fields that would help to identify the cause of the exception, and the third argument is a
pointer to a data structure (called a "ucontext" or something similar), which contains
machine-dependent information about the state of the thread at the time that the
exception/signal occurred. While asynchronous signals are blocked, the signal handler
stores the pointer to its third argument (the "signal context") in a field in the current
thread's TCR, sets some bits in another TCR field to indicate that the thread is now waiting
to handle an exception, unblocks asynchronous signals, and waits for a global exception
lock that serializes exception processing.
On Darwin, the Mach exception thread creates a signal context (and maybe a siginfo_t
structure), stores the signal context in the thread's TCR, sets the TCR field which describes
the thread's state, and arranges that the thread resume execution at its signal handling
function (with a signal handler, possibly NULL siginfo_t, and signal context as arguments.
When the thread resumes, it waits for the global exception lock.
On x86-64 platforms where signal handing can be used to handle synchronous exceptions,
there's an additional complication: the OS kernel ordinarily allocates the signal context and
siginfo structures on the stack of the thread that received the signal; in practice, that means
"wherever RSP is pointing." Clozure CL's Register and stack usage conventions require that
the thread's value stack-where RSP is usually pointing while lisp code is running-contain
only "nodes" (properly tagged lisp objects), and scribbling a signal context all over the
value stack would violate this requirement. To maintain consistency, the sigaltstack()
mechanism is used to cause the signal to be delivered on (and the signal context and siginfo
to be allocated on) a special stack area (the last few pages of the thread's control stack, in
practice). When the signal handler runs, it (carefully) copies the signal context and siginfo
to the thread's control stack and makes RSP point into that stack before invoking the "real"
signal handler. The effect of this hack is that the "real" signal handler always runs on the
thread's control stack.
Once the exception handler has obtained the global exception lock, it uses the values of the
signal number, siginfo_t, and signal context arguments to determine the (logical) cause of
the exception. Some exceptions may be caused by factors that should generate lisp errors
or other serious conditions (stack overflow); if this is the case, the kernel code may release
the global exception lock and call out to lisp code. (The lisp code in question may need to
repeat some of the exception decoding process; in particular, it needs to be able to interpret
register values in the signal context that it receives as an argument.)
In some cases, the lisp kernel exception handler may not be able to recover from the
exception (this is currently true of some types of memory-access fault and is also true of
traps or illegal instructions that occur during foreign code execution. In such cases, the
kernel exception handler reports the exception as "unhandled", and the kernel debugger is
invoked.
If the kernel exception handler identifies the exception's cause as being a transient out-of-
memory condition (indicating that the current thread needs more memory to cons in), it
tries to make that memory available. In some cases, doing so involves invoking the GC.
The signal handler for the asynchronous "suspend" signal is entered with all asynchronous
signals blocked. It saves its signal-context argument in a TCR slot, raises the tcr's
"suspend" semaphore, then waits on the TCR's "resume" semaphore.
The GC thread has access to the signal contexts of all TCRs (including its own) at the time
when the thread received an exception or acknowledged a request to suspend itself. This
information (and information about stack areas in the TCR itself) allows the GC to identify
the "stack locations and register contents" that are elements of the GC's root set.
PC-lusering
It's not quite accurate to say that Clozure CL's compiler and runtime follow precise stack
and register usage conventions at all times; there are a few exceptions:
On both PPC and x86-64 platforms, consing isn't fully atomic.It takes at least a few
instructions to allocate an object in memory(and slap a header on it if necessary); if a
thread is interrupted in the middle of that instruction sequence, the new object may
or may not have been created or fully initialized at the point in time that the interrupt
occurred. (There are actually a few different states of partial initialization)
On the PPC, the common act of building a lisp control stack frame involves allocating
a four-word frame and storing three register values into that frame. (The fourth word
- the back pointer to the previous frame - is automatically set when the frame is
allocated.) The previous contents of those three words are unknown (there might
have been a foreign stack frame at the same address a few instructions earlier),so
interrupting a thread that's in the process of initializing a PPC control stack frame
isn't GC-safe.
There are similar problems with the initialization of temp stackframes on the PPC.
(Allocation and initialization doesn't happen atomically, and the newly allocated
stack memory may have undefined contents.)
This works because (a) many of the troublesome instruction sequences are PPC-specific
and it's relatively easy to partially disassemble the instructions surrounding the interrupted
thread's PC on the PPC and (b) those instruction sequences are heavily stylized and
intended to be easily recognized.
Overview
The set of live, reachable lisp objects basically form the nodes of a (usually large) graph,
with edges from each node A to any other objects (nodes) that object A references.
Some nodes in this graph can never have outgoing edges: an array with a specialized
numeric or character type usually represents its elements in some (possibly more compact)
specialized way. Some nodes may refer to lisp objects that are never allocated in memory
(FIXNUMs, CHARACTERs, SINGLE-FLOATs on 64-bit platforms ..) This latter class of
objects are sometimes called "immediates", but that's a little confusing because the term
"immediate" is sometimes used to refer to things that can never be part of the big
connectivity graph (e.g., the "raw" bits that make up a floating-point value, foreign address,
or numeric value that needs to be used - at least fleetingly - in compiled code.)
For the GC to be able to build the connectivity graph reliably, it's necessary for it to be able
to reliably tell (a) whether or not a "potential root" - the contents of a machine register or
stack location - is in fact a node and (b) for any node, whether it may have components that
refer to other nodes.
There's no reliable way to answer the first question on stock hardware. (If everything was a
node, as might be the case on specially microcoded "lisp machine" hardware, it wouldn't
even need to be asked.) Since there's no way to just look at a machine word (the contents of
a machine register or stack location) and tell whether or not it's a node or just some
random non-node value, we have to either adopt and enforce strict conventions on register
and stack usage or tolerate ambiguity.
Once we've decided that a given machine word is a node, a Tagging scheme describes how
the node's value and type are encoded in that machine word.
Most of this discussion-so far-has treated things from the GC's very low-level perspective.
From a much higher point of view, lisp functions accept nodes as arguments, return nodes
as values, and (usually) perform some operations on those arguments in order to produce
those results. (In many cases, the operations in question involve raw non-node values.)
Higher-level parts of the lisp type system (functions like TYPE-OF and CLASS-OF, etc.)
depend on the Tagging scheme.
On the PPC, there's a third case (besides "node" and "immediate" values). As discussed
below, a node that denotes a memory-allocated lisp object is a biased (tagged) pointer -to-
that object; it's not generally possible to point -into- some composite (multi-element)
object (such a pointer would not be a node, and the GC would have no way to update the
pointer if it were to move the underlying object.)
Such a pointer ("into" the interior of a heap-allocated object) is often called a locative; the
cases where locatives are allowed in Clozure CL mostly involve the behavior of function call
and return instructions. (To be technically accurate, the other case also arises on x86-64,
but that case isn't as user-visible.)
On the PowerPC (both PPC32 and PPC64), all machine instructions are 32 bits wide and all
instruction words are allocated on 32-bit boundaries. In PPC Clozure CL, a
CODE-VECTOR is a specialized type of vector-like object; its elements are 32-bit PPC
machine instructions. A CODE-VECTOR is an attribute of a FUNCTION object; a function
call involves accessing the function's code-vector and jumping to the address of its first
instruction.
As each instruction in the code vector sequentially executes, the hardware program counter
(PC) register advances to the address of the next instruction (a locative into the code
vector); since PPC instructions are always 32 bits wide and aligned on 32-bit boundaries,
the low two bits of the PC are always 0. If the function executes a call (simple call
instructions have the mnemonic "bl" on the PPC, which stands for "branch and link"), the
address of the next instruction (also a word-aligned locative into a code-vector) is copied
into the special- purpose PPC "link register" (lr); a function returns to its caller via a
"branch to link register" (blr) instruction. Some cases of function call and return might also
use the PPC's "count register" (ctr), and if either the lr or ctr needs to be stored in memory
it needs to first be copied to a general-purpose register.
Clozure CL's GC understands that certain registers contain these special "pc-locatives"
(locatives that point into CODE-VECTOR objects); it contains special support for finding
the containing CODE-VECTOR object and for adjusting all of these "pc-locatives" if the
containing object is moved in memory. The first part of that operation-finding the
containing object-is possible and practical on the PPC because of architectural artifacts
(fixed-width instructions and arcana of instruction encoding.) It's not possible on x86-64,
but fortunately not necessary either (though the second part - adjusting the PC/RIP when
the containing object moves) is both necessary and simple.
Stack conventions
On both PPC and X86 platforms, each lisp thread uses 3 stacks; the ways in which these
stacks are used differs between the PPC and X86.
A "control stack". On both platforms, this is "the stack" used by foreign code. On the
PPC, it consists of a linked list of frames where the first word in each frame points to
the first word in the previous frame (and the outermost frame points to 0.) Some
frames on a PPC control stack are lisp frames; lisp frames are always 4 words in size
and contain (in addition to the back pointer to the previous frame) the calling
function (a node), the return address (a "locative" into the calling function's
code-vector), and the value to which the value-stack pointer (see below) should be
restored on function exit. On the PPC, the GC has to look at control-stack frames,
identify which of those frames are lisp frames, and treat the contents of the saved
function slot as a node (and handle the return address locative specially.) On x86-64,
the control stack is used for dynamic-extent allocation of immediate objects. Since
the control stack never contains nodes on x86-64, the GC ignores it on that platform.
Alignment of the control stack follows the ABI conventions of the platform (at least at
any point in time where foreign code could run.) On PPC, the r1 register always
points to the top of the current thread's control stack; on x86-64, the RSP register
points to the top of the current thread's control stack when the thread is running
foreign code and the address of the top of the control stack is kept in the thread's TCR
(see The Thread Context Record when not running foreign code. The control stack
"grows down."
A "value stack". On both platforms, all values on the value stack are nodes (including
"tagged return addresses" on x86-64.) The value stack is always aligned to the native
word size; objects are always pushed on the value stack using atomic instructions
("stwu"/"stdu" on PPC, "push" on x86-64), so the contents of the value stack between
its bottom and top are always unambiguously nodes; the compiler usually tries to pop
or discard nodes from the value stack as soon as possible after their last use (as soon
as they may have become garbage.) On x86-64, the RSP register addresses the top of
the value stack when running lisp code; that address is saved in the TCR when
running foreign code. On the PPC, a dedicated register (VSP, currently r15) is used to
address the top of the value stack when running lisp code, and the VSP value is saved
in the TCR when running foreign code. The value stack grows down.
A "temp stack". The temp stack consists of a linked list of frames, each of which
points to the previous temp stack frame. The number of native machine words in
each temp stack frame is always even, so the temp stack is aligned on a two-word (64-
or 128-bit) boundary. The temp stack is used for dynamic-extent objects on both
platforms; on the PPC, it's used for essentially all such objects (regardless of whether
or not the objects contain nodes); on the x86-64, immediate dynamic-extent objects
(strings, foreign pointers, etc.) are allocated on the control stack and only
Register conventions
The ultimate definition of register partitioning is hardwired into the GC in functions like
"mark_xp()" and "forward_xp()", which process the values of some of the registers in an
exception frame as nodes and may give some sort of special treatment to other register
values they encounter there.)
The RAX, RCX, and RDX registers are used as the implicit operands and results of
some extended-precision multiply and divide instructions which generally involve
non-node values; since their use in these instructions means that they can't be
guaranteed to contain node values at all times, it's natural to put these registers in the
"immediate" set. RAX is generally given the symbolic name "imm0", RDX is given the
symbolic name "imm1" and RCX is given the symbolic name "imm2"; you may see
these names in disassembled code, usually in operations involving type checking,
array indexing, and foreign memory and function access.
RSP and RBP have dedicated functionality dictated by the hardware and calling
conventions.
11 "node" registers.
All other registers (RBX, RSI, RDI, and R8-R15) are asserted to contain node values
at (almost) all times; legacy "string" operations that implicitly use RSI and/or RDI
are not used.
ESP and EBP have dedicated functionality dictated by the hardware and calling
conventions.
5 "node" registers.
The remaining registers, (EBX, ECX, EDX, ESI, EDI) normally contain node values.
As on x86-64, string instructions that implicity use ESI and EDI are not used.
There are times when this default partitioning scheme is inadequate. As mentioned in the
x86-64 section, there are instructions like the extended-precision MUL and DIV which
require the use of EAX and EDX. We therefore need a way to change this partitioning at
run-time.
Two schemes are employed. The first uses a mask in the TCR that contains a bit for each
register. If the bit is set, the register is interpreted by the GC as a node register; if it's clear,
the register is treated as an immediate register. The second scheme uses the direction flag
in the EFLAGS register. If DF is set, EDX is treated as an immediate register. (We don't use
the string instructions, so DF isn't otherwise used.)
6 "immediate" registers.
Registers r3-r8 are given the symbolic names imm0-imm5. As a RISC architecture
with simpler addressing modes, the PPC probably uses immediate registers a bit
more often than the CISC x86-64 does, but they're generally used for the same sort of
things (type checking, array indexing, FFI, etc.)
9 dedicated registers
r0 (symbolic name rzero) always contains the value 0 when running lisp code.
Its value is sometimes read as 0 when it's used as the base register in a memory
address; keeping the value 0 there is sometimes convenient and avoids
asymmetry.
r2 is used to hold the current thread's TCR on ppc64 systems; it's not used on
ppc32.
r9 and r10 (symbolic names allocptr and allocbase) are used to do per-thread
memory allocation
r11 (symbolic name nargs) contains the number of function arguments on entry
and the number of return values in multiple-value returning constructs. It's not
used more generally as either a node or immediate register because of the way
that certain trap instruction encodings are interpreted.
r12 (symbolic name tsp) holds the top of the current thread's temp stack.
r13 is used to hold the TCR on PPC32 systems; it's not used on PPC64.
r14 (symbolic name loc-pc) is used to copy "pc-locative" values between main
memory and special-purpose PPC registers (LR and CTR) used in function-call
and return instructions.
r15 (symbolic name vsp) addresses the top of the current thread's value stack.
lr and ctr are PPC branch-unit registers used in function call and return
instructions; they're always treated as "pc-locatives", which precludes the use of
the ctr in some PPC looping constructs.
17 "node" registers
Tagging scheme
Clozure CL always allocates lisp objects on double-node (64-bit for 32-bit platforms,
128-bit for 64-bit platforms) boundaries; this mean that the low 3 bits (32-bit lisp) or 4 bits
(64-bit lisp) are always 0 and are therefore redundant (we only really need to know the
upper 29 or 60 bits in order to identify the aligned object address.) The extra bits in a lisp
node can be used to encode at least some information about the node's type, and the other
29/60 bits represent either an immediate value or a doublenode-aligned memory address.
The low 3 or 4 bits of a node are called the node's "tag bits", and the conventions used to
encode type information in those tag bits are called a "tagging scheme."
It might be possible to use the same tagging scheme on all platforms (at least on all
platforms with the same word size and/or the same number of available tag bits), but there
are often some strong reasons for not doing so. These arguments tend to be very machine-
specific: sometimes, there are fairly obvious machine-dependent tricks that can be
exploited to make common operations on some types of tagged objects faster; other times,
there are architectural restrictions that make it impractical to use certain tags for certain
types. (On PPC64, the "ld" (load doubleword) and "std" (store doubleword) instructions -
which load and store a GPR operand at the effective address formed by adding the value of
another GPR operand and a 16-bit constant operand - require that the low two bits of that
constant operand be 0. Since such instructions would typically be used to access the fields
of things like CONS cells and structures, it's desirable that that the tags chosen for CONS
cells and structures allow the use of these instructions as opposed to more expensive
alternatives.)
One architecture-dependent tagging trick that works well on all architectures is to use a tag
of 0 for FIXNUMs: a fixnum basically encodes its value shifted left a few bits and keeps
those low bits clear. FIXNUM addition, subtraction, and binary logical operations can
operate directly on the node operands, addition and subtraction can exploit
hardware-based overflow detection, and (in the absence of overflow) the hardware result of
those operations is a node (fixnum). Some other slightly-less-common operations may
require a few extra instructions, but arithmetic operations on FIXNUMs should be as
cheap as possible and using a tag of zero for FIXNUMs helps to ensure that it will be.
If we have N available tag bits (N = 3 for 32-bit Clozure CL and N = 4 for 64-bit Clozure
CL), this way of representing fixnums with the low M bits forced to 0 works as long as M
<= N. The smaller we make M, the larger the values of MOST-POSITIVE-FIXNUM and
MOST-NEGATIVE become; the larger we make N, the more distinct non-FIXNUM tags
become available. A reasonable compromise is to choose M = N-1; this basically yields two
distinct FIXNUM tags (one for even fixnums, one for odd fixnums), gives 30-bit fixnums
on 32-bit platforms and 61-bit fixnums on 64-bit platforms, and leaves us with 6 or 14 tags
to encoded other types.
Once we get past the assignment of FIXNUM tags, things quickly devolve into machine-
dependencies. We can fairly easily see that we can't directly tag all other primitive lisp
object types with only 6 or 14 available tag values; the details of how types are encoded vary
between the ppc32, ppc64, and x86-64 implementations, but there are some general
common principles:
CONS cells always contain exactly 2 elements and are usually fairly common.It
therefore makes sense to give CONS cells their own tag. Unlike the fixnum case -
where a tag value of 0 had positive implications - there doesn't seem to be any
advantage to using any particular value. (A longtime ago - in the case of 68K MCL -
the CONS tag and the order of CAR and CDR in memory were chosen to allow
smaller, cheaper addressing modes to be used to "cdr down a list." That's not a factor
on ppc or x86-64, but all versions of Clozure CL still store the CDR of a CONS cell
first in memory. It doesn't matter, but doing it the way that the host system did made
boostrapping to a new target system a little easier.)
Any way you look at it, NIL is a bit ... unusual. NIL is both a SYMBOL and a LIST (as
well as being a canonical truth value and probably a few other things.) Its role as a
LIST is probably much more important to most programs than its role as a SYMBOL
is: LISTP has to be true of NIL and primitives like CAR and CDR do LISTP implicitly
when safe and want that operation to be fast. There are several possible approaches
to this problem; Clozure CL uses two of them. On PPC32 and X86-64, NIL is basically
a weird CONS cell that straddles two doublenodes; the tag of NIL is unique and
congruent modulo 4 (modulo 8 on 64-bit) with the tag used for CONS cells. LISTP is
therefore true of any node whose low 2 (or 3) bits contain the appropriate tag value
(it's not otherwise necessary to special-case NIL.) SYMBOL accessors
(SYMBOL-NAME, SYMBOL-VALUE, SYMBOL-PLIST ..) -do- have to special-case
NIL (and access the components of an internal proxy symbol.) On PPC64 (where
architectural restrictions dictate the set of tags that can be used to access fixed
components of an object), that approach wasn't practical. NIL is just a distinguished
SYMBOL,and it just happens to be the case that its pname slot and values slot are at
the same offsets from a tagged pointer as a CONS cell's CDR and CAR would be.
NIL's pname is set to NIL (SYMBOL-NAME checks for this and returns the string
"NIL"), and LISTP (and therefore safe CAR and CDR) has to check for (OR NULL
CONSP). At least in the case of CAR and CDR, the fact that the PPC has multiple
condition-code fields keeps that extra test from being prohibitively expensive. On
IA-32, we can't afford to dedicate a tag to NIL. NIL is therefore just a distinguished
CONS cell, and we have to explicitly check for a NIL argument in CONSP/RPLACA
/RPLACD.
Some objects are immediate (but not FIXNUMs). This is true of CHARACTERs and,
on 64-bit platforms, SINGLE-FLOATs. It's also true of some nodes used in the
runtime system (special values used to indicate unbound variables and slots, for
instance.) On 64-bit platforms, SINGLE-FLOATs have their own unique tag (making
them a little easier to recognize; on all platforms, CHARACTERs share a tag with
other immediate objects (unbound markers) but are easy to recognize (by looking at
several of their low bits.) The GC treats any node with an immediate tag (and any
node with a fixnum tag) as a leaf.
Heap Allocation
When the Clozure CL kernel first starts up, a large contiguous chunk of the process's
address space is mapped as "anonymous, no access" memory. ("Large" means different
things in different contexts; on LinuxPPC32, it means "about 1 gigabyte", on DarwinPPC32,
it means "about 2 gigabytes", and on current 64-bit platforms it ranges from 128 to 512
gigabytes, depending on OS. These values are both defaults and upper limits; the --heap-
reserve argument can be used to try to reserve less than the default.)
Reserving address space that can't (yet) be read or written to doesn't cost much; in
particular, it doesn't require that corresponding swap space or physical memory be
available. Marking the address range as being "mapped" helps to ensure that other things
(results from random calls to malloc(), dynamically loaded shared libraries) won't be
allocated in this region that lisp has reserved for its own heap growth.
A small portion (around 1/32 on 32-bit platforms and 1/64 on 64-bit platforms) of that
large chunk of address space is reserved for GC data structures. Memory pages reserved for
these data structures are mapped read-write as pages are made writable in the main
portion of the heap.
The initial heap image is mapped into this reserved address space and an additional (LISP-
HEAP-GC-THRESHOLD) bytes are mapped read-write. GC data structures grow to match
the amount of GC-able memory in the initial image plus the gc threshold, and control is
transferred to lisp code. Inevitably, that code spoils everything and starts consing; there are
basically three layers of memory allocation that can go on.
Each lisp thread has a private "reserved memory segment"; when a thread starts up, its
reserved memory segment is empty. PPC ports maintain the highest unallocated address
and the lowest allocatable address in the current segment in registers when running lisp
code; on x86-664, these values are maintained in the current threads's TCR. (An "empty"
heap segment is one whose high pointer and low pointer are equal.) When a thread is not
in the middle of allocating something, the low 3 or 4 bits of the high and low pointers are
clear (the pointers are doublenode-aligned.)
A thread tries to allocate an object whose physical size in bytes is X and whose tag is Y by:
On PPC32, where the size of a CONS cell is 8 bytes and the tag of a CONS cell is 1, machine
code which sets the arg_z register to the result of doing (CONS arg_y arg_z) looks like:
On x86-64, the idea's similar but the implementation is different. The high and low
pointers to the current thread's reserved segment are kept in the TCR, which is addressed
by the gs segment register. An x86-64 CONS cell is 16 bytes wide and has a tag of 3; we
canonically use the temp0 register to initialize the object
If we don't take the trap (if allocating 8-16 bytes doesn't exhaust the thread's reserved
memory segment), that's a fairly short and simple instruction sequence. If we do take the
trap, we'll have to do some additional work in order to get a new segment for the current
thread.
After the lisp image is first mapped into memory - and after each full GC - the lisp kernel
ensures that (LISP-HEAP-GC-TRESHOLD) additional bytes beyond the current end of the
heap are mapped read-write.
If a thread traps while trying to allocate memory, the thread goes through the usual
exception-handling protocol (to ensure that any other thread that GCs "sees" the state of
the trapping thread and to serialize exception handling.) When the exception handler runs,
it determines the nature and size of the failed allocation and tries to complete the
allocation on the thread's behalf (and leave it with a reasonably large thread-specific
memory segment so that the next small allocation is unlikely to trap.
Depending on the size of the requested segment allocation, the number of segment
allocations that have occurred since the last GC, and the EGC and GC thresholds, the
segment allocation trap handler may invoke a full or ephemeral GC before returning a new
segment. It's worth noting that the [E]GC is triggered based on the number of and size of
these segments that have been allocated since the last GC; it doesn't have much to do with
how "full" each of those per-thread segments are. It's possible for a large number of threads
to do fairly incidental memory allocation and trigger the GC as a result; avoiding this
involves tuning the per-thread allocation quantum and the GC/EGC thresholds
appropriately.
Heap growth
All OSes on which Clozure CL currently runs use an "overcommit" memory allocation
strategy by default (though some of them provide ways of overriding that default.) What
this means in general is that the OS doesn't necessarily ensure that backing store is
available when asked to map pages as read-write; it'll often return a success indicator from
the mapping attempt (mapping the pages as "zero-fill, copy-on-write"), and only try to
allocate the backing store (swap space and/or physical memory) when non-zero contents
are written to the pages.
It -sounds- like it'd be better to have the mmap() call fail immediately, but it's actually a
complicated issue. (It's possible that other applications will stop using some backing store
before lisp code actually touches the pages that need it, for instance.) It's also not
guaranteed that lisp code would be able to "cleanly" signal an out-of-memory condition if
lisp is ... out of memory
I don't know that I've ever seen an abrupt out-of-memory failure that wasn't preceded by
several minutes of excessive paging activity. The most expedient course in cases like this is
to either (a) use less memory or (b) get more memory; it's generally hard to use memory
that you don't have.
GC details
The GC uses a Mark/Compact algorithm; its execution time is essentially a factor of the
amount of live data in the heap. (The somewhat better-known Mark/Sweep algorithms
don't compact the live data but instead traverse the garbage to rebuild free-lists; their
execution time is therefore a factor of the total heap size.)
As mentioned in Heap Allocation, two auxiliary data structures (proportional to the size of
the lisp heap) are maintained. These are
1. the markbits bitvector, which contains a bit for every doublenode in the dynamic
heap (plus a few extra words for alignment and so that sub-bitvectors can start on
word boundaries.)
2. the relocation table, which contains a native word for every 32 or 64 doublenodes in
the dynamic heap, plus an extra word used to keep track of the end of the heap.
Mark phase
Each doublenode in the dynamic heap has a corresponding bit in the markbits vector. (For
any doublenode in the heap, the index of its mark bit is determined by subtracting the
address of the start of the heap from the address of the object and dividing the result by 8
or 16.) The GC knows the markbit index of the free pointer, so determining that the
markbit index of a doubleword address is between the start of the heap and the free pointer
can be done with a single unsigned comparison.
The markbits of all doublenodes in the dynamic heap are zeroed before the mark phase
begins. An object is marked if the markbits of all of its constituent doublewords are set and
unmarked otherwise; setting an object's markbits involves setting the corresponding
markbits of all constituent doublenodes in the object.
The mark phase traverses each root. If the tag of the value of the root indicates that it's a
non-immediate node whose address lies in the lisp heap, then:
4. If the object is a cons cell, recursively mark its car and cdr.
Marking an object thus involves ensuring that its mark bits are set and then recursively
marking any pointers contained within the object if the object was originally unmarked. If
this recursive step was implemented in the obvious manner, marking an object would take
stack space proportional to the length of the pointer chain from some root to that object.
Rather than storing that pointer chain implicitly on the stack (in a series of recursive calls
to the mark subroutine), the Clozure CL marker uses mixture of recursion and a technique
called link inversion to store the pointer chain in the objects themselves. (Recursion tends
to be simpler and faster; if a recursive step notes that stack space is becoming limited, the
link-inversion technique is used.)
1. To support a feature called GCTWA (an acronym that I believe comes from
MACLISP, where it stood for "Garbage Collection of Truly Worthless Atoms"), the
vector that contains the internal symbols of the current package is marked on entry to
the mark phase, but the symbols themselves are not marked at this time. Near the
end of the mark phase, symbols referenced from this vector which are not otherwise
marked are marked if and only if they're somehow distinguishable from newly
created symbols (by virtue of their having function bindings, value bindings, plists, or
other attributes.)
2. Pools have their first element set to NIL before any other elements are marked.
3. All hash tables have certain fields (used to cache previous results) invalidated.
4. Weak Hash Tables and other weak objects are put on a linkedlist as they're
encountered; their contents are only retained if there are other (non-weak) references
to them.
At the end of the mark phase, the markbits of all objects that are transitively reachable
from the roots are set and all other markbits are clear.
Relocation phase
The forwarding address of a doublenode in the dynamic heap is (<its current address> -
(size_of_doublenode * <the number of unmarked markbits that precede it>)) or
alternately (<the base of the heap> + (size_of_doublenode * <the number of marked
markbits that precede it >)). Rather than count the number of preceding markbits each
time, the relocation table is used to precompute an approximation of the forwarding
addresses for all doublewords. Given this approximate address and a pointer into the
markbits vector, it's relatively easy to compute the exact forwarding address.
The relocation table contains the forwarding addresses of each pagelet, where a pagelet is
256 bytes (or 32 doublenodes). The forwarding address of the first pagelet is the base of the
heap. The forwarding address of the second pagelet is the sum of the forwarding address of
the first and 8 bytes for each mark bit set in the first 32-bit word in the markbits table. The
last entry in the relocation table contains the forwarding address that the freepointer would
have, e.g., the new value of the freepointer after compaction.
In many programs, old objects rarely become garbage and new objects often do. When
building the relocation table, the relocation phase notes the address of the first unmarked
object in the dynamic heap. Only the area of the heap between the first unmarked object
and the freepointer needs to be compacted; only pointers to this area will need to be
forwarded (the forwarding address of all other pointers to the dynamic heap is the address
of that pointer.) Often, the first unmarked object is much nearer the free pointer than it is
to the base of the heap.
Forwarding phase
The forwarding phase traverses all roots and the "old" part of the dynamic heap (the part
between the base of the heap and the first unmarked object.) All references to objects
whose address is between the first unmarked object and the free pointer are updated to
point to the address the object will have after compaction by using the relocation table and
the markbits vector and interpolating.
The relocation table entry for the pagelet nearest the object is found. If the pagelet's
address is less than the object's address, the number of set markbits that precede the object
on the pagelet is used to determine the object's address; otherwise, the number of set
markbits that follow the object on the pagelet is used.
Since forwarding views the heap as a set of doublewords, locatives are (mostly) treated like
any other pointers. (The basic difference is that locatives may appear to be tagged as
fixnums, in which case they're treated as word-aligned pointers into the object.)
If the forward phase changes the address of any hash table key in a hash table that hashes
by address (e.g., an EQ hash table), it sets a bit in the hash table's header. The hash table
code will rehash the hash table's contents if it tries to do a lookup on a key in such a table.
Profiling reveals that about half of the total time spent in the GC is spent in the subroutine
which determines a pointer's forwarding address. Exploiting GCC-specific idioms,
hand-coding the routine, and inlining calls to it could all be expected to improve GC
performance.
Compact phase
The compact phase compacts the area between the first unmarked object and the
freepointer so that it contains only marked objects. While doing so, it forwards any
pointers it finds in the objects it copies.
When the compact phase is finished, so is the GC (more or less): the free pointer and some
other data structures are updated and control returns to the exception handler that invoked
the GC. If sufficient memory has been freed to satisfy any allocation request that may have
triggered the GC, the exception handler returns; otherwise, a "seriously low on memory"
condition is signaled, possibly after releasing a small emergency pool of memory.
The ephemeral GC
In the Clozure CL memory management scheme, the relative age of two objects in the
dynamic heap can be determined by their addresses: if addresses X and Y are both
addresses in the dynamic heap, X is younger than Y (X was created more recently than Y) if
it is nearer to the free pointer (and farther from the base of the heap) than Y.
most objects that have already survived several GCs are unlikely to ever become
garbage.
old objects can only point to newer objects as the result of a destructive modification
(e.g., via SETF.)
By concentrating its efforts on (frequently and quickly) reclaiming newly created garbage,
an ephemeral collector hopes to postpone the more costly full GC as long as possible. It's
important to note that most programs create some long-lived garbage, so an EGC can't
typically eliminate the need for full GC.
An EGC views each object in the heap as belonging to exactly one generation; generations
are sets of objects that are related to each other by age: some generation is the youngest,
some the oldest, and there's an age relationship between any intervening generations.
Objects are typically assigned to the youngest generation when first allocated; any object
that has survived some number of GCs in its current generation is promoted (or tenured)
into an older generation.
When a generation is GCed, the roots consist of the stacks, registers, and global variables as
always and also of any pointers to objects in that generation from other generations. To
avoid the need to scan those (often large) other generations looking for such
intergenerational references, the runtime system must note all such intergenerational
references at the point where they're created (via Setf). (This is sometimes called "The
Write Barrier": all assignments which might result in intergenerational references must be
noted, as if the other generations were write-protected). The set of pointers that may
contain intergenerational references is sometimes called the remembered set.
In Clozure CL's EGC, the heap is organized exactly the same as otherwise; "generations"
are merely structures which contain pointers to regions of the heap (which is already
ordered by age.) When a generation needs to be GCed, any younger generation is
incorporated into it; all objects which survive a GC of a given generation are promoted into
the next older generation. The only intergenerational references that can exist are therefore
those where an old object is modified to contain a pointer to a new object.
The EGC uses exactly the same code as the full GC. When a given GC is "ephemeral",
the "base of the heap" used to determine an object's markbit address is the base of the
generation being collected;
the markbits vector is actually a pointer into the middle of the global markbits table;
preceding entries in this table are used to note doubleword addresses in older
generations that (may) contain intergenerational references;
some steps (notably GCTWA and the handling of weak objects) are not performed;
the intergenerational references table is used to find additional roots for the mark
and forward phases. If a bit is set in the intergenerational references table, that
means that the corresponding doubleword (in some "old" generation, in some
"earlier" part of the heap) may have had a pointer to an object in a younger
generation stored into it.
With one exception (the implicit setfs that occur on entry to and exit from the binding of a
special variable), all setfs that might introduce an intergenerational reference must be
memoized. Note that the implicit setfs that occur when initializing an object - as in the case
of a call to cons or vector - can't introduce intergenerational references, since the newly
created object is always younger than the objects used to initialize it. It's always safe to
push any cons cell or gvector locative onto the memo stack; it's never safe to push anything
else.
old locations are stored into, although some of them may have been stored into many
times. The routine that scans the memoization buffer does a lot of work and usually does it
fairly often; it uses a simple, brute-force method but might run faster if it was smarter
about recognizing addresses that it'd already seen.
When the EGC mark and forward phases scan the intergenerational reference bits, they can
clear any bits that denote doublewords that definitely do not contain intergenerational
references.
Fasl files
The Clozure CL Fasl format is forked from the old MCL Fasl format; there are a few
differences, but they are minor. The name "nfasload" comes from the fact that this is the
so-called "new" Fasl system, which was true in 1986 or so.
A Fasl file begins with a "file header", which contains version information and a count of
the following "blocks". There's typically only one "block" per Fasl file. The blocks are part of
a mechanism for combining multiple logical files into a single physical file, in order to
simplify the distribution of precompiled programs.
Each block begins with a header for itself, which just describes the size of the data that
follows.
The data in each block is treated as a simple stream of bytes, which define a bytecode
program. The actual bytecodes, "fasl operators", are defined in xdump/faslenv.lisp. The
descriptions in the source file are terse, but, according to Gary, "probably accurate".
Some of the operators are used to create a per-block "object table", which is a vector used
to keep track of previously-loaded objects and simplify references to them. When the table
is created, an index associated with it is set to zero; this is analogous to an array
fill-pointer, and allows the table to be treated like a stack.
The low seven bits of each bytecode are used to specify the fasl operator; currently, about
fifty operators are defined. The high byte, when set, indicates that the result of the
operation should be pushed onto the object table.
Most bytecodes are followed by operands; the operand data is byte-aligned. How many
operands there are, and their type, depend on the bytecode. Operands can be indices into
the object table, immediate values, or some combination of these.
An exception is the bytecode #xFF, which has the symbolic name ccl::$faslend; it is used to
mark the end of the block.
In most cases, pointers to instances of Objective-C classes are recognized as such; the
recognition is (and probably always will be) slightly heuristic. Basically, any pointer that
passes basic sanity checks and whose first word is a pointer to a known ObjC class is
considered to be an instance of that class; the Objective-C runtime system would reach the
same conclusion.
It's certainly possible that a random pointer to an arbitrary memory address could look
enough like an ObjC instance to fool the lisp runtime system, and it's possible that pointers
could have their contents change so that something that had either been a true ObjC
instance (or had looked a lot like one) is changed (possibly by virtue of having been
deallocated.)
In the first case, we can improve the heuristics substantially: we can make stronger
assertions that a particular pointer is really "of type :ID" when it's a parameter to a
function declared to take such a pointer as an argument or a similarly declared function
result; we can be more confident of something we obtained via SLOT-VALUE of a slot
defined to be of type :ID than if we just dug a pointer out of memory somewhere.
The second case is a little more subtle: ObjC memory management is based on a reference-
counting scheme, and it's possible for an object to ... cease to be an object while lisp is still
referencing it. If we don't want to deal with this possibility (and we don't), we'll basically
have to ensure that the object is not deallocated while lisp is still thinking of it as a
first-class object. There's some support for this in the case of objects created with
MAKE-INSTANCE, but we may need to give similar treatment to foreign objects that are
introduced to the lisp runtime in other ways (as function arguments, return values,
SLOT-VALUE results, etc. as well as those instances that are created under lisp control.)
This doesn't all work yet (in fact, not much of it works yet); in practice, this has not yet
been as much of a problem as anticipated, but that may be because existing Cocoa code
deals primarily with relatively long-lived objects such as windows, views, menus, etc.
Recommended Reading
Cocoa Documentation
This is the top page for all of Apple's documentation on Cocoa. If you are unfamiliar
with Cocoa, it is a good place to start.
This is one of the two most important Cocoa references; it covers all of the basics,
except for GUI programming. This is a reference, not a tutorial.
This section is a placeholder, added as of August 2004. The full text is being written, and
As it's distributed, Clozure CL starts up with *PACKAGE* set to the CL-USER package and
with most predefined functions and methods protected against accidental redefinition. The
package setting is of course a requirement of ANSI CL, and the protection of predefined
functions and methods is intended to catch certain types of programming errors
(accidentally redefining a CL or CCL function) before those errors have a chance to do
much damage.
These settings may make using Clozure CL to develop Clozure CL a bit awkward, because
much of that process assumes you are working in the CCL package is current, and a
primary purpose of Clozure CL development is to redefine some predefined, builtin
functions. The standard, "routine" ways of building Clozure CL from sources (see ) -
COMPILE-CCL, XCOMPILE-CCL, and XLOAD-LEVEL-0 - bind *PACKAGE* to the "CCL"
package and enable the redefinition of predefined functions; the symbols COMPILE-CCL,
XCOMPILE-CCL, and XLOAD-LEVEL-0 are additionally now exported from the "CCL"
package.
Some other (more ad-hoc) ways of doing development on Clozure CL-compiling and/or
loading individual files, incrementally redefining individual functions-may be awkward
unless one reverts to the mode of operation which was traditionally offered in Clozure CL.
Some Clozure CL source files - especially those that comprise the bootstrapping image
sources and the first few files in the "cold load" sequence - are compiled and loaded in the
"CCL" package but don't contain (IN-PACKAGE "CCL") forms, since IN-PACKAGE doesn't
work until later in the cold load sequence.
"user" and "development" are otherwise very generic terms; here they're intended to
enforce the distinction between "using" Clozure CL and "developing" it.
The initial environment from which Clozure CL images are saved is one where (SET-USER-
ENVIRONMENT T) has just been called; in previous versions, it was effectively as if
(SET-DEVELOPMENT-ENVIRONMENT T) had just been called.
Hopefully, most users of Clozure CL can safely ignore these issues most of the time. Note
that doing (SET-USER-ENVIRONMENT T) after loading one's own code (or 3rd-party
code) into Clozure CL would protect that code (as well as Clozure CL's) from accidental
redefinition; that may be useful in some cases.
As you may have noticed, it's not a perfect world; it's rare that the cause (attempting to
reference the CDR of -1, and therefore accessing unmapped memory near location 0) of
this effect (an "Unhandled exception ..." message) is so obvious.
The addresses printed in the message above aren't very useful unless you're debugging the
kernel with GDB (and they're often very useful if you are.)
Aside from causing an exception that the lisp kernel doesn't know how to handle, one can
also enter the kernel debugger (more) deliberately:
? (classify 0)
Bug in Clozure CL system code:
I give up. How could this happen ?
? for help
[12345] Clozure CL kernel debugger:
CCL::BUG isn't quite the right tool for this example (a call to BREAK or PRINT might do a
better job of clearing up the mystery), but it's sometimes helpful when those other tools
can't be used. The lisp error system notices, for instance, if attempts to signal errors
themselves cause errors to be signaled; this sort of thing can happen if CLOS or the I/O
system are broken or missing. After some small number of recursive errors, the error
system gives up and calls CCL::BUG.
If one enters a '?' at the kernel debugger prompt, one will see output like:
CCL::BUG just does an FF-CALL into the lisp kernel. If the kernel debugger was invoked
because of an unhandled exception (such as an illegal memory reference) the OS kernel
saves the machine state ("context") in a data structure for us, and in that case some
additional options can be used to display the contents of the registers at the point of the
exception. Another function-CCL::DBG-causes a special exception to be generated and
enters the lisp kernel debugger with a non-null "context":
? (classify2 0)
Lisp Breakpoint
While executing: #<Function CLASSIFY2 #x08476cfe>
? for help
[12345] Clozure CL kernel debugger: ?
(G) Set specified GPR to new value
(A) Advance the program counter by one instruction (use with caution!)
(D) Describe the current exception in greater detail
(R) Show raw GPR/SPR register values
(L) Show Lisp values of tagged registers
(F) Show FPU registers
(S) Find and describe symbol matching specified name
(B) Show backtrace
(X) Exit from this debugger, asserting that any exception was handled
(P) Propagate the exception to another handler (debugger or OS)
(K) Kill Clozure CL process
(?) Show this help
CCL::DBG takes an argument, whose value is copied into the register that Clozure CL uses
to return a function's primary value (arg_z, which is r23 on the PowerPC). If we were to
choose the (L) option at this point, we'd see a dislay like:
rnil = 0x01836015
nargs = 0
r16 (fn) = #<Function CLASSIFY2 #x30379386>
r23 (arg_z) = 0
r22 (arg_y) = 0
r21 (arg_x) = 0
r20 (temp0) = #<26-element vector subtag = 2F @#x303793ee>
r19 (temp1/next_method_context) = 6393788
From this we can conclude that the problematic argument to CLASSIFY2 was 0 (see
r23/arg_z), and that I need to work on a better example.
The R option shows the values of the ALU (and PPC branch unit) registers in hex; the F
option shows the values of the FPU registers.
The (B) option shows a raw stack backtrace; it'll try to identify foreign functions as well as
lisp functions. (Foreign function names are guesses based on the nearest preceding
exported symbol.)
If you ever unexpectedly find yourself in the "lisp kernel debugger", the output of the (L)
and (B) options are often the most helpful things to include in a bug report.
Overview
It's now possible to use AltiVec instructions in PPC LAP (assembler) functions.
The lisp kernel detects the presence or absence of AltiVec and preserves AltiVec state on
lisp thread switch and in response to exceptions, but the implementation doesn't otherwise
use vector operations.
This document doesn't document PPC LAP programming in general. Ideally, there would
be some document that did.
This document does explain AltiVec register-usage conventions in Clozure CL and explains
the use of some lap macros that help to enforce those conventions.
All of the global symbols described below are exported from the CCL package. Note that lap
macro names, ppc instruction names, and (in most cases) register names are treated as
strings, so this only applies to functions and global variable names.
Much of the Clozure CL support for AltiVec LAP programming is based on work
contributed to MCL by Shannon Spires.
Clozure CL LAP functions that use AltiVec instructions must interoperate with each other
and with C functions; that fact suggests that they follow C AltiVec register usage
conventions. (vr0-vr1 scratch, vr2-vr13 parameters/return value, vr14-vr19 temporaries,
vr20-vr31 callee-save non-volatile registers.)
The EABI (Embedded Application Binary Interface) used in LinuxPPC doesn't ascribe
particular significance to the vrsave special-purpose register; on other platforms (notably
MacOS), it's used as a bitmap which indicates to system-level code which vector registers
contain meaningful values.
The WITH-ALTIVEC-REGISTERS lap macro generates code that saves, updates, and
restores VRSAVE on platforms where this is required (as indicated by the value of the
special variable that controls this behavior) and ignores VRSAVE on platforms that don't
require it to be maintained.
On all PPC platforms, it's necessary to save any non-volatile vector registers (vr20 .. vr31)
before assigning to them and to restore such registers before returning to the caller.
On platforms that require that VRSAVE be maintained, it's not necessary to mention the
"use" of vector registers that are used as incoming parameters. It's not incorrect to mention
their use in a WITH-ALTIVEC-REGISTERS form, but it may be unnecessary in many
interesting cases. One can likewise assume that the caller of any function that returns a
vector value in vr2 has already set the appropriate bit in VRSAVE to indicate that this
register is live. One could therefore write a leaf function that added the bytes in vr3 and vr2
and returned the result in vr2 as:
When vector registers that aren't incoming parameters are used in a LAP function,
WITH-ALTIVEC-REGISTERS takes care of maintaining VRSAVE and of saving/restoring
any non-volatile vector registers:
AltiVec registers are not preserved by CATCH and UNWIND-PROTECT. Since AltiVec is
only accessible from LAP in Clozure CL and since LAP functions rarely use high-level
control structures, this should rarely be a problem in practice.
LAP functions that use non-volatile vector registers and that call (Lisp ?) code which may
use CATCH or UNWIND-PROTECT should save those vector registers before such a call
and restore them on return. This is one of the intended uses of the WITH-VECTOR-
BUFFER lap macro.
Development-Mode Dictionary
*warn-if-redefine-kernel* [Variable]
When true, attempts to redefine (via DEFUN or DEFMETHOD) functions and methods
that are marked as being "predefined" signal continuable errors.
Note that these are CERRORs, not warnings, and that no lisp functions or methods have
been defined in the kernel in MCL or Clozure CL since 1987 or so.
*altivec-available* [Variable]
This variable is initialized each time an Clozure CL session starts based on information
provided by the lisp kernel. Its value is true if AltiVec is present and false otherwise. This
variable shouldn't be set by user code.
altivec-available-p [Function]
*altivec-lapmacros-maintain-vrsave-p* [Variable]
Intended to control the expansion of certain lap macros. Initialized to NIL on LinuxPPC;
initialized to T on platforms (such as MacOS X/Darwin) that require that the VRSAVE SPR
contain a bitmask of active vector registers at all times.
reglist
body
which saves any non-volatile vector registers which appear in the register list, executes
body, and restores the saved non-volatile vector registers (and, if *altivec-lapmacros-
maintain-vrsave-p* is true, restores VRSAVE as well. Uses the IMM0 register (r3) as a
temporary.
base
An integer between 1 and 254, inclusive. (Should typically be much, much closer to 1.)
Specifies the size of the buffer, in 16-byte units.
body
Generates code which allocates a 16-byte aligned buffer large enough to contain N vector
registers; the GPR base points to the lowest address of this buffer. After processing body,
the buffer will be deallocated. The body should preserve the value of base as long as it
needs to reference the buffer. It's intended that base be used as a base register in stvx and
lvx instructions within the body.
oprofile is a system-level profiler that's available for most modern Linux distributions.
Use of oprofile and its companion programs isn't really documented here; what is
described is a way of generating symbolic information that enables profiling summaries
generated by the opreport program to identify lisp functions meaningfully.
Modern Linux uses the 'ELF" (Executable and Linking Format) object file format; the
oprofile tools can associate symbolic names with addresses in a memory-mapped file if that
file appears to be an ELF object file and if it contains ELF symbol information that
describes those memory regions. So, the general idea is to make a lisp heap image that
looks enough like an ELF shared library to fool the oprofile tools (we don't actually load
heap images via ELF dynamic linking technology, but we can make it look like we did.)
Prerequisites
oprofile itself, which is almost certainly available via your distribution's package
management system if not already preinstalled.
libelf, which provides utilities for reading and writing ELF files (and is likewise
likely preinstalled or readily installable.) Somewhat confusingly, there are two libelf
implementations in widespread use on Linux, and different distributions refer to
them by different names (they may be available as part of an 'elfutils' package.) The
oprofile insterface was designed to work with a libelf implementation whose version
number is currently around 147; the other (incompatible) libelf implementation has a
version number around 0.8. It may be necessary to install the corresponding
development package (-dev or -devel, usuallly) in order to actually be able to use the
libelf shared library.
In order to create a lisp heap image which can be used for oprofile- based profiling, we
need to:
2. generate a file that contains ELF symbol information describing the names and
addresses of all lisp functions.
? (require "ELF")
"ELF"
("ELF")
? (ccl::write-elf-symbols-to-file "home:elf-symbols")
3. Generate a lisp heap image in which the ELF symbols generated in the previous step
are prepended.
? (save-application "somewhere/image-for-profiling"
:prepend-kernel "home:elf-symbols")
any lisp code sampled by oprofile in that image will be identified "symbolically" by
opreport.
Example
;;; Define some lisp functions that we want to profile and save
;;; a profiling-enabled image. In this case, we just want to
;;; define the FACTORIAL funcion, to keep things simple.
? (defun fact (n) (if (zerop n) 1 (* n (fact (1- n)))))
FACT
? (require "ELF")
"ELF"
("ELF")
? (ccl::write-elf-symbols-to-file "home:elf-symbols")
"home:elf-symbols"
? (save-application "home:profiled-ccl" :prepend-kernel "home:elf-symbols
;;; Setup oprofile with (mostly) default arguments. This example was
;;; run on a Fedora 8 system where an uncompressed 'vmlinux' kernel
;;; image isn't readily available.
;;; Note that use of 'opcontrol' generally requires root access, e.g.,
;;; 'sudo' or equivalent:
Issues
So far, no one has been able to make oprofile/opreport options that're supposed to
generate call-stack info generate meaningful call-stack info.
As of a few months ago, there was an attempt to provide symbol info for oprofile/opreport
"on the fly", e.g., for use in JIT compilation or other incremental compilation scenarios.
That's obviously more nearly The Right Thing, but it might be awhile before that
experimental code makes it into widespread use.
Apple's CHUD package provides libraries, kernel extensions, and a set of graphical and
command-line programs that can be used to measure many aspects of application and
system performance.
Prerequisites
Apple's CHUD tools have been distributed with the last several XCode releases. One way to
determine whether or not the tools are installed is to run:
$ /usr/bin/shark -v
Shark can only properly identify functions that're defined in a shared library that's loaded
by the target application. (Any other functions will be identified by a hex address described
as being in an "Unknown Library"; the hex address is generally somewhat near the actual
function, but it's determined heuristically and isn't always accurate.)
For those reasons, it's desirable to load the code that you wish to profile in one lisp session,
save a native (Mach-O library) image, and invoke Shark in a new session which uses that
native image. (It may also be useful to load the CHUD-METERING module, which defines
CHUD:METER and friends.
Usage synopsis
and, a few seconds after the result is returned, a file whose name is of the form
"session_nnn.mshark" will open in Shark.app.
The fist time that CHUD:METER is used in a lisp session, it'll do a few things to prepare
subsequent profiling sessions. Those things include:
creating a directory to store files that are related to using the CHUD tools in this lisp
session. This directory is created in the user's home directory and has a name of the
form:
profiling-session-<lisp-kernel>-<pid>_<mm>-<dd>-<yyyy>_<h>.<m>.<s>
run the shark program ("/usr/bin/shark") and wait until it's ready to receive signals
that control its operation.
This startup activity typically takes a few seconds; after it's been completed, subsequent use
of CHUD:METER doesn't involve that overhead. (See the discussion of :RESET below.)
After any startup activity is complete, CHUD:METER arranges to send a "start profiling"
signal to the running shark program, executes the form, sends a "stop profiling" signal to
the shark program, and reads its diagnostic output, looking for the name of the ".mshark"
file it produces. If it's able to find this filename, it arranges for "Shark.app" to open it.
Profiling "configurations"
use "time based" sampling, to periodically interrupt the lisp process and note the
value of the program counter and at least a few levels of call history.
This is known as "the default configuration"; it's possible to use items on the "Config"
menu in the Shark application to create alternate configurations which provide different
kinds of profiling parameters and to save these configurations in files for subsequent reuse.
(The set of things that CHUD knows how to monitor is large and interesting.)
You use alternate profiling configurations (created and "exported" via Shark.app) with
CHUD:METER, but the interface is a little awkward.
Reference
chud:*shark-config-file* [Variable]
When non-null, this should be the pathname of an alternate profiling configuration file
created by the "Config Editor" in Shark.app.
Executes FORM (an arbitrary lisp form) and returns whatever result(s) it returns, with
CHUD profiling enabled during the form's execution. Tries to determine the name of the
session file (*.mshark) to which the shark program wrote profiling data and opens this file
in the Shark application.
Arguments:
debug-output
reset
when non-nil, terminates any running instance of the shark program created by
previous invocations of CHUD:METER in this lisp session, generates a new .spatch
file (describing the names and addresses of lisp functions), and starts a new instance
of the shark program; if CHUD:*SHARK-CONFIG-FILE* is non-NIL when this new
instance is started, that instance is told to use the specified config file for profiling (in
lieu of the default profiling configuration.)
Acknowledgement
Both Dan Knapp and Hamilton Link have posted similar CHUD interfaces to
openmcl-devel in the past; Hamilton's also reported bugs in the spatch mechanism to
CHUD developers (and gotten those bugs fixed.)
Concurrency issues
Clozure CL supports most of the semi-standard metaobject protocol (MOP) for CLOS, as
defined in chapters 5 and 6 of “The Art Of The Metaobject Protocol”, (Kiczales et al, MIT
Press 1991, ISBN 0-262-61074-4); this specification is also available online at
http://www.alu.org/mop/index.html .
All of the symbols defined in the MOP specification (whether implemented or not) are
exported from the ccl package and from the openmcl-mop package.
construct status
accessor-method-slot-definition +
add-dependent +
add-direct-method +
add-direct-subclass +
add-method +
class-default-initargs +
class-direct-default-initargs +
class-direct-slots +
class-direct-subclasses +
class-direct-superclasses +
class-finalized-p +
class-prototype +
class-slots +
compute-applicable-methods -
compute-applicable-methods-using-classes -
compute-class-precedence-list +
compute-direct-initargs +
compute-discriminating-function -
compute-effective-method +
compute-effective-slot-definition +
compute-slots +
direct-slot-definition-class +
effective-slot-definition-class +
ensure-class +
ensure-class-using-class +
ensure-generic-function-using-class +
eql-specializer-object +
extract-lambda-list +
extract-specializer-names +
finalize-inheritance +
find-method-combination +
funcallable-standard-instance-access +
generic-function-argument-precedence-order +
generic-function-declarations +
generic-function-lambda-list +
generic-function-method-class +
generic-function-method-combination +
generic-function-methods +
generic-function-name +
intern-eql-specializer +
make-method-lambda -
map-dependents +
method-function +
method-generic-function +
method-lambda-list +
method-qualifiers +
method-specializers +
reader-method-class +
remove-dependent +
remove-direct-method +
remove-direct-subclass +
remove-method +
set-funcallable-instance-function -
slot-boundp-using-class +
slot-definition-allocation +
slot-definition-initargs +
slot-definition-initform +
slot-definition-initfunction +
slot-definition-location +
slot-definition-name +
slot-definition-readers +
slot-definition-type +
slot-definition-writers +
slot-makunbound-using-class +
slot-value-using-class +
specializer-direct-generic-functions +
specializer-direct-methods +
standard-instance-access +
update-dependent +
validate-superclass +
writer-method-class +
Note that those generic functions whose status is “-” in the table above deal with the
internals of generic function dispatch and method invocation (the “generic function
invocation protocol”). Method functions are implemented a bit differently in Clozure CL
from what the MOP expects, and it's not yet clear if or how this subprotocol can be
well-supported.
Those constructs that are marked as “+” in the table above are nominally implemented as
the MOP document specifies (deviations from the specification should be considered bugs;
please report them as such.) Note that some CLOS implementations in widespread use
(e.g., PCL) implement some things (ENSURE-CLASS-USING-CLASS comes to mind) a bit
differently from what the MOP specifies.
Concurrency issues
The entire CLOS class and generic function hierarchy is effectively a (large, complicated)
shared data structure; it's not generally practical for a thread to request exclusive access to
all of CLOS, and the effects of volitional modification of the CLOS hierarchy (via class
redefinition, change-class, etc.) in a multithreaded environment aren't always tractable.
Native threads exacerbate this problem (in that they increase the opportunities for
concurrent modification and access.) The implementation should try to ensure that a
thread's view of any subset of the CLOS hierarchy is consistent (to the extent that that's
possible) and should try to ensure that incidental modifications of the hierarchy (cache
updates, etc.) happen atomically; it's not generally possible for the implementation to
guarantee that a thread's view of things is correct and current.
If you are loading code and defining classes in the most usual way, which is to say, via the
compiler, using only a single thread, these issues are probably not going to affect you
much.
If, however, you are making finicky changes to the class hierarchy while you're running
multiple threads which manipulate objects related to each other, more care is required.
Before doing such a thing, you should know what you're doing and already be aware of
what precautions to take, without being told. That said, if you do it, you should seriously
consider what your application's critical data is, and use locks for critical code sections.
Changes in 1.2
Using Objective-C Classes
Instantiating Objective-C Objects
Calling Objective-C Methods
Type Coercion for Objective-C Method Calls
Methods which Return Structures
Variable-Arity Messages
Optimization
Defining Objective-C Classes
Defining classes with foreign slots
Defining classes with Lisp slots
Defining Objective-C Methods
Using define-objc-method
Using objc:defmethod
Method Redefinition Constraints
Loading Frameworks
How Objective-C Names are Mapped to Lisp Symbols
Mac OS X APIs use a language called Objective-C, which is approximately C with some
object-oriented extensions modeled on Smalltalk. The Objective-C bridge makes it possible
to work with Objective-C objects and classes from Lisp, and to define classes in Lisp which
can be used by Objective-C.
The ultimate purpose of the Objective-C and Cocoa bridges is to make Cocoa (the standard
user-interface framework on Mac OS X) as easy as possible to use from Clozure CL, in
order to support the development of GUI applications and IDEs on Mac OS X (and on any
platform that supports Objective-C, such as GNUStep). The eventual goal, which is much
The current release provides Lisp-like syntax and naming conventions for the basic
Objective-C operations, with automatic type processing and messages checked for validity
at compile-time. It also provides some convenience facilities for working with Cocoa.
Changes in 1.2
Version 1.2 of Clozure CL exports most of the useful symbols described in this chapter; in
previous releases, most of them were private in the CCL package.
There are several new reader macros that make it much more convenient than before to
refer to several classes of symbols used with the Objective-C bridge. For a full description
of these reader-macros, see the Foreign-Function-Interface Dictionary, especially the
entries at the beginning, describing reader macros.
As in previous releases, 32-bit versions of Clozure CL use 32-bit floats and integers in data
structures that describe geometry, font sizes and metrics, and so on. 64-bit versions of
Clozure CL use 64-bit values where appropriate.
The Objective-C bridge defines the type NS:CGFLOAT as the Lisp type of the preferred
floating-point type on the current platform, and defines the constant NS:+CGFLOAT+. On
DarwinPPC32, the foreign types :cgfloat, :<NSUI>nteger, and :<NSI>nteger are
defined by the Objective-C bridge (as 32-bit float, 32-bit unsigned integer, and 32-bit
signed integer, respectively); these types are defined as 64-bit variants in the 64-bit
interfaces.
Every Objective-C class is now properly named, either with a name exported from the NS
package (in the case of a predefined class declared in the interface files) or with the name
provided in the DEFCLASS form (with :METACLASSNS:+NS-OBJECT) which defines the
class from Lisp. The class's Lisp name is now proclaimed to be a "static" variable (as if by
DEFSTATIC, as described in the "Static Variables" section) and given the class object as its
value. In other words:
and
are equivalent. (Since it's not legal to bind a "static" variable, it may be necessary to rename
some things so that unrelated variables whose names coincidentally conflict with
Objective-C class names don't do so.)
The class of most standard CLOS classes is named STANDARD-CLASS. In the Objective-C
object model, each class is an instance of a (usually unique) metaclass, which is itself an
instance of a "base" metaclass (often the metaclass of the class named "NSObject".) So, the
Objective-C class named "NSWindow" and the Objective-C class "NSArray" are (sole)
instances of their distinct metaclasses whose names are also "NSWindow" and "NSArray",
respectively. (In the Objective-C world, it's much more common and useful to specialize
class behavior such as instance allocation.)
When Clozure CL first loads foreign libraries containing Objective-C classes, it identifies
the classes they contain. The foreign class name, such as "NSWindow", is mapped to an
external symbol in the "NS" package via the bridge's translation rules, such as NS:NS-
WINDOW. A similar transformation happens to the metaclass name, with a "+" prepended,
yielding something like NS:+NS-WINDOW.
These classes are integrated into CLOS such that the metaclass is an instance of the class
OBJC:OBJC-METACLASS and the class is an instance of the metaclass.
SLOT-DESCRIPTION metaobjects are created for each instance variable, and the class and
metaclass go through something very similar to the "standard" CLOS class initialization
protocol (with a difference being that these classes have already been allocated.)
Performing all this initialization, which is done when you (require "COCOA"), currently
takes several seconds; it could conceivably be sped up some, but it's never likely to be fast.
When the process is complete, CLOS is aware of several hundred new Objective-C classes
and their metaclasses. Clozure CL's runtime system can reliably recognize MACPTRs to
Objective-C classes as being CLASS objects, and can (fairly reliably but heuristically)
recognize instances of those classes (though there are complicating factors here; see
below.) SLOT-VALUE can be used to access (and, with care, set) instance variables in
Objective-C instances. To see this, do:
? (require "COCOA")
and, after waiting a bit longer for a Cocoa listener window to appear, activate that Cocoa
listener and do:
This sends a message asking for the key window, which is the window that has the input
focus (often the frontmost), and then describes it. As we can see, NS:NS-WINDOWs have
lots of interesting slots.
#<NS-CF-NUMBER 42 (#x85962210)>
It's worth looking at how this would be done if you were writing in Objective C:
Allocating an instance of an Objective-C class involves sending the class an "alloc" message,
and then using those initargs that don't correspond to slot initargs as the "init" message to
be sent to the newly-allocated instance. So, the example above could have been done more
verbosely as:
That setq is important; this is a case where init decides to replace the object and return the
new one, instead of modifying the existing one. In fact, if you leave out the setq and then
try to view the value of *N*, Clozure CL will freeze. There's little reason to ever do it this
way; this is just to show what's going on.
You've seen that an Objective-C initialization method doesn't have to return the same
object it was passed. In fact, it doesn't have to return any object at all; in this case, the
initialization fails and make-instance returns nil.
In some special cases, such as loading an ns:ns-window-controller from a .nib file, it may
be necessary for you to pass the instance itself as one of the parameters to the initialization
method. It goes like this:
? (defvar *controller*
(make-instance 'ns:ns-window-controller))
*CONTROLLER*
? (setq *controller*
(ccl::send *controller*
:init-with-window-nib-name #@"DataWindow"
:owner *controller*))
#<NS-WINDOW-CONTROLLER <NSWindowController: 0x1fb520> (#x1FB520)>
This example calls (make-instance) with no initargs. When you do this, the object is only
allocated, and not initialized. It then sends the "init" message to do the initialization by
hand.
There is an alternative API for instantiating Objective-C classes. You can call OBJC:MAKE-
OBJC-INSTANCE, passing it the name of the Objective-C class as a string. In previous
releases, OBJC:MAKE-OBJC-INSTANCE could be more efficient than OBJC:MAKE-
INSTANCE in cases where the class did not define any Lisp slots; this is no longer the case.
You can now regard OBJC:MAKE-OBJC-INSTANCE as completely equivalent to
OBJC:MAKE-INSTANCE, except that you can pass a string for the classname, which may be
convenient in the case that the classname is in some way unusual.
In Objective-C, methods are called "messages", and there's a special syntax to send a
message to an object:
[w alphaValue]
[w setAlphaValue: 0.5]
[v mouse: p inRect: r]
The first line sends the method "alphaValue" to the object w, with no parameters. The
second line sends the method "setAlphaValue", with the parameter 0.5. The third line
sends the method "mouse:inRect:" - yes, all one long word - with the parameters p and r.
(send w 'alpha-value)
(send w :set-alpha-value 0.5)
(send v :mouse p :in-rect r)
Notice that when a method has no parameters, its name is an ordinary symbol (it doesn't
matter what package the symbol is in, as only its name is checked). When a method has
parameters, each part of its name is a keyword, and the keywords alternate with the values.
These two lines break those rules, and both will result in error messages:
(send w :alpha-value)
(send w 'set-alpha-value 0.5)
Instead of (send), you can also invoke (send-super), with the same interface. It has roughly
the same purpose as CLOS's (call-next-method); when you use (send-super), the message
is handled by the superclass. This can be used to get at the original implementation of a
method when it is shadowed by a method in your subclass.
Clozure CL's FFI handles many common conversions between Lisp and foreign data, such
as unboxing floating-point args and boxing floating-point results. The bridge adds a few
more automatic conversions:
NIL is equivalent to (%NULL-PTR) for any message argument that requires a pointer.
Some Cocoa methods return small structures, such as those used to represent points, rects,
sizes and ranges. When writing in Objective C, the compiler hides the implementation
details. Unfortunately, in Lisp we must be slightly more aware of them.
Methods which return structures are called in a special way; the caller allocates space for
the result, and passes a pointer to it as an extra argument to the method. This is called a
Structure Return, or STRET. Don't look at me; I don't name these things.
Here's a simple use of this in Objective C. The first line sends the "bounds" message to v1,
which returns a rectangle. The second line sends the "setBounds" message to v2, passing
that same rectangle as a parameter.
In Lisp, we must explicitly allocate the memory, which is done most easily and safely with
rlet. We do it like this:
The rlet allocates the storage (but doesn't initialize it), and makes sure that it will be
deallocated when we're done. It binds the variable r to refer to it. The call to send/stret
is just like an ordinary call to send, except that r is passed as an extra, first parameter. The
third line, which calls send, does not need to do anything special, because there's nothing
complicated about passing a structure as a parameter.
In order to make STRETs easier to use, the bridge provides two conveniences.
First, you can use the macros slet and slet* to allocate and initialize local variables to
foreign structures in one step. The example above could have been written more tersely as:
Second, when one call to send is made inside another, the inner one has an implicit slet
around it. So, one could in fact just write:
There are also several pseudo-functions provided for convenience by the Objective-C
compiler, to make objects of specific types. The following are currently supported by the
bridge: NS-MAKE-POINT, NS-MAKE-RANGE, NS-MAKE-RECT, and NS-MAKE-SIZE.
However, since these aren't real functions, a call like the following won't work:
To extract fields from these objects, there are also some convenience macros: NS-MAX-
RANGE, NS-MIN-X, NS-MIN-Y, NS-MAX-X, NS-MAX-Y, NS-MID-X, NS-MID-Y,
NS-HEIGHT, and NS-WIDTH.
Note that there is also a send-super/stret for use within methods. Like send-super,
it ignores any shadowing methods in a subclass, and calls the version of a method which
belongs to its superclass.
Variable-Arity Messages
There are a few messages in Cocoa which take variable numbers of arguments. Perhaps the
most common examples involve formatted strings:
Note that it's necessary to specify the foreign types of the variables (in this example,
:double-float), because the compiler has no general way of knowing these types. (You might
think that it could parse the format string, but this would only work for format strings
which are not determined at runtime.)
Because the Objective-C runtime system does not provide any information on which
messages are variable arity, they must be explicitly declared. The standard variable arity
messages in Cocoa are predeclared by the bridge. If you need to declare a new variable arity
message, use (DEFINE-VARIABLE-ARITY-MESSAGE "myVariableArityMessage:").
Optimization
The bridge works fairly hard to optimize message sends, when it has enough information to
do so. There are two cases when it does. In either, a message send should be nearly as
efficient as when writing in Objective C.
The first case is when both the message and the receiver's class are known at compile-time.
In general, the only way the receiver's class is known is if you declare it, which you can do
with either a DECLARE or a THE form. For example:
Note that there is no way in Objective-C to name the class of a class. Thus the bridge
provides a declaration, @METACLASS. The type of an instance of "NSColor" is ns:ns-color.
The type of the class "NSColor" is (@metaclass ns:ns-color):
The other case that allows optimization is when only the message is known at
compile-time, but its type signature is unique. Of the more-than-6000 messages currently
provided by Cocoa, only about 50 of them have nonunique type signatures.
An example of a message with a type signature that is not unique is SET. It returns VOID
for NSColor, but ID for NSSet. In order to optimize sends of messages with nonunique type
signatures, the class of the receiver must be declared at compile-time.
When the receiver's class is unknown, the bridge's ability to optimize relies on a
type-signature table which it maintains. When first loaded, the bridge initializes this table
by scanning every method of every Objective-C class. When new methods are defined later,
the table must be updated. This happens automatically when you define methods in Lisp.
After any other major change, such as loading an external framework, you should rebuild
the table:
? (update-type-signatures)
Because send and its relatives send-super, send/stret, and send-super/stret are
macros, they cannot be funcalled, applyed, or passed as arguments to functions.
To work around this, there are function equivalents to them: %send, %send-super,
%send/stret, and %send-super/stret. However, these functions should be used only
when the macros will not do, because they are unable to optimize.
You can define your own foreign classes, which can then be passed to foreign functions; the
methods which you implement in Lisp will be made available to the foreign code as
callbacks.
You can also define subclasses of existing classes, implementing your subclass in Lisp even
though the parent class was in Objective C. One such subclass is CCL::NS-LISP-STRING. It
is also particularly useful to make subclasses of NS-WINDOW-CONTROLLER.
We can use the MOP to define new Objective-C classes, but we have to do something a little
funny: the :METACLASS that we'd want to use in a DEFCLASS option generally doesn't
exist until we've created the class (recall that Objective-C classes have, for the sake of
argument, unique and private metaclasses.) We can sort of sleaze our way around this by
specifying a known Objective-C metaclass object name as the value of the DEFCLASS
:METACLASS object; the metaclass of the root class NS:NS-OBJECT, NS:+NS-OBJECT,
makes a good choice. To make a subclass of NS:NS-WINDOW (that, for simplicity's sake,
doesn't define any new slots), we could do:
That'll create a new Objective-C class named EXAMPLE-WINDOW whose metaclass is the
class named +EXAMPLE-WINDOW. The class will be an object of type OBJC:OBJC-
CLASS, and the metaclass will be of type OBJC:OBJC-METACLASS. EXAMPLE-WINDOW
will be a subclass of NS-WINDOW.
The value of the :FOREIGN-TYPE initarg should be a foreign type specifier. For example, if
we wanted (for some reason) to define a subclass of NS:NS-WINDOW that kept track of the
number of key events it had received (and needed an instance variable to keep that
information in), we could say:
Foreign slots are always SLOT-BOUNDP, and the initform above is redundant: foreign
slots are initialized to binary 0.
In Objective-C, unlike in CLOS, every method belongs to some particular class. This is
probably not a strange concept to you, because C++ and Java do the same thing. When you
use Lisp to define Objective-C methods, it is only possible to define methods belonging to
Objective-C classes which have been defined in Lisp.
You can use either of two different macros to define methods on Objective-C classes.
define-objc-method accepts a two-element list containing a message selector name
and a class name, and a body. objc:defmethod superficially resembles the normal CLOS
defmethod, but creates methods on Objective-C classes with the same restrictions as
those created by define-objc-method.
Using define-objc-method
The return type of this method is the foreign type :id, which is used for all Objective-C
objects. The name of the method is get-window. The body of the method is the single line
(window self). The variable self is bound, within the body, to the instance that is
receiving the message. The call to window uses the CLOS accessor to get the value of the
window field.
Here's an example that takes a parameter. Notice that the name of the method without a
parameter was an ordinary symbol, but with a parameter, it's a keyword:
To Objective-C code that uses the class, the name of this method is
initWithMultiplier:. The name of the parameter is multiplier, and its type is
:int. The body of the method does some meaningless things. Then it returns self,
because this is an initialization method.
(dotimes (i 100)
(setf (aref (data self) i)
(+ (* i multiplier)
addend)))
self)
Here is a method that does not return any value, a so-called "void method". Where our
other methods said :id, this one says :void for the return type:
This method would be called takeAction: in Objective-C. The convention for methods
that are going to be used as Cocoa actions is that they take one parameter, which is the
object responsible for triggering the action. However, this method doesn't actually need to
use that parameter, so it explicitly ignores it to avoid a compiler warning. As promised, the
method doesn't return any value.
There is also an alternate syntax, illustrated here. The following two method definitions are
equivalent:
(define-objc-method ("applicationShouldTerminate:"
"LispApplicationDelegate")
(:id sender :<BOOL>)
(declare (ignore sender))
nil)
(define-objc-method ((:<BOOL>
:application-should-terminate sender)
lisp-application-delegate)
(declare (ignore sender))
nil)
Using objc:defmethod
Its syntax is
(OBC:DEFMETHOD name-and-result-type
((receiver-arg-and-class) &rest other-args)
&body body)
other-args are either variable names (denoting parameters of type :ID) or 2-element
lists whose first element is a variable name and whose second element is a foreign type
specifier.
Arguments that wind up as some pointer type other than :ID (e.g. pointers, records passed
by value) are represented as typed foreign pointers, so that the higher-level, type-checking
accessors can be used on arguments of type :ns-rect, :ns-point, and so on.
Within the body of methods defined via OBJC:DEFMETHOD, the local function CL:CALL-
NEXT-METHOD is defined. It isn't quite as general as CL:CALL-NEXT-METHOD is when
used in a CLOS method, but it has some of the same semantics. It accepts as many
arguments as are present in the containing method's other-args list and invokes version
of the containing method that would have been invoked on instances of the receiver's
class's superclass with the receiver and other provided arguments. (The idiom of passing
the current method's arguments to the next method is common enough that the
CALL-NEXT-METHOD in OBJC:DEFMETHODs should probably do this if it receives no
arguments.)
A method defined via OBJC:DEFMETHOD that returns a structure "by value" can do so by
returning a record created via MAKE-GCABLE-RECORD, by returning the value returned via
CALL-NEXT-METHOD, or by other similar means. Behind the scenes, there may be a
pre-allocated instance of the record type (used to support native structure-return
conventions), and any value returned by the method body will be copied to this internal
record instance. Within the body of a method defined with OBJC:DEFMETHOD that's
declared to return a structure type, the local macro OBJC:RETURNING-FOREIGN-STRUCT
can be used to access the internal structure. For example:
If the OBJC:DEFMETHOD creates a new method, then it displays a message to that effect.
These messages may be helpful in catching errors in the names of method definitions. In
addition, if a OBJC:DEFMETHOD form redefines a method in a way that changes its type
signature, Clozure CL signals a continuable error.
Objective C was not designed, as Lisp was, with runtime redefinition in mind. So, there are
a few constraints about how and when you can replace the definition of an Objective C
method. Currently, if you break these rules, nothing will collapse, but the behavior will be
confusing; so don't.
Objective C methods can be redefined at runtime, but their signatures shouldn't change.
That is, the types of the arguments and the return type have to stay the same. The reason
for this is that changing the signature changes the selector which is used to call the method.
When a method has already been defined in one class, and you define it in a subclass,
shadowing the original method, they must both have the same type signature. There is no
such constraint, though, if the two classes aren't related and the methods just happen to
have the same name.
Loading Frameworks
Loading a framework means opening the shared libraries and processing any declarations
so that Clozure CL can subsequently call its entry points and use its data structures.
Clozure CL provides the function OBJC:LOAD-FRAMEWORK for this purpose.
Assuming that interface databases for the named frameworks exist on the standard search
path, OBJC:LOAD-FRAMEWORK finds and initializes the framework bundle by searching OS
X's standard framework search paths. Loading the named framework may create new
Objective-C classes and methods, add foreign type descriptions and entry points, and
adjust Clozure CL's dispatch functions.
If interface databases don't exist for a framework you want to use, you will need to create
them. For more information about creating interface databases, see Creating new interface
directories.
There is a standard set of naming conventions for Cocoa classes, messages, etc. As long as
they are followed, the bridge is fairly good at automatically translating between Objective-C
and Lisp names.
To see how a given Objective-C or Lisp name will be translated by the bridge, you can use
the following functions:
Of course, there will always be exceptions to any naming convention. Please tell us on the
mailing lists if you come across any name translation problems that seem to be bugs.
Otherwise, the bridge provides two ways of dealing with exceptions:
First, you can pass a string as the class name of MAKE-OBJC-INSTANCE and as the
message to SEND. These strings will be directly interpreted as Objective-C names, with no
translation. This is useful for a one-time exception. For example:
(ccl::make-objc-instance "WiErDclass")
(ccl::send o "WiErDmEsSaGe:WithARG:" x y)
Alternatively, you can define a special translation rule for your exception. This is useful for
an exceptional name that you need to use throughout your code. Some examples:
The normal rule in Objective-C names is that each word begins with a capital letter (except
possibly the first). Using this rule literally, "NSWindow" would be translated as
N-S-WINDOW, which seems wrong. "NS" is a special word in Objective-C that should not
be broken at each capital letter. Likewise "URL", "PDF", "OpenGL", etc. Most common
special words used in Cocoa are already defined in the bridge, but you can define new ones
as follows:
(ccl::define-special-objc-word "QuickDraw")
Note that message keywords in a SEND such as (SEND V :MOUSE P :IN-RECT R) may
look like the keyword arguments in a Lisp function call, but they really aren't. All keywords
must be present and the order is significant. Neither (:IN-RECT :MOUSE) nor (:MOUSE)
translate to "mouse:inRect:"
Introduction
Building the IDE
Running the IDE
IDE Features
Editor Windows
The Lisp Menu
The Tools Menu
The Inspector Window
IDE Sources
The Application Builder
Running the Application Builder From the Command Line
Introduction
Clozure CL ships with the complete source code for an integrated development
environment written using Cocoa on Mac OS X. This chapter describes how to build and
use that environment, referred to hereafter simply as "the IDE".
The IDE provides a programmable text editor, listener windows, an inspector for Lisp data
structures, and a means of easily building a Cocoa application in Lisp. In addition, its
source code provides an example of a fairly complex Cocoa application written in Lisp.
The current version of the IDE has seen the addition of numerous features and many
bugfixes. Although it's by no means a finished product, we hope it will prove more useful
than previous versions, and we plan additional work on the IDE for future releases.
2. Run ccl from the shell. The easiest way to do this is generally to execute the ccl or
ccl64 command.
For example, assuming that the Clozure CL distribution is installed in "/usr/local/ccl", the
oshirion:ccl mikel$
Clozure CL compiles and loads the various subsystems that make up the IDE, then
constructs a Cocoa application bundle named "Clozure CL.app" and saves the Lisp image
into it. Normally Clozure CL creates the application bundle in the root directory of the
Clozure CL distribution.
After it has been built, you can run the "Clozure CL.app" application normally, by double-
clicking its icon. When launched, the IDE initially displays a single listener window that
you can use to interact with Lisp. You can type Lisp expressions for evaluation at the
prompt in the listener window. You can also use Hemlock editing commands to edit the
text of expressions in the listener window.
IDE Features
Editor Windows
You can open an editor window either by choosing Open from the File menu and then
selecting a text file, or by choosing New from the File menu. You can also evaluate the
expression (ed) in the listener window; in that case Clozure CL creates a new window as if
you had chosen New from the File menu.
Editor windows implement Hemlock editing commands. You can use all the editing and
customization features of Hemlock within any editor window (including listener windows).
The Lisp menu provides several commands for interacting with the running Lisp session,
in addition to the ways you can interact with it by evaluating expressions. You can evaluate
a selected range of text in any editing buffer. You can compile and load the contents of
editor windows (please note that in the current version, Clozure CL compiles and loads the
contents of the file associated with an editor window; that means that if you try to load or
compile a window that has not been saved to a file, the result is an error).
You can interrupt computations, trigger breaks, and select restarts from the Lisp menu.
You can also display a backtrace or open the Inspector window.
At the bottom of the Lisp menu is an item entitled "Check for Updates". If your copy of
Clozure CL came from the Clozure Subversion server (which is the preferred source), and if
your internet connection is working, then you can select this menu item to check for
updates to your copy of Clozure CL.
When you select "Check for Updates", Clozure CL uses the svn program to query the
Clozure Subversion repository and determine whether new updates to Clozure CL are
available. (This means that on Mac OS X versions earlier than 10.5, you must ensure that
the Subversion client software is installed before using the "Check for Updates" feature. See
the wikiHow page on installing Subversion for more information.) If updates are available,
Clozure CL automatically downloads and installs them. After a successful download,
Clozure CL rebuilds itself, and then rebuilds the IDE on the newly-rebuilt Lisp. Once this
process is finished, you should quit the running IDE and start the newly built one (which
will be in the same place that the old one was).
Normally, Clozure CL can install updates and rebuild itself without any problems.
Occasionally, an unforeseen problem (such as a network outage, or a hardware failure)
might interrupt the self-rebuilding process, and leave your copy of Clozure CL unusable. If
you are expecting to update your copy of Clozure CL frequently, it might be prudent to keep
a backup copy of your working environment ready in case of such situtations. You can also
always obtain a full, fresh copy of Clozure CL from Clozure's repository..
The tools menu provides access to the Apropos and Processes windows. The Apropos
window searches the running Lisp image for symbols that match any text you enter. You
can use the Apropos window to quickly find function names and other useful symbols. The
Processes window lists all threads running in the current Lisp session. If you double-click a
process entry, Clozure CL opens an Inspector window on that process.
The Inspector window displays information about a Lisp value. The information displayed
varies from the very simple, in the case of a simple data value such as a character, to the
complex, in the case of structured data such as lists or CLOS objects. The left-hand column
of the window's display shows the names of the object's attributes; the righthand column
shows the values associated with those attributes. You can inspect the values in the
righthand column by double-clicking them.
Inspecting a value in the righthand column changes the Inspector window to display the
double-clicked object. You can quickly navigate the fields of structured data this way,
inspecting objects and the objects that they refer to. Navigation buttons at the top left of
the window enable you to retrace your steps, backing up to return to previously-viewed
objects, and going forward again to objects you navigated into previously.
You can change the contents of a structured object by evaluating expressions in a listener
window. The refresh button (marked with a curved arrow) updates the display of the
Inspector window, enabling you to quickly see the results of changing a data structure.
IDE Sources
Clozure CL builds the IDE from sources in the "objc-bridge" and "cocoa-ide" directories in
the Clozure CL distribution. The IDE as a whole is a relatively complicated application, and
is probably not the best place to look when you are first trying to understand how to build
Cocoa applications. For that, you might benefit more from the examples in the
"examples/cocoa/" directory. Once you are familiar with those examples, though, and have
some experience building your own application features using Cocoa and the Objective-C
bridge, you might browse through the IDE sources to see how it implements its features.
The search path for Clozure CL's REQUIRE feature includes the "objc-bridge" and
"cocoa-ide" directories. You can load features defined in these directories by using
REQUIRE. For example, if you want to use the Cocoa features of Clozure CL from a terminal
session (or from an Emacs session using SLIME or ILISP), you can evaluate (require
:cocoa).
One important feature of the IDE currently has no Cocoa user interface: the application
builder. The application builder constructs a Cocoa application bundle that runs a Lisp
image when double-clicked. You can use the application builder to create Cocoa
applications in Lisp. These applications are exactly like Cocoa applications created with
XCode and Objective-C, except that they are written in Lisp.
To make the application builder available, evaluate the expression (require :build-
application). Clozure CL loads the required subsystems, if necessary.
name
named "MyApplication.app".
type-string
Specifies type of bundle to create. You should normally never need to change the
default value, which Mac OS X uses to identify application bundles.
creator-string
Specifies the creator code, which uniquely identifies the application under Mac OS X.
The default creator code is that of Clozure CL. For more information about reserving
and assigning creator codes, see Apple's developer page on the topic.
directory
copy-ide-resources
Whether to copy the resource files from the IDE's application bundle. By default,
BUILD-APPLICATION copies nibfiles and other resources from the IDE to the newly-
created application bundle. This option is often useful when you are developing a new
application, because it enables your built application to have a fully-functional user
interface even before you have finished designing one. By default, the application
uses the application menu and other UI elements of the IDE until you specify
otherwise. Once your application's UI is fully implemented, you may choose to pass
NIL for the value of this parameter, in which case the IDE resources are not copied
into your application bundle.
info-plist
A user-supplied NSDictionary object that defines the contents of the Info.plist file to
be written to the application bundle. The default value is NIL, which specifies that the
Info.plist from the IDE is to be used if copy-ide-resources is true, and a new
dictionary created with default values is to be used otherwise. You can create a
suitable NSDictionary object using the function make-info-dict. For details on the
parameters to this function, see its definition in "ccl/cocoa-ide/builder-utilities.lisp".
nibfiles
A list of pathnames, where each pathname identifies a nibfile created with Apple's
InterfaceBuilder application. BUILD-APPLICATION copies each nibfile into the
appropriate place in the application bundle, enabling the application to load
user-interface elements from them as-needed. It is safest to provide full pathnames to
the nibfiles in the list. Each nibfile must be in ".nib" format, not ".xib" format, in
order that the application can load it.
main-nib-name
The name of the nibfile to load initially when launching. The user-interface defined in
this nibfile becomes the application's main interface. You must supply the name of a
suitable nibfile for this parameter, or the resulting application uses the Clozure CL
user interface.
application-class
The name of the application's CLOS class. The default value is the class provided by
Clozure CL for graphical applications. Supply the name of your application class if
you implement one. If not, Clozure CL uses the default class.
toplevel-function
The toplevel function that runs when the application launches. Normally the default
value, which is Clozure CL's toplevel, works well, but in some cases you may wish to
customize the behavior of the application's toplevel. The best source of information
about writing your own toplevel is the Clozure CL source code, especially the
implementations of TOPLEVEL-FUNCTION in "ccl/level-1/l1-application.lisp"
The work needed to produce a running Cocoa application is very minimal. In fact, if you
supply BUILD-APPLICATION with a valid nibfile and pathnames, it builds a running
Cocoa application that displays your UI. It doesn't need you to write any code at all to do
this. Of course, the resulting application doesn't do anything apart from displaying the UI
defined in the nibfile. If you want your UI to accomplish anything, you need to write the
code to handle its events. But the path to a running application with your UI in it is very
short indeed.
Please note that BUILD-APPLICATION is a work in progress. It can easily build a working
Cocoa application, but it still has limitations that may in some cases prove inconvenient.
For example, in the current version it provides no easy way to specify an application
delegate different from the default. If you find the current limitations of BUILD-
APPLICATION too restrictive, and want to try extending it for your use, you can find the
source code for it in "ccl/cocoa-ide/build-application.lisp". You can see the default values
used to populate the "Info.plist" file in "ccl/cocoa-ide/builder-utilities.lisp".
For more information on how to use BUILD-APPLICATION, see the Currency Converter
example in "ccl/examples/cocoa/currency-converter/".
It's possible to automate use of the application builder by running a call to CCL:BUILD-
APPLICATION from the terminal command line. For example, the following command,
entered at a shell prompt in Mac OS X's Terminal window, builds a working copy of the
Clozure CL environment called "Foo.app":
You can use the same method to automate building your Lisp/Cocoa applications. Clozure
CL handles each Lisp expressions passed with a -e argument in order, so you can simply
evaluate a sequence of Lisp expressions as in the above example to build your application,
ending with a call to CCL:BUILD-APPLICATION. The call to CCL:BUILD-APPLICATION
can process all the same arguments as if you evaluated it in a Listener window in the
Clozure CL IDE.
Building a substantial Cocoa application (rather than just reproducing the Lisp
environment using defaults, as is done in the above example) is likely to involve a relatively
complicated sequence of loading source files and perhaps evaluating Lisp forms. You might
be best served to place your command line in a shell script that you can more easily edit
and test.
One potentially complicated issue concerns loading all your Lisp source files in the right
order. You might consider using ASDF to define and load a system that includes all the
parts of your application before calling CCL:BUILD-APPLICATION. ASDF is a "another
system-definition facility", a sort of make for Lisp, and is included in the Clozure CL
distribution. You can read more about ASDF at the ASDF home page .
Alternatively, you could use the standard features of Common Lisp to load your
application's files in the proper order.
Introduction
Representation of Text
Lines
Marks
Regions
Buffers
The Current Buffer
Buffer Functions
Modelines
Altering and Searching Text
Altering Text
Text Predicates
Kill Ring
Active Regions
Searching and Replacing
The Current Environment
Different Scopes
Shadowing
Hemlock Variables
Variable Names
Variable Functions
Hooks
Commands
Introduction
Key-events
Introduction
Hemlock is the text editor used in Clozure CL. It was originally based on the CMU Hemlock
editor , but has since diverged from it in various ways. We continue to call the editor part of
our IDE Hemlock to give credit where credit is due, but we make no attempt at source or
API compatibility with the original Hemlock.
Like the code, this documentation is based on the original Hemlock documentation,
modified as necessary.
Hemlock follows in the tradition of Emacs-compatible editors, with a rich set of extensible
commands. This document describes the API for implementing new commands. The basic
editor consists of a set of Lisp utility functions for manipulating buffers and the other data
structures of the editor. All user level commands are written in terms of these functions. To
find out how to define commands see Commands.
Representation of Text
In Hemlock, text is represented as a sequence of lines. Newline characters are never stored
but are implicit between lines. The implicit newline character is treated as the single
character #\Newline by the text primitives.
Text is broken into lines when it is first introduced into Hemlock. Text enters Hemlock
from the outside world in two ways: reading a file, or pasting text from the system
clipboard. Hemlock uses heuristics (which should be documented here!) to decide what
newline convention to use to convert the incoming text into its internal representation as a
sequence of lines. Similarly it uses heuristics (which should be documented here!) to
convert the internal representation into a string with embedded newlines in order to write
a file or paste a region into the clipboard.
Lines
Given a line, this function returns as a simple string the characters in the line. This is
setf'able to set the line-string to any string that does not contain newline characters. It is an
error to destructively modify the result of line-string or to destructively modify any string
after the line-string of some line has been set to that string.
Given a line, line-previous returns the previous line or nil if there is no previous line.
Similarly, line-next returns the line following line or nil.
This function returns the buffer which contains this line. Since a line may not be associated
with any buffer, in which case line-buffer returns nil.
This function returns the number of characters in the line. This excludes the newline
character at the end.
This function returns the character at position index within line. It is an error for index to
be greater than the length of the line or less than zero. If index is equal to the length of the
line, this returns a #\newline character.
This function returns the property-list for line. setf, getf, putf and remf can be used to
change properties. This is typically used in conjunction with line-signature to cache
information about the line's contents.
This function returns an object that serves as a signature for a line's contents. It is
guaranteed that any modification of text on the line will result in the signature changing so
that it is not eql to any previous value. The signature may change even when the text
remains unmodified, but this does not happen often.
Marks
A mark indicates a specific position within the text represented by a line and a character
position within that line. Although a mark is sometimes loosely referred to as pointing to
some character, it in fact points between characters. If the charpos is zero, the previous
character is the newline character separating the previous line from the mark's line. If the
charpos is equal to the number of characters in the line, the next character is the newline
character separating the current line from the next. If the mark's line has no previous line,
a mark with charpos of zero has no previous character; if the mark's line has no next line, a
mark with charpos equal to the length of the line has no next character.
This section discusses the very basic operations involving marks, but a lot of Hemlock
programming is built on altering some text at a mark. For more extended uses of marks see
Altering and Searching Text.
Kinds of Marks
A mark may have one of two lifetimes: temporary or permanent. Permanent marks remain
valid after arbitrary operations on the text; temporary marks do not. Temporary marks are
used because less bookkeeping overhead is involved in their creation and use. If a
temporary mark is used after the text it points to has been modified results will be
unpredictable. Permanent marks continue to point between the same two characters
regardless of insertions and deletions made before or after them.
There are two different kinds of permanent marks which differ only in their behavior when
text is inserted at the position of the mark; text is inserted to the left of a left-inserting
mark and to the right of right-inserting mark.
Mark Functions
This function returns the character position in the line of the character after mark, i.e. the
number of characters before the mark in the mark's line.
This function returns the character position in the buffer of the character after the mark,
i.e. the number of characters before the mark in the mark's buffer.
This function returns the character immediately before (after) the position of the mark, or
nil if there is no previous (next) character. These characters may be set with setf when they
exist; the setf methods for these forms signal errors when there is no previous or next
character.
Making Marks
This function returns a mark object that points to the charpos'th character of the line. Kind
is the kind of mark to create, one of :temporary, :left-inserting, or :right-
inserting. The default is :temporary.
This function returns a new mark pointing to the same position and of the same kind, or of
kind kind if it is supplied.
This function deletes mark. Delete any permanent marks when you are finished using it.
This macro binds to each variable mark a mark of kind kind, which defaults to
:temporary, pointing to the same position as the markpos. On exit from the scope the
mark is deleted. The value of the last form is the value returned.
Moving Marks
These functions destructively modify marks to point to new positions. Other sections of
this document describe mark moving routines specific to higher level text forms than
characters and lines, such as words, sentences, paragraphs, Lisp forms, etc.
This function changes the mark to point to the given character position on the line line.
Line defaults to mark's line.
This function changes the mark to point to the given character position in the buffer.
This function moves mark to the same position as the mark new-position and returns it.
This function changes mark to point to the beginning or the end of line and returns it. Line
defaults to mark's line.
These functions change mark to point to the beginning or end of buffer, which defaults to
the buffer mark currently points into. If buffer is unsupplied, then it is an error for mark to
be disassociated from any buffer.
These functions change mark to point one character before or after the current position. If
there is no character before/after the current position, then they return nil and leave mark
unmodified.
This function changes mark to point n characters after (n before if n is negative) the
current position. If there are less than n characters after (before) the mark, then this
returns nil and mark is unmodified.
This function changes mark to point n lines after (n before if n is negative) the current
position. The character position of the resulting mark is (min (line-length resulting-line)
(mark-charpos mark)) if charpos is unspecified, or (min (line-length resulting-line)
charpos) if it is. As with character-offset, if there are not n lines then nil is returned and
mark is not modified.
Regions
A region is simply a pair of marks: a starting mark and an ending mark. The text in a
region consists of the characters following the starting mark and preceding the ending
mark (keep in mind that a mark points between characters on a line, not at them). By
modifying the starting or ending mark in a region it is possible to produce regions with a
start and end which are out of order or even in different buffers. The use of such regions is
undefined and may result in arbitrarily bad behavior.
Region Functions
This function returns a region constructed from the marks start and end. It is an error for
the marks to point to non-contiguous lines or for start to come after end.
make-empty-region [Function]
This function returns a region with start and end marks pointing to the start of one empty
line. The start mark is a :right-inserting mark, and the end is a :left-inserting
mark.
This function returns a region containing a copy of the text in the specified region. The
resulting region is completely disjoint from region with respect to data references ---
marks, lines, text, etc.
These functions coerce regions to Lisp strings and vice versa. Within the string, lines are
delimited by newline characters.
This function returns a region containing all the characters on line. The first mark is
:right-inserting and the last is :left-inserting.
This function returns as multiple-values the starting and ending marks of region.
This function sets the start and end of region to start and end. It is an error for start to be
after or in a different buffer from end.
This function returns the number of lines in the region, first and last lines inclusive. A
newline is associated with the line it follows, thus a region containing some number of
non-newline characters followed by one newline is one line, but if a newline were added at
the beginning, it would be two lines.
This function returns the number of characters in a given region. This counts line breaks as
one character.
Buffers
1. A name.
2. A piece of text.
6. Some variables.
8. A collection of Modes.
Because of the way Hemlock is currently integrated in Cocoa, all modifications to buffer
contents must take place in the GUI thread. Hemlock commands always run in the GUI
thread, so most of the time you do not need to worry about it. If you are running code in
another thread that needs to modify a buffer, you should perform that action using
gui::execute-in-gui or gui::queue-for-gui.
There are no intrinsic limitations on examining buffers from any thread, however,
Hemlock currently does no locking, so you risk seeing the buffer in an inconsistent state if
you look at it outside the GUI thread.
Hemlock has the concept of the "current buffer". The current buffer is defined during
Hemlock commands as the buffer of the hemlock view that received the key events that
invoked the command. Many hemlock functions operate on the current buffer rather than
taking an explicit buffer argument. In effect, the current buffer is an implicit argument to
many text manipulation functions.
current-buffer [Function]
Returns the current buffer, which, during command execution, is the buffer that is the
target of the command.
current-point [Function]
This function returns the buffer-point of the current buffer . This is such a common idiom
in commands that it is defined despite its trivial implementation.
current-point-collapsing-selection [Function]
This function returns the buffer-point of the current buffer, after first deactivating any
active region.
current-point-extending-selection [Function]
This function returns the buffer-point of the current buffer, after first making sure there is
an active region - if the region is already active, keeps it active, otherwise it establishes a
new (empty) region at point.
current-point-for-insertion [Function]
This function checks to see if the current buffer can be modified at its current point, and
errors if not. Otherwise, it deletes the current selection if any, and returns the current
point.
current-point-for-deletion [Function]
This function checks to see if the current buffer can be modified at its current point and
errors if not. Otherwise, if there is a section in the current buffer, it deletes it and returns
NIL. If there is no selection, it returns the current point.
current-point-unless-selection [Function]
This function checks to see if the current buffer can be modified at its current point and
errors if not. Otherwise, if there's a selection in the current buffer, returns NIL. If there is
no selection, it returns the current point.
current-mark [Function]
This function returns the top of the current buffer's mark stack. There always is at least one
mark at the beginning of the buffer's region, and all marks returned are right-inserting.
pop-buffer-mark [Function]
This function pops the current buffer's mark stack, returning the mark. If the stack
becomes empty, this pushes a new mark on the stack pointing to the buffer's start. This
always deactivates the current region (see Active Regions).
This function pushes mark into the current buffer's mark stack, ensuring that the mark is
right-inserting. If mark does not point into the current buffer, this signals an error.
Optionally, the current region is made active, but this never deactivates the current region
(see Active Regions). Mark is returned.
This function pushes a new mark onto the mark stack, at the position of mark. It's
equivalent to calling push-buffer-mark on (copy-mark mark).
all-buffers [Function]
This function returns a list of all the buffer objects made with make-buffer.
*buffer-names* [Variable]
This variable holds a string-table mapping the name of a buffer to the corresponding buffer
object.
Buffer Functions
make-buffer creates and returns a buffer with the given name. If a buffer named name
already exists, nil is returned. Modes is a list of modes which should be in effect in the
buffer, major mode first, followed by any minor modes. If this is omitted then the buffer is
created with the list of modes contained in "Default Modes". Modeline-fields is a list of
modeline-field objects (see the Modelines section) which may be nil. delete-hook is a list of
delete hooks specific to this buffer, and delete-buffer invokes these along with Delete
Buffer Hook.
Buffers created with make-buffer are entered into the list (all-buffers), and their names are
inserted into the string-table *buffer-names*. When a buffer is created the hook Make
Buffer Hook is invoked with the new buffer.
buffer-name returns the name, which is a string, of the given buffer. The corresponding setf
method invokes Buffer Name Hook with buffer and the new name and then sets the
buffer's name. When the user supplies a name for which a buffer already exists, the setf
method signals an error.
Returns the buffer's region. Note this is the region that contains all the text in a buffer, as
opposed to the hemlock-interface:current-region.
buffer-pathname returns the pathname of the file associated with the given buffer, or nil if
it has no associated file. This is the truename of the file as of the most recent time it was
read or written. There is a setf form to change the pathname. When the pathname is
changed the hook Buffer Pathname Hook is invoked with the buffer and new value.
Returns the write date for the file associated with the buffer in universal time format.
When this the buffer-pathname is set, use setf to set this to the corresponding write date,
or to nil if the date is unknown or there is no file.
Returns the mark which is the current location within buffer. To move the point, use
hemlock-interface:move-mark or hemlock-interface:move-to-position
This function returns the top of buffer's mark stack. There always is at least one mark at
the beginning of buffer's region, and all marks returned are right-inserting.
These functions return the start and end marks of buffer's region:
and
This function returns t if you can modify the buffer, nil if you cannot. If a buffer is not
writable, then any attempt to alter text in the buffer results in an error. There is a setf
method to change this value. The setf method invokes the functions in Buffer Writable
Hook on the buffer and new value before storing the new value.
buffer-modified returns t if the buffer has been modified, nil if it hasn't. This attribute is set
whenever a text-altering operation is performed on a buffer. There is a setf method to
change this value. The setf method invokes the functions in Buffer Modified Hook with the
buffer whenever the value of the modified flag changes.
This macro executes forms with buffer's writable status set. After forms execute, this resets
the buffer's writable and modified status.
This function returns an arbitrary number which reflects the buffer's current signature.
The result is eql to a previous result if and only if the buffer has not been modified between
the calls.
This function returns a string-table containing the names of the buffer's local variables.
This function returns the list of the names of the modes active in buffer. The major mode is
first, followed by any minor modes. See the Modes chapter.
This function returns the list of buffer specific functions delete-buffer invokes when
deleting a buffer . This is setf-able.
Modelines
A Buffer may specify a modeline, a line of text which is displayed across the bottom of a
view to indicate status information. Modelines are described by a list of modeline-field
objects which have individual update functions and are optionally fixed-width. These have
an eql name for convenience in referencing and updating, but the name must be unique for
all created modeline-field objects. All modeline-field functions must take a buffer as an
argument and return a string. When displaying a modeline-field with a specified width, the
result of the update function is either truncated or padded on the right to meet the
constraint.
Whenever one of the following changes occurs, all of a buffer's modeline fields are updated:
A buffer is renamed.
The policy is that whenever one of these changes occurs, it is guaranteed that the modeline
will be updated before the next trip through redisplay. Furthermore, since the system
cannot know what modeline-field objects the user has added whose update functions rely
on these values, or how he has changed Default Modeline Fields, we must update all the
fields.
The user should note that modelines can be updated at any time, so update functions
should be careful to avoid needless delays (for example, waiting for a local area network to
determine information).
This function returns a modeline-field object with name, width, and function. Width
defaults to nil meaning that the field is variable width; otherwise, the programmer must
supply this as a positive integer. Function must take a buffer as an arguments and return a
string. If name already names a modeline-field object, then this signals an error.
This function returns the name field of a modeline-field object. If this is set with setf, and
the new name already names a modeline-field, then the setf method signals an error.
This returns the modeline-field object named name. If none exists, this returns nil.
Returns the function called when updating the modeline-field. When this is set with setf,
the setf method updates modeline-field for all views on all buffers that contain the given
field, so the next trip through redisplay will reflect the change. All modeline-field functions
must take a buffer as an argument and return a string.
Returns a copy of the list of buffer's modeline-field objects. This list can be destructively
modified without affecting display of buffer's modeline, but modifying any particular field's
components (for example, width or function) causes the changes to be reflected the next
trip through redisplay in every modeline display that uses the modified modeline-field.
When this is set with setf, the setf method method updates all modeline-fields on all views
on the buffer, so next trip through the redisplay will reflect the change.
Arranges so that the modeline display is updated with the latest values at the end of current
command.
A note on marks and text alteration: :temporary marks are invalid after any change has
been made to the buffer the mark points to; it is an error to use a temporary mark after
such a change has been made.
If text is deleted which has permanent marks pointing into it then they are left pointing to
the position where the text was.
Altering Text
Like insert-region, inserts the region at the mark's position, destroying the source
region. This must be used with caution, since if anyone else can refer to the source region
bad things will happen. In particular, one should make sure the region is not linked into
any existing buffer. If region is empty, and mark is in some buffer, then Hemlock leaves
buffer-modified of mark's buffer unaffected.
This deletes n characters after the mark (or -n before if n is negative). If n characters after
(or -n before) the mark do not exist, then this returns nil; otherwise, it returns t. If n is
zero, and mark is in some buffer, then Hemlock leaves buffer-modified of mark's buffer
unaffected.
This deletes region. This is faster than delete-and-save-region (below) because no lines are
copied. If region is empty and contained in some buffer's buffer-region, then Hemlock
leaves buffer-modified of the buffer unaffected.
This deletes region and returns a region containing the original region's text. If region is
empty and contained in some buffer's buffer-region, then Hemlock leaves buffer-modified
of the buffer unaffected. In this case, this returns a distinct empty region.
Destructively modifies region by replacing the text of each line with the result of the
application of function to a string containing that text. Function must obey the following
restrictions:
3. The return value may not be destructively modified after it is returned from function.
Text Predicates
Returns t if the mark points before the first character in a line, nil otherwise.
Returns t if the mark points after the last character in a line and before the newline, nil
otherwise.
empty-line-p mark [Function] Return t if the line which mark points to contains no
characters.
Returns t if line contains only characters with a Whitespace attribute of 1. See the
Character Attributes chapter for discussion of character attributes.
These functions test if all the characters preceding or following mark on the line it is on
have a Whitespace attribute of 1.
Returns t if mark1 and mark2 point to the same line, or nil otherwise; That is,
These predicates test the relative ordering of two marks in a piece of text, that is a mark is
mark> another if it points to a position after it. An error is signalled if the marks do not
point into the same buffer, except that for such marks mark= is always false and mark/= is
always true.
These predicates test the ordering of line1 and line2. An error is signalled if the lines are
not in the same buffer.
This function returns t if line1 and line2 are in the same buffer, nil nil otherwise.
first-line-p returns t if there is no line before the line mark is on, and nil otherwise.
Last-line-p similarly tests tests whether there is no line after mark.
Kill Ring
There is a global ring of regions deleted from buffers. Some commands save affected
regions on the kill ring before performing modifications. You should consider making the
command undoable, but this is a simple way of achieving a less satisfactory means for the
user to recover.
This kills region saving it in the kill ring. Current-type is either :kill-forward or :kill-
backward. When the hemlock-interface:last-command-type is one of these, this
adds region to the beginning or end, respectively, of the top of the kill ring. The result of
calling this is undoable using the command Undo (see the Hemlock User's Manual). This
sets last-command-type to current-type, and it interacts with kill-characters.
kill-characters kills count characters after mark if count is positive, otherwise before mark
if count is negative. When count is greater than or equal to Character Deletion Threshold,
the killed characters are saved on the kill ring. This may be called multiple times
contiguously (that is, without hemlock-interface:last-command-type being set) to
accumulate an effective count for purposes of comparison with the threshold.
This sets last-command-type, and it interacts with kill-region. When this adds a new region
to the kill ring, it sets last-command-type to :kill-forward (if count is positive) or :kill-
backward (if count is negative). When last-command-type is :kill-forward or :kill-
backward, this adds the killed characters to the beginning (if count is negative) or the end
(if count is positive) of the top of the kill ring, and it sets last-command-type as if it added a
new region to the kill ring. When the kill ring is unaffected, this sets last-command-type to
:char-kill-forward or :char-kill-backward depending on whether count is positive or
negative, respectively.
This returns mark if it deletes characters. If there are not count characters in the
appropriate direction, this returns nil.
Active Regions
Every buffer has a mark stack and a mark known as the point where most text altering
nominally occurs. Between the top of the mark stack, the current-mark, and the current-
buffer's point, the current-point, is what is known as the current-region. Certain
commands signal errors when the user tries to operate on the current-region without its
having been activated. If the user turns off this feature, then the current-region is
effectively always active.
When writing a command that marks a region of text, the programmer should make sure to
activate the region. This typically occurs naturally from the primitives that you use to mark
regions, but sometimes you must explicitly activate the region. These commands should be
written this way, so they do not require the user to separately mark an area and then
activate it. Commands that modify regions do not have to worry about deactivating the
region since modifying a buffer automatically deactivates the region. Commands that insert
text often activate the region ephemerally; that is, the region is active for the immediately
following command, allowing the user wants to delete the region inserted, fill it, or
whatever.
Once a marking command makes the region active, it remains active until:
When this variable is non-nil, some primitives signal an editor-error if the region is not
active. This may be set to nil for more traditional Emacs region semantics.
*ephemerally-active-command-types* [Variable]
This is a list of Command Types, and its initial value is the list of :ephemerally-active and
:unkill. When the previous command's type is one of these, the current-region is active for
the currently executing command only, regardless of whether it does something to
deactivate the region. However, the current command may activate the region for future
commands. :ephemerally-active is a default command type that may be used to
ephemerally activate the region, and:unkill is the type used by two commands, Un-kill and
Rotate Kill Ring (what users typically think of as C-y and M-y).
activate-region [Function]
deactivate-region [Function]
region-active-p [Function]
Returns whether the current-region is active, including ephemerally. This ignores Active
Regions Enabled.
check-region-active [Function]
This signals an editor-error when active regions are enabled, and the current-region is not
active.
This returns a region formed with current-mark and current-point, optionally signaling an
editor-error if the current region is not active. Error-if-not-active defaults to t. Each call
returns a distinct region object. Depending on deactivate-region (defaults to t), fetching the
current region deactivates it. Hemlock primitives are free to modify text regardless of
whether the region is active, so a command that checks for this can deactivate the region
whenever it is convenient.
Before using any of these functions to do a character search, look at Character Attributes.
They provide a facility similar to the syntax table in real Emacs. Syntax tables are a
powerful, general, and efficient mechanism for assigning meanings to characters in various
modes.
specifies the kind of search pattern to make, and pattern is a thing which specifies what to
search for. The interpretation of pattern depends on the kind of pattern being made.
Currently defined kinds of search pattern are:
:string-insensitive
:string-sensitive
:character
:not-character
:test
Finds a character which satisfies the function pattern. This function may not be
applied an any particular fashion, so it should depend only on what its argument is,
and should have no side-effects.
:test-not
:any
:not-any
get-search-pattern interfaces to a default search string and pattern that search and
replacing commands can use. These commands then share a default when prompting for
what to search or replace, and save on consing a search pattern each time they execute.
This uses Default Search Kind (see the Hemlock User's Manual) when updating the pattern
object.
*last-search-string* [Variable]
Find the next match of search-pattern starting at mark. If a match is found then mark is
altered to point before the matched text and the number of characters matched is returned.
If no match is found then nil is returned and mark is not modified.
Replace n matches of search-pattern with the string replacement starting at mark. If n is nil
(the default) then replace all matches. A mark pointing before the last replacement done is
returned.
Different Scopes
In Hemlock the "current" values of variables, key bindings and character-attributes depend
on the current buffer and the modes active in it. There are three possible scopes for
Hemlock values:
buffer local
The value is present only if the buffer it is local to is the current buffer.
mode local
The value is present only when the mode it is local to is active in the current buffer.
global
The value is always present unless shadowed by a buffer or mode local value.
Shadowing
It is possible that there are different values for the same thing in in different scopes. For
example, there be might a global binding for a given variable and also a local binding in the
current buffer. Whenever there is a conflict, shadowing occurs, permitting only one of the
values to be visible in the current environment.
The process of resolving such a conflict can be described as a search down a list of places
where the value might be defined, returning the first value found. The order for the search
is as follows:
2. Mode local values in the minor modes of the current buffer, in order from the highest
precedence mode to the lowest precedence mode. The order of minor modes with
equal precedences is undefined.
4. Global values.
Hemlock Variables
Hemlock implements a system of variables separate from normal Lisp variables for the
following reasons:
1. Hemlock has different scoping rules which are useful in an editor. Hemlock variables
can be local to a buffer or a mode.
2. Hemlock variables have hooks, lists of functions called when someone sets the
variable. See variable-value for the arguments Hemlock passes to these hook
functions.
Variable Names
To the user, a variable name is a case insensitive string. This string is referred to as the
string name of the variable. A string name is conventionally composed of words separated
by spaces.
In Lisp code a variable name is a symbol. The name of this symbol is created by replacing
any spaces in the string name with hyphens. This symbol name is always interned in the
Hemlock package.
*global-variable-names* [Variable]
This variable holds a string-table of the names of all the global Hemlock variables. The
value of each entry is the symbol name of the variable.
current-variable-tables [Function]
This function returns a list of variable tables currently established, globally, in the current
buffer, and by the modes of the current-buffer. This list is suitable for use with prompt-
for-variable.
Variable Functions
This function defines a Hemlock variable. Functions that take a variable name signal an
error when the variable is undefined.
string-name
documentation
mode, buffer
If buffer is supplied, the variable is local to that buffer. If mode is supplied, it is local
to that mode. If neither is supplied, it is global.
value
This is the initial value for the variable, which defaults to nil.
hooks
This is the initial list of functions to call when someone sets the variable's value.
These functions execute before Hemlock establishes the new value. See variable-value
for the arguments passed to the hook functions.
If a variable with the same name already exists in the same place, then defhvar sets its
hooks and value from hooks and value if the user supplies these keywords.
This function returns the value of a Hemlock variable in some place. The following values
for kind are defined:
:current
Return the value present in the current environment, taking into consideration any
mode or buffer local variables. This is the default.
:global
:mode
:buffer
When set with setf, Hemlock sets the value of the specified variable and invokes the
functions in its hook list with name, kind, where, and the new value.
These function return the documentation, hooks and string name of a Hemlock variable.
The kind and where arguments are the same as for variable-value. The documentation and
hook list may be set using setf.
This function converts a string into the corresponding variable symbol name. String need
These macros get and set the current value of the Hemlock variable name. Name is not
evaluated. There is a setf form for value.
This macro is very similar to let in effect; within its scope each of the Hemlock variables var
have the respective values, but after the scope is exited by any means the binding is
removed. This does not cause any hooks to be invoked. The value of the last form is
returned.
Returns t if name is defined as a Hemlock variable in the place specified by kind and where,
or nil otherwise.
delete-variable makes the Hemlock variable name no longer defined in the specified place.
Kind and where have the same meanings as they do for variable-value, except that :current
is not available, and the default for kind is :global
An error will be signaled if no such variable exists. The hook, Delete Variable Hook is
invoked with the same arguments before the variable is deleted.
Hooks
Hemlock actions such as setting variables, changing buffers, changing windows, turning
modes on and off, etc., often have hooks associated with them. A hook is a list of functions
called before the system performs the action. The manual describes the object specific
hooks with the rest of the operations defined on these objects.
Often hooks are stored in Hemlock variables, Delete Buffer Hook and Set Window Hook
for example. This leads to a minor point of confusion because these variables have hooks
that the system executes when someone changes their values. These hook functions
Hemlock invokes when someone sets a variable are an example of a hook stored in an
object instead of a Hemlock variable. These are all hooks for editor activity, but Hemlock
keeps them in different kinds of locations. This is why some of the routines in this section
have a special interpretation of the hook place argument.
These macros add or remove a hook function in some place. If hook-fun already exists in
place, this call has no effect. If place is a symbol, then it is a Hemlock variable; otherwise, it
is a generalized variable or storage location. Here are two examples:
This macro calls all the functions in place. If place is a symbol, then it is a Hemlock
variable; otherwise, it is a generalized variable.
Commands
Introduction
The way that the user tells Hemlock to do something is by invoking a command.
Commands have three attributes:
name
A command's name provides a way to refer to it. Command names are usually
capitalized words separated by spaces, such as Forward Word.
documentation
function
Defining Commands
*command-names* [Variable]
Defines a new command named name, with command documentation documentation and
function function. If :transparent-p is true, the command becomes transparent. The
command in entered in the string-table hemlock-interface:*command-names*, with
the command object as its value. Normally command implementors will use the
defcommand macro, but this permits access to the command definition mechanism at a
lower level, which is occasionally useful.
Returns the documentation, function, or name for command. These may be set with setf.
Command Documentation
Command documentation may also be a function of one argument. The function is called
with either :short or :full, indicating that the function should return a short documentation
string or do something to document the command fully.
The command interpreter is the functionality invoked by the event handler to process
key-events from the keyboard and dispatch to different commands on the basis of what the
user types. When the command interpreter executes a command, we say it invokes the
command. The command interpreter also provides facilities for communication between
contiguously running commands, such as a last command type register. It also takes care of
resetting communication mechanisms, clearing the echo area, displaying partial keys typed
slowly by the user, etc.
The command interpreter invokes the function in this variable whenever someone aborts a
command (for example, if someone called editor-error).
Editor Input
The canonical representation of editor input is a key-event structure. Users can bind
commands to keys, which are non-empty sequences of key-events. A key-event consists of
an identifying token known as a keysym and a field of bits representing modifiers. Users
define keysym names by supplying names that reflect the legends on their keyboard's keys.
Users define modifier names similarly, but the system chooses the bit and mask for
recognizing the modifier. You can use keysym and modifier names to textually specify
key-events and Hemlock keys in a #k syntax. The following are some examples:
#k"C-u"
#k"Control-u"
#k"c-m-z"
#k"control-x meta-d"
#k"a"
#k"A"
#k"Linefeed"
This is convenient for use within code and in init files containing bind-key calls.
The #k syntax is delimited by double quotes. Within the double quotes, spaces separate
multiple key-events. A single key-event optionally starts with modifier names terminated
by hyphens. Modifier names are alphabetic sequences of characters which the system uses
case-insensitively. Following modifiers is a keysym name, which is case-insensitive if it
consists of multiple characters, but if the name consists of only a single character, then it is
case-sensitive.
You can escape special characters---hyphen, double quote, open angle bracket, close angle
bracket, and space---with a backslash, and you can specify a backslash by using two
contiguously. You can use angle brackets to enclose a keysym name with many special
characters in it. Between angle brackets appearing in a keysym name position, there are
only two special characters, the closing angle bracket and backslash.
The command interpreter determines which command to invoke on the basis of key
bindings. A key binding is an association between a command and a sequence of
key-events. A sequence of key-events is called a key and is represented by a single
key-event or a sequence (list or vector) of key-events.
Since key bindings may be local to a mode or buffer, the current environment determines
the set of key bindings in effect at any given time. When the command interpreter tries to
find the binding for a key, it first checks if there is a local binding in the current buffer,
then if there is a binding in each of the minor modes and the major mode for the current
buffer, and finally checks to see if there is a global binding. If no binding is found, then the
command interpreter beeps or flashes the screen to indicate this.
This function associates command name and key in some environment. Key is either a
key-event or a sequence of key-events. There are three possible values of kind:
:global
:mode
Make a mode specific key binding in the mode whose name is where.
:buffer
This processes key for key translations before establishing the binding.
If the key is some prefix of a key binding which already exists in the specified place, then
the new one will override the old one, effectively deleting it.
This function returns a list of the places where command is bound. A place is specified as a
list of the key (always a vector), the kind of binding, and where (either the mode or buffer
to which the binding is local, or nil if it is a global).
This function removes the binding of key in some place. Key is either a key-event or a
sequence of key-events. kind is the kind of binding to delete, one of :global(the default),
:mode or :buffer. If kind is :mode, where is the mode name, and if kind is :buffer, then
where is the buffer.
This processes key for key translations before deleting the binding.
This function returns the command bound to key, returning nil if it is unbound. Key is
either a key-event or a sequence of key-events. If key is an initial subsequence of some
keys, then this returns the keyword :prefix. There are four cases of kind:
:current
Return the current binding of key using the current buffer's search list. If there are
any transparent key bindings for key, then they are returned in a list as a second
value.
:global
:mode
:buffer
This processes key for key translations before looking for any binding.
This function maps over the key bindings in some place. For each binding, this passes
function the key and the command bound to it. Kind and where are the same as in
bind-key. The key is not guaranteed to remain valid after a given iteration.
Key Translation
Key translation is a process that the command interpreter applies to keys before doing
anything else. There are two kinds of key translations: substitution and bit-prefix. In either
case, the command interpreter translates a key when a specified key-event sequence
appears in a key.
In a substitution translation, the system replaces the matched subsequence with another
key-event sequence. Key translation is not recursively applied to the substituted
key-events.
In a bit-prefix translation, the system removes the matched subsequence and effectively
sets the specified bits in the next key-event in the key.
This form is setf-able and allows users to register key translations that the command
interpreter will use as users type key-events.
This function returns the key translation for key, returning nil if there is none. Key is either
a key-event or a sequence of key-events. If key is a prefix of a translation, then this returns
:prefix.
A key translation is either a key or modifier specification. The bits translations have a list
form: (:bits {bit-name}*).
Key bindings local to a mode may be transparent. A transparent key binding does not
shadow less local key bindings, but rather indicates that the bound command should be
invoked before the first normal key binding. Transparent key bindings are primarily useful
for implementing minor modes such as auto fill and word abbreviation. There may be
several transparent key bindings for a given key, in which case all of the transparent
commands are invoked in the order they were found. If there no normal key binding for a
key typed, then the command interpreter acts as though the key is unbound even if there
are transparent key bindings.
Command Types
In many editors the behavior of a command depends on the kind of command invoked
before it. Hemlock provides a mechanism to support this known as command type.
last-command-type [Function]
This returns the command type of the last command invoked. If this is set with setf, the
supplied value becomes the value of last-command-type until the next command
completes. If the previous command did not set last-command-type, then its value is nil.
Normally a command type is a keyword. The command type is not cleared after a
command is invoked due to a transparent key binding.
Command Arguments
There are three ways in which a command may be invoked: It may be bound to a key which
has been typed, it may be invoked as an extended command, or it may be called as a Lisp
function. Ideally commands should be written in such a way that they will behave sensibly
no matter which way they are invoked. The functions which implement commands must
obey certain conventions about argument passing if the command is to function properly.
Whenever a command is invoked it is passed as its first argument what is known as the
prefix argument. The prefix argument is always either an integer or nil. When a command
uses this value it is usually as a repeat count, or some conceptually similar function.
prefix-argument [Function]
This function returns the current value of the prefix argument. When set with setf, the new
value becomes the prefix argument for the next command. If the prefix argument is not set
by the previous command then the prefix argument for a command is nil. The prefix
argument is not cleared after a command is invoked due to a transparent key binding.
Lisp Arguments
It is often desirable to call commands from Lisp code, in which case arguments which
would otherwise be prompted for are passed as optional arguments following the prefix
argument. A command should prompt for any arguments not supplied.
Modes
A mode is a collection of Hemlock values which may be present in the current environment
depending on the editing task at hand. An example of a typical mode is Lisp, for editing
Lisp code.
Mode Hooks
When a mode is added to or removed from a buffer, its mode hook is invoked. The hook
functions take two arguments, the buffer involved and t if the mode is being added or nil if
it is being removed. Mode hooks are typically used to make a mode do something
additional to what it usually does. One might, for example, make a Text mode hook that
turned on auto-fill mode when you entered.
There are two kinds of modes, major modes and minor modes. A buffer always has exactly
one major mode, but it may have any number of minor modes. Major modes may have
mode character attributes while minor modes may not.
A major mode is usually used to change the environment in some major way, such as to
install special commands for editing some language. Minor modes generally change some
small attribute of the environment, such as whether lines are automatically broken when
they get too long. A minor mode should work regardless of what major mode and minor
modes are in effect.
*mode-names* [Variable]
This is a useful command to bind in modes that wish to shadow global bindings by making
them effectively illegal. Also, although less likely, minor modes may shadow major mode
bindings with this. This command calls editor-error.
Mode Functions
This function defines a new mode named name, and enters it in hemlock-
interface:*mode-names*. If major-p is supplied and is not nil then the mode is a
major mode; otherwise it is a minor mode.
Setup-function and cleanup-function are functions which are invoked with the buffer
affected, after the mode is turned on, and before it is turned off, respectively. These
functions typically are used to make buffer-local key or variable bindings and to remove
Precedence is only meaningful for a minor mode. The precedence of a minor mode
determines the order in which it in a buffer's list of modes. When searching for values in
the current environment, minor modes are searched in order, so the precedence of a minor
mode determines which value is found when there are several definitions.
Transparent-p determines whether key bindings local to the defined mode are transparent.
Transparent key bindings are invoked in addition to the first normal key binding found
rather than shadowing less local key bindings.
Documentation is some introductory text about the mode. Commands such as Describe
Mode use this.
This function returns the documentation for the mode named name.
buffer-major-mode returns the name of buffer's major mode. The major mode may be
changed with setf; then Buffer Major Mode Hook is invoked with buffer and the new mode.
buffer-minor-mode returns t if the minor mode name is active in buffer, nil otherwise. A
minor mode may be turned on or off by using setf; then Buffer Minor Mode Hook is
invoked with buffer, name and the new value.
Returns t if name is the name of a major mode, or nil if it is the name of a minor mode. It is
an error for name not to be the name of a mode.
Character Attributes
Introduction
Character attributes provide a global database of information about characters. This facility
is similar to, but more general than, the syntax tables of other editors such as Emacs. For
example, you should use character attributes for commands that need information
regarding whether a character is whitespace or not. Use character attributes for these
reasons:
1. If this information is all in one place, then it is easy the change the behavior of the
editor by changing the syntax table, much easier than it would be if character
constants were wired into commands.
1. The syntax table primitives are probably faster than anything that can be written
above the primitive level.
Note that an essential part of the character attribute scheme is that character attributes are
global and are there for the user to change. Information about characters which is internal
to some set of commands (and which the user should not know about) should not be
maintained as a character attribute. For such uses various character searching abilities are
provided by the function hemlock-interface:find-pattern. 20).
As for Hemlock variables, character attributes have a user visible string name, but are
referred to in Lisp code as a symbol. The string name, which is typically composed of
capitalized words separated by spaces, is translated into a keyword by replacing all spaces
with hyphens and interning this string in the keyword package. The attribute named "Ada
Syntax" would thus become :ada-syntax.
*character-attribute-names* [Variable]
Whenever a character attribute is defined, its name is entered in this string-table, with the
corresponding keyword as the value.
This function defines a new character attribute with name, a string. Character attribute
operations take attribute arguments as a keyword whose name is name uppercased with
spaces replaced by hyphens.
Type, which defaults to (mod 2), specifies what type the values of the character attribute
are. Values of a character attribute may be of any type which may be specified to
make-array. Initial-value (default 0) is the value which all characters will initially have for
this attribute.
character-attribute returns the value of attribute for character. This signals an error if
attribute is undefined.
setf will set a character's attributes. This setf method invokes the functions in Character
Attribute Hook on the attribute and character before it makes the change.
If character is nil, then the value of the attribute for the beginning or end of the buffer can
be accessed or set. The buffer beginning and end thus become a sort of fictitious character,
which simplifies the use of character attributes in many cases.
This function returns t if symbolis the name of a character attribute, nil otherwise.
This function establishes value as the value of character's attribute attribute when in the
mode mode. Mode must be the name of a major mode. Shadow Attribute Hook is invoked
with the same arguments when this function is called. If the value for an attribute is set
while the value is shadowed, then only the shadowed value is affected, not the global one.
Make the value of attribute for character no longer be shadowed in mode. Unshadow
Attribute Hook is invoked with the same arguments when this function is called.
These functions find the next (or previous) character with some value for the character
attribute attribute starting at mark. They pass test one argument, the value of attribute for
the character tested. If the test succeeds, then these routines modify mark to point before
(after for reverse-find-attribute) the character which satisfied the test. If no characters
satisfy the test, then these return nil, and mark remains unmodified. Test defaults to
#'not-zerop. There is no guarantee that the test is applied in any particular fashion, so it
should have no side effects and depend only on its argument.
It is often useful to use the character attribute mechanism as an abstract interface to other
information about characters which in fact is stored elsewhere. For example, some
implementation of Hemlock might decide to define a Print Representation attribute which
controls how a character is displayed on an output device.
To make this easy to do, each attribute has a list of hook functions which are invoked with
the attribute, character and new value whenever the current value changes for any reason.
Return the current hook list for attribute. This may be set with setf. The hemlock-
interface:add-hook and hemlock-interface:remove-hook macros should be
used to manipulate these lists.
"Whitespace"
"Word Delimiter"
A value of 1 indicates the character separates words (see the English Text Buffers
section).
"Space"
This is like Whitespace, but it should not include Newline. Hemlock uses this
primarily for handling indentation on a line.
"Sentence Terminator"
A value of 1 indicates these characters terminate sentences (see the English Text
Buffers section).
"Paragraph Delimiter"
A value of 1 indicates these characters delimit paragraphs when they begin a line (see
the English Text Buffers section).
"Page Delimiter"
A value of 1 indicates this character separates Logical Pages when it begins a line.
"Lisp Syntax"
:space These characters act like whitespace and should not include Newline.
:prefix This is a character that is a part of any form it precedes for example, the single
quote, '.
:comment This is the character that makes a comment with the rest of the line,;.
Views
A hemlock-view represents the GUI object(s) used to display the contents of a buffer.
Conceptually it consists of a text buffer, a modeline for semi-permanent status info, an
echo area for transient status info, and a text input area for reading prompted input.
(Currently the last two are conflated, i.e. text input happens in the echo area).
The API for working with hemlock-views is not fully defined yet. If you need to work with
views beyond what's listed here, you will probably need to get in the sources and find some
internal functions to call.
current-view [Function]
current-view returns the hemlock view which is the target of the currently executing
command. This is usually the frontmost hemlock window in the current application.
View Functions
Cursor Positions
This function returns the X position at which mark would be displayed, supposing its line
was displayed on an infinitely wide screen. This takes into consideration strange characters
such as tabs.
Redisplay
The display of the buffer contents on the screen is updated at the end of each command.
The following function can be used to control the scroll position of the buffer in the view.
Normally, after a command that changes the contents of the buffer or the selection (i.e. the
active region), the event handler repositions the view so that the selection is visible,
scrolling the buffer as necessary. Calling this function tells the system to not do that, and
instead to position the buffer in a particular way. how can be one of the following:
:center-selection
This causes the selection (or the point) to be centered in the visible area. what is
ignored.
:page-up
This causes the previous page of the buffer to be shown what is ignored.
:page-down
This causes the next page of the buffer to be shown. what is ignored.
:lines-up
This causes what previous lines to be scrolled in at the top. what must be an integer.
:lines-down
This causes what next lines to be scrolled in at the bottom. what must be an integer.
:line
This causes the line containing what to be scrolled to the top of the view. what must
be a mark.
Logical Key-Events
Introduction
as Emacs query replace read key-events directly from the keyboard instead of using the
command interpreter. To encourage consistency between these commands and to make
them portable and easy to customize, there is a mechanism for defininglogical key-events.
A logical key-event is a keyword which stands for some set of key-events. The system
globally interprets these key-events as indicators a particular action. For example, the :help
logical key-event represents the set of key-events that request help in a given Hemlock
implementation. This mapping is a many-to-many mapping, not one-to-one, so a given
logical key-event may have multiple corresponding actual key-events. Also, any key-event
may represent different logical key-events.
*logical-key-event-names* [Variable]
This variable holds a string-table mapping all logical key-event names to the keyword
identifying the logical key-event.
This function defines a new logical key-event with name string-name. Logical key-event
operations take logical key-events arguments as a keyword whose name is string-name
uppercased with spaces replaced by hyphens.
This function returns the list of key-events representing the logical key-event keyword.
These functions return the string name and documentation given to define-logical-
key-event for logical key-event keyword.
This function returns t if key-event is the logical key-event keyword. This is setf-able
establishing or disestablishing key-events as particular logical key-events. It is a error for
keyword to be an undefined logical key-event.
There are many default logical key-events, some of which are used by functions
documented in this manual. If a command wants to read a single key-event command that
fits one of these descriptions then the key-event read should be compared to the
corresponding logical key-event instead of explicitly mentioning the particular key-event in
the code. In many cases you can use the hemlock-interface:command-case macro. It
makes logical key-events easy to use and takes care of prompting and displaying help
messages.
:abort Indicates the prompter should terminate its activity without performing any
closing actions of convenience, for example.
:yes Indicates the prompter should take the action under consideration.
:no Indicates the prompter should NOT take the action under consideration.
:do-all Indicates the prompter should repeat the action under consideration as many
times as possible.
:do-once Indicates the prompter should execute the action under consideration once
and then exit.
:confirm Indicates the prompter should take any input provided or use the default if
the user entered nothing.
:quote Indicates the prompter should take the following key-event as itself without
any sort of command interpretation.
1. The key-event concerned represents a general class of actions, and several commands
may want to take a similar action of this type.
1. The exact key-event a command implementor chooses may generate violent taste
disputes among users, and then the users can trivially change the command in their
init files.
Hemlock provides a number of facilities for displaying information and prompting the user
for it. Most of these work through a small area displayed at the bottom of the screen, called
the Echo Area.
clear-echo-area [Function]
Displays a message in the echo area, replacing previous contents if any. loud-message is
like message, but it also beeps.
beep [Function]
Prompting Functions
Prompting functions can be used to obtain short one-line input from the user.
Cocoa note: Because of implementation restrictions, only one buffer at a time is allowed to
read prompted input. If a prompting function is invoked while a prompting operation is
already in effect in another buffer, the attempt fails, telling the user "Buffer xxx is already
waiting for input".
:must-exist
If :must-exist has a non-nil value then the user is prompted until a valid response is
obtained. If :must-exist is nil then return as a string whatever is input. The default is
t.
:default
If null input is given when the user is prompted then this value is returned. If no
default is given then some input must be given before anything interesting will
happen.
:default-string
If a :default is given then this is a string to be printed to indicate what the default is.
The default is some representation of the value for :default, for example for a buffer it
is the name of the buffer.
:prompt
:help
This is similar to :prompt, except that it is displayed when the help command is typed
during input.
This may also be a function. When called with no arguments, it should either
return a string which is the help text or perform some action to help the user,
returning nil.
Prompts with completion for a buffer name and returns the corresponding buffer. If
must-exist is nil, then it returns the input string if it is not a buffer name. This refuses to
accept the empty string as input when :default and :default-string are nil. :default-string
may be used to supply a default buffer name when:default is nil, but when :must-exist is
non-nil, it must name an already existing buffer.
This function prompts for a key-event returning immediately when the user types the next
key-event. hemlock-interface:command-case is more useful for most purposes.
When appropriate, use Logical Key-Events.
This function prompts for a key, a vector of key-events, suitable for passing to any of the
functions that manipulate key bindings. If must-exist is true, then the key must be bound
in the current environment, and the command currently bound is returned as the second
value.
This function prompts for an acceptable filename. "Acceptable" means that it is a legal
filename, and it exists if must-exist is non-nil. prompt-for-file returns a Common Lisp
pathname. If the file exists as entered, then this returns it, otherwise it is merged with
default as by merge-pathnames.
This function prompts for a keyword with completion, using the string tables in the list
string-tables. If must-exist is non-nil, then the result must be an unambiguous prefix of a
string in one of the string-tables, and the returns the complete string even if only a prefix of
the full string was typed. In addition, this returns the value of the corresponding entry in
the string table as the second value.
If must-exist is nil, then this function returns the string exactly as entered. The difference
between prompt-for-keyword with must-exist nil, and prompt-for-string, is the user may
complete the input using the Complete Parse and Complete Field commands.
This prompts for logical key events :Y or :N, returning t or nil without waiting for
confirmation. When the user types a confirmation key, this returns default if it is supplied.
If must-exist is nil, this returns whatever key-event the user first types; however, if the user
types one of the above key-events, this returns t or nil. This is analogous to the Common
Lisp function y-or-n-p.
This macro is analogous to the Common Lisp case macro. Commands such as Help use this
to get a key-event, translate it to a character, and then to dispatch on the character to some
case. In addition to character dispatching, this supports Logical Key-Events by using the
input key-event directly without translating it to a character. Since the description of this
macro is rather complex, first consider the following example:
command-case prompts for a key-event and then executes the code in the first branch with
a logical key-event or a character (called tags) matching the input. Each character must be
a standard-character, one that satisfies the Common Lisp standard-char-p predicate, and
the dispatching mechanism compares the input key-event to any character tags by
mapping the key-event to a character with ext:key-event-char. If the tag is a logical
key-event, then the search for an appropriate case compares the key-event read with the
tag using logical-key-event-p.
All uses of command-case have two default cases, :help and :abort. You can override these
easily by specifying your own branches that include these logical key-event tags. The :help
branch displays in a pop-up window the a description of the valid responses using the
variously specified help strings. The :abort branch signals an editor-error.
The key/value arguments control the prompting. The following are valid values:
:help
The default :help case displays this string in a pop-up window. In addition it formats
a description of the valid input including each case's help string.
:prompt
:bind
This specifies a variable to which the prompting mechanism binds the input
key-event. Any case may reference this variable. If you wish to know what character
corresponds to the key-event, use key-event-char.
Instead of specifying a tag or list of tags, you may use t. This becomes the default branch,
and its forms execute if no other branch is taken, including the default :help and :abort
cases. This option has no helpstring, and the default :help case does not describe the
default branch. Every command-case has a default branch; if none is specified, the macro
includes one that beep's and reprompt's (see below).
Within the body of command-case, there is a defined reprompt macro. It causes the
prompting mechanism and dispatching mechanism to immediately repeat without further
execution in the current branch.
If this variable is true, then an attempt to complete a parse which is ambiguous will result
in a "beep".
This function enters a mode reading input from the user and echoing it in the echo area,
and returns a value when done. The input is managed by commands bound in "Echo Area"
mode on the buffer associated with the echo area. The following keyword arguments are
accepted:
:verification-function
This is invoked by the "Confirm Parse" command. It does most of the work when
parsing prompted input. Confirm Parse calls it with one argument, which is the string
that the user typed so far. The function should return a list of values which are to be
the result of the recursive edit, or nil indicating that the parse failed. In order to
return zero values, a non-nil second value may be returned along with a nil first
value.
:string-tables
:value-must-exist
This is referred to by the verification function, and possibly some of the commands.
:default
The string representing the default object when prompting the user. Confirm Parse
supplies this to the parse verification function when the user input is empty.
:default-string
When prompting the user, if :default is not specified, Hemlock displays this string as
a representation of the default object; for example, when prompting for a buffer, this
argument would be a default buffer name.
:type
The kind of parse, e.g. :file, :keyword, :string. This tells the completion commands
how to do completion, with :string disabling completion.
:prompt
:help
The help string or function being used for the current parse.
These are some of the Echo Area commands that coordinate with the prompting routines.
Hemlock binds other commands specific to the Echo Area, but they are uninteresting to
mention here, such as deleting to the beginning of the line or deleting backwards a word.
help on parse (bound to home, c-_ in echo area mode) [Hemlock Command]
This attempts to complete the current region. It signals an editor-error if the input is
ambiguous or incorrect.
Similar to Complete Keyword, but only attempts to complete up to and including the
first character in the keyword with a non-zero :parse-field-separator attribute. If there is no
field separator then attempt to complete the entire keyword. If it is not a keyword parse
then just self-insert.
Call the verification function with the current input. If it returns a non-nil value then that is
returned as the value of the parse. A parse may return a nil value if the verification function
returns a non-nil second value.
Files
This chapter discusses ways to read and write files at various levels---at marks, into
regions, and into buffers. This also treats automatic mechanisms that affect the state of
buffers in which files are read.
The user specifies file options with a special syntax on the first line of a file. If the first line
contains the string "-*-", then Hemlock interprets the text between the first such
occurrence and the second, which must be contained in one line , as a list of "option: value"
pairs separated by semicolons. The following is a typical example:
See the Hemlock User's Manual for more details and predefined options.
File type hooks are executed when Hemlock reads a file into a buffer based on the type of
the pathname. When the user specifies a Mode file option that turns on a major mode,
Hemlock ignores type hooks. This mechanism is mostly used as a simple means for turning
on some appropriate default major mode.
This defines a new file option with the string name name. Buffer and value specify variable
names for the buffer and the option value string, and forms are evaluated with these
bound.
This defines some code that process-file-options(below) executes when the file options fail
to set a major mode. This associates each type, a string, in type-list with a routine that
binds buffer to the buffer the file is in and type to the type of the pathname.
This checks for file options in buffer and invokes handlers if there are any. Pathname
defaults to buffer's pathname but may be nil. If there is no Mode file option that specifies a
major mode, and pathname has a type, then this tries to invoke the appropriate file type
hook. read-buffer-file calls this.
There is no good way to uniquely identify buffer names and pathnames. However, Hemlock
has one way of mapping pathnames to buffer names that should be used for consistency
among customizations and primitives. Independent of this, Hemlock provides a means for
consistently generating prompting defaults when asking the user for pathnames.
This returns Buffer Pathname if it is bound. If it is not bound, and buffer's name is
composed solely of alphnumeric characters, then return a pathname formed from buffer's
name. If buffer's name has other characters in it, then return the value of Last Resort
Pathname Defaults Function called on buffer.
File Groups
Common Lisp pathnames are used by the file primitives. For probing, checking write dates,
and so forth, all of the Common Lisp file functions are available.
This function writes the contents of region to the file named by pathname. This writes
region using a stream as if it were opened with :if-exists supplied as :rename-and-delete.
When keep-backup, which defaults to the value of Keep Backup Files, is non-nil, this opens
the stream as if :if-exists were :rename. If append is non-nil, this writes the file as if it were
opened with:if-exists supplied as :append.
This signals an error if both append and keep-backup are supplied as non-nil.
add newline at eof on writing file (initial value :ask-user) [Hemlock Variable]
write-buffer-file writes buffer to the file named by pathname including the following:
It consults Add Newline at EOF on Writing File (see Hemlock User's Manual for
possible values) and interacts with the user if necessary.
It sets Pathname Defaults, and after using write-file, marks buffer unmodified.
Write File Hook is a list of functions that take the newly written buffer as an argument.
read-buffer-file deletes buffer's region and uses read-file to read pathname into it,
including the following:
It sets buffer's write date to the file's write date if the file exists; otherwise, it
messages that this is a new file and sets buffer's write date to nil.
It sets buffer's pathname to the result of probing pathname if the file exists;
otherwise, this function sets buffer's pathname to the result of merging pathname
with default-directory.
Read File Hook is a list functions that take two arguments---the buffer read into and
whether the file existed, t if so.
This chapter is sort of a catch all for any functions and variables which concern Hemlock's
interaction with the outside world.
ed &optional x [Function]
This a standard Common Lisp function. If x is supplied and is a string or pathname, the file
specified by x is visited in a hemlock view (opening a new window if necessary, otherwise
bringing an existing window with the file to the front), and the hemlock view object is the
return value from the function.
If x is a symbol or a setf function name, it attempts to edit the definition of the name. In
this last case, the function returns without waiting for the operation to complete (for
example, it might put up a non-modal dialog asking the user to select one of multiple
definitions) and hence the return value is always NIL.
Keyboard Input
*key-event-history* [Variable]
This is a Hemlock ring buffer that holds the last 60 key-events received.
last-key-event-typed [Function]
This function returns the last key-event the user typed to invoke the current command.
last-char-typed [Function]
This function returns the character corresponding to the last key event typed.
Hemlock Streams
It is possible to create streams which output to or get input from a buffer. This mechanism
is quite powerful and permits easy interfacing of Hemlock to Lisp.
Note that operations on these streams operate directly on buffers, therefore they have the
same restrictions as described here for interacting with buffers from outside of the GUI
thread.
This function returns a stream that inserts at mark all output directed to it. It works best if
mark is a left-inserting mark. Buffered controls whether the stream is buffered or not, and
its valid values are the following keywords:
:none
:line
The buffer is flushed whenever a newline is written or when it is explicitly done with
force-output.
:full
The stream is only brought up to date when it is explicitly done with force-output
This function returns a stream from which the text in region can be read.
While evaluating forms, binds var to a stream which returns input from region.
During the evaluation of the forms, binds var to a stream which inserts output at the
permanent mark. Buffered has the same meaning as for make-hemlock-output-stream.
This macro executes forms in a context with var bound to a stream. Hemlock collects
output to this stream and tries to pop up a display of the appropriate height containing all
the output. When height is supplied, Hemlock creates the pop-up display immediately,
forcing output on line breaks. This is useful for displaying information of temporary
interest.
Hemlock commands are executed from an event handler in the initial Cocoa thread. They
are executed within a ccl::with-standard-abort-handling form, which means cl:abort,
ccl:abort-break, ccl:throw-cancel will abort the current command only and exit the event
handler in an orderly fashion.
In addition, for now, lisp errors during command execution dump a backtrace in the
system console and are otherwise handled as if by handle-lisp-errors below, which
means it is not possible to debug errors at the point of the error. Once Clozure CL has
better support for debugging errors in the initial Cocoa thread, better Hemlock error
handling will be provided that will allow for some way to debug.
This function is called to report minor errors to the user. These are errors that a normal
user could encounter in the course of editing, such as a search failing or an attempt to
delete past the end of the buffer. This function simply aborts the current command. Any
args specified are used to format an error message to be placed in the echo area. This
function never returns.
Within the body of this macro any Lisp errors that occur are handled by displaying an error
message in a dialog and aborting the current command, leaving the error text in the echo
area. This macro should be wrapped around code which may get an error due to some
action of the user --- for example, evaluating code fragments on the behalf of and supplied
by the user.
Definition Editing
Hemlock provides commands for finding the definition of a function or variable and
placing the user at the definition in a buffer. A function is provided to allow invoking this
functionality outside of Hemlock. Note that this function is unusual in that it is it is safe to
call outside of the command interpreter, and in fact it can be called from any thread.
This function tries to find the definition of name, create or activate the window containing
it, and scroll the view to show the definition. If there are multiple definitions available, the
user is given a choice of which one to use. This function may return before the operation is
complete.
Event Scheduling
Miscellaneous
This evaluates forms inside handle-lisp-errors. It also binds *package* to the package
named by Current Package if it is non-nil. Use this when evaluating Lisp code on behalf of
the user.
This iterates over alphabetic characters in Common Lisp binding var to each character in
order as specified under character relations in Common Lisp the Language. Kind is one
of:lower, :upper, or :both. When the user supplies :both, lowercase characters are
processed first.
This chapter discusses primitives that operate on higher level text forms than characters
and words. For English text, there are functions that know about sentence and paragraph
structures, and for Lisp sources, there are functions that understand this language. This
chapter also describes mechanisms for organizing file sections into logical pages and for
Indenting Text
The value of this variable determines how indentation is done, and it is a function which is
passed a mark as its argument. The function should indent the line that the mark points to.
The function may move the mark around on the line. The mark will be :left-inserting. The
default simply inserts a tab character at the mark. A function for Lisp mode probably
moves the mark to the beginning of the line, deletes horizontal whitespace, and computes
some appropriate indentation for Lisp code.
Indent with Tabs should be true if indenting should use tabs whenever possible. If nil, the
default, it only uses spaces. Spaces per Tab defines the size of a tab.
indent-region invokes the value of Indent Function on every line of region. indent-region-
for-commands uses indent-region but first saves the region for the Undo command.
This deletes all characters on either side of mark with a Space attribute (see System
Defined Character Attributes) of 1.
Hemlock bases its Lisp primitives on parsing a block of the buffer and annotating lines as
to what kind of Lisp syntax occurs on the line or what kind of form a mark might be in (for
example, string, comment, list, etc.). These do not work well if the block of parsed forms is
exceeded when moving marks around these forms, but the block that gets parsed is
somewhat programmable.
There is also a notion of a top level form which this documentation often uses
synonymously with defun, meaning a Lisp form occurring in a source file delimited by
parentheses with the opening parenthesis at the beginning of some line. The names of the
functions include this inconsistency.
pre-command-parse-check calls Parse Start Function and Parse End Function on mark to
get two marks. It then parses all the lines between the marks including the complete lines
they point into. When for-sure is non-nil, this parses the area regardless of any cached
information about the lines. Every command that uses the following routines calls this
before doing so.
The default values of the start and end variables use Minimum Lines Parsed, Maximum
Lines Parsed, and Defun Parse Goal to determine how big a region to parse. These two
functions always include at least the minimum number of lines before and after the mark
passed to them. They try to include Defun Parse Goal number of top level forms before and
after the mark passed them, but these functions never return marks that include more than
the maximum number of lines before or after the mark passed to them.
This tries to move mark count forms forward if positive or -count forms backwards if
negative. Mark is always moved. If there were enough forms in the appropriate direction,
this returns mark, otherwise nil.
This tries to move mark count top level forms forward if positive or -count top level forms
backwards if negative. If there were enough top level forms in the appropriate direction,
this returns mark, otherwise nil. Mark is moved only if this is successful.
This moves mark1 and mark2 to the beginning and end, respectively, of the current or next
top level form. Mark1 is used as a reference to start looking. The marks may be altered even
if unsuccessful. If successful, return mark2, else nil. Mark2 is left at the beginning of the
line following the top level form if possible, but if the last line has text after the closing
parenthesis, this leaves the mark immediately after the form.
This returns a region around the current or next defun with respect to mark. Mark is not
used to form the region. If there is no appropriate top level form, this signals an editor-
error. This calls pre-command-parse-check first.
These return, respectively, whether mark is inside a top level form or at the beginning of a
line immediately before a character whose Lisp Syntax (see System Defined Character
Respectively, these move mark immediately past a character whose Lisp Syntax (see
System Defined Character Attributes) value is :closing-paren or immediately before a
character whose Lisp Syntax value is :opening-paren.
This returns t or nil depending on whether the character indicated by mark is a valid spot.
When forwardp is set, use the character after mark and vice versa. Valid spots exclude
commented text, inside strings, and character quoting.
This defines the function with name to have count special arguments. indent-for-lisp, the
value of "Indent Function" in Lisp mode, uses this to specially indent these
arguments. For example, do has two, with-open-file has one, etc. There are many of these
defined by the system including definitions for special Hemlock forms. Name is a simple-
string, case insensitive and purely textual (that is, not read by the Lisp reader); therefore,
"with-a-mumble" is distinct from "mumble:with-a-mumble".
This section describes some routines that understand basic English language forms.
This moves mark count words forward (if positive) or backwards (if negative). If mark is in
the middle of a word, that counts as one. If there were count (-count if negative) words in
the appropriate direction, this returns mark, otherwise nil. This always moves mark. A
word lies between two characters whose Word Delimiter attribute value is 1 (see System
Defined Character Attributes).
This moves mark count sentences forward (if positive) or backwards (if negative). If mark
is in the middle of a sentence, that counts as one. If there were count (-count if negative)
sentences in the appropriate direction, this returns mark, otherwise nil. This always moves
mark.
A sentence ends with a character whose Sentence Terminator attribute is 1 followed by two
spaces, a newline, or the end of the buffer. The terminating character is optionally followed
by any number of characters whose Sentence Closing Char attribute is 1. A sentence begins
after a previous sentence ends, at the beginning of a paragraph, or at the beginning of the
buffer.
This moves mark count paragraphs forward (if positive) or backwards (if negative). If mark
is in the middle of a paragraph, that counts as one. If there were count (-count if negative)
paragraphs in the appropriate direction, this returns mark, otherwise nil. This only moves
mark if there were enough paragraphs.
Paragraph Delimiter Function holds a function that takes a mark, typically at the beginning
of a line, and returns whether or not the current line should break the paragraph. default-
para-delim-function returns t if the next character, the first on the line, has a Paragraph
Delimiter attribute value of 1. This is typically a space, for an indented paragraph, or a
newline, for a block style. Some modes require a more complicated determinant; for
example, Scribe modes adds some characters to the set and special cases certain formatting
commands.
Prefix defaults to "Fill Prefix", and the right prefix is necessary to correctly skip
paragraphs. If prefix is non-nil, and a line begins with prefix, then the scanning process
skips the prefix before invoking the Paragraph Delimiter Function. Note, when scanning
for paragraph bounds, and prefix is non-nil, lines are potentially part of the paragraph
regardless of whether they contain the prefix; only the result of invoking the delimiter
function matters.
The programmer should be aware of an idiom for finding the end of the current paragraph.
Assume paragraphp is the result of moving mark one paragraph, then the following
correctly determines whether there actually is a current paragraph:
(or paragraphp
(and (last-line-p mark)
(end-line-p mark)
(not (blank-line-p (mark-line mark)))))
In this example mark is at the end of the last paragraph in the buffer, and there is no last
newline character in the buffer. paragraph-offset would have returned nil since it could not
skip any paragraphs since mark was at the end of the current and last paragraph. However,
you still have found a current paragraph on which to operate. mark-paragraph understands
this problem.
This marks the next or current paragraph, setting mark1 to the beginning and mark2 to the
end. This uses "Fill Prefix". Mark1 is always on the first line of the paragraph,
regardless of whether the previous line is blank. Mark2 is typically at the beginning of the
line after the line the paragraph ends on, this returns mark2 on success. If this cannot find
a paragraph, then the marks are left unmoved, and nil is returned.
Logical Pages
Filling
Filling is an operation on text that breaks long lines at word boundaries before a given
column and merges shorter lines together in an attempt to make each line roughly the
specified length. This is different from justification which tries to add whitespace in
awkward places to make each line exactly the same length. Hemlock's filling optionally
inserts a specified string at the beginning of each line. Also, it eliminates extra whitespace
between lines and words, but it knows two spaces follow sentences.
These variables hold the default values of the prefix and column arguments to Hemlock's
filling primitives. If Fill Prefix is nil, then there is no fill prefix.
This deletes any blank lines in region and fills it according to prefix and column. Prefix and
column default to Fill Prefix and Fill Column.
This finds paragraphs within region and fills them with fill-region. This ignores blank lines
between paragraphs. Prefix and column default to Fill Prefix and Fill Column.
Utilities
This chapter describes a number of utilities for manipulating some types of objects
Hemlock uses to record information. String-tables are used to store names of variables,
commands, modes, and buffers. Ring lists can be used to provide a kill ring, recent
command history, or other user-visible features.
String-table Functions
String tables are similar to Common Lisp hash tables in that they associate a value with an
object. There are a few useful differences: in a string table the key is always a case
insensitive string, and primitives are provided to facilitate keyword completion and
recognition. Any type of string may be added to a string table, but the string table functions
always return simple-string's.
A string entry in one of these tables may be thought of as being separated into fields or
keywords. The interface provides keyword completion and recognition which is primarily
used to implement some Echo Area commands. These routines perform a prefix match on
a field-by-field basis allowing the ambiguous specification of earlier fields while going on to
enter later fields. While string tables may use any string-char as a separator, the use of
characters other than space may make the Echo Area commands fail or work unexpectedly.
make-string-table [Function]
This function creates an empty string table that uses separator as the character, which
must be a string-char, that distinguishes fields. Initial-contents specifies an initial set of
strings and their values in the form of a dotted a-list, for example:
delete-string removes any entry for string from the string-table table, returning t if there
was an entry. clrstring removes all entries from table.
This function returns as multiple values, first the value corresponding to the string if it is
found and nil if it isn't, and second t if it is found and nil if it isn't.
This may be set with setf to add a new entry or to store a new value for a string. It is an
error to try to insert a string with more than one field separator character occurring
contiguously.
This function completes string as far as possible over the list of tables, returning five
values. It is an error for tables to have different separator characters. The five return values
are as follows:
:none
:complete
The completion is a valid entry, but other valid completions exist too. This occurs
when the supplied string is an entry as well as initial substring of another entry.
:unique
:ambiguous
The completion is invalid; get-string would return nil and nil if given the returned
string.
The value of the string when the completion is :unique or :complete, otherwise nil.
An index, or nil, into the completion returned, indicating where the addition of a
single field to string ends. The command Complete Field uses this when the
completion contains the addition to string of more than one field.
An index to the separator following the first ambiguous field when the completion is
:ambiguous or :complete, otherwise nil.
find-ambiguous returns a list in alphabetical order of all the strings in table matching
string. This considers an entry as matching if each field in string, taken in order, is an
initial substring of the entry's fields; entry may have fields remaining.
find-containing is similar, but it ignores the order of the fields in string, returning all
strings in table matching any permutation of the fields in string.
This macro iterates over the strings in table in alphabetical order. On each iteration, it
binds string-var to an entry's string and value-var to an entry's value.
Ring Functions
There are various purposes in an editor for which a ring of values can be used, so Hemlock
provides a general ring buffer type. It is used for maintaining a ring of killed regions, a ring
of marks, or a ring of command strings which various modes and commands maintain as a
history mechanism.
Makes an empty ring object capable of holding up to length Lisp objects. Delete-function is
a function that each object is passed to before it falls off the end. Length must be greater
than zero.
Returns as multiple-values the number of elements which ring currently holds and the
maximum number of elements which it may hold.
Returns the index'th item in the ring, where zero is the index of the most recently pushed.
Pushes object into ring, possibly causing the oldest item to go away.
Removes the most recently pushed object from ring and returns it. If the ring contains no
elements then an error is signalled.
With a positive offset, rotates ring forward that many times. In a forward rotation the index
of each element is reduced by one, except the one which initially had a zero index, which is
made the last element. A negative offset rotates the ring the other way.
Undoing commands
Miscellaneous
This chapter is somewhat of a catch-all for comments and features that don't fit well
anywhere else.
Key-events
Introduction
The canonical representation of editor input is a key-event structure. Users can bind
commands to keys, which are non-empty sequences of key-events. A key-event consists of
an identifying token known as a keysym and a field of bits representing modifiers. Users
define keysyms by supplying names that reflect the legends on their keyboard's keys. Users
define modifier names similarly, but the system chooses the bit and mask for recognizing
the modifier. You can use keysym and modifier names to textually specify key-events and
Hemlock keys in a #k syntax. The following are some examples:
#k"C-u"
#k"Control-u"
#k"c-m-z"
#k"control-x meta-d"
#k"a"
#k"A"
#k"Linefeed"
This is convenient for use within code and in init files containing bind-key calls.
The #k syntax is delimited by double quotes, but the system parses the contents rather
than reading it as a Common Lisp string. Within the double quotes, spaces separate
multiple key-events. A single key-event optionally starts with modifier names terminated
by hyphens. Modifier names are alphabetic sequences of characters which the system uses
You can escape special characters --- hyphen, double quote, open angle bracket, close angle
bracket, and space --- with a backslash, and you can specify a backslash by using two
contiguously. You can use angle brackets to enclose a keysym name with many special
characters in it. Between angle brackets appearing in a keysym name position, there are
only two special characters, the closing angle bracket and backslash.
Interface
Keysym can be any object, but generally it is either an integer representing the window-
system code for the event, or a keyword which allows the mapping of the keysym to its code
to be defined separately.
Defines the window-system code for the key event which in Hemlock is represented by
keysym.
This function defines keysym named name for key-events representing mouse click events.
Shifted-bit is a defined modifier name that translate-mouse-key-event sets in the key-event
it returns whenever the shift bit is set in an incoming event.
This function returns the keysym named name. If name is unknown, this returns nil.
This function returns the list of all names for keysym. If keysym is undefined, this returns
nil.
This returns the preferred name for keysym, how it is typically printed. If keysym is
undefined, this returns nil.
This establishes long-name and short-name as modifier names for purposes of specifying
key-events in #k syntax. The names are case-insensitive strings. If either name is already
defined, this signals an error.
The system defines the following default modifiers (first the long name, then the short
name):
"Hyper", "H"
"Super", "S"
"Meta", "M"
"Control", "C"
"Shift", "Shift"
"Lock", "Lock"
*all-modifier-names* [Variable]
This function returns bits suitable for make-key-event from the supplied modifier-names.
If any name is undefined, this signals an error.
This function returns a mask for modifier-name. This mask is suitable for use with
key-event-bits. If modifier-name is undefined, this signals an error.
This returns a list of key-event modifier names, one for each modifier set in bits.
This function returns a key-event described by object with bits. Object is one of keysym,
string, or key-event. When object is a key-event, this uses key-event-keysym. You can form
bits with make-key-event-bits or key-event-modifier-mask.
This function returns the key-event associated with character. You can associate a
key-event with a character by setf-ing this form.
This function returns the character associated with key-event. You can associate a
character with a key-event by setf'ing this form. The system defaultly translates key-events
in some implementation dependent way for text insertion; for example, under an ASCII
system, the key-event #k"C-h", as well as #k"backspace" would map to the Common Lisp
character that causes a backspace.
This function returns whether key-event has the bit set named by bit-name. This signals an
error if bit-name is undefined.
This macro evaluates each form with var bound to a key-event representing an alphabetic
character. Kind is one of :lower, :upper, or :both, and this binds var to each key-event in
order a-z A-Z. When :both is specified, this processes lowercase letters first.
? (require "PTY")
? (ccl::disable-tty-local-modes 0 #$ICANON)
T
will turn off "input canonicalization" on file descriptor 0, which is at least part of what you
need to do here. This disables the #$ICANON mode, which tells the OS not to do any
line-buffering or line-editing. Of course, this only has any effect in situations where the OS
ever does that, which means when stdin is a TTY or PTY.
(where the first READ-CHAR consumes the newline, which isn't really necessary to make
the reader happy anymore.) So, you can do:
? (read-char)
#\Space
(where there's a space after the close-paren) without having to type a newline.
I'm using the graphics demos. Why doesn't the menubar change?
When you interact with text-only Clozure CL, you're either in Terminal or in Emacs,
running Clozure CL as a subprocess. When you load Cocoa or the graphical environment,
the subprocess does some tricky things that turn it into a full-fledged Application, as far as
the OS is concerned.
So, it gets its own icon in the dock, and its own menubar, and so on. It can be confusing,
because standard input and output will still be connected to Terminal or Emacs, so you can
still type commands to Clozure CL from there. To see the menubar you loaded, or the
windows you opened, just click on the Clozure CL icon in the dock.
I'm using Slime and Cocoa. Why doesn't *standard-output* seem to work?
This comes up if you're using the Slime interface to run Clozure CL under Emacs, and you
are doing Cocoa programming which involves printing to *standard-output*. It seems as
though the output goes nowhere; no error is reported, but it doesn't appear in the
*slime-repl* buffer.
For the most part, this is only relevant when you are trying to insert debug code into your
event handlers. The SLIME listener runs in a thread where the standard stream variables
(like *STANDARD-OUTPUT* and and *TERMINAL-IO* are bound to the stream used to
communicate with Emacs; the Cocoa event thread has its own bindings of these standard
stream variables, and output to these streams goes to the *inferior-lisp* buffer instead.
Look for it there.
application bundle
ccl directory
The directory containing Clozure CL's source code and interface databases. The ccl
logical host should refer to this directory.
Cocoa
code point
A value in the Unicode code space; that is, a non-negative integer below
char-code-limit (#x110000).
creator code
displaced array
An array with no storage of its own for elements, which points to the storage of
another array, called its target. Reading or writing the elements of the displaced array
returns or changes the contents of the target.
fasl file
A file containing compiled lisp code that the Lisp is able to quickly load and use. A
"fast-load" file.
heap image
The in-memory state of a running Lisp system, containing functions, data structures,
variables, and so on. Also, a file containing archived versions of these data in a format
that can be loaded and reconstituted by the Lisp lisp kernel. A working Clozure CL
system consists of the kernel and a heap image.
Hemlock
A text editor, written in Common Lisp, similar in features to Emacs. Hemlock was
originally developed as part of CMU Common Lisp. A portable version of Hemlock is
built into the Clozure CL IDE.
IDE
InterfaceBuilder
An application supplied by Apple with their developer tools that can be used to
lisp kernel
The binary executable program that implements the lowest levels of the Lisp system.
A working Clozure CL system consists of the kernel and a heap image.
listener window
memory-mapped file
A file whose contents are accessible as a range of memory addresses. Some operating
systems support this feature, in which the virtual memory subsystem arranges for a
range of virtual memory addresses to point to the contents of an open file. Programs
can then gain access to the file's contents by operating on memory addresses in that
range. Access to the file's contents is valid only as long as the file remains open.
nibfile
REPL
s-expression
The simplest, most general element of Lisp syntax. An s-expression may be an atom
(such as a symbol, integer, or string), or it may be a list of s-expressions.
special variable
static variable
In Clozure CL, a variable whose value is shared across all threads, and which may not
be dynamically rebound. Changing a static variable's value in one thread causes all
threads to see the new value. Attempting to dynamically rebind the variable (for
instance, by using LET, or using the variable name as a parameter in a LAMBDA form)
signals an error.
toplevel function
The function executed by Lisp automatically once its startup is complete. Clozure
CL's default toplevel is the interactive read-eval-print loop that you normally use to
interact with Lisp. You can, however, replace the toplevel with a function of your own
design, changing Clozure CL from a Lisp development system into some tool of your
making.
type-specifier
An expression that denotes a type. Type specifiers may be symbols (such as CONS and
STRING), or they may be more complex S-expressions (such as (UNSIGNED-BYTE
8)).
%%get-signed-longlong
%address-of %%get-unsigned-longlong
%ff-call %copy-float
%get-cstring %get-byte
%get-fixnum %get-double-float
%get-ptr %get-long
%get-signed-long %get-signed-byte
%get-single-float %get-signed-word
%get-unsigned-long %get-unsigned-byte
%get-word %get-unsigned-word
%incf-ptr %inc-ptr
%null-ptr %int-to-ptr
%ptr-eql %null-ptr-p
%reference-external-entry-point %ptr-to-int
%setf-macptr %set-toplevel
%str-from-ptr %stack-block
%word-to-int %vstack-block
accept-connection abort-break
add-auto-flush-stream accessor-method-slot-definition
add-direct-method add-dependent
add-feature add-direct-subclass
advise add-method
all-processes advisedp
application-init-file application-error
arglist apply-in-frame
assq arglist-string
base-string-p augment-environment
Exported Symbols
bitp bignump
caller-functions byte-length
cancel-terminate-
catch-cancel
when-unreachable
class-direct-default-initargs class-default-initargs
class-direct-subclasses class-direct-slots
class-finalized-p class-direct-superclasses
class-precedence-list class-own-wrapper
class-slots class-prototype
clear-clos-caches clear-all-gf-caches
clear-gf-cache clear-coverage
clear-open-file-streams clear-lock-acquisition-status
clear-semaphore-notification-
close-core
status
collect-heap-utilization close-shared-library
compile-ccl combine-coverage
compiler-let compile-user-function
compiler-macroexpand-1 compiler-macroexpand
compiler-warning-source-note compiler-warning-function-name
compute-applicable-methods-using-
compute-applicable-methods
classes
compute-default-initargs compute-class-precedence-list
compute-effective-slot-definition compute-effective-method
configure-egc compute-slots
copy-file constant-symbol-p
copy-instance copy-from-core
core-b core-all-processes
core-cdr core-car
core-find-class core-consp
core-find-process-for-id core-find-package
core-functionp core-find-symbol
core-hash-table-count core-gethash
core-instance-class core-heap-utilization
core-keyword-package core-instance-p
core-lfun-bits core-l
core-list core-lfun-name
core-map-symbols core-listp
core-nullp core-nth-immediate
core-object-typecode-type core-object-type-key
core-package-names core-package-name
core-print-call-history core-print
core-q core-process-name
core-symbol-name core-string=
core-symbol-plist core-symbol-package
core-symbolp core-symbol-value
core-uvector-p core-type-string
core-uvsize core-uvref
core-uvtypep core-uvtype
count-characters-in-octet-vector core-w
coverage-code-forms-total coverage-code-forms-covered
coverage-expressions-entered coverage-expressions-covered
coverage-functions-fully-covered coverage-expressions-total
coverage-functions-partly-covered coverage-functions-not-entered
coverage-source-file coverage-functions-total
coverage-unreached-branches coverage-statistics
create-directory cpu-count
create-interfaces create-file
current-directory current-compiler-policy
current-process-allocation-quantum current-file-compiler-policy
cwd current-time-in-nanoseconds
declaration-information dbg-form
def-foreign-type decode-string-from-octets
default-allocation-quantum def-load-pointers
defglobal defcallback
define-character-encoding define-callback
define-declaration define-character-encoding-alias
define-setf-method define-definition-type
defloadvar definition-type-name
defstaticvar defstatic
delq delete-directory
describe-character-encodings describe-character-encoding
directory-pathname-p direct-slot-definition-class
displaced-array-p directoryp
dotted-to-ipaddr dispose-heap-ivector
dparef dovector
edit-definition-p drain-termination-queue
egc effective-slot-definition-class
egc-configuration egc-active-p
enclose egc-enabled-p
ensure-class encode-string-to-octets
ensure-generic-function-
ensure-class-using-class
using-class
ensure-source-note-text ensure-simple-string
event-ticks eql-specializer-object
external extended-char-p
external-format-character-encoding external-call
external-format-
external-process-error-stream
line-termination
external-process-input-stream external-process-id
external-process-status external-process-output-stream
extract-specializer-names extract-lambda-list
fasl-concatenate false
finalize-inheritance ff-call
find-method-combination find-definition-sources
find-source-note-at-pc find-referencers
foreign-symbol-address fixnump
frame-function foreign-symbol-entry
frame-supplied-arguments frame-named-variables
free-static-conses free
full-pathname fset
funcallable-standard-instance-
function-args
access
function-name function-information
gc function-source-note
gc-retaining-pages gc-retain-pages
gc-verbose-p gc-verbose
gctime gccounts
generic-function-argument-
generic-function-declarations
precedence-order
generic-function-method-class generic-function-lambda-list
generic-function-method-
generic-function-methods
combination
get-character-encoding generic-function-name
get-encoded-string get-coverage
get-fpu-mode get-foreign-namestring
get-incremental-coverage get-gc-notification-threshold
get-setf-method get-output-stream-vector
get-string-from-user get-setf-method-multiple-value
grab-lock getenv
heap-allocation-allowed-p hash-table-weak-p
idom-heap-utilization heap-utilization
incremental-coverage-source-
include
matches
incremental-coverage-
init-list-default
svn-matches
ipaddr-to-dotted intern-eql-specializer
join-process ipaddr-to-hostname
lisp-heap-gc-threshold let-globally
local-filename list-character-encodings
local-port local-host
lock-acquisition-status local-socket-address
lookup-character-encoding lock-name
lookup-port lookup-hostname
mac-default-directory lsh
macroexpand-all macptrp
make-heap-ivector make-external-format
make-lock-acquisition make-lock
make-process make-population
make-record make-read-write-lock
make-semaphore-notification make-semaphore
make-truncating-string-stream make-socket
make-vector-output-stream make-vector-input-stream
map-core-areas map-call-frames
map-file-to-ivector map-dependents
map-heap-objects map-file-to-octet-vector
method-exists-p memq
method-generic-function method-function
method-name method-lambda-list
method-specializers method-qualifiers
native-to-pathname name-of
neq native-translated-namestring
nfunction new-compiler-policy
nremove note-open-file-stream
object-direct-size nstring-studlify
open-file-streams open-core
optimize-generic-function-
open-shared-library
dispatching
override-one-method-
paref
one-arg-dcode
parse-proc-maps parse-macro
parse-unsigned-integer parse-signed-integer
pointerp pathname-encoding-name
population-type population-contents
print-call-history pref
process-abort proc-maps-diff
process-allow-schedule process-allocation-quantum
process-enable process-creation-time
process-initial-form process-exhausted-p
process-interrupt process-input-wait
process-kill-issued process-kill
process-output-wait process-name
process-preset process-plist
process-reset process-priority
process-resume process-reset-and-enable
process-serial-number process-run-function
process-suspend-count process-suspend
process-total-run-time process-termination-semaphore
process-wait-with-timeout process-wait
proclaimed-special-p process-whostate
pui-stream psi-stream
ratiop quit
reader-method-class read-coverage-from-file
receive-from rebuild-ccl
release-lock record-source-file
remote-host remote-filename
remote-socket-address remote-port
remove-character-encoding-alias remove-auto-flush-stream
remove-direct-method remove-dependent
remove-feature remove-direct-subclass
remove-open-file-stream remove-method
report-compiler-warning repl-function-name
require-type report-coverage
reset-coverage reserved-static-conses
resolve-address reset-incremental-coverage
restore-coverage-from-file restore-coverage
rletz rlet
run-program rref
save-coverage save-application
select-item-from-list save-coverage-in-file
send-to semaphore-notification-status
set-current-file-compiler-policy set-current-compiler-policy
set-event-ticks set-development-environment
set-funcallable-instance-function set-fpu-mode
set-lisp-heap-gc-threshold set-gc-notification-threshold
setenv set-user-environment
shutdown setf-function-spec-name
signal-semaphore signal-external-process
slot-boundp-using-class signed-integer-to-binary
slot-definition-documentation slot-definition-allocation
slot-definition-initform slot-definition-initargs
slot-definition-location slot-definition-initfunction
slot-definition-readers slot-definition-name
slot-definition-writers slot-definition-type
slot-value-using-class slot-makunbound-using-class
socket-address-host socket-address-family
socket-address-port socket-address-path
socket-creation-error-code socket-connect
socket-creation-error-
socket-creation-error-situation
identifier
socket-error-code socket-error
socket-error-situation socket-error-identifier
socket-os-fd socket-format
source-note-end-pos socket-type
source-note-p source-note-filename
source-note-text source-note-start-pos
special-form-p sparef
specializer-direct-generic-
specializer-direct-methods
functions
static-cons standard-instance-access
stream-clear-input stream-advance-to-column
stream-deadline stream-clear-output
stream-direction stream-device
stream-force-output stream-eofp
stream-input-timeout stream-fresh-line
stream-line-length stream-line-column
stream-output-timeout stream-listen
stream-read-byte stream-peek-char
stream-read-char-no-hang stream-read-char
stream-read-line stream-read-ivector
stream-read-vector stream-read-list
stream-terpri stream-start-line-p
stream-write-byte stream-unread-char
stream-write-ivector stream-write-char
stream-write-string stream-write-list
string-size-in-octets stream-write-vector
structurep structure-typep
target-fasl-version symbol-value-in-process
terminate temp-pathname
termination-function terminate-when-unreachable
throw-cancel test-ccl
toplevel timed-wait-on-semaphore
toplevel-loop toplevel-function
transitive-referencers trace-function
try-lock true
tyo tyi
unadvise type-specifier-p
unmap-ivector uncompile-function
unsetenv unmap-octet-vector
untyi unsigned-integer-to-binary
unwatch unuse-interface-dir
update-dependent update-ccl
use-lisp-heap-gc-threshold use-interface-dir
uvref uvectorp
validate-superclass uvsize
wait-for-signal variable-information
watch wait-on-semaphore
whitespacep weak-gc-method
with-decoding-problems-as-errors with-cstrs
with-encoding-problems-as-errors with-encoded-cstrs
with-input-from-vector with-filename-cstrs
with-interrupts-enabled with-input-timeout
with-macptrs with-lock-grabbed
with-output-timeout with-open-socket
with-pointer-to-ivector with-output-to-vector
with-string-vector with-read-lock
with-write-lock with-terminal-input
without-duplicate-definition-
without-compiling-code-coverage
warnings
write-coverage-to-file without-interrupts
xcompile-ccl writer-method-class
*.fasl-pathname* xload-level-0
*alternate-line-terminator* *.lisp-pathname*
*application* *always-eval-user-defvars*
*backtrace-format* *autoload-lisp-package*
*backtrace-print-length* *backtrace-on-break*
*backtrace-show-internal-frames* *backtrace-print-level*
*break-loop-when-uninterruptable* *break-hook*
*break-on-warnings* *break-on-errors*
*check-call-next-method-
*command-line-argument-list*
with-args*
*compile-definitions* *compile-code-coverage*
*default-external-format* *current-process*
*default-file-character-
*default-line-termination*
encoding*
*default-socket-character-
*disassemble-verbose*
encoding*
*enable-automatic-termination* *elements-per-buffer*
*error-print-length* *error-print-circle*
*fasl-save-definitions* *error-print-level*
*fasl-save-local-symbols* *fasl-save-doc-strings*
*host-page-size* *heap-image-name*
*ignore-extra-close-
*lisp-cleanup-functions*
parenthesis*
*listener-indent* *lisp-startup-functions*
*load-preserves-optimization-
*listener-prompt-format*
settings*
*long-site-name* *loading-file-source-file*
*merge-compiler-warnings* *make-package-use-defaults*
*module-search-path* *module-provider-functions*
*pathname-translations-
*pending-gc-notification-hook*
pathname*
*print-simple-bit-vector* *print-abbreviate-quote*
*print-string-length* *print-simple-vector*
*quit-interrupt-hook* *print-structure*
*record-pc-mapping* *quit-on-eof*
*report-time-function* *record-source-file*
*restore-lisp-functions* *resident-editor-hook*
*save-definitions* *save-arglist-info*
*save-exit-functions* *save-doc-strings*
*save-source-locations* *save-local-symbols*
*select-interactive-
*short-site-name*
process-hook*
*signal-printing-errors* *show-restarts-on-break*
*terminal-character-encoding-name* *svn-program*
*top-error-frame* *ticks-per-second*
*trace-level* *trace-bar-frequency*
*trace-print-length* *trace-max-indent*
*trust-paths-from-environment* *trace-print-level*
*vector-output-stream-default- *unprocessed-command-
initial-allocation* line-arguments*
*warn-if-redefine-kernel* *warn-if-redefine*
@ +null-ptr+