Strong Static Type Checking in Lisp
Strong Static Type Checking in Lisp
Robert L. Akers
Abstract
This thesis presents a type system which supports the strong static type checking of programs developed
in an applicative subset of the Common Lisp language. The Lisp subset is augmented with a guard
construct for function definitions, which allows type restrictions to be placed on the arguments. Guards
support the analysis and safe use of partial functions, like CAR, which are well-defined only for
arguments of a certain type.
A language of type descriptors characterizes the type domain. Descriptors are composed into function
signatures which characterize the guard and which map various combinations of actual parameter types to
possible result types. From a base of signatures for a small collection of primitive functions, the system
infers signatures for newly submitted functions.
The system includes a type inference algorithm which handles constructs beyond the constraints of
ML-style systems. Most notable are the free use of CONS to construct objects of undeclared type and the
use of IF forms whose two result components have unrelated types, resulting in ad hoc polymorphism.
Accordingly, the type descriptor language accommodates disjunction, unrestricted CONS, recursive type
forms, and ad hoc polymorphic function signatures. As with traditional type inference systems,
unification is a central algorithm, but the richness of our type language complicates many component
algorithms, including unification. Special treatment is given to recognizer functions, which are predicates
determining whether an object is of a particular type. Type inference in this setting is undecidable, so the
algorithm is heuristic and is not complete.
The semantics of the system are in terms of a function which determines whether values satisfy
descriptors with respect to a binding of type variables. The soundness of each signature emitted by the
system is validated by a signature checker, whose properties are formally specified with respect to the
formal semantics and proven to hold. The checker algorithm is substantially simpler than the inference
algorithm, as it need not perform operations such as discovering closed recursive forms. Thus, its proof is
both more straightforward to construct and easier to validate than a direct proof of the inference algorithm
would be.
Type Checking for Common Lisp 1
Introduction
Chapter 1
INTRODUCTION
Types in a programming language are a means of organizing the representation of program objects at a
level of abstraction appropriate to the language. As such, they provide a means for subjecting data
representation issues to distinct analysis. Inconsistent views of the representation can be isolated as type
errors. A common belief is that a very large portion of programming errors are manifested as type errors.
Early exposure of type errors reduces the cost of software development.
For the purposes of this discussion, I will adopt the following terminology:
• A type error occurs when an operation is applied to an operand of an incompatible type, or
when the type of a construct does not match that expected in its context.
• Static type checking is type checking performed by a compiler or pre-processor.
• Dynamic type checking is type checking performed while the program is running.
• A type system is a collection of rules for assigning type expressions to the various parts of a
program.
• A type checker implements a type system.
• A language is strongly typed if its compiler or another pre-processor can guarantee that the
programs it accepts will execute without type errors.1
• Polymorphic languages are those in which some values and variables may have more than
one type, and polymorphic functions are functions whose operands can have more than one
type.
• Polymorphic types are types whose operations are applicable to operands of more than one
type. [Cardelli 85]
• Type inference is the problem of determining the type of a language construct from the way
it is used.
1
These definitions, adopted by Aho, Sethi, and Ullman [Aho 86], differ from some others. For example, Horowitz [Horowitz 84]
says that a language is strongly typed if the type of all expressions is known at compile time. Cardelli and Wegner [Cardelli 85]
apply Horowitz’s definition of strongly typed to the term statically typed, and accept Aho’s definition of strong typing.
2 Type Checking for Common Lisp
Introduction
Common Lisp [Steele 90] is often referred to as an untyped programming language.2 To the contrary,
Lisp has a number of well-documented data types, including characters, various kinds of numbers, arrays,
and CONS-es. But rather than suggesting the absence of types in Lisp, the claim that Lisp is untyped
generally refers to one of two notions:
• There is little static type checking at function definition time or at compile time.
• In the vast universe of structures which can be composed with CONS, the most explicit
typing normally ascribed is simply that the structure is a CONS, i.e., it is an ordered pair
with a CAR (head) and a CDR (tail).
Although some Lisp compilers will do perfunctory type checking when constant values are embedded in a
program, there are generally no type restrictions on variables, and thus no type checking on variables is
possible until run-time, when the variables are bound to specific values. The inability to type-check
CONS structures is especially problematic, partly because of the ease with which arbitrary structures can
be created. A major deterrent to static type checking in Lisp is the presence of destructive operations on
CONS structures. If allowed in its full generality, this upsets the locality of type analysis and makes static
type checking infeasible. It would be possible to allow destructive updates which preserve the type of the
updated object, but even a determination that the type was being preserved would require a search of the
program context, since a function higher in the call tree might have a more restrictive view of the type
than is necessary in the local context.
The lack of a strong static type discipline hampers Common Lisp in several respects. Software engineers
highly value the early detection of errors, recognizing the high cost of correction later in the software
cycle. The lack of type checking and the lack of thorough typing of structured objects in Common Lisp is
a serious deterrent to the employment of Lisp. In its lack of type checking and type annotation, the
problem of maintaining type consistency in Lisp is much like the same problem in Backus’ original FP
language [Backus 78], and it has been noted that misunderstandings of data structures are the most
common errors in FP programming [Pozefsky 78]. If anything, the open-ended flexibility of Lisp makes
the problem even more serious than in FP.
Furthermore, the dynamic type checking complicates compilation and degrades run-time performance of
compiled programs. If a compiler is to generate code that supports diagnosis and debugging of type
errors, it must generate code that checks for type errors prior to any operation which may expose them.
The normal execution of this code incurs substantial performance overhead, which contributes to the
reputation of Lisp as being inefficient. To mitigate this inefficiency, Common Lisp [Steele 90] gives
compilers the option of offering as many as four safety levels. The KCL compiler [Yuasa 85], for
example, may be instructed to generate code without type checks that "runs" fast but behaves without
accountability when errors occur. Clearly, a better solution, which might be provided by strong static type
checking, is generation of fast code combined with the assurance by static analysis that type errors will
not occur. The program would then have the same behavior with respect to input and output, whether or
not type checking code were generated.
A third motivation lies in support of automated formal reasoning about Lisp programs. Typically,
reasoning about total functions is much more straightforward than dealing with partial functions. The
Boyer & Moore theorem prover Nqthm [Boyer & Moore 88], for example, implements a logic bearing a
clear resemblance to a functional subset of Lisp, but with the important distinction that all functions, in
particular functions like CAR, are total and default to standard values in cases where Lisp functions would
2
Cardelli [Cardelli 89] refers to Lisp as a type-free language, meaning that there is run-time typing, but no static typing.
Type Checking for Common Lisp 3
Introduction
break. If the simplicity of having a domain of total functions were to carry over to the analysis of
Common Lisp functions, some mechanism would be necessary to guarantee that function calls like (CAR
3) could never occur in functions admitted to the logic. Static type analysis is one obvious way to
approach this problem.
Fortunately, a functional subset of Common Lisp may lend itself to this kind of analysis. Consider a
subset which is purely applicative (with no destructive list operators) and is thereby referentially
transparent.3With such a subset, Hindley-Milner style type inference may be possible, as it is in
ML [Hindley 69, Milner 78]. Lisp, however, is much more flexible than ML-style languages in ways that
will complicate the straightforward classical inference algorithms. The crux of this thesis is to deal with
these complications in developing a strong static polymorphic type checking system for an applicative
subset of Common Lisp.
A desirable side effect of the effort deals with the clarity of Lisp programs. Pre- and
post-conditions [Hoare 69], in addition to being useful tools for formal reasoning about imperative
programs, are valuable program documentation. They provide a specification of the domain and range of
a function which a user may use both as an aid to writing the function and as a guide in determining when
an existing function may be properly utilized. Type signatures are weak pre- and post-conditions. Since
our strong static typing discipline results in type signatures for functions, this expressive benefit can be
added to Lisp functions where it has previously been missing.
This effort targets a subset of Common Lisp which is applicative, and hence has relatively uncomplicated
semantics, yet which presents some type checking problems which are not faced by the family of ML-
style languages. What about this subset, then, lies outside the well traveled territory of ML and its
descendents?
ML supports parametric polymorphism of functions (a term originated by Strachey [Strachey 67]) which
means that functions may work normally on an infinite number of types, all of which have a given
common structure. Lisp functions, however, may exhibit both parametric polymorphism and ad hoc
polymorphism (also due to Strachey), whereby functions may work on objects of unrelated types, perhaps
by executing different code. Furthermore, the various objects returned by a given Lisp function may be of
completely unrelated types. One may view all Lisp conditional forms as variations of IF. The IF
expression was invented by McCarthy [McCarthy 62] apparently to support the explicit definition of
recursive functions, but it may be used in a more general manner to achieve ad hoc polymorphism. The
two result arms of Lisp IF forms may have completely different types, and both must be considered
options for the result type. By contrast, the result arms of IF forms in ML-style languages must have the
same type, or at least unifiable types. This feature of Lisp would throw a significant monkey wrench in
ML type inference by limiting the use of classical unification, which is ubiquitous in the inference
algorithm.
Another equally critical distinction is that Lisp offers relative flexibility in the composition of structures.
The ML family offers only restricted versions of structured types: lists, all of whose elements must be of
the same type; tuples, which are distinct from lists and whose elements are of specified type; and
3
Referential transparency is the property that any expression can be replaced by any other expression which has the same value
without affecting the value of any larger expression of which it is a part. Conventionally stated, "Equals can be substituted for
equals."
4 Type Checking for Common Lisp
Introduction
explicitly prescribed structures like the algebraic types of Miranda [Turner 85]. Declarations of algebraic
types are required and specify unique constructor and selector functions. There is no commonality among
these structured types. They are constructed differently, represented differently, and are not comparable.
All are strictly typed. A representation of the type structure of Miranda, for example, in terms of a
grammar for type descriptors, might be (where italicized identifiers are type variables in the type
descriptor grammar and *, **, ... are generic type variables in the Miranda language):
<type> ::= NUM | CHAR | BOOL | (*LIST <type>) |
(*TUPLE t1, .., tn), n >= 2 |
t1 -> t2 {function types} |
*, **, ... | UNDEF | <algebraic> | <type variable>
Algebraic types, described with the <algebraic> descriptor, are like tuples in that they can be used to
construct mixed structures, but only if the algebraic type is previously declared. Algebraic objects are
tagged in the concrete representation.
Lisp CONS structures, on the other hand, are arbitrarily irregular. Lists, all manner of trees, and arbitrary
tuples are all constructed with the binary CONS function. S-expressions may be viewed as having
multiple types, in the sense that a singleton list may also be considered to be a singleton tuple or a brutally
pruned tree. Even the predefined value NIL is overloaded, since it is routinely viewed as both a (negative)
Boolean value and as an empty list. Dealing with this flexibility raises significant complications in many
respects of the type inference problem, including canonicalization of type information, unification of type
descriptors, and dealing with a multiplicity of potential recognizer functions.
A grammar for type descriptors for a functional Common Lisp might be:
<descriptor> ::= <simple descriptor> | <variable> | *EMPTY |
*UNIVERSAL | (*CONS <descriptor> <descriptor> ) |
(*OR <descriptor>* ) | <rec descriptor>
This is, in fact, the grammar employed in the function signature checker described in this thesis. The
*CONS descriptor constructs a binary object from arbitrary pieces. The *OR descriptor can be used to
characterize ad hoc polymorphism of function types. These characterize two of the factors which will
most distinguish Lisp type inference from the classical algorithms. *REC descriptors, which allow for
arbitrary recursive structures using the *CONS and *OR constructs, extend the impact of these
distinctions.
Type Checking for Common Lisp 5
Introduction
Yet another characteristic which will distinguish a Lisp type system from an ML-style system is the
structure of function signatures. A function signature typically maps a list of parameter types to a result
type. For instance, the type of a binary integer addition function can be expressed:
(INTEGER INTEGER) -> INTEGER
ML-style function types need only include a single expression of this form, even when they are
polymorphic, since their polymorphism is parametric. For example, an ML function extracting the first
element of a list would have the signature:
(* list) -> *
where the "*" is a type parameter which can be instantiated with any ML type. But the ad hoc
polymorphism of Common Lisp functions means that often no single expression of this form can capture a
function signature. Differently typed parameters can produce results of unrelated type. A schema for
Lisp signatures can accommodate ad hoc polymorphism by including multiple components, or segments.
For example, a modified CAR function which defaults to 0 if the parameter is something other than a
CONS might have the signature:
(*CONS &1 *UNIVERSAL) -> &1
Multiple-segment signatures complicate the type inference problem for Common Lisp and further
distinguish it from ML-style type inference.
1.3 Synopsis
This thesis presents a type system which supports the strong static type checking of programs developed
in a simple subset of the Common Lisp language. A language of type descriptors characterizes the type
domain. Descriptors are composed into function signatures which characterize the type requirements
placed on actual parameters to the functions, and which map various combinations of actual parameter
types to possible result types. From a base of signatures for a small collection of primitive functions, the
system infers signatures for newly submitted functions. The signature inferred for each new function is
shown to be sound with respect to a formal semantics for the type system by application of a signature
checker whose properties are formally specified and proven to hold.
The language subset is limited to functions of fixed arity, where the only forms allowed in the body are
references to the formal parameters, IF forms of arity three, calls of previously defined functions, literals
(of type character, integer, rational, T, NIL, or string), and quoted forms where the quoted object is either
a literal, a symbol, or a CONS of any such objects (recursively). All data objects are assumed to be of
finite size. Although there is no formal requirement that functions always terminate, the inference
algorithm was designed with terminating functions in mind, and submission of non-terminating functions
may result in weak results.
We augment the Common Lisp subset with a guard construct, which may be included as part of a function
definition. A guard is a predicate on the function arguments which essentially restricts the domain of the
function. The function is well-defined on some arguments only if the guard evaluates to a non-NIL value.
Typically, a guard can characterize type restrictions on the parameters which will guarantee the absence of
type errors in the function body.
6 Type Checking for Common Lisp
Introduction
The type system includes a type inference algorithm in the tradition of ML. But it handles constructs
which violate the constraints of ML and its successors, most notably the free use of CONS to construct
objects of undeclared type and the use of IF forms whose two result arms have objects of totally unrelated
type, resulting in ad hoc polymorphism. Accordingly, in addition to the primitive types of Lisp, the type
descriptor language includes type variables and forms for disjunction, unrestricted use of CONS, and a
similarly unrestricted recursive type constructor. As with ML type systems, unification is the central
algorithm. But because of the richness of the type language, and since unification is performed as part of
a process of discovering closed form descriptors for recursively constructed data structures, the unification
algorithm is vastly more complicated.
The type descriptor language is used to express type signatures for functions. Among other things, a
signature includes a guard descriptor,which is a list of descriptors whose length is equal to the arity of the
function, and whose elements each characterize the type requirements on the corresponding function
argument. The signature also includes a collection of segments, where a segment is a list of argument
type descriptors paired with a result descriptor. For any values satisfying the guard of a function, there
exists a segment for the function such that the values satisfy the argument descriptors in the segment, and
the result of evaluating the function on those values satisfies the result descriptor. The system maintains a
database of signatures for all existing functions. Given a new function, the inference system attempts to
generate a signature. When it succeeds, it adds it to the database. When the function’s guard can be
completely captured within the type system, and when the same is true for all the functions in its call tree,
we will show that the signature is correct, that whenever the function’s guard is satisfied by some actual
parameters (i.e., the guard evaluates to a non-NIL value), evaluating the function call will not result in a
guard violation, and furthermore that to establish that the function’s guard is satisfied, it is sufficient to
show that its guard descriptors are satisfied by the actual parameters.
Typically, guards, as well as predicates used as IF tests, include applications of type recognizer functions
to formal parameters. A recognizer is a Boolean-valued, single-argument function of a certain form which
determines whether its argument has a particular type within the type system. Some recognizer functions,
such as INTEGERP, CHARACTERP, and CONSP, are in the primitive subset. But others may be
submitted as new functions to recognize objects of arbitraily complex type. Recognizer functions are
spotted by the inference tool and given special treatment, as they can play an important role in inferring
precise function signatures. In particular, a guard can be totally captured in the type system if it is
expressed as a conjunction of recognizer calls on distinct formal parameters. In this case, we say the
guard descriptor is complete.
The semantics of the system are in terms of a function which interprets descriptors and values with respect
to a binding of type variables. This function determines whether a value satisfies a descriptor under the
binding. The soundness of the system is guaranteed by a result checker, which has been proved correct.
The checker validates each signature emitted by the inference system. The checker algorithm is
substantially simpler than the inference algorithm, as it need not perform operations such as finding closed
recursive forms and negating descriptors. Thus, its proof is both simpler to construct and easier to
validate than a direct proof of the inference algorithm would be.
The inference system is not shown to be complete, as the general problem of type inference in this domain
is undecidable. Neither is the algorithm proven to terminate, though a collection of safeguards have
enabled it to withstand rigorous testing without an instance of non-termination.
The rest of this document is structured as follows. The next chapter is a survey of previous work related
to the topic. Chapter 3 is an overview of the complete system, including an introductory explanation of
function signatures and their derivation. Chapter 4 is a detailed guide to the type inference algorithm.
Type Checking for Common Lisp 7
Introduction
Chapter 5 presents the formal semantics of the type system. Chapter 6 is a detailed presentation of the
algorithm used to check the validity of function signatures generated by the inference algorithm. Chapter
7 is a proof of soundness for the checker described in Chapter 6. Since the proof is voluminous, many
significant components are incorporated by reference from a proof appendix. Chapter 8 describes the
performance of the system on an extensive test suite. Chapter 9 discusses future work on the topic and
makes concluding remarks. Other appendices include a description of the initial state of the system,
discussion of some alternate semantic models which were explored, and the Lisp code implementing some
selected functions in the system.
Several other supplementary materials are available via anonymous ftp at the time of writing. These
include the source code implementing the entire system, the postscript for this document, and a log of tests
conducted on the system. Appendix H describes everything that is available and gives instructions on
how to retrieve it.
8 Type Checking for Common Lisp
Introduction
Type Checking for Common Lisp 9
Background and Related Work
Chapter 2
BACKGROUND AND RELATED WORK
In this section, we will examine the various fundamental characterizations of type relevant to the body of
work on automatic type inference. Then we will survey the various related threads of research into
recursive types, functional programming languages, polymorphism, and type inference, with particular
emphasis on the lineage of ML. Next we will look at previous efforts to bring static type checking to Lisp
systems in particular, and finish with references to the relationship between formal type checking systems
and optimizing compilers.
The term type has roots in mathematics which significantly predate computers. Bertrand Russell [Russell
08] developed a theory of types to deal with paradoxes involving self-referencing sets. His notion of type
was "the range of significance of a propositional function, i.e, the collection of arguments for which the
said function has values." Church’s "A Formulation of the Simple Theory of Types" [Church 40] partially
incorporated the calculus of lambda conversion into a theory of types adapted from Russell’s original
theory. Hoare and Allison [Hoare 72] give a very readable presentation of the resolution of Russell’s
paradox with a typed system.
Though types have been manifest in programming languages since the inception of "high level" languages
like FORTRAN in the late 1950’s, for years their status was more intuitive than formal. The primary
motivation was to distinguish different classes of data, partly because of their different storage
requirements and partly because of their different semantic properties. The underlying intuition is that a
type characterizes a set of values, presumably with some semantic or structural commonality, which can
be assumed by a variable or expression. Hoare [Hoare 73] characterized a type with a grammar for the
language of constant expressions in the set, involving only basic constructor functions for the type. His
work was in the spirit of types as sets of values, but his formal, abstract approach filtered concrete
representation issues away and placed types in a more algebraic setting.
The Simula 67 language [Dahl 66] was the first appearance of the notion of classes, which bundle a set of
values with the primitive operations over the value set. The goal of the Simula mechanism was not the
enrichment of type systems, however; the first reference in the literature with this mindset may be due to
John Laski [Laski 68]. J. H. Morris also recognized the significance of classes in providing type
abstraction, and expanded on it in his work. Morris [Morris 73] maintained that the specification of a type
should include not only a set of values, but also a collection of basic operations for manipulating the
objects. This packaging of operations with types has become a central feature in the conventional style of
data abstraction supported in many programming languages. Examples of this approach include
10 Type Checking for Common Lisp
Background and Related Work
SML [MacQueen 84a], Ada [DoD 83], Miranda [Turner 85], and Quest [Cardelli 89].4 Object-oriented
programming languages also take Morris’ approach, though in a significantly different style.
Straying even further from the simple notion of types as sets of values, in the Russell language [Demers
78, Demers 80a] "a data type is a set of named operations that provide an interpretation of the values of a
single universal value space common to all types." Demers and Donahue were following the work of
Dana Scott, who defined types as retracts of a universal domain [Scott 76]. This treatment allows data
types to be treated as first class values. Russell accomplishes static type checking and has a general
parameterization mechanism supporting parameterized types, types as parameters, and a disciplined form
of self-application. Any construction in the language can be parameterized with respect to any of its free
identifiers using call-by-value semantics. The type system includes a calculus of signatures, which are
type descriptors associated with every identifier and expression. Demers and Donahue use this uniformity
to argue for "type completeness" as a language design principle [Demers 80b]. One impact of the
universal value space on the type system is that the notion of "error" becomes one of "misinterpretation"
of data caused by applying an operation inappropriate for that type of object. An excellent general
discussion of the Russell approach to types and type checking is presented in [Donahue 85].
MacQueen and Sethi [MacQueen 82] developed a model of types as ideals, or sets with an object domain
which satisfies certain closure conditions. Essentially, these conditions are that the structure characterized
by types is preserved when going "downward" to approximations and "upward" to least upper bounds of
consistent sets of values. Their formulation follows Shamir and Wadge [Shamir 77]5 and Milner [Milner
78]. Within their theory, polymorphic functions are expressed as conjunctions of types. They introduce
higher order types, or kinds, to help formulate operations for constructing new types from existing ones.
MacQueen, Sethi, and Plotkin [MacQueen 84b] further developed this work by developing a metric
structure on their sets to establish the existence and uniqueness of solutions of most recursive type
equations. The set of admissible type expressions are those which are contractive in their algebra.
In contrast, Leivant contends that a type is a structural condition on data objects [Leivant 83a]. Such a
condition can be conveyed fully by syntactic expressions in a type discipline which simply encode the
ways in which data objects are allowed to interact.
Type theorists are also exploring other approaches to the notion of type as it applies to programming.
Focusing on abstract data types, Goguen et al. [Goguen 78] define a type as an isomorphism class of an
initial (many-sorted) algebra6 in some category. Thereby they have a schema for discussing the algebraic
properties of a type at a high level of abstraction, while maintaining the ability to map this level of
abstraction to an isomorphic implementation. Kamin [Kamin 80] identifies a type with the isomorphism
class of a final algebra, whereby every element of a type is the sole member of its equivalence class with
respect to all the operations on the type, and thus the type is maximally compact. In Martin-Lof’s
constructive type theory, types correspond to propositions in a formal logic [Martin-Lof 79]. This
approach reduces logic to type theory, and it defines a constructive logic as opposed to the conventional
classical logic. The rules of logic are derived from the definitions and rules for types, and the result is a
natural deduction system which is strong enough to formalize constructive mathematics and which allows
types to be specified with extremely fine granularity. This view is also described by Constable [Constable
4
None of these languages apply this packaging uniformly over their type systems, however. Rather, they provide it as a special
feature for defining abstract data types; their primitive types are in the straightforward ALGOL/Pascal tradition.
5
Shamir and Wadge were among the first to formulate a semantics of data types wherein types were themselves first class objects
in the domain of data objects.
6
An algebra here is a family of sets with a collection of operations among them.
Type Checking for Common Lisp 11
Background and Related Work
80] and used in his PL/CV project and in the Nuprl environment [Constable 86].
Steensgaard-Madsen [Steensgaard-Madsen 89] proposes that a type is a set of functions defined in the
lambda calculus. All these efforts are motivated by factors not particularly related to the concerns of type
checking Lisp programs. They are nevertheless interesting in that they carry the potential benefit of type
systems much further into the semantic realm than the more syntactic purposes of guaranteeing
representation compatibility and supporting modularity.
Definition of recursive data types has been long supported in programming languages. Pointer types,
popularized in ALGOL68 [VanWijngaarden 69], PL/I [Beech 70], and Pascal [Wirth 71] support user-
defined recursive structures using pointers. A pointer mechanism is also implicit in the CONSes of
LISP [McCarthy 62].
Several formalizations of algebrae for expressing recursive types are in the literature.
McCarthy [McCarthy 63] and Hoare [Hoare 71, Hoare 73] each gave general algebraic specifications of
generalized recursive type definition schema and used their schema to help define sets of Lisp primitive
functions. Lewis and Rosen [Lewis 73] also formulated a view of recursive data types, one which allowed
them to resolve, for example, whether two types expressed in different terms within a type system in fact
represent the same type.
The most significant advances in type inference and polymorphism have come from a lineage of work
which culminated with ML, but which continues to this day. With foundations in the lambda calculus, the
type system and type inference mechanism supporting the ML language are milestones in the history of
programming language development. Because of its success, the ML type system has been employed
with minor variations in a number of more recent languages. The next section surveys the ML lineage,
including its roots in the study of lambda calculus and the continuing development of its ideas in
succeeding efforts. It is followed by sections covering other significant work on polymorphic type
systems and type inference algorithms. For more general background, Hudak [Hudak 90] provides an
excellent review of the functional programming paradigm, its history, and its underpinnings in the lambda
calculus.
J. H. Morris laid important theoretical groundwork for dealing with parametric polymorphism in his
thesis [Morris 68] on the lambda calculus. Although he did not specify a polymorphic type system, he
recognized the type inference problem and showed how a valid type assignment could be found for a
lambda calculus term by solving a set of simultaneous linear equations. His axiomatic description of
formal rules allowed type inference to be studied separately from the underlying intuition that types are
sets of values. His algorithm was the basis for the work of Milner and established a relationship between
type inference systems and functional programming which has been developed in most succeeding
functional language systems. In later work [Morris 73], Morris proposed a type system including
modules, which were an improvement over Simula’s classes [Dahl 66]. Though he was working primarily
to develop and strengthen notions of type abstraction, he observed that his type system provided a limited
form of polymorphism.
Reynolds [Reynolds 74] formalized a notion of type structure similar to Morris’, introducing an extension
12 Type Checking for Common Lisp
Background and Related Work
of the typed lambda calculus which supported user defined types and polymorphic functions. The key
idea was to introduce types as values and to allow them to be passed as arguments to functions and be
bound to type variables. This provided an explicit form of polymorphism in terms of type parameters, and
became known as the polymorphic or second order typed lambda calculus.7 The use of type parameters
was incorporated into the programming languages Alphard [Wulf 78], Russell [Demers 79], and
CLU [Liskov 76, Liskov 78, Liskov 81], but without the full generality of Reynolds’ system, since types
in those languages are identified with sets of operations, and thus the polymorphism is limited to the
operations used.
Hindley [Hindley 69] developed a method for deriving the principal type scheme of an object in
combinatory logic. A type scheme discovered by an algorithm is principal if it is the most general type
scheme which can be inferred for the expression from the rules and axioms of the type system. Principal
type schemes support the notion of polymorphic type, though Hindley did not use that terminology.
Hindley also noticed that the unification algorithm developed by Robinson [Robinson 65] is applicable to
the type inference problem.
ML [Gordon 79, Milner 84] is the landmark programming language for utilizing type inference in the
presence of parametric polymorphism. ML types can contain type variables that are instantiated to
different types in different contexts. Hence, it is possible to write functions which can operate uniformly
over objects of different type, so long as there is sufficient commonality among the objects to support the
operation. Milner [Milner 78] described the theory of polymorphism and type inference around which
ML was built. He demonstrated a type inference algorithm which accepts any well-typed ML program
and determines its most general type signature. Furthermore, he used the formal semantics of the ML
language to show that any program which is accepted by the type checking algorithm is guaranteed to be
free of type errors. Consequently, the implementation need not tag program objects for the purpose of
dynamic type checking.8 Since all types can be deduced by the system, the user does not need to provide
type declarations or type signatures on program objects.
The Hindley-Milner type system is based on an extension of the pure typed lambda calculus. It may seem
strange to say that a language which requires no type signatures has the typed lambda calculus as its
abstract model. But in fact, the only ML expressions which pass the type checker are those for which
types can be supplied to obtain valid typed lambda calculus expressions. Thus, the type inference
mechanism allows type structure to be automatically recovered from a program in which it may be
missing. Good, pragmatic discussions on Hindley-Milner style type inference are provided by Damas and
Milner [Damas 82], Cardelli [Cardelli 84a], and Peter Hancock [Hancock 87], the latter in terms of
Miranda. A clear discussion of the Hindley-Milner system contrasted with Reynolds’ polymorphic
lambda-calculus is given by Giannini [Giannini 85].
Hindley’s and Milner’s use of an extension to the pure calculus is motivated by the premise that the pure
typed lambda calculus is not sufficiently expressive. Fortune et al. [Fortune 83] showed that every term in
the calculus can be shown to be strongly normalizable, meaning that there exists a normal form and an
algorithm for reducing expressions to their normal form. Consequently, the family of functions which can
be computed (the primitive recursive functions) is too restricted, especially compared to those definable in
7
This calculus was invented independently by Jean-Yves Girard [Girard 72].
8
Moreover, Appel [Appel 89] asserts that run-time tags are not necessary for statically-typed polymorphic languages at all. Aside
from type checking, he asserts that tags are needed only for garbage collection, and he proposes a garbage collection algorithm
which makes use of type information which the compiler can provide. Though no such garbage collector has been implemented, he
used some reasonable assumptions to estimate that the performance of such a garbage collector would be comparable to that of a
conventional system and better in some scenarios.
Type Checking for Common Lisp 13
Background and Related Work
the second order typed lambda calculus. The pure calculus lacks the power of lambda-definability, thus
self-applicative forms, such as Church’s Y-combinator, which was used to prove his fixpoint theorem,
cannot be type checked. The fact that the pure typed lambda calculus has no general fixpoint operator can
be treated by adding a domain of constants to the typed lambda calculus, by including in the domain a
constant fixpoint operator for every type, and by adding a conversion rule for each of the fixpoint
operators. Polymorphism can then be achieved in the calculus by adding a domain of type variables and
extending the domain of types accordingly. But if the calculus includes type variables, there can be
problems with type checking. Type checking in this calculus is a variant of the problem of partial
polymorphic type inference, which Boehm showed to be undecidable [Boehm 85]. By placing some
judicious restrictions on expressions, Milner and Hindley were able to support polymorphism while
achieving decidable type inference. By not including explicit type variables in the programming language
(only in the algebra of types), they sidestepped the problems later examined by Boehm. One restriction
they impose is that the use of polymorphism is limited to the scope in which it is defined. Another is that
self-application is not allowed. Though these seem to be fundamental limitations, the class of
polymorphic functions which are allowed is quite substantial and sufficient to make the language quite
useful.
Kanellakis and Mitchell [Kanellakis 89] have recently made the disquieting observations that:
• The length of the principal type expression for an ML expression of length n is doubly
exponential in n.
• The unification of two type expressions for the ML core language is PSPACE hard.
• It is PSPACE hard to determine whether a given core ML expression is typable.
These observations contrast with the common belief, based on practical experience with ML type systems,
that ML typing is efficient and does not slow down the process of compilation. We can only speculate
that typical experience with ML type systems is with types of relatively small size or types not containing
problematic constructs, so the combinatorial explosion is of limited scale.
A number of languages have been developed in the tradition of ML, among them HOPE [Burstall 80],
Miranda [Turner 85], and Haskell [Hudak 90]. While ML contained some non-applicative constructs, all
of these descendants are functional languages. A goal of HOPE was to consolidate a number of well-
understood features of functional languages. In style and theory, it owes considerable debt to Landin and
his ISWIM language [Landin 66]. HOPE requires type signature declarations on all function definitions,
but allows polymorphism and does complete compile-time type checking. Its most significant new
contribution was the capability for a user to declare his own concrete and abstract data types and to
destructure them by pattern matching. This feature, not included in the original ML, was later
incorporated into SML, or Standard ML [Milner 84]. Miranda is the most recent of a series of languages
developed by David Turner. Preceded by SASL [Turner 76] and KRC [Turner 81] and inheriting many of
their concepts, Miranda provides nice syntactic sugar for functional programming structures and
emphasizes the use of higher order functions and lazy evaluation. It is the first commercially available
functional programming system. Haskell is a general purpose functional programming language which
attempts to consolidate the functional programming research of the 1980’s. Additionally, it provides tools
such as a module facility, a well defined I/O system, and a rich set of primitives to support larger scale
programming projects. Haskell’s approach to data abstraction treats algebraic specification and
information hiding as orthogonal concepts.
ML-style type inference has also been adapted to existing programming languages. Mycroft and O’Keefe,
for instance, developed a polymorphic type system for Prolog [Mycroft 84], and Dietrich and Hagl
modified and extended that type system to handle explicit subtype relationships [Dietrich 88].
Meertens [Meertens 83] gives an interesting account of a type inference system employed to support a
14 Type Checking for Common Lisp
Background and Related Work
non-applicative language, B. The inference algorithm is essentially Milner’s, but the organization of the
algorithm is such that it supports type checking by increments corresponding to minor changes in the
program. Meertens’ paper provides an excellent detailed description of how Robinson’s unification
algorithm is employed9, how mutually recursive definitions are treated, and how non-termination is
detected. Meertens claims a stronger sense of completeness than that demonstrated by Damas and Milner,
in that the B algorithm assigns types when the ML algorithm cannot, but this, he admits, can be attributed
to the absence of infinite types and procedure parameters in B.
Although ML is not itself an applicative language, neither is it the first non-applicative language to be
supported by automatic type inference. Henderson [Henderson 77], presented a method for type checking
several imperative programming language constructs which bore a resemblence to the ML style of type
checking. He specified a language which was free of type signatures and which supported procedure
parameters as well as a small but interesting set of conventional program structuring operations. Then he
gave type equations for the supported operations which restrict their operands and which could then be
composed to determine the type consistency of a program in the language.
The classical unification algorithm, which is sufficient to support ML type inference, does not support
several data forms which are critical to the work on Common Lisp. Specifically, these are disjunction,
negation, and recursive structures. These limitations have also been encountered by the computational
linguistics community in pursuit of stronger grammar processors. Kay employed unification in his
Unification Grammar [Kay 79] for processing feature structures, a common linguistic representation.
Karttunen [Karttunen 84] observed that most grammar formalisms for features at that time had the same
inability to deal with negation, disjunction, and cyclic structures in unification and generalization. In
particular, he was concerned about negation and disjunction as being quite important to the task, and he
proposed a method for supporting them using negative and positive constraints (without mentioning a
solution for recursive forms). In doing so, he pointed out that performance problems persisted.
The complexity of unification with disjunction is indeed formidable. Kasper and Rounds [Kasper
86] [Rounds 86] presented a model for describing feature structures and used it to prove that the
unification of disjunctive feature structures was an NP-complete problem. Kasper [Kasper 87] went on to
provide an algorithm which performed reasonably well in the average case. He did this by canonicalizing
the structures prior to unification, then tailoring the algorithm to consider the most inherently expensive
cases only after other methods had been exhausted. Unfortunately, his canonicalization would not be
possible on recursive forms containing disjunction.
Aiken and Wimmers [Aiken 92] place the problem of merging forms containing variables, conjunction,
disjunction, and negation in a different setting, that of solving system of set constraints, and exhibit an
algorithm for doing so.
9
The unification algorithm cited is credited to Boyer and Moore [Boyer 72], who showed how unification could be performed
more efficiently by using a stack of binding environments rather than by using explicit term substitution as formulated by Robinson.
Type Checking for Common Lisp 15
Background and Related Work
When type coercions are added to the ML type system, the ML style type schemes are no longer adequate
to characterize all the possible typings of a given term. Mitchell [Mitchell 84] addresses this special
problem in a theoretical paper by adding to the system of type schemas a model of coercion based on set
containment. His model is strong enough to support type inference in an ML-style language with
automatic type coercion, but the algorithms for doing so are not given in the paper.
The approach to types in the object-oriented programming paradigm has evolved from notions of
inheritance and abstract data types. Type inference has not received a great deal of attention in the
object-oriented programming world, but in a recent paper, Wand [Wand 87] modeled hierarchical objects
as extensible records10 in an adaptation of an approach by Cardelli [Cardelli 84b, Cardelli 89]. Then, in
the tradition of Milner [Milner 78] and Cardelli [Cardelli 84a], he reduced the type inference problem for
his language to the unification problem.
Mishra and Reddy [Mishra 85] have a system whose capabilities are as similar to those of the work
described in this report as any we have found. Their type language is quite similar to ours, using a "fix"
form analogous to our *REC descriptor to represent recursive structures, and including type variables,
conjunction, and disjunction. Their system also discovers undeclared types, unlike ML-derivative systems
and existing Lisp type inference systems. Their type variables allow them to capture parametric
polymorphism in the signatures for functions. But they do not capture the notion of ad hoc
polymorphism, inherent in the nature of Lisp.
Fuh and Mishra [Fuh 88] present another type inference system which supports the notion of subtypes.
They prove a principal type schema property within their type system and demonstrate a complete
inference algorithm. One of the things that distinguishes their algorithm is that their MATCH algorithm,
which is the counterpart to the unification algorithm in ML-style systems, derives a unifying substitution
from a set of (subtyping) coercion relationships rather than from a set of equations expressing type
constraints.
Although overloading is not directly relevant to the Common Lisp work, ad hoc polymorphism and
overloading are related concepts and are sometimes considered to be equivalent. Methods for dealing
with overloading are an active area of research. There seems to be no single conventional way for treating
overloading in type checking systems. Some efforts for bringing a more uniform discipline to overloading
have focused on the commonality in the objects handled by the overloaded operator. Kaes [Kaes 88]
proposed that all overloaded symbols be declared in advance, and that all symbols of the same name have
a common polymorphic type. Wadler and Blott take a similar approach in the Haskell language [Wadler
89]. They propose the notion of type classes, which capture and characterize collections of overloaded
operators in a consistent way, and require that any given symbol may belong to only one class.
Others have dealt with overloading and/or type parameterization by placing various restrictions on the
language being supported. Cormack and Wright [Cormack 90] incorporate type inference, implicit
parameter binding, and overload resolution to do type-dependent parameter inference. Although they
handle higher-order polymorphic functions, they require all function abstractions to be explicitly typed,
and type application is restricted to simple types. Boehm [Boehm 85] showed Reynolds’ formalism of
polymorphic functions to be problematic when the supplying of type parameters was optional, proving
10
For a discussion of record type extensions, see Wirth [Wirth 88].
16 Type Checking for Common Lisp
Background and Related Work
that the type inference problem is undecidable for a simple extension of the Girard-Reynolds
polymorphically typed lambda calculus. In Boehm’s extension, type bindings not specified are to be
inferred by the type system. His characterization disallows the inference of type abstractions and requires
that the positions of omitted type parameters be explicitly marked, and his proof of undecidability relies
on these restrictions. In later work [Boehm 89], Boehm provided a set of rules which require that type
parameters are inferred only when functions are applied (and not when functions are passed unapplied to
higher order functions) and that functions are uncurried so that type parameters are applied in conjunction
with other parameters which depend on them. O’Toole [O’Toole 89] outlines a language supporting
explicit conversions between the explicit type abstractions of second-order polymorphic lambda calculus
and ML-style polytypes. Where an explicit conversion is performed, type inference is done; elsewhere,
type applications are required to be explicit. Ada [DoD 83] supports implicit operation parameters to
generic functions and packages, but only after type parameters have been explicitly instantiated.
After McCarthy’s introduction of Lisp, which will be discussed later, the next major development in the
functional programming paradigm was Peter Landin’s work in the mid-1960’s. He discussed how to
mechanically evaluate lambda calculus expressions through an abstract machine called the SEDC
machine [Landin 64] and formally defined a subset of ALGOL 60 in terms of the lambda calculus [Landin
65]. His ISWIM language [Landin 66] embodied a number of syntactic ideas (infix notation, LET and
WHERE constructs, simultaneous and mutually recursive definitions, and the offside rule) and semantic
principles, including referential transparency, which are significant in functional languages to this day.
One of the first block-structured programming languages with a significantly extended notion of type was
EL1 [Wegbreit 74]. Types (or modes in EL1 terminology) were first class objects in the data space. The
language supported mode-generating routines and generic routines. One of the kinds of mode was a union
of modes. Functions whose parameters were of union mode were compiled so that each invocation
"froze" the mode of its parameters for the duration of its activation. Another unusual facility was the
ability for users to specify conversions among all modes. The type system of EL1 was quite adventurous
for its day, so much so that it was viewed to be overly general and to provide the user with a bewildering
excess of capability. Moreover, the flexibility of the type system has been alleged to defeat the goals of
strong static type checking [Giannini 85].
John Backus’ landmark 1978 Turing Award Lecture [Backus 78] provided much of the impetus for the
development of functional programming. Backus’ language, FP, was built upon a fixed set of combining
forms called functional forms rather than upon the lambda calculus, which Backus claimed provided too
much freedom for programmers’ good. ⊥ (bottom) is a valid FP object, so all FP functions can be said to
be total, and FP did not originally support any type checking. But programmers usually wish to avoid ⊥,
and as a result commonly had to code explicit checks to validate the form of arguments. To alleviate the
need for this tedium, Guttag, Horning, and Williams added a type system to FP [Guttag 81]. For each FP
function, the type system ascribes a domain predicate and a target, or result, type. Target types are treated
as "type transformers" in order to accommodate combination of the types of polymorphic functions. A
function is said to be fully typed with respect to the type system if the domain predicate and the target type
depend only on the type of the argument rather than on particular values within the type. Thus, while the
plus function is fully typed, division is not, since the domain predicate depends not only on the type of the
argument, but also on whether the second component of the argument is zero. A program is type safe if
the domain predicate is T and type erroneous if the domain predicate is F. Division is neither type safe nor
type erroneous; its type is data dependent and is safe for a set of data objects. The type system also
supports user definition of abstract types. The definition includes a name for the type, a representation of
a class of objects, an invariant over objects in the representation which captures the notion of a valid
Type Checking for Common Lisp 17
Background and Related Work
value, and a list of functions which implement the operations available on objects of the type. The
representation is visible only to the implementation functions. Recognizer functions are defined
implicitly, as well as functions for mapping abstract values to values in the representation and vice versa.
Geoffrey Frank proposed an automatic type checking system for FP [Frank 81] which was to be built
around a kernel algebra specified in his paper. The type checker would essentially do type inference by
composing the domain and range specifications of functions called in the definition of a functional form.
Takuya Katayama [Katayama 84] proposed an approach to type inference for FP programs based on the
notion that there exists a type domain which is a projection of the data domain in which we normally think
of programs computing. Any computation in the data domain has an image, or reduced computation in
the type domain. Katayama produces relations in the type domain as the images of functions and
procedures in the data domain and then performs type inference with an algebra of relations. His
approach was exploratory when published, particularly with respect to recursive functions. It appears to
amount simply to a rephrasing of the same computations which occur in Milner-style type inference,
though in very different terms.
Fairbairn’s Ponder system [Fairbairn 82, Fairbairn 86] provides full type checking for a richer type system
than that offered by Hindley-Milner. In particular, it supports polymorphic functional arguments11 and a
notion of domain strong enough to recognize within the type system, for example, that reciprocal works
only on a non-zero argument. Although no primitive types are defined, type generators may be defined
for the purpose of declaring types. Generators may be parameterized, providing a powerful tool for
developing polymorphic types. A special generator rectype is employed for generating recursive types.
Restrictions built into rectype ensure that all types declared with it are finite cycles rather than infinite
trees.
Leivant applied the notions of type structure he formulated in [Leivant 83a] to the type inference problem,
developing a general algebraic formulation of Milner’s inference algorithm [Leivant 83b]. In this paper,
he also formulated a different type inference algorithm which accommodated coercion and overloading
and yet another which handled abstraction and general type quantification. The author knows of no
instance where Leivant’s work has been incorporated into any programming language or environment.
Abadi, Cardelli, Pierce and Plotkin [Abadi 89] developed a system which supports a type called dynamic,
whose purpose is to deal with persistent objects existing outside the program being type checked. Their
goal was to support maximal static type checking of programs, but to have a means within the same
discipline for doing dynamic type checking of objects which are out of reach of the static type checker.
Any such object is stored with a tag which is some representation of the type of the object. Objects thus
tagged have the type "dynamic". After being read, they may be examined by a running program with a
"typecase" form which case splits on the indicated type and controls flow accordingly. Their paper also
presents some preliminary thoughts on dealing with polymorphism within the dynamic construct.
Another style of polymorphism is Coppo’s conjunctive discipline [Coppo 80a, Coppo 80b]. It is powerful
enough to allow the characterization of various classes of lambda-calculus terms, but is impractical for
programming languages because type checking is unsolvable.
11
For example, a Ponder function could have a formal parameter whose type is (∀ T. T -> T), so that the actual parameter supplied
could be a function whose type is T1 -> T1.
18 Type Checking for Common Lisp
Background and Related Work
Lisp has remained by far the most popular language with a largely applicative basis, particularly in the
western hemisphere. Emerging from its roots as an exploratory language for non-numeric
computation [McCarthy 62] used primarily in artificial intelligence and theorem proving, it has found
significant application in such disparate areas as systems programming [Symbolics 88], numerical
applications [Mathlab 82, Mathlab 83], text editing [Stallman 77], and image processing, and has recently
been enlarged to support object-oriented programming [Cointe 88, Steele 90]. Plagued for years by a lack
of standardization which discouraged software portability, the Lisp community has consolidated itself in
support of Common Lisp [Steele 84, Steele 90]. The success of this effort and the continued development
and support of Lisp-based products and research seems to guarantee the language’s continued popularity,
despite claims that newer languages rest on more sophisticated and attractive foundations and impose
greater discipline.
Cartwright’s thesis [Cartwright 76] was an early attempt to implement a strong static type checker for
Lisp. Not surprisingly, Cartwright restricted his attention to a small subset of Lisp with functional
semantics. Moreover, some primitives within his subset had more restrictive definitions than their Lisp
counterparts. His base subset contained no primitive constructors (not even CONS), but his type system
provided a disciplined facility for declaring constructions, thereby allowing users to build up various
abstractions. His type system was essentially the one proposed by McCarthy [McCarthy 63]. Cartwright
required all identifiers and functions to have user-furnished type signatures, much like any ALGOL-like
language. In its original formulation, his type system would have been statically strong, but he found this
too restrictive and relaxed his static checks, thus requiring run-time checks when static checks are not
satisfied. He attempted no type inference and incorporated no notion of polymorphism.
Boyer and Moore have developed automated support tools for formal reasoning about Lisp-like logics.
Their Computational Logic [Boyer 75, Boyer & Moore 79, Boyer & Moore 88] bears a strong
resemblance to an applicative Common Lisp. In its original form, it allowed definition of only total
functions, but with the addition of V&C$, the logic allows the definition of non-terminating functions.
Within the logic, there is a mechanism, the shell principle, for declaring new classes of recursively
structured objects. Primitive types such as natural numbers, lists, and litatoms (words) are defined as
shells in the base logic. The theorem prover which manipulates the logic, Nqthm, does some limited type
inference, but only in terms of known shell "types". The logic allows for arbitrary CONS structures, but
does not provide specific type support for them.
Boyer and Moore are carrying many of their methods into a new logic and theorem prover, Acl2 [Boyer
90], which is under development. In many ways Acl2 resembles Nqthm, but the logic (and the
implementation language) is an applicative subset of Common Lisp. Guards on function parameters,
reminiscent of those on Verdi functions in the EVES environment [Pase 89] address the partial nature of
many Lisp functions. If a program passes the Acl2 admissibility tests and if all its guard predicates are
verified on all function calls, the program may be safely compiled by a standard Lisp compiler with a low
safety setting, thus producing fast code without fear of breakage due to type errors. But type reasoning in
Acl2 is only slightly more powerful than in Nqthm, and the need to verify that all function calls satisfy the
guards of the called functions puts much greater pressure on the type system.
McPhee [McPhee 89] did a preliminary investigation of type inference for Lisp, which led to the
implementation of a simple tool in Miranda. But the small set of Lisp functions which were supported,
including CONS and IF, were assumed to obey the same restrictions as their Miranda counterparts, and
thus his work did not break any new ground.
Type Checking for Common Lisp 19
Background and Related Work
Kaplan and Ullman [Kaplan 80] used a lattice-theoretic model of types and developed a flowgraph model
of programs to support type inference for dynamically type-checked languages. A flowgraph is a directed
graph whose nodes are associated with one or more assignment statements. The types of all expressions
in a program are analyzed by repeated application of "forward" and "backward" analysis with respect to
pre-defined lattices of types until a fixed point is reached for all type assignments. Forward analysis uses
the types of subexpressions to help determine the type of an expression. Backward analysis uses context
information to help determine the type of an expression, i.e., knowledge of the type of an expression is
used to refine the types of its subexpressions.
This approach is largely the basis of the TICL type inference system for Common Lisp [Ma 90]. The
system uses repeated application of forward analysis with respect to a collection of partial type lattices for
Common Lisp types to derive type annotations for use by the compiler. Since the type lattices must be
fixed, user types such as those declared with DEFSTRUCT may not be included. The system, however,
does include an ad hoc type LIST-OF to provide some support for a particular kind of CONS structure.
Type inconsistency is resolved by assignment of a most general type, enabling the inference process to
continue. Since the goal of the system is to improve run-time performance, the type inference system is
tailored to process only types for which declarations offer significant speed-up for most compilers. Thus,
list-processing operations and general CONS structures are largely ignored in favor of accurate
assignments of various types, like the numerics, likely to be specifically supported at the machine level.
Kaplan and Ullman’s work is also the basis of the Nimble type inference system for Common Lisp [Baker
90]. As with TICL, the type lattices it employs are primarily focused on numeric types, and the system
does not appear to deal with structured types in any direct sense. Both the Nimble and the TICL systems
achieve their pragmatic goals of improving run-time performance of compiled Lisp code.
Baker has also developed a decision procedure for the Common Lisp SUBTYPEP predicate [Baker 92],
utilizing a specific lattice structure suitable for performing type inference. A complete implementation of
SUBTYPEP can assist a compiler in making optimal storage allocation decisions and in removing some
type-checking code.
A problem limits the effectiveness of the lattice-theoretic models when confronting ad hoc polymorphic
functions. When a form can return disparate types, the type inferred for the form is the least upper bound
of all the possibilities. Commonly, this forces the assignment to the universal type, which is of no use
except to enable the inference process to continue. Disjunctive types would provide greater specificity,
but these are not compatible with the lattice approach in systems styled after Kaplan and Ullman.
Johnson [Johnson ] uses a technique he calls "type flow analysis" to represent the type-level structure of
Lisp programs. Though his report does not present much technical detail, he does claim to deal with the
problems of ad hoc polymorphism and heterogeneous structures.
The fundamental relationship between static type checking and the generation of fast executable code for
programs is simple: checks that can be performed statically need not be performed at runtime. A program
running without dynamic type checking code will do its job "faster" than an otherwise identical program
executing those checks. This relationship is a significant motivation for most of the static checks which
compilers perform.
Efforts employing mechanically assisted deduction systems and verification to enable compilers to
eliminate dynamic type checking code date back to the Euclid project [Lampson 76]. As designed,
20 Type Checking for Common Lisp
Background and Related Work
Euclid’s approach is that a legal program is not allowed to have run-time exceptions. The compiler has
the responsibility of proving that exceptions cannot occur. If it fails, it is obliged to create a legality
assertion, which can be submitted to proof by other static means or be checked at run-time. If an
unvalidated assertion is false at run-time, the program is by definition not a legal program.12
Implied, but not directly expressed in Cartwright’s thesis on Typed Lisp [Cartwright 76] is the notion that
where his static type system is able to determine type correctness, the compiler does not need to generate
type checking code. Cartwright alluded to this notion, though, by noting that when he relaxed the rules in
his static type checker, he created the requirement that "in an actual TYPED LISP implementation,
run-time type checking should be done in all those cases where standard parse-time rules are violated. If
the user formally proves that a certain run-time error can never occur, then that particular check can be
safely eliminated."
In his Phd thesis, McHugh [McHugh 83] explored the application of formal reasoning to compiled code
optimization in Gypsy, particularly with respect to the runtime detection of type errors. [Good 86a]. He
modified the verification condition generation process in the Gypsy Verification Environment [Good
86b, Young 90] to generate theorems which, if proved, guaranteed the absense of certain run-time type
errors. The fact that these theorems were proved was then communicated to the compiler [SmithL 80]
which then suppressed the generation of run-time error checking code. Unfortunately, the Gypsy
Verification Environment no longer supports a compiler, although the generation of verification
conditions for proving the absence of type errors has been re-implemented and preserved [Akers 86].
Boyer and Moore’s work with Acl2, previously mentioned, falls squarely into this realm as well. They
use the mechanical assistance of their theorem prover to validate conformance to domain restrictions
expressed as guards. Accepted functions may then be compiled at a low safety setting, i.e., without
runtime type checks. Unfortunately, the assistance provided in their early developmental system was not
sufficient to provide this validation automatically in many routine situations.
A different approach to efficient run-time type checking is used in the Symbolics Lisp machines. [Moon
85] The system employs a hardware co-processor to perform run-time type checking at the same time it is
doing data operations, so the main processor incurs minimal extra run-time overhead.
12
This notion of illegal programs is somewhat distasteful, as the compiler abdicates on assigning a meaning to a program which it
compiles without complaint. A similar notion pervades the definitions of Ada [DoD 83] and Common Lisp [Steele 90].
Type Checking for Common Lisp 21
Overview
Chapter 3
OVERVIEW
The purpose of this chapter is to provide an overview of the entire type inference system. We will first
provide some orientation to function signatures and the type inference process. Then, in a bit more detail,
we will examine each of the principal constituents of the system and see how they fit together. This
should provide the reader enough background and perspective to approach the chapters that follow, which
cover the formal theory and each of the system’s components in great detail.
The subset of Lisp which is supported by the inference system is quite simple. It is limited to functions of
fixed arity, where the only forms allowed in the body are:
• references to formal parameters,
• literals of type character, integer, rational, T, NIL, or string,
• quoted forms, where the quoted object is either a literal, a symbol, or a CONS of any such
objects (recursively),
• IF forms of arity three, and
• recursive function calls or calls to previously accepted functions.
All functions are assumed to terminate for all inputs, and all data objects are of finite size, i.e., there can
be no circularly linked lists. Mutually recursive functions are not allowed. Function names are not
allowed to begin with the characters "$", "*", "%", or "!".
In the initial system state, only a small set of native functions are defined. These functions are CONS,
CAR, CDR, BINARY-+, UNARY--, BINARY-*, UNARY-/, <, EQUAL, CONSP, INTEGERP,
RATIONALP, STRINGP, CHARACTERP, SYMBOLP, NULL, DENOMINATOR, NUMERATOR,
SYMBOL-NAME, and SYMBOL-PACKAGE-NAME. Most of these are native Lisp functions.
BINARY-+, UNARY--, BINARY-*, and UNARY-/ are just the binary and unary versions of the n-ary
Lisp functions +, -, *, and /. We call our native functions subrs.
All functions are purely applicative. There is no global data. Since there are no "destructive" functions,
i.e., functions like RPLACA which destructively alter data, no data structure can be created where a
pointer points to a CONS cell higher in the structure. Thus, no CONS structure contains an infinite chain
of pointers.
Type restrictions are present in the domain of many Lisp functions. CAR, for instance, requires its
argument to be a CONS or NIL, and application of CAR to any non-NIL atom will cause a runtime error.
22 Type Checking for Common Lisp
Overview
Type restrictions, if analyzed statically, tend to percolate through an application. To characterize them
and to allow for restricting the domain of a function, we augment the Common Lisp subset by allowing a
guard to be included as part of a function definition.
Definition: A guard is a component of a function definition
which is a Lisp predicate on the function arguments. The guard
must be satisfied, i.e., it must evaluate to a non-NIL value,
in order for the function to be well-defined on those arguments.
Syntactically, it follows the formal argument list, and is of
the form of a Lisp compiler directive:
Guards can characterize type restrictions on the parameters which will guarantee the absence of type
errors in the function body. Properly employed, they can be of great utility in conjunction with the
inference system. When the type inference algorithm analyzes the body of a function, it assumes the
guard is satisfied and makes judgements about the types of the arguments accordingly.
The primary task of the inference system is to infer a type signature for a function from the function text
and from a database of signatures for previously defined functions. Previously defined function signatures
allow the type checking process to scale up as new functions are introduced. The signatures are in terms
of type descriptors, and manipulation of type descriptors is the primary occupation of the inference
system. The grammar defining the language of type descriptors used in our signatures is as follows:
<descriptor> ::= <simple descriptor> | <variable> | *EMPTY |
*UNIVERSAL | (*CONS <descriptor> <descriptor> ) |
(*OR <descriptor>* ) | <rec descriptor>
13
This syntax is identical to that used in Boyer and Moore’s Acl2 system [Boyer 90].
Type Checking for Common Lisp 23
Overview
As stated in the introduction, a signature has two primary components,14a guard descriptor, which is a
vector of variable-free descriptors characterizing the guard, and a collection of segments mapping
parameter types to result types. The guard descriptor has the same length as the arity of the function, and
each of its descriptors characterize the type requirements placed on the corresponding argument. Each
segment in the signature is of the form
(td1 .. tdn) -> td
where n is the arity of the function. We interpret a segment to mean that if the descriptors (td1 .. tdn)
characterize the function’s arguments, then the result may be (but is not necessarily) characterized by the
descriptor td. The meaning of the signature as a whole is that if the function’s guard, evaluated on some
arguments, yields a non-NIL value, then there is some segment in the signature which characterizes both
the function’s arguments and the value produced by evaluating the function on those arguments.
An important notion pertaining to a signature is that of completeness of its guard descriptors. A guard
descriptor exists that is complete with respect to a guard if the guard is expressed as a conjunction of
recognizer calls on distinct formal parameters. When a guard descriptor is complete, we will show it is
satisfied by the actual parameters if and only if the guard expression evaluates to a non-NIL value on the
parameters. When a function’s guard is complete, and the guards of all the functions in its hereditary call
tree are also complete, then we will show that if the actual parameters satisfy the guard descriptors, they
will also satisfy the real guard (i.e., cause it to evaluate to a non-NIL value), and the evaluation of the
function call will not cause a guard violation.
CAR takes a single argument, so its guard is a singleton vector. The guard descriptor requires the
argument to be either the value NIL or a CONS of any two objects. (*UNIVERSAL is a descriptor
satisfied by any value whatsoever.) There are two segments. The first says that, given an argument which
is a CONS, the function may return the same value which was the first element of the CONS. The second
says that, given NIL, the function may return NIL. When we say the guard is complete, we mean the
guard descriptors completely capture the actual guard predicate, which in this case is (where X is the
formal parameter name):
(CAR-ABLE X)
14
It has other components which we will discuss later.
24 Type Checking for Common Lisp
Overview
An example of a guard which is not complete is (EQUAL X 3), since the corresponding descriptor vector
would be ($INTEGER), but some values satisfying this vector (4, for instance) would not satisfy the
guard. When we say "All called functions complete", we mean that the guards of every function in the
call tree of the definition are complete.15 When we say the recognizer descriptor is NIL, we mean this
function is not a recognizer. If it were, this entry would contain the descriptor which characterizes the
objects for which the function returns the value T. The other elements of this signature we can disregard
for now. We will explore all these notions much more fully in later chapters.
The initial state of the system includes the signatures for all the native functions. The signatures for these
functions are given in Appendix A.
In the process of inferring a signature for a new function, the inference tool attempts to validate that the
guards of all called functions are satisfied, at least within the scope of the type system. This is both a
means and an end for the system. It is a means because the tool makes a conservative judgement, when it
cannot demonstrate that the guard descriptors are satisfied, that the real guard expression will not be
satisfied either. If the real guard is not satisfied, there is no guarantee that the use of the signature is
sound. Thus, when the tool fails to show the guard descriptors are satisfied, it declares failure and aborts
the analysis. But guard verification is also a goal for the system, because in the cases where the guard
descriptors are known to be complete for a function, then the tool can guarantee that if it does not detect a
guard violation on a function call, the real guard will be satisfied by any possible parameters.
Furthermore, if the guard descriptors are also complete for all the functions in the call tree of the called
function, then the tool’s failure to detect a guard violation signifies that the guards for all functions called
in the course of evaluating the original function call will also be satisfied.
A recognizer function is a Boolean function of one argument and of a certain form which determines
whether its argument conforms to a variable-free type descriptor. Conversely, for any variable-free type
descriptor, one could define a corresponding recognizer function. A recognizer must have no guard (or a
vacuous guard, T), so it is defined for all actual parameter values.
One reason recognizers are important to the type system is that if a call to a recognizer is used as a test in
an IF expression, optimal modifications to the type context of the IF may be made to support the analysis
of the THEN and the ELSE arms. Another reason is that, if a function guard is expressed as a conjunction
of recognizer calls on distinct parameters, we know that the guard descriptors are complete. The entire
inference system is monotonically conservative, so the descriptors computed to characterize the value of
an expression may be too inclusive, but they can never be too small. In the case of a complete guard (or
any complete type predicate), the fit is perfect. The significance of a complete guard is that a list of actual
parameter values satisfies a complete guard expression if and only if they satisfy the guard descriptors
computed by the inference system. If the system determines that the (possibly too inclusive) descriptors
characterizing the actual parameters represent a subset of the values represented by the guard descriptors,
then the real function guard will evaluate to a non-NIL value.
Given a new function, if the inference algorithm determines the function is a recognizer, it will employ
special techniques to infer its signature, techniques which will lead to the perfect signature we seek.
15
There are two pairs of completeness tags, the latter preceded with the notation "TC". We shall see later that the first set of tags
signifies a mere conjecture, but the TC tags, more stringently defined, carry a guarantee with respect to the evaluation of guard
expressions.
Type Checking for Common Lisp 25
Overview
The inference system operates in two phases when a new function is submitted. First, the inference
algorithm makes a good heuristic guess at the signature. Then, the checker algorithm, which is formally
specified and proven sound, validates the signature.
The system makes certain assumptions about functions being submitted to it, mainly that they lie within
the language subset supported by the system, that the only non-recursive function calls within the function
definition are to functions already in the system database, and that, given any arguments satisfying the
function guard, the evaluation of the function body will terminate.
There are several significant algorithms within the main inference algorithm. TYPE-PREDICATE-P
takes a form, determines if it is a type predicate, and if so, returns (among other things) a vector of
descriptors characterizing what can be deduced about the types of the variables in the environment when
the predicate evaluates to a non-NIL value, and whether any information is available when the predicate
evaluates to NIL. DESCRIPTOR-FROM-FNDEF takes a function, determines if it is a recognizer
function, and if so, returns the descriptor which characterizes the objects for which the function returns
T. The composition of DERIVE-EQUATIONS and SOLVE-EQUATIONS, given a form, returns a guard
and the segments which become the main elements of its signature.
When a new function is submitted to the inference algorithm, the following events transpire. If there is no
guard, or if the guard is T, DESCRIPTOR-FROM-FNDEF examines the function to determine if it is a
recognizer. If so, it generates what we hope is a perfect signature for the function, meaning that there are
two segments, one whose result descriptor is $T and the other $NIL, where the respective argument
descriptors characterize no values in common, but together characterize the entire universe of values.
If there is a guard, it is first checked to ensure that it does not call the function recursively. Then, the
inference algorithm invokes DERIVE-EQUATIONS and SOLVE-EQUATIONS on the guard form to
ensure there are no detectable guard violations. If there are, the entire analysis fails, and the function is
rejected. If none are found, then TYPE-PREDICATE-P is invoked to construct the guard vector for the
function.
Then, using this guard vector as an assumption on the types of the function arguments (or using a guard
vector composed of *UNIVERSAL descriptors if there was no guard form), we invoke DERIVE-
EQUATIONS and SOLVE-EQUATIONS on the function body to generate the function signature and,
while in the course of doing so, attempt to validate all the guards of functions called within the body. If a
guard violation is detected, the function is rejected.
If the function is not rejected, SOLVE-EQUATIONS returns a signature for it. But the algorithms which
generated the signature are not trusted algorithms. They have not been formally modelled at all, much
less shown to be consistent with the semantics of the type system. So we can think of them as heuristic
algorithms which have suggested a signature for the function which must now be validated in order to be
trusted. This validation is performed by the signature checker, which has been rigorously formalized and
proven to be sound with respect to the formal semantics of the type system.
The inference algorithm just outlined is discussed in much greater detail in Chapter 4.
26 Type Checking for Common Lisp
Overview
We have spoken frequently of values "satisfying" descriptors. This is the fundamental notion in the
formal semantics. Roughly speaking, a value satisfies a descriptor if it is a member of the set of values
represented by the descriptor. But with type variables in the descriptor language, a descriptor in isolation
cannot necessarily prescribe a set.
The notion of type variables employed in the formal semantics is that a variable represents a single
arbitrary Lisp value. While this is not the conventional notion of type variable appearing in most type
systems, it is perfectly suited to deriving maximal information from the signatures of certain very
important functions, like CAR, CDR, and CONS, whose argument values (or specific components
thereof) appear intact in the result. Thus, the following segment, from the signature of CAR,
(((*CONS &1 *UNIVERSAL)) -> &1)
signifies that the very value which is the first element of a CONS actual parameter appears as the result
value.
When we speak of a value satisfying a descriptor containing type variables, we mean the value is a
member of the set prescribed by the descriptor under a binding of each of its type variables to particular
Lisp values. The set prescribed by the descriptor (*CONS &1 $INTEGER) under the binding ((&1. 10))
is the set of all CONS values whose CAR is the value 10 and whose CDR is any integer.
Variables are only interesting when the same variable appears more than once in a descriptor or descriptor
list, as in the descriptor list formed by the above-mentioned segment for CAR. In order for a list of values
to satisfy such a list of descriptors, the very same value must appear in every position occupied by a given
type variable. So the value list (("foo" . 0) "foo") satisfies the descriptor list
((*CONS &1 *UNIVERSAL) &1)
A consequence of this interpretation of variables is that they are of very limited use inside *REC
descriptors. The situation suggested by a descriptor like
(*REC FOO (*OR $NIL (*CONS &1 (*RECUR FOO))))
is that we have a list, all of whose values are identical. We believe such structures occur infrequently, and
the formal semantics does not support placing type variables in positions where the variable would appear
replicated within a structure if the descriptor were opened up several times. A different kind of variable
would be appropriate here, one which could be instantiated with an arbitrary descriptor, but work on the
system has not yet progressed to the point of implementing such a thing. This could, in fact, be a very
useful extension. On the other hand, under our semantic model, there is no reason not to allow type
variables to occur in a non-replicating position of a *REC descriptor, for example
(*REC BAR (*OR &2 (*CONS $INTEGER (*RECUR BAR))))
The formal semantics are captured by a recursive function INTERP-SIMPLE, which takes as parameters a
list of descriptors, a list of values of the same length as the descriptor list, and a binding that maps all the
type variables in the descriptor to Lisp values. It returns T or NIL, depending on whether the values
satisfy the descriptors under the binding. This function is presented and discussed in detail in Chapter 5,
along with a discussion motivating this approach.
Type Checking for Common Lisp 27
Overview
The semantics of a function signature is captured in the following definitions, which formulate the key
soundness criterion for every signature in the system database. This is what the signature checker
attempts to verify for every signature emitted by the inference algorithm. Here, "I" is an abbreviation for
INTERP-SIMPLE. "E" is a Lisp evaluator which is defined in Section 5.4.
Definition: A world is a collection of Lisp function definitions
including all the functions called within any of the definitions.
Definition: fooworld,clock
WELL-FORMED-SIGNATURE-1 just checks that arities are consistent, checks the well-formedness of
all the descriptors, and checks that the guard descriptors are variable-free. It is defined in Appendix G.2.
BREAK-OUT-OF-TIMEP and BREAK-GUARD-VIOLATIONP are defined along with E in Section 5.4.
Paraphrased, VALID-FS says that a signature in fs is valid when, if the guards are complete for all the
functions in the call tree of foo, if the guard evaluates to a non-NIL value when applied to some actual
parameter values, and if clock is large enough to allow complete evaluation of the function call, then
evaluating the function on those parameters will not result in a guard violation, and under some binding of
type variables, the parameter values and the function result satisfy some segment in the signature. The
predicate VALID-FS, which is a precondition of many of the most significant lemmas we will consider, is
the iteration of this validity determination over all the functions in the database.
This seems a good place for a remark about the convention we use for the character case of identifiers in
this document. In general, one may view the entire document as being case insensitive (with the obvious
exception of characters within Lisp strings). For reasons having solely to do with the aesthestics of the
type faces used here, we often use lower-case in text set apart as examples, as in the definitions just above.
Upper case, however, is sometimes used in examples when depicting output from the system. Within
normal prose paragraphs, we will use upper case for functions defined in the system and for forms
generated by it, to set them apart from normal text.
Another convention used throughout the document is the omission of the Lisp quote character "’" from
forms which are obviously quoted forms. In the few instances where this could lead to any
misunderstanding, we take care to explain exactly what forms are being handled.
The job of the signature checker is to validate each signature emitted by the inference algorithm. For a
recursive function, one may think of each iteration of the inference algorithm as producing a signature
which is a better approximation of the function graph than that produced by the previous iteration. In this
sense, the main inference algorithm produces a signature by iterating n times its process of passing over
the function body and producing segments, and the checker produces the next approximation by making
the n+1st traversal over the body. If the validation succeeds, the checker’s signature is a better
approximation of the function graph, but we prefer to store and use the inference algorithm signature
because it is almost always in a much more compact and efficient form, and any improvement in accuracy
embodied in the checker signature is usually marginal.
Since the checker is to be validated with a proof of soundness, it is important that it be simple enough for
the proof to be tractable, and in fact it is much simpler in concept than the inference algorithm. This
simplicity is possible largely because the checker assumes that the signature it is given is correct, in that it
is sound with respect to the semantics of the type system. Descriptor negation, represented explicitly with
Type Checking for Common Lisp 29
Overview
a *NOT form in the descriptor language supported in the inference algorithm, has been completely
factored out in the final signature, so the checker need not deal with *NOT forms at all. Closed *REC
forms have already been derived and constructed from analysis of the recursion in the function text,
relieving a substantial burden on the checker. Moreover, the checker makes only minimal efforts to
produce concise results. This allows a combinatoric explosion in the number of segments produced and
impairs the runtime performance of the checker. But the tricky simplifications necessary to maintain a
minimal representation would have required tricky formal justifications and would have made the checker
signature less amenable to formal validation. The benefit of a clean formal algorithm clearly outweighs
the desire for performance. Though the checker is by no means a simple algorithm, the semantics of its
operations are clean enough to be tractable in proof. Furthermore, the performance penalty occurs only
once for any given function, because the checker is not iterative and because the segments it generates are
discarded after the validation process is complete.
It may seem paradoxical that the checker can derive a trusted signature, when one of its chief assumptions
is the validity of the signature produced by the inference system, which we do not trust in any formal
sense. But a formal validation of this approach is presented in Section 7.2. The key to this approach is a
notion of descriptor containment. Descriptor containment is basically a subset notion, augmented to
accommodate the presence of type variables. If the checker accepts a heuristic guess at a signature and
then uses a trusted algorithm to produce a new signature, and if it can use a trusted algorithm to
demonstrate that each segment of the new signature is contained in some segment of the original, then we
can show that the original, heuristically derived signature is valid.
In the course of its analysis, the checker attempts to perform guard verification on function calls.
Definition: Guard verification on a function call is the
determination that the function’s guard will be satisfied for
any possible actual parameter values.
The checker’s strategy is to determine whether the actual parameter descriptors are contained in the guard
descriptors for the called function. If it cannot do so, it signals an error and aborts the computation, since
to continue would be to proceed on an unvalidated assumption. Containment by itself does not guarantee
guard satisfaction in the general case, since the guard descriptor may be more inclusive than the guard
itself. But a finding of containment is sufficient to proceed with the analysis. But if a given guard
descriptor is complete, i.e., if the guard is a composition of recognizer function calls on distinct
parameters, the checker’s guard validation does guarantee that the real guard will be satisfied, and then we
say the guard is verified. Furthermore, if all the guard descriptors in the hereditary call tree of the called
function are also complete, then the checker’s verification of the called function’s guard is sufficient to
demonstrate that the guards will be satisfied for all the functions called in the course of evaluation of the
call being checked.
Though the inference system will attempt to generate a signature even when this completeness property
does not hold, no claims are made that the resulting signature is valid. This is because the use of the
signatures of subsidiary functions is sound only when the satisfaction of their guards is known, and in the
absence of complete guards, the inference system by itself cannot make this determination.
The inference system, including both the type inference algorithm and the checker, is implemented in
Common Lisp and runs on the Symbolics [Symbolics 88] and on Sun workstations under akcl [Yuasa 85].
Aside from a very top-level function, which stores accumulated results in the database and accepts new
30 Type Checking for Common Lisp
Overview
There are two top level goals which must be proven to demonstrate the soundness of the inference system.
TC-SIGNATURE is the top level function of the checker, and the first and most significant is Lemma
TC-SIGNATURE-OK, the conjecture that a signature validated by the checker is indeed correct with
respect to the formal semantics of the type system.
Lemma TC-SIGNATURE-OK
Paraphrased, it states that if the database of function signatures is sound, if the computation of a signature
did not abort with a *GUARD-VIOLATION result, and if the guards for all the functions in the call tree
of foo are complete, then the checker adds a valid signature for foo to the database, in the sense of validity
previously explained. This lemma is the top level goal of the large proof which is described in most of
Chapter 7 and Appendix B.
The other principal lemma justifies our claim regarding guard verification. If a guard descriptor is
complete, i.e., if the guard is a conjunction of recognizer calls on distinct formal parameters, and if the
actual parameters of a function call satisfy descriptors which are contained in the guard descriptors, then
we know the guard expression evaluated on those parameters will yield a value of T.
Lemma GUARD-COMPLETE
16
Actually, this is a not perfectly accurate. There are three special variables which are used as counters for generating unique
gensyms. The counters are updated as side effects of calling the gensym routines. There is no technical reason why any of these
counters need to be implemented non-applicatively. They were implemented as specials only for expediency.
Type Checking for Common Lisp 31
Overview
(and
H1 (valid-fs fs world clock)
H2 (I (argtd1 .. argtdn) (arg1 .. argn) b)
H3 (contained-in-interface (*dlist argtd1 .. argtdn)
(*dlist rtd1 .. rtdn))
H4 (not (break-out-of-timep
(E (and (Ra a1) .. (Ra an))
1 n
((a1 . arg1) .. (an . argn))
world
clock))) )
=>
(equal (E (and (Ra a ) .. (Ra an))
1 1 n
((a1 . arg1) .. (an . argn))
world
clock)
t)
CONTAINED-IN-INTERFACE is the top level function of the containment algorithm, which is presented
in Section 6.8. We present the proof of Lemma GUARD-COMPLETE in Section 7.3.1.
Recall that SYMBOLP is a recognizer function which returns T when given a symbol, NIL otherwise.
The descriptor characterizing a symbol is (*OR $NIL $NON-T-NIL-SYMBOL $T).
Since there is one parameter and no guard form, the guard descriptor is (*UNIVERSAL), which is, of
course, complete. No guard violations are detected, since CONSP, SYMBOLP, SYMBOL-ALISTP, and
NULL all have (*UNIVERSAL) guards, and the calls to CAR and CDR are protected by the IF test which
ensures their argument is a CONS. DESCRIPTOR-FROM-FNDEF determines that the body of the
function satisfies the requirements of a recognizer function, and so it returns the descriptor that
characterizes the values for which the function returns a T result:
(*REC SYMBOL-LISTP
(*OR $NIL
(*CONS (*OR $NIL $NON-T-NIL-SYMBOL $T)
(*RECUR SYMBOL-LISTP))))
32 Type Checking for Common Lisp
Overview
The algorithm constructs a segment which maps this argument to $T. Then, it negates and canonicalizes
this descriptor into a *REC descriptor characterizing its complement (the !REC1 descriptor below), and
this descriptor becomes the argument in a segment which maps it to a $NIL result. The segments for the
function, then, are:
(((*REC SYMBOL-LISTP
(*OR $NIL
(*CONS (*OR $NIL $NON-T-NIL-SYMBOL $T)
(*RECUR SYMBOL-LISTP)))))
-> $T)
(((*REC !REC1
(*OR $CHARACTER
$INTEGER
$NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL
$STRING
$T
(*CONS *UNIVERSAL (*RECUR !REC1))
(*CONS (*OR $CHARACTER
$INTEGER
$NON-INTEGER-RATIONAL
$STRING
(*CONS *UNIVERSAL *UNIVERSAL))
*UNIVERSAL))))
-> $NIL))
An aside on !REC notation: When the inference system generates a new *REC descriptor, it supplies a
unique label for the descriptor. In the exposition in this thesis, we use the notation !RECn, where n is
some positive integer. The name generator maintains a counter, which is appended to the characters
"!REC" in forming a new symbol, thus ensuring the uniqueness of each label. In fact, though, the counter
is reset as each new function is submitted. In order to guarantee uniqueness, then, the label generator also
appends the name of the function being analyzed to the *REC name. Since this additional verbage would
clutter the exposition, we do not display the function name component of *REC names.
Next, this signature is tentatively added to the database of function signatures, and the function is
submitted to the checker for validation. The checker produces the following segments. We will clearly
see the combinatoric explosion to which we referred above.
(((*CONS $NIL
(*REC SYMBOL-LISTP
(*OR $NIL (*CONS (*OR $NIL $NON-T-NIL-SYMBOL $T)
(*RECUR SYMBOL-LISTP))))))
-> $T)
(((*CONS $NIL
(*REC !REC1
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL (*RECUR !REC1))
(*CONS (*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$STRING (*CONS *UNIVERSAL *UNIVERSAL))
*UNIVERSAL)))))
-> $NIL)
(((*CONS $NON-T-NIL-SYMBOL
(*REC SYMBOL-LISTP
(*OR $NIL (*CONS (*OR $NIL $NON-T-NIL-SYMBOL $T)
(*RECUR SYMBOL-LISTP))))))
-> $T)
(((*CONS $NON-T-NIL-SYMBOL
(*REC !REC1
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL (*RECUR !REC1))
(*CONS (*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
Type Checking for Common Lisp 33
Overview
Loosely speaking, each of these segments represents one of the disjuncts produced from a single
unfolding of the *REC forms in the original segments. The containment test ensures that each of them is
contained within one of the original descriptors. In this case, it is easy to see which one contains each,
since the signature contains only one segment with a $T result and one with a $NIL result.
The checker also validates that the guard is (*UNIVERSAL), that this guard descriptor is complete, that
the original guard is contained in the guard computed by the checker, that the guards of all functions
called in the body are complete, and that the function is a recognizer.
The guard is a cumbersome expression which ensures that the argument is such that CDR is defined on it
and CAR is defined on its CDR. (It is not a simple conjunction of CONSP tests because CAR and CDR
each return NIL when given NIL as an argument.) The guard is itself free of guard violations, and the tool
produces the guard descriptor:
((*OR $NIL
(*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))))
This guard descriptor is complete. Since the function has a guard, it fails one of the first tests for being a
recognizer, so the tools employs DERIVE-EQUATIONS and SOLVE-EQUATIONS to compute its
34 Type Checking for Common Lisp
Overview
segments. No guard violations are found in the body, thanks to the assumption provided by the guard.
The segments produced are:
(((*OR $NIL (*CONS *UNIVERSAL $NIL))) -> $NIL)
(((*CONS *UNIVERSAL (*CONS &1 *UNIVERSAL))) -> &1))
Thus, if the argument is a CONS whose CDR is a CONS, the result is the same value as the CAR of the
CDR. Otherwise, the result is NIL.
The checker generates an identical guard descriptor, but the checker has a more stringent test for guard
completeness than the inference tool. The checker requires that guard be a conjunction of recognizer calls
on distinct parameters. Had the guard expression been made the body of another unguarded function, that
function would have qualified as a recognizer. Then, the guard for CADR could have been simply a call
to this new function on the parameter X, and the checker would have judged the guard complete.
The first two segments are contained in the first original segment, and the third segment is contained in
the second original. Thus, the signature is validated.
Type Checking for Common Lisp 35
The Implementation of the Inference Algorithm
Chapter 4
THE IMPLEMENTATION OF THE INFERENCE ALGORITHM
This chapter is an informal synopsis of the type inference algorithm, intended to provide insight into all
the operations undertaken by the implementation.
This discussion will bootstrap itself, in a sense. First, we will mention some key functions and describe
them just well enough to then follow with a high-level overview of the algorithm. Then we devote the
bulk of the chapter to a very detailed discussion of the most critical component algorithms.
Keep in mind that this algorithm is purely heuristic, in that its purpose is to suggest a function signature.
We make no formal claims to its validity or to the soundness of the operations performed within the
algorithm. Formal assurance of the correctness of the signature is left to the checker algorithm.
Fortunately, we define our subset of Common Lisp so that we can make some handy assumptions about
the function we are given.
1. A new function may call only itself and functions which exist in the database.
2. The only variables are those in the parameter list.
3. Quoted constants must be objects whose types are known to the type system. For instance,
complex numbers are not handled.
Furthermore, it was not our intention to support type inference for non-terminating functions. Though
there is no specific prohibition of such functions, the algorithm may be prone to generating weak results
or to diverge, thus yielding no result at all.
The algorithm as it stands is the result of an evolving prototype effort. Its development was arbitrarily
halted at a point where it was judged to be generating sufficiently interesting results. As such, there are
many improvements which could be made to the algorithm, and some are suggested within this chapter.
36 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
Here is a brief description of several important functions which will be mentioned in the overview of the
algorithm below. Details and some parameters are omitted to keep the big picture clearer. Several of
these functions will be explained in greater detail later.
UNIFY-DESCRIPTORS (DESCRIPTOR1 DESCRIPTOR2 ...)
If one thinks of descriptors as terms which, under some arbitrary substitution of ground type
descriptors for type variables, represent sets of data objects, then one may think of UNIFY-
DESCRIPTORS as returning a descriptor characterizing the intersection of those two sets under any
common substitution. UNIFY-DESCRIPTORS returns a descriptor which represents the common ground
between DESCRIPTOR1 and DESCRIPTOR2, annotated with substitutions mapping type variables to
descriptors.
GUARD-VIOLATION-ON-FNCALL (FORMALS ACTUALS)
FORMALS and ACTUALS are lists of descriptors. FORMALS is the descriptor list corresponding
to the guard of the function being called. ACTUALS are the descriptors of the actual parameters.
GUARD-VIOLATION-ON-FNCALL returns T if the ACTUALS descriptors fail to satisfy the
FORMALS descriptors. This judgement is made essentially by unifying the FORMALS and ACTUALS
and seeing that the result is isomorphic to ACTUALS, thus ensuring FORMALS placed no additional
constraint on ACTUALS.
CANONICALIZE-DESCRIPTOR (DESCRIPTOR)
This function employs a number of heuristics to massage a descriptor into a canonical form. There is
not always a single unique canonicalization computed by this function for different descriptors
characterizing the same set of values.
TYPE-PREDICATE-P (FORM ARGLIST RECOGNIZERS TYPE-ALIST)
Given a Lisp expression FORM, a list of names of formal parameters the function ARGLIST, a list of
known RECOGNIZERS from the system database, and a TYPE-ALIST characterizing the types of the
variables in the environment, TYPE-PREDICATE-P determines what information can be determined
about the parameters by evaluating the form.
PREPASS (FORM FNNAME FUNCTION-SIGNATURES RECOGNIZERS)
PREPASS takes a Lisp expression FORM and the system database and returns an IF-normalized
form, which will always evaluate to the same value as FORM in any environment. IF-normalization
consists mainly of transforming the test form in the IF so that it always evaluates to either T or NIL. It
also reduces an IF form to its THEN or ELSE arms if it can deduce trivially that the test always evaluates
to T or NIL, respectively.
At the highest level, the algorithm does the following. When a function is submitted, if it has no guard (or
the guard is T), we check its PREPASS-ed body to see if it is a recognizer function (See Section 4.4.5
below). If it is, we compute the variable-free type descriptor which characterizes what it recognizes.
Then we construct the signature which maps that descriptor to $T and its negation to $NIL. We store this
signature in the database, noting that the function’s guard descriptor is (*UNIVERSAL), that the guard is
complete, and that it recognizes forms conforming to its descriptor.
If the function is not a recognizer, we check the PREPASS-ed guard to ensure it has no guard violations
itself, then use TYPE-PREDICATE-P to construct a type alist which captures what is implied about the
Type Checking for Common Lisp 37
The Implementation of the Inference Algorithm
actual parameters when the guard expression evaluates to a non-NIL value. If the real guard is stronger
than the type alist (for example, if the guard were (EQUAL X 3), the type alist would just say that X is an
INTEGER), we note that the guard is not complete. Then, using the type alist as a starting point, we
construct a representation of the function body which is in the form of a table with one entry for each
subexpression in the body, ordered by Lisp evaluation order. This table, described later, will be the basis
for the computation of the function signature.
Next we traverse this table, computing the types of each subexpression. For each subexpression, we
produce a list of segments, each mapping a list of descriptors characterizing the types of the variables in
the environment to a descriptor characterizing the result. There can be more than one segment for a given
subexpression because different variable types may produce different result types. Ultimately, the set of
segments for the outermost subexpression, i.e., the function body, will become the segments of the
signature for the function.
How are these segments produced, then? Suppose we are analyzing a function call. We have already
computed the segments for each of its arguments. First we take the cross product of all the segments for
all the arguments. This gives us a pattern of argument descriptors for every possible combination we have
uncovered. Now, we fetch the signature for the called function from the system database. The signature
is composed of a guard vector of the same arity as the parameter list, and a collection of segments. For
each actual parameter argument pattern, we first "unify", using UNIFY-DESCRIPTORS, that pattern
against the pattern representing the guard of the called function. If the result of this unification indicates
that the guard descriptor further restricts the argument types, we report a failure to verify guards,
terminate the computation, and make no addition to the system database. Otherwise, we unify each
argument pattern against the argument pattern in each segment of the signature for the called function.
For each segment for which this unification is successful, i.e., for which a non-empty result is produced,
we are rewarded with a unifying substitution, and this substitution is applied to the result descriptor for the
matching segment. This, when canonicalized, gives us a result type which characterizes a possible result
corresponding to our argument pattern. We map our argument pattern to this result descriptor to produce
a segment. Since there may be multiple formal argument patterns for which the unification is successful,
we may get multiple segments for any given actual argument pattern. For each actual parameter
descriptor pattern from the cross product, we perform this unification with each segment for the called
function, collecting all the results to form the set of segments for this function call.
A problem, of course, is what to do about recursive function calls? The type inference process just
described depends on the existence of a signature for every called function, and we are only in the process
of computing the signature for the function at hand. Initially, we have nothing whatsoever with which to
work. We address this problem by doing an iterative approximation of the signature for the function. An
iteration amounts to one pass through the entire table representing the function. During the first pass, we
have no segments to match against recursive function calls, so we know nothing about the results of those
calls. However, this does not prevent us from producing a first approximation of our segments on this
pass. On the second pass, we can use this approximation to produce a refined approximation. Eventually,
we hope that the approximation we produce will be equivalent to the one we produced on the previous
pass. When this happens, the algorithm has stabilized. We canonicalize our segments into a suitable
form, and we are finished. We store the segments in the system database, along with the representation of
the function guard, a tag which says whether the guard is complete, and a tag which says the function is
not a recognizer (since we would not have resorted to this algorithm if it were).
Ensuring that we have reached stability is non-trivial, however. The function is recursively composing
structures which contribute to the result. If we proceed naively, these structures will just grow with each
iteration, our segments will reflect that growth, and nothing will ever stabilize. Therefore, we employ a
38 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
technique for determining when these structures are taking on a form which can be described with a
closed recursive descriptor. With such a closed recursive descriptor, the expansion and folding which
occurs during an iteration of the algorithm will produce the same result we had in the previous pass, thus
leading to stabilization.
In some situations, our heuristics may not be strong enough to close our recursive forms in such a way
that stability can be reached. To prevent the algorithm from running away, we simply place a limit on the
number of iterations it will attempt before giving up.
The following is the grammar for type descriptors which are handled by the inference algorithm.
<descriptor> ::= <simple descriptor> | <variable> | *EMPTY |
(*CONS <descriptor> <descriptor> ) |
(*OR <descriptor>* ) | (*NOT <descriptor> ) |
<rec descriptor> | <and descriptor> |
<fix descriptor> | *MERGE-FIX-POINT |
**RECUR-MARKER**
A <recur descriptor> is further constrained so that when it takes the form (*RECUR <rec name>), the
<rec name> must be identical to that associated with the immediately containing *REC form. Note that
the grammar for <recur descriptor> is just the grammar for <descriptor> extended with the *RECUR form
and modified to be internally recursive.
*ISO-RECUR is a transitory form which can replace (*RECUR <rec name>) in a small algorithm which
determines if two *REC descriptors are isomorphic.
*DLIST is simply a notation for packaging a list of simple descriptors into a single form for submission to
Type Checking for Common Lisp 39
The Implementation of the Inference Algorithm
In an *AND form, the <descriptor> is a constraint on the descriptor of the <rec arm>. The <rec arm> is
some kind of recursive form, either a <recursive arm> or a <descriptor> which contains a *REC form.
<and descriptor> ::= (*AND <descriptor> <recursive arm> )
A <subst descriptor> is a marked descriptor characterized by the embedded <annotated descriptor> but
indicating that its place is held in common by the indicated variable.
<subst descriptor> ::= (*SUBST <variable> <annotated descriptor>)
Before we delve into the higher-level algorithms which produce signatures for recognizers and for general
functions, it is useful to understand several very important utility functions which help implement the
algebra of type descriptors. These are UNIFY-DESCRIPTORS, which finds the common ground between
two descriptors, CANONICALIZE-DESCRIPTOR, which performs canonicalization, PREPASS, which
performs IF-normalization, and TYPE-PREDICATE-P, which extracts type information from a certain
important class of descriptors. Then we can move on to RECOGNIZERP and DESCRIPTOR-FROM-
FNDEF, which determine if a function is a recognizer and derive its associated descriptor. Finally we will
look at DERIVE-EQUATIONS, SOLVE-EQUATIONS, and INFER-SIGNATURE, which compute type
40 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
4.4.1 UNIFY-DESCRIPTORS
Think for the moment of descriptors as terms which, under some arbitrary substitution mapping type
variables to ground type descriptors, represent sets of data objects. With this view, one can think of
UNIFY-DESCRIPTORS as returning the intersection of those two sets under any common substitution.
Given two descriptors, DESCRIPTOR1 and DESCRIPTOR2, UNIFY-DESCRIPTORS returns a
descriptor which represents the common ground between DESCRIPTOR1 and DESCRIPTOR2, plus a
substitution list mapping type variables to descriptors which embodies one part of the unification of the
descriptors. We say "one part of" because in the presence of disjunction in the descriptor language,
descriptor unification also performs an intersection operation not typically associated with classical
unification. For example:
(UNIFY-DESCRIPTORS ’(*OR $CHARACTER $INTEGER) ’(*OR $INTEGER $NIL))
=
($INTEGER . NIL)
Here, though no variables appear, the unifier still must find the common ground between the two
disjunctions, and this is essentially an intersection problem. An example which combines the tasks of
finding an appropriate substitution and taking an intersection is:
(UNIFY-DESCRIPTORS ’(*CONS &1 $NIL) ’(*OR $INTEGER (*CONS $INTEGER &2)))
=
((*CONS $INTEGER $NIL) . ((&1 . $INTEGER) (&2 . $NIL)))
Here the result must be a *CONS, since that is the only possibility allowed by DESCRIPTOR1.
DESCRIPTOR2 requires that, if the object it represents is a CONS, the CAR be an $INTEGER, and
DESCRIPTOR1 requires that the CDR be $NIL. Hence, the pair in the result indicates that the resulting
descriptor must be (*CONS $INTEGER $NIL), and we obtained this result in part by instantiating the
variables according to the substitution ((&1 . $INTEGER) (&2 . $NIL)).
UNIFY-DESCRIPTORS also returns a second mapping list called a restriction list, which is of no
particular interest to the external caller (unless it is exploited -- see Section 4.4.7), but which is important
internally in dealing with variables mapped to disjunctive forms. If a form within one of the input
descriptors is marked with an annotation which associates it with a type variable, and if that subform is
later restricted in the course of unification, then an entry is made in the restrictions list which maps the
variable to its restricted form.
to indicate the form of a result returned by UNIFY-DESCRIPTORS, and the extraction functions TERM,
SUBSTS, and RESTRICTIONS to access the pieces.
UNIFY-DESCRIPTORS takes some parameters which we have not yet mentioned. Each of them
safeguard against particular kinds of infinite recursion.
1. OPEN-NOT-REC is a flag used only to regulate whether CANONICALIZE-NOT-
DESCRIPTOR opens up *REC descriptors which occur within a *NOT form.
2. STACK-RECS is a stack of pairs of *REC descriptors. Every time UNIFY-DESCRIPTORS
is called where both DESCRIPTOR1 and DESCRIPTOR2 are *REC descriptors, the pair is
pushed on the stack. If on a deeper recursive call, the same pair is encountered, we have
Type Checking for Common Lisp 41
The Implementation of the Inference Algorithm
found a fixed point in the recursive descent, and we can construct a closed *REC form to
represent the result.
3. TERM-RECS is a stack of pairs of descriptors. In each pair, one is a *REC and the other is
not. Whenever one descriptor argument to UNIFY-DESCRIPTORS is a *REC and the
other not, the pair is put on this stack, and if it is encountered again on recursion, the
algorithm fails, returning the *EMPTY descriptor. This is explained below.
To illustrate the action of UNIFY-DESCRIPTORS, we state rules providing generic examples of results it
returns. A rule is of the form
(UNIFY-DESCRIPTORS D1 D2) => <<TD SUBSTS RESTRICTIONS>>
where D1 and D2 are various forms representing the kinds of descriptors to which the rule applies, and the
triple represents, in terms of D1 and D2, the result produced by the unification. In some cases, we supply
additional remarks to help clarify the adequacy of the result. UNIFY-DESCRIPTORS is commutative, so
we will not bother to present a result with the arguments reversed. To imagine a soundness argument for
the result, one might imagine a containment relation for descriptors and then question whether the result
descriptor is contained in both the arguments.
By convention in our rules, let the names D, D1, D2, ... represent type descriptors. The SUBSTS and
RESTRICTIONS lists have elements of the form (&i . <descriptor>).
Rule:
(UNIFY-DESCRIPTORS D D) => <<D NIL NIL>>
Rule:
(UNIFY-DESCRIPTORS *EMPTY D) => <<*EMPTY NIL NIL>>
Rule:
(UNIFY-DESCRIPTORS D *UNIVERSAL) => <<D NIL NIL>>
In the following rule, the *SUBST form is the annotation mentioned above which associates a variable
with a component of the result descriptor. The form of a *SUBST is (*SUBST &i <descriptor>), where
&i is the variable. The set of values represented by the *SUBST is the set represented by its <descriptor>.
Thus, unification consists of unifying against the <descriptor>, and then, if this unification results in a
narrowing of <descriptor>, noting in the restrictions list that the variable is restricted accordingly.
REBUILD-SUBST-FORM is the function which does this, modifying the result of the unification by
adding to the restriction list a restriction on the *SUBST variable whenever the *SUBST form is restricted
by the unification.
Rule:
(UNIFY-DESCRIPTORS (*SUBST &i D1) D2) =>
where unified-form = (UNIFY-DESCRIPTORS D1 D2)
if TERM (unified-form) = *EMPTY
then <<*EMPTY NIL NIL>>
else (REBUILD-SUBST-FORM (*SUBST &i D1) unified-form)
Rule:
(UNIFY-DESCRIPTORS &i (*OR .. &i ..)) => <<&i NIL NIL>>
Rule:
Under the condition that D contains an occurrence of the variable &i
nested within both an *OR and a *CONS,
(UNIFY-DESCRIPTORS &i D) =>
42 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
The preceding rule illustrates a different treatment of variables than we will see in the formal semantics
and in the checker algorithm. In the inference algorithm, variables serve a dual purpose, indicating both
the sharing of data values and the need for common type instantiation. Instances of the latter are cleaned
up at the end of the inference process. In hindsight, we believe these purposes should have been
separated, treated differently, and preserved in the formal semantics.
Rule:
Under the condition that D contains an occurrence of the variable &i,
but where D is neither an *OR with &i as one of its disjuncts nor a
form where the variable &i is nested within both an *OR and a *CONS,
(UNIFY-DESCRIPTORS &i D) => <<*EMPTY NIL NIL>>
This covers only the case where D contains &i within a *CONS but not within an *OR. If we created a
*REC in the spirit of the previous case, it would be a non-terminating structure.
Rule:
Unifying a descriptor D with an *OR descriptor which has D as one
of its disjuncts yields D intact with no substitutions necessary.
Otherwise, the result is the disjunction of the unification of D with
each of the disjuncts, with each unification yielding its own
substitution.
Rule:
Where &i does not occur in D
(UNIFY-DESCRIPTOR &i D) =>
<<(*SUBST &i D) ((&i . (*SUBST &i D))) NIL>>
Rule:
(UNIFY-DESCRIPTORS (*REC <rec-name> (.. (*RECUR <rec-name>) ..)) D)
=>
(UNIFY-DESCRIPTORS
(.. (*REC <rec-name> (.. (*RECUR <rec-name>) ..)) ..) D)
This is to say that when we unify a *REC form with another form, we
just open the *REC and unify.
When unifying two non-identical *REC descriptors, there is a danger of getting lost in a non-terminating
recursion. UNIFY-DESCRIPTORS does a nice trick to prevent this and to reach closure. When it is
asked to unify two *REC descriptors, it places the pair on a stack before proceeding. This stack is the
STACK-RECS parameter mentioned above. Then, as usual, it opens one up and unifies against the other,
which results in the other being opened up, so that their internals get unified. This constructs a new,
unified descriptor as it descends, and sooner or later, we will reach a point where the two *REC forms,
which reappear nested with every unfolding, once again are to be unified with each other. If we just
continued unfolding, the algorithm would never terminate, but before unifying two *RECS, UNIFY-
DESCRIPTORS checks the stack to see if they have already been encountered. If so, we have reached a
Type Checking for Common Lisp 43
The Implementation of the Inference Algorithm
fixed point in the recursion. We simply create a new *REC label <recname>, declare the result of the
nested unification to be (*RECUR <recname>), and when we finally unwind our way to the top, wrap a
(*REC <recname> .. ) around the accumulated result. For example, unifying the descriptors
(*REC R1 (*OR $NIL (*CONS (*OR $INTEGER $CHARACTER) (*RECUR R1))))
(*REC R2 (*OR $NIL $T (*CONS *UNIVERSAL (*CONS $INTEGER (*RECUR R2)))))
Another pitfall is the possibility of infinite recursion when each descriptor argument contains a *REC
form, but the forms unwind out of sync with one another. For example, consider the descriptors:
(*REC FOO (*OR $NIL (*CONS *UNIVERSAL
(*CONS *UNIVERSAL (*RECUR FOO))))
(*CONS *UNIVERSAL
(*REC BAR (*OR $NIL (*CONS *UNIVERSAL
(*CONS *UNIVERSAL (*RECUR BAR))))
The *REC forms are opened up alternately, leapfrogging one another in an endless unwinding. This
unwinding does not get spotted under the STACK-RECS regimen because at no point do we attempt to
unify two *REC forms head to head. So we employ the TERM-RECS mechanism, mentioned above, to
prevent this case from running away. TERM-RECS is a stack of pairs of descriptors, where in each pair
one is a *REC and the other is not. If on recursion we encounter arguments which match pairs already on
the TERM-RECS stack, we return *EMPTY as the result, since any object satisfying both descriptors
would be impossible to construct. In the example above, it would have to be a list whose length is both
even and odd.
Rule:
(UNIFY-DESCRIPTORS D (*NOT D)) => <<*EMPTY NIL NIL>>
Canonicalization complements a simple descriptor and forces the *NOT inside a structure, as we shall see.
This reduces unification of *NOT descriptors to unification on other forms. The rare exception is where
the canonicalization of the *NOT form produces no change. This occurs only when D1 is a *RECUR
form, and this peculiarity occurs only during recognizer analysis, which is explained below, along with
the treatment of *AND and *RECUR forms.
Rule:
Where D1 is not a *RECUR descriptor,
(UNIFY-DESCRIPTORS (*NOT D1) D2) =>
(UNIFY-DESCRIPTORS (CANONICALIZE-DESCRIPTOR (*NOT D1)) D2)
Rule:
Where D is not an *OR descriptor:
(UNIFY-DESCRIPTORS (*NOT (*RECUR <rec-name>)) D) =>
<<(*AND (*NOT (*RECUR <rec-name>)) D) NIL NIL>>
otherwise
(UNIFY-DESCRIPTORS (*NOT (*RECUR <rec-name>)) (*OR D1 .. DN)) =>
(*OR <<(*AND (*NOT (*RECUR <rec-name>)) D1) NIL NIL>>
..
<<(*AND (*NOT (*RECUR <rec-name>)) DN) NIL NIL>>)
this when both descriptors are composed solely of type variables, instances of the simple types
$CHARACTER, $INTEGER, $NIL, $NON-INTEGER-RATIONAL, $NON-T-NIL-SYMBOL,
$STRING, or $T, or *CONS or *DLIST descriptors whose components are also suitable descriptors.
UNIFY takes two such descriptors and returns either *failure if unification fails or a unifying substitution
list. Thus, we have the rule:
Rule:
Where D1 and D2 are suitable for classical unification:
(UNIFY-DESCRIPTORS D1 D2) =>
let unify-result = (UNIFY D1 D2)
if unify-result = *failure
(*EMPTY NIL NIL)
<<(APPLY-SUBSTS unify-result D1) unify-result NIL>>
Now consider the general unification of two *CONS descriptors. Since this is a complex operation, we
decompose it into several principal functions, expressed as follows:
Rule:
Where D1 is of the form (*CONS D1-CAR D1-CDR)
and D2 is of the form (*CONS D2-CAR D2-CDR)
(UNIFY-DESCRIPTORS D1 D2) =>
let unify-cars = (UNIFY-DESCRIPTORS D1-CAR D2-CAR)
if (EQUAL (TERM unify-cars) *EMPTY)
then <<*EMPTY NIL NIL>>
else if unify-cars is of the form (*OR u-form1 .. u-formn)
then (*OR (UNIFY-CDR-UNIFIED-FORMS u-form1 D1 D2)
...
(UNIFY-CDR-UNIFIED-FORMS u-formn D1 D2))
else (UNIFY-CDR-UNIFIED-FORMS unify-cars D1 D2)
where
UNIFY-CDR-UNIFIED-FORMS (unify-cars D1 D2) =
let unify-cdrs =
(UNIFY-DESCRIPTORS
(APPLY-RESTRICTIONS (RESTRICTIONS unify-cars)
(APPLY-SUBSTS (SUBSTS unify-cars)) D1-CDR)
(APPLY-RESTRICTIONS (RESTRICTIONS unify-cars)
(APPLY-SUBSTS (SUBSTS unify-cars)) D2-CDR))
if (EQUAL (TERM unify-cdrs) *EMPTY)
then <<*EMPTY NIL NIL>>
else if unify-cars is of the form (*OR u-form1 .. u-formn)
then (*OR (MAKE-CONS-UNIFIED-FORM u-form1 unify-cdrs)
...
(MAKE-CONS-UNIFIED-FORM u-formn unify-cdrs))
else (MAKE-CONS-UNIFIED-FORM unify-cars unify-cdrs)
where
MAKE-CONS-UNIFIED-FORM (unify-cars unify-cdrs) =
<<(*CONS (APPLY-RESTRICTIONS
(RESTRICTIONS unify-cdrs)
(APPLY-SUBSTS (SUBSTS unify-cdrs) (TERM unify-cars)))
(TERM unify-cdrs))
(COMPOSE-SUBSTS
(APPLY-RESTRICTIONS (RESTRICTIONS unify-cdrs) (SUBSTS unify-cars))
(SUBSTS unify-cdrs))
(COMPOSE-RESTRICTIONS
(APPLY-RESTRICTIONS (RESTRICTIONS unify-cdrs)
(APPLY-SUBSTS (SUBSTS unify-cdrs)
(RESTRICTIONS unify-cars)))
(RESTRICTIONS unify-cdrs))>>
The use of the SUBSTS lists when unifying CONSes is essentially the same as with classical unification.
The substitutions from the CAR are applied to the CDRs before unifying, and the substitutions from the
Type Checking for Common Lisp 45
The Implementation of the Inference Algorithm
CARs and CDRs are merged to become the substitutions for the CONS.
The RESTRICTIONS list is used to deal with the presence of disjunction in the descriptor language. If
one of the CARs is a variable, we return a *SUBST form,
(*SUBST <variable> <descriptor>)
for unify-cars, as already discussed. The value set represented by a *SUBST is just that of the
<descriptor>, but the form is annotated with the variable for future reference. We carry the <descriptor>
form into the unification of the CDRs (through substitution for the variable in the CDRs). If, in unifying
the CDRs, this form is further constrained, we must reflect this constraint everywhere the variable once
occurred. A restriction is formed every time such a constraint occurs. As we construct the *CONS result
from its unified CARs and CDRs, we apply these restrictions wherever there was a *SUBST form for the
variable in question. This can be viewed simply as an elaborate way to ensure that the substitutions
discovered in the unification are uniformly applied throughout the result.
Rule:
Unification of lists of descriptors, denoted (*DLIST D1 D2 .. DN) works
exactly like the unification of *CONS descriptors. We treat the list
(*DLIST D1 D2 .. DN), for unification purposes, as if it were of the
form (*CONS D1 (*CONS D2 .. DN)). Actually, the *DLIST construction is
the general case, and *CONS is the special case, but for the purposes of
explanation, *CONS was enough to bear. *DLIST unification is only
meaningful when both arguments are *DLISTs and have the same arity.
Rule:
Where D3 is one of the simple descriptors $CHARACTER, $INTEGER, $NIL,
$NON-INTEGER-RATIONAL, $NON-T-NIL-SYMBOL, $STRING, or $T,
(UNIFY-DESCRIPTORS (*CONS D1 D2) D3) => <<*EMPTY NIL NIL>>
The following peculiar operation occurs only during the analysis of recognizer functions, as this is the
only time *AND forms are built, and they are subsequently filtered away.
Rule:
(UNIFY-DESCRIPTORS (*AND D1 (*RECUR <rec-name>)) D2) =>
let uform = (UNIFY-DESCRIPTORS D1 D2)
if (TERM uform) = *EMPTY
<<*EMPTY NIL NIL>>
if uform is of the form (*OR uform1 .. uformn)
then (*OR <<(*AND (TERM uform1) (*RECUR <rec-name>))
(SUBSTS uform1)
(RESTRICTIONS uform1)>>
...
<<(*AND (TERM uformn) (*RECUR <rec-name>))
(SUBSTS uformn)
(RESTRICTIONS uformn)>>)
else <<(*AND (TERM uform) (*RECUR <rec-name>))
(SUBSTS uform)
(RESTRICTIONS uform)>>
Here is one place where the *AND forms are built during the analysis of recognizers. Again, this only
happens during that analysis, because any other time a (*RECUR <rec-name>) only occurs within a
*REC, since we always open up *REC forms by replacing the nested *RECUR form with the *REC form.
Rule:
(UNIFY-DESCRIPTORS (*RECUR <rec-name>) D) =>
if D is of the form (*OR D1 .. DN)
then (*OR (*AND D1 (*RECUR <rec-name>))
46 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
...
(*AND DN (*RECUR <rec-name>)))
else (*AND D (*RECUR <rec-name>))
4.4.2 CANONICALIZE-DESCRIPTOR
CANONICALIZE-DESCRIPTOR attempts to put type descriptors into canonical form. There is not
always a single unique canonicalization computed by this function for different descriptors characterizing
the same particular set of values. We simply employ a collection of reductions which tend toward
canonicalization. Since these reductions amount to heuristic choices, we may not discover an optimal
form. For example, two equivalent descriptors which do not canonicalize to the same form are:
(*REC FOO1 (*CONS $INTEGER (*OR $NIL (*RECUR FOO1))))
and
(*CONS $INTEGER (*REC FOO2 (*OR $NIL (*CONS $INTEGER (*RECUR FOO2)))))
Of course, since recursive descriptors always recur from a point which is nested within both an *OR and a
*CONS, there is no reason we could not force the top level form within the *REC to be definitely one or
the other. Consider the first descriptor. If we required an *OR at the top level, we would force the outer
(*CONS $INTEGER .. ) outside the *REC and also inside the *OR around the *RECUR, and we would
get a form equal to the second. But the simple fact is that in the current implementation, we do not do
this. A nice upgrade to the implementation would be to install this canonicalization. Many other such
omissions could be found and possibly implemented.
where the left hand side is a descriptor whose form generically indicates the form of a descriptor which is
transformed by the rule, and the right hand side gives the form of the result, usually in terms of descriptors
appearing within the form on the left. Let D, D1, D2, and D3 represent arbitrary type descriptors. Let R1,
R2, and R3 represent arbitary, but distinct names associated with *REC descriptors. Let D-REC, D-
REC1, and D-REC2 be descriptors which contain *RECUR forms, useful, of course, only inside *REC
descriptors. Finally, note that *OR is an n-ary descriptor. But for purposes of expressing these rules,
without loss of generality we can consider it to be ternary, binary, unary, or 0-ary as convenient.
Canonicalization Rule 1:
(*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
=> *UNIVERSAL
Canonicalization Rule 2:
(*OR D *EMPTY) => (*OR D)
The rule above also connotes (*OR D1 *EMPTY D2) => (*OR D1 D2), etc.
Canonicalization Rule 3:
(*OR D) => D
Canonicalization Rule 4:
(*OR D1 D2) => (*OR D2 D1)
There is a function DESCRIPTOR-ORDER (D1 D2) which defines a complete partial order on
Type Checking for Common Lisp 47
The Implementation of the Inference Algorithm
Canonicalization Rule 6:
(*OR D1 D1 D2) => (*OR D1 D2)
Canonicalization Rule 7:
(*OR D1 (*REC R1 (*OR D1 D2 D-REC)))
=>
(*OR (*REC R1 (*OR D1 D2 D-REC)))
Canonicalization Rule 8:
(*OR (*CONS D1 D2) (*CONS D3 D2))
=>
(*OR (*CONS (*OR D1 D3) D2))
Canonicalization Rule 9:
(*OR (*CONS D1 D2) (*CONS D1 D3))
=>
(*OR (*CONS D1 (*OR D2 D3)))
Canonicalization Rule 31 shows how the expansion of a *REC descriptor can canonicalize to the *REC
descriptor itself. For example,
(*OR $NIL (*CONS $INTEGER
(*REC INT-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INT-LISTP))))))
folds to:
(*REC INT-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INT-LISTP))))
=>
(*REC R2 (*NOT (OPEN-REC-DESCRIPTOR (*REC R1 D-REC))))
/ (((*NOT (*REC R1 D-REC)) . (*RECUR R2)))
There is a lot of information in the rule above. Read the form on the right hand side as a *REC form with
some new label R2 different from R1, whose body is the *not of the body of R1 opened up once, and with
the (*RECUR R2) marker replacing the original descriptor. For example, consider the following as our
left hand side.
(*NOT (*REC INT-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INT-LISTP)))))
Then we partially canonicalize the *NOT, but without opening up the *rec again:
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL $STRING
$T
(*CONS *UNIVERSAL
(*NOT (*REC INT-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INT-LISTP))))))
(*CONS (*OR $CHARACTER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
*UNIVERSAL))
But since this form represents the same set of objects as the original form, and since that original form is
embedded within it, we have discovered a new recursive form. We give it a name (NOT-INT-LISTP for
the sake of exposition), replace the embedded occurrence with the *RECUR point, and encapsulate in a
*REC:
(*REC NOT-INT-LISTP
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T
(*CONS *UNIVERSAL (*RECUR NOT-INT-LISTP)
(*CONS (*OR $CHARACTER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL *UNIVERSAL))
*UNIVERSAL)))
*FIX is a descriptor form used in SOLVE-EQUATIONS (See Section 4.4.7) to find closed forms
constructed through function recursion. Though the canonicalization of *FIX forms is not quite of the
same flavor as other canonicalizations, it is implemented in CANONICALIZE-DESCRIPTOR anyway.
*FIX forms are the mechanism used by the algorithm to discover closed recursive descriptors for forms
50 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
which are constructed on recursive function calls. One may view *FIX as having the same meaning as
*REC, with the *DLIST form representing the label. But the *FIX form signals that a certain variety of
heuristics should be employed. This canonicalization rule is very similar to Rule 31, and almost the
inverse of OPEN-REC-DESCRIPTOR, the difference being that DN is only an approximation of
(... (*FIX (*DLIST D1 .. DM) DN) ...)
When the inference tool reaches stability and we employ this rule, it is a perfect approximation. That is
the essence of the iterative stabilization algorithm.
By Rule 35, we mean that if the *RECUR form is a top-level disjunct of the body, we eliminate it, since it
serves no purpose.
Canonicalization Rule 36:
Where D does not contain (*RECUR R),
(*REC R D) => D
Rule 37 represents any case where the body of a *REC is just a nest of *CONS descriptors with no *OR
providing a terminating disjunct.
4.4.3 PREPASS
PREPASS takes a Lisp expression FORM and the system database and returns an IF-normalized form,
which will always evaluate to the same value as FORM in any environment. IF-normalization consists
mainly of transforming the test form in the IF so that it always evaluates to either T or NIL. Since IF
splits on NIL vs. non-NIL values in the test, a non-NIL value serves the same purpose as a T in an IF test.
If a test form can evaluate to a value other than T or NIL, PREPASS simply wraps (NULL (NULL .. ))
around the test. Otherwise, it leaves the test alone, and prepasses the THEN and ELSE arms recursively.
As an optimization, PREPASS also checks whether an IF test will always return NIL. If so, it simplifies
the IF to its prepassed ELSE arm. Likewise for the THEN arm if it trivially sees a test returning non-NIL.
The inference algorithm performs IF-normalization prior to processing function guards and bodies.
Though IF-normalization does not affect the value produced by evaluation of the IF, it can make the
difference in deciding whether a form can be treated as a type predicate.
The inference algorithm gives special consideration to forms which it considers to be type predicates, and
in particular to forms which conform to the requirements of a recognizer function body. The algorithm
exploits the properties of type predicates to make simplifying assumptions which allow different
techniques to be used in formulating signatures. These techniques can produce very specific signatures,
so it is worthwhile to apply them whenever possible. One of the requirements in recognizer bodies is that
IF tests always yield Boolean results.
A recognizer function in essence is a unary function which determines whether its argument conforms to a
descriptor in the type language. For any variable-free type descriptor, one could define a perfect
recognizer function. One of the criteria for a recognizer function is that it always returns either T or NIL.
Type Checking for Common Lisp 51
The Implementation of the Inference Algorithm
One reason recognizers are important to the type system is that if a call to a recognizer is used as a test in
an IF expression, optimal modifications to the type context of the IF may be made to support the analysis
of the THEN and the ELSE arms. Moreover, this significance scales up, since one criterion for the body
of a recognizer is that its IF tests are recognizer calls or equivalent predicates. IF-normalization, then, can
help transform a function body into a form suitable for treatment as a recognizer.
The tool does not treat (CDR X) as a type predicate, since it can return a non-Boolean value. Therefore,
the algorithm will not optimally refine the type assumptions about X when considering (FOO X) and
(BAR X). However, for heuristic reasons, it will treat
(IF (NULL (NULL (CDR X))) (FOO X) (BAR X))
4.4.4 TYPE-PREDICATE-P
The function TYPE-PREDICATE-P is the principal function which determines whether a function
qualifies as a recognizer and which computes the descriptor for the objects it recognizes. It is also used to
derive a guard descriptor from a function guard and to examine IF tests for type information which can be
merged into the type contexts for the THEN and ELSE arms.
Given a Lisp expression FORM, a list of names of formal parameters the function ARGLIST, a list of
known RECOGNIZERS from the system database, and a TYPE-ALIST which associates variable names
to descriptors characterizing their types in the environment of FORM, this complex function returns either
NIL or a five-tuple:
(ALIST NEGATABLE UNSATISFIED-GUARDS BOOLEANP COMPLETE)
where
• ALIST is of the same form as the argument TYPE-ALIST and conveys the type information
provided when FORM evaluates to a non-NIL value. For example, given the form
(IF (INTEGERP X) (CHARACTERP Y) NIL)
• NEGATABLE is either T or NIL and indicates whether the type information in ALIST can
be negated if the FORM evaluates to NIL. In the case where FORM is an IF test we might
wish to negate its information in the else arm. This is not always possible, with one trivial
example being (EQUAL X "abc"). The ALIST produced for this form is ((X . $STRING)).
But it is not true that the negation of FORM indicates that X is not a string. It could be a
non-string or any string except "abc". Therefore, we say that the ALIST for this form is not
negatable. For another example, (IF (INTEGERP X) (NULL Y) NIL) produces the ALIST
((X . $INTEGER) (Y . $NIL)). This is not negatable, since the type language does not have
a notion of negating an ALIST.17
17
In practice, what we do not support is negation of an ALIST of length other than one. This amounts to a shortcoming in the
implementation, since it means, for example, that we cannot provide additional type information in the ELSE arm of an IF where a
form like (IF (INTEGERP X) (NULL Y) NIL) was the test. A nice upgrade would be to express the result as a disjunctive normal
form, so we could accommodate negation.
52 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
Let us now examine how TYPE-PREDICATE-P makes its judgements, using examples to illustrate.
These examples will give the flavor of the heuristics used. This is just a sampling, which places emphasis
on where the algorithm yields information. Often, it is just as interesting why it fails to yield information,
but we will not dwell on that. The notation
<form> <arglist>
-> NIL or <ALIST NEGATABLE UNSATISFIED-GUARDS BOOLEANP COMPLETE>
produces NIL or a 5-tuple, and follow with an explanation if warranted. The TYPE-ALIST argument is
significant only in determining the presence of guard violations. <arglist>s will be given by example or
omission, to avoid copious notation. Assume any identifiers X, Y, ... appearing in the form are in the
arglist. In expressing these schema, we will assume no guard violations are detected. These rules are
presented as canonical examples, so functions like INTEGERP, CONSP, and TRUE-LISTP are used to
exemplify treatment of recognizer functions, and FOO represents a non-recognizer.
T (X) -> (((X . *UNIVERSAL)) T NIL T T))
NIL (X) -> (((X . *EMPTY)) T NIL T T)
T (X Y) -> (((X . *UNIVERSAL) (Y . *UNIVERSAL)) NIL NIL T T)
NIL (X Y) -> (((X . *EMPTY) (Y . *EMPTY)) NIL NIL T T)
T recognizes all objects, NIL rejects all objects. If the arglist is of length greater than one, the result is not
negatable.
<atom> (..) -> NIL
When the form is an EQUAL, one argument is some nest of destructors (CAR or CDR) applied to a
parameter from ARGLIST, and the other argument is a literal (a quoted form or an constant of one of the
primitive types), we build the *CONS structure appropriate to the destructors and embed the descriptor for
the literal. Under the assumption that there is no guard violation, i.e., when the type-alist guarantees the
*CONS structure is adequate, the resulting descriptor is negatable if the literal is T or NIL, otherwise not.
Note the following curiosity. Whereas the (EQUAL (CDR X) T) example above yielded
(X . (*CONS *UNIVERSAL $T)), the same predicate, using NIL instead of T, yields
(EQUAL (CDR X) NIL)) (..) ->
(((X *OR $NIL (*CONS *UNIVERSAL $NIL))) T NIL T T)
This is, of course, because (CDR X) is NIL when X is NIL or X is a CONS whose CDR is NIL.
(INTEGERP X) (..) -> (((X . $INTEGER)) T NIL T T)
(INTEGERP (CDR X)) -> (((X . (*CONS *UNIVERSAL $INTEGER))) T NIL T T)
If the form is a call to a recognizer function on some component of a formal, we build the *CONS
structure appropriate to the destructors and embed the descriptor associated with the recognizer function.
The descriptor is negatable when there is no guard violation found because recognizer calls produce
negatable results. Recognizers always return Boolean values and are complete.
(NOT (INTEGERP X)) ->
(((X . (*OR $CHARACTER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
T NIL T T)
We give special treatment to the NULL function and its equivalents (NOT, for instance). In these cases,
like the one above, we do not require the argument to be a series of destructors applied to a formal. It is
sufficient for the argument to be a negatable type predicate. If it is, we can just negate the descriptor for
the nested form. If it is not (in a case like (NULL (CDR X)), we can still try the general test for
recognizers, given below. This, in fact, is the base case for applications of a series of NULL surrogates to
a destructed formal. This special treatment may seem like an arcane special case, but remember that
PREPASS doubly negates expressions appearing in IF tests if they are not recognizer calls.
When we encounter an IF form, we recur on its test and on its THEN and ELSE arms. If the test was a
type predicate, we merge its results into the type-alist via unification for the recursive call on the THEN
arm. If the test was also negatable, we negate it and merge for the ELSE arm. Then we try a number of
special cases, as follows:
54 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
If evaluating this IF form returns T by way of the THEN arm, we know both X and Y are CONSes, but if
we return T by way of the ELSE arm, we know only that X is an integer. So if this form returns non-NIL,
we know something about X, but not necessarily anything about Y. Moreover, since this form may return
NIL when X is a CONS but Y is not, we cannot negate what little we know. And since we capture only
information about X, but a non-NIL result may require something of Y, we cannot say the result is
complete, either.
(IF (CONSP X) (TRUE-LISTP X) T) ->
(((X . (*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL
(*RECUR TRUE-LISTP)))))))
T NIL T T)
Evaluation of this form will succeed whenever X is a non-CONS or it is both a CONS and a TRUE-
LISTP, and fail otherwise, which makes it negatable and complete. The result descriptor for X is the *OR
of the ones yielded by the two recognizer calls.
(IF (IF (CONSP X) (CONSP Y) NIL) (TRUE-LISTP X) T) -> NIL
This could succeed solely because Y is a non-CONS, but it could also succeed for reasons not relating to
Y. We deduce nothing.
(IF (CONSP X) (INTEGERP Y) NIL) ->
(((X . (*CONS *UNIVERSAL *UNIVERSAL)) (Y . $INTEGER)) NIL NIL T T)
This will succeed only when X is a CONS and Y is an integer. However, it can fail if either is not true, so
we cannot negate the result. The result is complete, however, because the alist is sufficient to guarantee a
positive result.
(IF (CONSP X) (FOO X) NIL) ->
(((X . (*CONS *UNIVERSAL *UNIVERSAL))) NIL NIL NIL NIL)
Here, suppose that what FOO returns yields no particular type information about X. This can only succeed
if X is a CONS, but it can fail for any reason. So, the result is neither negatable nor complete.
(IF (CONSP X) NIL (IF (INTEGERP X) (INTEGERP Y) NIL)) ->
(((X . $INTEGER) (Y . $INTEGER)) NIL NIL T T)
This form returns a non-NIL value only if X and Y are integers. Moreover, this is complete. But we
know nothing if we fail because we could fail either when X is a non-integer or Y is a non-integer, and we
do not have a means for negating an alist.
(IF (NOT (CONSP X)) NIL (IF (INTEGERP (CAR X)) (INTEGERP Y) NIL)) ->
(((X . (*CONS $INTEGER *UNIVERSAL)) (Y . $INTEGER)) NIL NIL T T)
If this form returns non-NIL, we know the negatable IF test failed, but we know nothing more.
(IF (CONSP X)
Type Checking for Common Lisp 55
The Implementation of the Inference Algorithm
If this form returns non-NIL, we know that X is either a character or a CONS whose CAR is an integer,
but we know nothing about Y or Z. We could succeed when Y is an integer or when Z is a character, but
we cannot relate these facts in a type alist.
(IF (EQUAL (CAR X) 3) (CONSP (CDR X)) (INTEGERP (CAR (CDR X)))) ->
(((X . (*CONS $INTEGER (*CONS *UNIVERSAL *UNIVERSAL)))) NIL NIL T NIL)
If our IF test form is not negatable, then we do not try to merge any information from the ELSE arm. We
cannot negate our descriptor.
(IF (FOO X) (CONSP X) NIL) ->
(((X . (*CONS *UNIVERSAL *UNIVERSAL))) NIL NIL T NIL)
FOO gives us no type information, but since the ELSE is NIL, the only way we can succeed is if X is a
CONS. We do not try to negate, since FOO alone could cause the result to be NIL.
RECOGNIZERP is the predicate which says whether a function may be classified as a recognizer. It is
essentially a combination of a call to TYPE-PREDICATE-P on the body of a function and a set of
additional checks. A recognizer must have an argument list of length one, and it must either have no
guard or a guard of T. Its body must be a type predicate, i.e., TYPE-PREDICATE-P must return an alist.
Furthermore, the alist must be of length one and be negatable. There may be no guard violations. Finally,
the result must be Boolean (T or NIL). If a function passes these constraints, we know that the check for
guard violations in TYPE-PREDICATE-P is strong enough to guarantee the guards of all called functions
are verified. If all this is true, RECOGNIZERP returns the descriptor for its parameter from the ALIST
from TYPE-PREDICATE-P.
RECOGNIZERP sets up its call to TYPE-PREDICATE-P in a way which can produce a result which is a
little irregular in the type language. TYPE-PREDICATE-P looks up function names in the
RECOGNIZERS table to determine when a called function is a recognizer. If it is, TYPE-PREDICATE-P
extracts from the table the descriptor associated with the function. RECOGNIZERP allows TYPE-
PREDICATE-P to assume the function whose body it is analyzing is a recognizer, since, if every other
attribute of the function body is consistent with the notion of a recognizer, the recursive call will be, too.
But, since we are only in the process of computing the descriptor for the function, what we can store as
the descriptor characterizing the recognizer is limited. In fact, we store the partial descriptor
(*RECUR <fnname>), which is the recursion point in a *REC descriptor labelled <fnname>. Thus, when
we encounter a recursive function call in the body, we embed this *RECUR form at the appropriate point
in the descriptor being composed by TYPE-PREDICATE-P. Upon exit from TYPE-PREDICATE-P, we
might have a descriptor with a *RECUR point, but no encapsulating *REC. For example, when analyzing
the function TRUE-LISTP:
(DEFUN TRUE-LISTP (X)
(IF (NULL X) (NULL X) (IF (CONSP X) (TRUE-LISTP (CDR X)) NIL)))
We close this form by simply wrapping *REC around it with the TRUE-LISTP label:
56 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
The (CONSP (CDR X)) test governing the recursive call yields type information about the recursive
parameter which amounts to a constraint on what the recursive call will recognize. Normally, TYPE-
PREDICATE-P would use descriptor unification to merge the information from the IF test
((X . (*CONS *UNIVERSAL (*CONS *UNIVERSAL *UNIVERSAL))))
But (*RECUR FOO) is not meaningful in the absense of an enclosing *REC form, and at the point where
we would do this unification, we are not yet ready to encapsulate with a *REC.
The answer to this quandary is to defer the merge until we do have a *REC form. We allow TYPE-
PREDICATE-P to construct a special form which indicates a deferred unification.18 We call this form
*AND. TYPE-PREDICATE-P will produce the following alist entry for the body of foo.
(X . (*OR $NIL
(*CONS *UNIVERSAL
(*AND (*CONS *UNIVERSAL *UNIVERSAL) (*RECUR FOO)))))
The essence of *AND validation is to unify the constraint with the unwinding of the *RECUR form, and
to let this result take the place of the *AND form in the *REC descriptor. But this is a deceptive
simplification of the task, for a number of reasons. For one thing, the constraint may close off the
recursion, so we have to be prepared to canonicalize our "*REC" form to a non-recursive one. (In fact,
this is what will happen with our FOO example above.) But more problematically, the unified result may
itself have an *AND form in it from expanding the *RECUR to the containing *REC, which of course has
an *AND nested in it. Furthermore, the unified form or its constraint may be different than before, the
result of the constraint actually restricting the possible values from the unfolded *RECUR. Thus, we have
a stabilization problem.
18
Actually, the form is constructed in UNIFY-DESCRIPTORS when we attempt to unify a *RECUR form with something else, a
scenario which would never occur if the *RECUR were nested in a *REC.
Type Checking for Common Lisp 57
The Implementation of the Inference Algorithm
The process is essentially as follows. If the unification produces a descriptor which is just the recursive
expansion of the one we started with, we contract the expansion, stripping the *AND constraint, and we
are done. If the unification produces something different, then if the result contains an *AND, we
*AND-validate it again, in hopes of finding a new *REC descriptor to embed within the expansion of the
old one. If there is no *AND in the unified form, we are done, and we just embed the result in the form
where the *AND was originally.
Here are some examples of calls to VALIDATE-AND. The first is from our FOO example above. Here
the constraint can only be fulfilled if the recursion does not terminate. Therefore, the *AND form
collapses to *EMPTY, and so does its containing *CONS. Since $NIL includes no (*RECUR FOO), we
drop the *REC, leaving only $NIL.
(VALIDATE-AND
’(*REC FOO (*OR $NIL
(*CONS *UNIVERSAL
(*AND (*CONS *UNIVERSAL *UNIVERSAL)
(*RECUR FOO)))))
’FOO)
=
$NIL
(VALIDATE-AND’s second argument is just a coding artifact of no interest in this discussion.) In the
following, the constraint does not allow NIL, so we force the $NIL outside the recursive form.
(VALIDATE-AND ’(*REC FOO
(*OR $INTEGER $NIL
(*CONS $INTEGER
(*AND (*OR $INTEGER (*CONS *UNIVERSAL *UNIVERSAL))
(*RECUR FOO)))))
’FOO)
=
(*OR $NIL (*REC !REC1 (*OR $INTEGER (*CONS $INTEGER (*RECUR !REC1)))))
In the next example, the *CONS constraint is disallowed, but the $INTEGER is valid, albeit non-
recursive.
(VALIDATE-AND ’(*REC FOO
(*OR $INTEGER $NIL
(*CONS $INTEGER
(*OR (*AND $INTEGER (*RECUR FOO))
(*AND (*CONS *UNIVERSAL (*CONS *UNIVERSAL *UNIVERSAL))
(*RECUR FOO))))))
’FOO)
=
(*OR $INTEGER $NIL (*CONS $INTEGER $INTEGER))
The processing of *AND forms makes possible the accurate signification of many ordinary functions. For
instance, the RECOGNIZERP function applied to:
(DEFUN PROPER-CONSP (X)
(IF (CONSP X)
(IF (CONSP (CDR X)) (PROPER-CONSP (CDR X)) (EQUAL (CDR X) NIL))
NIL))
4.4.6 DERIVE-EQUATIONS
Perhaps a better name for this function would have been SCHEMATIZE-BODY. DERIVE-
EQUATIONS transforms the body of the function into a table, with one entry in the table for each
subexpression in the body. Each entry is of the form
(<form> <marker> <type alist> <method>)
where <form> is the Lisp form being analyzed, the <marker> is a gensym used for table lookup, the
<type alist> characterizes what we know about the types of the variables in the context of the
subexpression, and the method is a schema which provides direction in computing the type of the
subexpression.
We perform some type inference computations while constructing this table. Whenever an IF is
encountered, we extract what type information we can from the predicate, using TYPE-PREDICATE-P,
and merge it into distinct type alists for the THEN and ELSE arms, utilizing UNIFY-DESCRIPTORS. Of
course, we can merge into the ELSE arm only if TYPE-PREDICATE-P tells us its type alist is negatable.
The methods are schematic aids to the computation of the result type for each expression. They vary for
different kinds of expressions. For identifiers, the method is simply the type descriptor for the variable in
the type-alist. For quoted forms, the method is the descriptor directly derived from the form, using the
function DESCRIPTOR-FROM-QUOTED-FORM. For IF forms, the method is a list of length two which
contains the index markers pointing to the table entries for the forms in the THEN arm and the ELSE arm
of the IF. For function calls, the method is a list of length two whose first element is a list of index
markers pointing to the entries for its actual argument forms, and whose second element is a new type
variable which will represent the result type for the function call when we "solve the equation"
represented by this table entry.
DERIVE-EQUATIONS produces the following table representing the body of this function.
foo
Each subexpression has a unique marker. Notice that the type-alist for the second occurrence of X (the
one marked with *MARKER-4) maps X to the descriptor
(*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T)
rather than the variable in the outermost type-alist. This is because this occurrence of X is governed by
the (ATOM X) test with a positive result. Thus, X is an atom, and we can rule out any possibility of it
being a *CONS. The opposite is true in the third occurrence (*MARKER-5). Since we are in the ELSE
arm, we know the atom test failed, so the type-alist entry for X has been refined to be
(*CONS *UNIVERSAL *UNIVERSAL).
Notice also that the method for the first X, &4, is just the type-alist entry for X. In the method for
(ATOM X), i.e., ((*MARKER-2) &5), *MARKER-2 points to the argument, which is the X from the first
table entry. &5 is a new variable which will represent the type of (ATOM X). In the method for the IF
form, (*MARKER-4 *MARKER-7), *MARKER-4 points to the table entry for the THEN arm, X, and
*MARKER-7 to the entry for the ELSE arm, (LAST (CDR X)).
When we discuss SOLVE-EQUATIONS below, we shall see how this table is employed as a basis for the
type computation.
4.4.7 SOLVE-EQUATIONS
The highest level view of SOLVE-EQUATIONS is as follows. We traverse the table of EQUATIONS,
computing the type(s) of each subexpression. For each subexpression, we produce a list of "segments",
where a segment maps the types of the variables in the environment to a result type. The form of a
segment is:
(<desc1> .. <descn>) -> <desc>
where <desc1> .. <descn> represent the types of the variables at this point (as opposed to the types of
actual parameters in the subexpression function call), and <desc> represents a possible type of the
subexpression, assuming the variable types. There can be more than one segment for a given
subexpression because different combinations of variable types may produce different result types.
Ultimately, the set of segments for the outermost subexpression, i.e., the function body, will become the
segments of the signature for the function.
As SOLVE-EQUATIONS passes through the equations, it builds a table of results, one entry for each
equation. An entry in this table is the marker for the equation CONS-ed onto the segments computed for
the equation. The markers serve as keys for looking up previously computed segments. Let us call this
table PARTIAL-SOLUTION.
Recall the form of an equation, given in the previous section. How we solve an equation depends on the
method stored as a component of the equation. Some cases are easy. If the expression is a variable, the
method is simply the descriptor for the variable in the type-alist saved in the equation. The segment
produced simply maps the variable types from the type alist to the descriptor for this variable, which was
conveniently stored as the method. There is one exception to this rule. If the descriptor for the variable is
an *OR descriptor, we produce one segment for each disjunct of the *OR. This gives us much greater
specificity in our results. Thus, if we encounter a reference to Y in a context where our type-alist is
((X . (*OR $INTEGER $NIL)) (Y . &1))
If the expression is a literal (a quoted form or a base value from a primitive type), the method is the
descriptor describing the literal. SOLVE-EQUATIONS produces a segment mapping the argment
descriptors to the literal descriptor.
If the expression is an IF expression, the method is a list of two markers. The first marker points to the
segments for the THEN arm in PARTIAL-SOLUTION, and the second marker points to the segments for
the ELSE arm. These, of course, have already been computed because the table is in Lisp evaluation
order (as if IF were strict). The segments for the IF are simply the appended segments for the THEN and
ELSE.
The most interesting case is the function call. We have already computed the segments for each of its
arguments. We look these segments up by using the markers in the method for this function call. Recall,
a function call method is a list of length two whose CAR is a list of the same arity as the parameter list for
the function being called, where the elements of the list are markers pointing to the segments for each
actual argument in the PARTIAL-SOLUTION. We need to formulate all the possible distinct type-alists
Type Checking for Common Lisp 61
The Implementation of the Inference Algorithm
provided by these segments. This is essentially a cross-product operation, which is necessary because the
contexts in the various segments do not necessarily match. For example, where there are two variables, X
and Y, in the environment, the segments for the first argument might be:
(((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> $INTEGER)
(($NIL
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> $NIL)
(Read this to say that if X is a non-empty integer list and Y is an integer list, the first argument may be an
integer. If X is NIL and Y is an integer list, the first argument may be NIL.) For the second argument, we
might have:
(((*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))
(*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
-> $INTEGER)
(((*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))
$NIL)
-> $NIL)
We need to factor these segments into one another to get the distinct type environments which will
produce the various argument patterns to our function call. As usual, UNIFY-DESCRIPTORS is the
combinator which is used to merge and factor the environments. Each of the contexts from the segments
for the first actual argument are unified pairwise with each of the contexts from the segments for the
second argument. The result of each such unification is paired with the list formed by the respective result
descriptors of the original segments, to form what we will call an arg-solution. Each arg-solution maps a
context of type descriptors for the variables in the environment to the types computed for the actual
parameters of the function call. In our example, the environments, and the arg-solutions they produce,
are:
(((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
-> ($INTEGER $INTEGER))
(((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
$NIL)
-> ($INTEGER $NIL))
(($NIL
(*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
-> ($NIL $INTEGER))
Collectively, these arg-solutions represent all the type scenarios to be treated on the function call.
62 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
Now that we know which argument patterns we will submit to our function, we are ready to see what
result each may yield. We take each arg-solution in sequence, performing the process described below.
The first task is to see if we can detect the possibility of a guard violation on the called function for our
argument pattern. Another way to phrase the question is to see whether the guard would further constrain
what we know about our parameters. If it would, then we will judge we have a guard violation.
Following this line of reasoning, we note that descriptor unification is a constraining operation. We apply
UNIFY-DESCRIPTORS to the function guard, fetched from the database, and our argument pattern. If
the resulting descriptor is identical to our original argument pattern, then we have satisfied the guard.
Since we dwell in a world including type variables, we do not require the result be identical, only that it be
isomorphic, i.e., identical modulo a 1-1 and onto substitution of variables for variables. If UNIFY-
DESCRIPTORS returns an *OR of possible unified forms, each must be isomorphic. If we do not have
this isomorphism, we declare a guard violation and halt the computation.19
Recall once again that the fact we declare a guard violation does not necessarily mean the violation can
occur. It merely means that the inference system cannot guarantee that it will not. Some other kind of
proof may be in order to verify the guard. Conversely, the fact that we do not signal a guard violation
does not mean that one will not occur. Our claim that there is no guard violation is only solid when the
guard descriptor we have stored for the called function is complete. Even this claim is qualified. Since
this algorithm is not formalized in any sense, we cannot make any formal argument to back up the claim.
With guard verification out of the way for our arg-solution, we can proceed to derive its possible result
types. The second element in the method for our equation was a type variable which represents the result
type of the function call. We construct a list of descriptors consisting of the argument descriptors from
our arg-solution with our result variable added to the end. Then we fetch the segments for the function
being called and massage each segment into a similar list, replacing all variables in the segment with fresh
ones. We pair each of our arg-solution vectors with each of the segment vectors, applying UNIFY-
DESCRIPTORS. UNIFY-DESCRIPTORS returns *EMPTY if an argument pattern will not unify with
the descriptors representing the formal arguments for any segment. This means simply that the segment is
irrelevant to our pattern, indicating that it corresponds to a control path through the called function which
will not be followed on arguments of the type we are supplying. If we get a non-*EMPTY result,
however, our result variable will have been bound to a type descriptor which constitutes a projection of
our arguments through the segment to its result. For example, consider calling the function CAR with an
argument of type
(*CONS (*OR $INTEGER $NON-INTEGER-RATIONAL) *UNIVERSAL)
Let us suppose our result variable is &2. The signature segments for CAR are:
($NIL) -> $NIL
(*CONS &1 *UNIVERSAL) -> &1
with ($NIL $NIL), we get *EMPTY, because $NIL will not unify with our *CONS. But in the second
segment, unifying against
19
This isomorphism test is somewhat awkward. A more natural formulation would result from extending the containment
algorithm used in the signature checker to support the full language of type descriptors used in the inference algorithm.
Type Checking for Common Lisp 63
The Implementation of the Inference Algorithm
with (*OR $INTEGER $NON-INTEGER-RATIONAL) having been substituted for our result variable
&2. Thus, we know that our call to CAR can have a result of type
(*OR $INTEGER $NON-INTEGER-RATIONAL). We construct a segment which maps the types of the
variables in our current environment, i.e., the types from the first part of our arg-solution, to our computed
result type. We get one such segment for each segment we match from the called function’s signature (in
this case, there was just one match). We perform this operation for each of the arg-solutions from our
original list. Finally, we tag our collection of segments with the marker for this subexpression,
performing some canonicalization to reduce the number of segments, and add the result to PARTIAL-
SOLUTION, for later reference.
Of course, things are not as simple as in the case just described. There is the question of dealing with
recursive function calls, which we will address later. But there is another significant detail to add to the
account of the non-recursive call. It can be illuminated by a variation on our previous example. Suppose
in our environment, the variable X was a list of integers, i.e.,
(*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))
For simplicity, let us say X is the only variable in the environment, and we are examining the call
(CAR X). Our arg solution, from the X subexpression entry in PARTIAL-SOLUTION, is just:
(((*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
i.e, in an environment where X is an integer list, the expression X produces an integer list. The guard for
CAR,
(*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL))
is satisfied. Now, we unify against the segments for CAR. Both match. If we simply followed the
procedure just explained, we would wind up with the segments:
((*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> $NIL
and
((*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> $INTEGER
Clearly, this is not optimal, as the real result will be NIL only if X is NIL, and will be an integer only if X
is a non-empty integer list. We would rather have the segments:
($NIL) -> $NIL
((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
-> $INTEGER
64 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
We can think of this as being a kind of feedback from the argument expressions to the type binding
environment for our variables. This feedback does not always occur. The arguments to a function call
may or may not be variables (or variable components). They may just be some expression which is
computed from some other expression, etc., with the variables appearing deeply buried, if at all. The
problem is determining whether and how a match of different segments for the called function reflects a
type distinction in a local variable. We can examine the subexpression arguments and determine some
circumstances when such a split on segments clearly does reflect this distinction. Specifically, we give
special treatment to arguments which are variables or components of variables, accessed by a nest of
applications of CAR and CDR to the variable.
A previously mentioned feature of UNIFY-DESCRIPTORS is the key. Recall, a restrictions list is one of
the forms returned by UNIFY-DESCRIPTORS. If a form within one of the input descriptors is marked
with an annotation which associates it with a type variable, and if that subform is restricted in the course
of unification, then an entry is made in the restrictions list which maps the variable to its restricted form.
For example, if we evaluate:
(UNIFY-DESCRIPTORS ’(*CONS *UNIVERSAL (*SUBST &1 (*OR $INTEGER $NIL)))
’(*CONS $INTEGER $NIL)
T)
where ((&1 . $NIL)) is our restriction list. This signifies that (*OR $INTEGER $NIL), which was
annotated as being associated with &1, was restricted to $NIL.
We exploit this feature when matching segments by annotating, with a *SUBST employing a fresh type
variable, any descriptor denoting the type of a Lisp variable or a variable component. Furthermore, we
maintain a data structure which enables us to locate, within the descriptor for the variable, which
component of the variable was passed in the parameter list. Then, if the unification returns with a
restriction on that variable, when we form the segment mapping our variable types to our result type, we
replace the appropriate component of the variable type descriptor with the restriction imposed from the
unification. This enables us, in the example which yielded imprecise results with the naive
implementation, to formulate exactly the segments we desire.
As currently implemented, this technique only works with nests of applications of CAR and CDR.20 We
can think of these functions as being "destructors". A nice upgrade to the implementation would be to
develop a generalized notion of destructor functions, so that new ones can be added to the class as they are
defined and, in turn, receive the benefit of this hook in the implementation.
20
Actually it works with a collection of functions defined in an extended base of Common Lisp, such as CADR, CDDR, etc., and
FIRST, SECOND, etc. which are defined to be CAR and CDR nests. This collection is passed to SOLVE-EQUATIONS as the
formal parameter UNARY-DESTRUCTOR-FN-MAP.
Type Checking for Common Lisp 65
The Implementation of the Inference Algorithm
and the guard for BAR, which we will assume is complete, is:
((*OR $INTEGER $NIL *NON-INTEGER-RATIONAL)
(*OR $INTEGER $NIL *NON-INTEGER-RATIONAL))
Now we encounter (CAR X). Since X is the only argument, and its entry in PARTIAL-SOLUTION
contains only one segment, we produce only one arg-solution,
(((*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))
(*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))
((*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
We grab the result from this arg-solution and use it to construct the form to match against the segments for
CAR. We use the *SUBST form to annotate the type representing the occurrence of X. We use the
*DLIST descriptor form to signify that a list of descriptors is being given to DUNIFY-DESCRIPTORS.
The arguments to DUNIFY-DESCRIPTORS are:
(*DLIST (*SUBST &3 (*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER
(*RECUR INTEGER-LISTP)))))
&2)
where the latter is from the first segment in the signature of CAR. &2 in the first argument is the variable
representing the result type. UNIFY-DESCRIPTORS returns:
((*DLIST (*SUBST &3 $NIL) $NIL) ((&2 . $NIL)) ((&3 . $NIL)))
66 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
Thus, our result variable &2 has been replaced with a result descriptor, $NIL, and the restriction on &3
tells us that we need to restrict X to $NIL when we construct our segment for this computation.
Unifying against the other segment for CAR goes much the same way, and the segments we finally
produce for (CAR X) are:
((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> $INTEGER
($NIL
(*REC INTEGER-LISTP (*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> $NIL)
These also happen to be the segments for (CAR Y). So what are the potential argument patterns we
present to BAR? We take the segments for the two arguments to BAR, (CAR X) and (CAR Y), and
perform our cross-product computation to determine the set of arg-solutions:
((((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
($INTEGER $INTEGER))
(((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
$NIL)
($INTEGER $NIL))
(($NIL
(*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
($NIL $INTEGER))
Thus, we will present four patterns to match against each of BAR’s segments: 1) two integers, 2) an
integer and NIL, 3) NIL and an integer, and 4) two NILs. In each case, both arguments will be annotated
as being components of their respective variables, X and Y, and we will maintain a data structure which
will enable us to reflect these patterns back into the type-alist entries for X and Y. Each argument pattern
satisfies the guard for BAR, i.e., the guard, when unified with the argument pattern, does not restrict it.
($INTEGER $INTEGER) matches only against the first segment, producing the result type
(*OR $INTEGER *NON-INTEGER-RATIONAL). We map the context from this arg-solution to this
Type Checking for Common Lisp 67
The Implementation of the Inference Algorithm
Moving along to our next arg-solution, ($INTEGER $NIL) matches only the third segment, producing the
result type $NIL, etc. Our final collection of segments for (BAR (CAR X) (CAR Y)) is:
(((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
-> (*OR $INTEGER $NON-INTEGER-RATIONAL))
(((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
$NIL)
-> $NIL)
(($NIL
(*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
-> $NIL)
Since this is the outermost form of the body, and since this is a non-recursive function, we are done. We
are confident that FEEDBACK has no guard violations, since its guard is complete and the guards for
INTEGER-LISTP, CAR, CDR, and BAR are complete, and we detected no guard violations in either the
guard21 or the body. Its signature is as above.
So how do we deal with recursive functions? Our computation of result types of subexpressions depends
on the existence of a signature for the called function, and for a recursive function call we are only in the
process of computing that signature. Indeed, we have no hint of the signature initially. To deal with this
lack of information, we employ an algorithm which computes an iterative approximation of the signature,
where one iteration consists of making one pass through our table of equations.
To set the context for a more detailed discussion, recall the overview presented earlier:
During the first pass, we have no segments to match against recursive function calls, so we know nothing
about the results of those calls. However, this does not prevent us from producing a first approximation of
our segments on this pass. On the second pass, we can use this approximation to produce a refined
approximation. Eventually, we hope that the approximation we produce will be equivalent to the one we
produced on the previous pass. When this happens, the algorithm has stabilized. We canonicalize our
segments into a suitable form, and we are finished.
Ensuring that we have reached stability is non-trivial, however. The function is recursively composing
21
We did not illustrate how we established the absence of guard violations in the guard for FEEDBACK, but the process is
identical to that for checking it in the function body. If the guard is a negatable type predicate, the checks in TYPE-PREDICATE-P
are sufficient. Otherwise, the regimen for checking general function bodies is applied.
68 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
structures which contribute to the result. If we proceed naively, these structures will just grow with each
iteration, our segments will reflect that growth, and nothing will ever stabilize. Therefore, we employ a
technique for determining when these structures are taking on a form which can be described with a closed
recursive descriptor. With such a closed recursive descriptor, the expansion and folding which occurs
during an iteration of the algorithm will produce the same result we had in the previous pass, thus leading to
stabilization.
We use a new descriptor form, the *FIX descriptor, as the basis for this technique. The form of a *FIX
descriptor is:
(*FIX (*DLIST <desc1> .. <descn>) <fix body form> )
where <fix body form> is an arbitrary descriptor, possibly containing another *FIX descriptor, or more
importantly, yet another new form:
(*FIX-RECUR (*DLIST <desc1> .. <descn>))
embedded within it. The interpretation of a *FIX descriptor is the same as the interpretation of a *REC
descriptor, with (*DLIST <desc1> .. <descn>) playing the role of the <recname> and *FIX-RECUR
playing the role of *RECUR.
We construct a *FIX form as a result descriptor for any recursive function call. The descriptors used to
fill in the (*DLIST <desc1> .. <descn>) are the descriptors for the argument types, one for each actual
parameter. The body is the disjunction (see below) of the result types computed from unifying with all the
segments from our current approximation of the signature. We call the segments of our most recent
approximation the working segments. On our first pass, when we have no working segments, the
<fix body form> will be the descriptor *EMPTY. On later passes, the disjunction may include references
to other *FIX descriptors. This would indicate that the result from a previous recursive call appears in the
result for the current one.
We employ the *DLIST in a *FIX form as we do the label in a *REC descriptor. It represents the point of
recursion in the form, and if we are in a form with multiple *FIXes, we will know which *FIX to reflect
back to. Having *FIX be distinct from *REC allows us to employ some special-purpose heuristics, and
the information in the (*DLIST <desc1> .. <descn>) will be critical to figuring out when to close off the
developing recursive form, as we shall see.
When we construct a new *FIX form, we attempt to make an important canonicalization with respect to
embedded *FIX forms. (See Canonicalization Rule 34 in Section 4.4.2.) If we have an embedded *FIX
form with the same descriptors in its *DLIST component as in the outer *FIX, we judge that we have
discovered a recursion point in our descriptor. An example of such a form is:
(*FIX (*DLIST (*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*OR $NIL
(*CONS $NON-T-NIL-SYMBOL
(*FIX (*DLIST
(*REC INTEGER-LISTP
(*OR $NIL
(*CONS $INTEGER
(*RECUR INTEGER-LISTP)))))
*EMPTY))))
Notice that the <fix body form>s do not match in any way. But, the outer <fix body form> is always at
least as good an approximation of the result of a recursive function call as the inner <fix body form>,
when the arguments are typed according to the *DLIST. Thus, the inner <fix body form> is not
Type Checking for Common Lisp 69
The Implementation of the Inference Algorithm
particularly helpful. Moreover, this is an effective point at which to form a closure from our developing
descriptor, so that it does not just grow indefinitely. We replace the embedded *FIX form with:
(*FIX-RECUR (*DLIST (*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))))
Merely finding an embedded *FIX form with a matching *DLIST does not mean we are done. The next
approximation may be better than this one. On the other hand, the next approximation may be the same as
this one, which would indicate we are stabilizing. We continue iterating until the entire collection of
segments for our function body is identical to the previous iteration’s working segments. When it is, we
declare victory, transform our *FIX descriptors to *REC descriptors, perform some canonicalizations, and
emerge with a signature for the recursive function.
A TRUE-LISTP is a CONS structure whose last CDR is NIL, and is characterized by the descriptor:
(*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
We proceed as with a non-recursive function until we reach the recursive call. At this point, our type alist
is
((X . (*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(Y . &8))
and for Y:
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
&8)
-> &8)
70 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
Since the contexts in these two segments match, we have only one arg-solution, mapping the context to
the argument list
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
&8)
There are no segments for APPEND with which to match, so our best-guess result type is *EMPTY. Our
segment, then, mapping the types of X and Y to our *FIX form, is:
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
&8) ->
(*FIX (*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
*EMPTY))
Notice that we replaced the variable &8 with *UNIVERSAL in the *DLIST. This is because an artifact of
the implementation can cause a variable in one iteration to appear with a different name in a succeeding
iteration. Thus, leaving the variable in the *FIX *DLIST could cause the folding operation we perform at
the end of each iteration through the equations to fail. Converting the variable to *UNIVERSAL
circumvents this problem.
Having constructed this partial solution for our recursive call, we complete the first iteration through the
equations and produce the first pass working segments:
((($NIL &8) -> &8)
(((*CONS &10 (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
*UNIVERSAL) ->
(*CONS &10
(*FIX
(*DLIST (*REC TRUE-LISTP
(*OR $NIL
(*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
*EMPTY))))
The first segment came from the (NULL X) branch of the function, where we return Y. The second
segment is from the other branch, where we CONS the CAR of X, represented by the type variable &10,
onto the result of the recursive call, our *FIX descriptor.
On our next iteration, when we hit the recursive call, and find that our argument pattern matches both
these segments. The body of our *FIX descriptor is therefore an *OR whose first disjunct is the variable
representing Y and whose second is the *CONS with the nested *FIX from the second segment. Thus, we
construct the following *FIX descriptor for the value returned on the recursive call.
(*FIX (*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
(*OR &8
(*CONS &10
(*FIX
(*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL
(*RECUR TRUE-LISTP))))
*UNIVERSAL)
*EMPTY))))
Type Checking for Common Lisp 71
The Implementation of the Inference Algorithm
Now we notice that within this *FIX we have another *FIX with the same parameter *DLIST (but a less
precise body). This is our signal to replace the nested *FIX by a *FIX-RECUR, giving us:
(*FIX (*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
(*OR &8
(*CONS &10
(*FIX-RECUR
(*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL
(*RECUR TRUE-LISTP))))
*UNIVERSAL)))))
On our next pass, our recursive call initially yields the segment:
(*FIX
(*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
(*OR &8
(*CONS
&10
(*FIX
(*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL
(*RECUR TRUE-LISTP))))
*UNIVERSAL)
(*OR &8
(*CONS &10
(*FIX-RECUR
(*DLIST (*REC TRUE-LISTP
(*OR $NIL
(*CONS *UNIVERSAL
(*RECUR TRUE-LISTP))))
*UNIVERSAL))))))))
Again, we find a nested *FIX with the same *DLIST as the outer one, so we replace it with the
*FIX-RECUR form to get:
(*FIX (*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
(*OR &8
72 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
(*CONS &10
(*FIX-RECUR
(*DLIST (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL
(*RECUR TRUE-LISTP))))
*UNIVERSAL)))))
This is the same segment we got on the previous iteration. It should come as no surprise, then, that the
new working segments we get when we complete this iteration are the same as the ones for the previous
iteration. This signals that we have achieved stability.
Now all that remains is to convert our *FIX descriptor into a *REC descriptor. This is a straightforward
process of generating a new rec-name, replacing our *FIX-RECUR with a *RECUR with that rec-name
label, replacing the *FIX label with *REC, and replacing the *DLIST with the rec-name, and (sadly)
replacing any variables which appear in replicating components of the new *REC descriptor with
*UNIVERSAL. This replacement is necessary because a variable replicated in the repeated opening of a
*REC descriptor, if interpreted in the formal semantics, INTERP-SIMPLE, would signify a replicated
value, and this is clearly not what is signified here. But we say the replacement with *UNIVERSAL is
sad because we could preserve some additional information if we replaced the variable instead with a
different kind of variable, one which could be instantiated with another type descriptor. For APPEND (as
we shall see), such a variable would allow the signature for APPEND to be such that if we called
APPEND with objects which were both integer lists, we could deduce that the result was an integer list.
As it is, we will know only that the result terminates in an integer list, but the specific knowledge about
the types of the elements of the first argument will be lost in the result. An excellent enhancement to the
tool would be addition of this new kind of variable to the formal semantic model.
But, regrets aside, SOLVE-EQUATIONS yields the segments of our final signature:
(($NIL &8) -> &8)
Just as with a *REC descriptor, a *FIX descriptor with no *FIX-RECUR is equivalent to its body. This
allows *FIX forms with no recursion to be reduced to simple forms. Thus, with a recursive function like:
This same property also lets us contain the treatment of recursive calls where the arguments are grounded,
non-recursive forms. For example,
(DEFUN FOO (X)
(IF (EQUAL X 3)
3
(IF (NULL X)
NIL
Type Checking for Common Lisp 73
The Implementation of the Inference Algorithm
(IF (ATOM X)
(FOO NIL)
(IF (EQUAL (CDR X) #\a)
(CONS (FOO (CAR X)) (FOO (CDR X)))
(CONS (FOO (CAR X)) (FOO 3)))))))
In this function, the only recursive calls on recursively-typed arguments are the instances of
(FOO (CAR X)), and these are the only ones which engender *REC forms in the result.
One intuitive question which might arise in evaluating an algorithm like this is consideration of where it
loses information. One notable place is the point when, after matching our argument pattern against the
working segments on a recursive function call, we generate a single segment whose result type is the
disjunction of the result types from the matching segments. In the non-recursive case, were we to match
more than one segment, we would produce more than one segment, possibly splitting on the types of the
context variables. For recursive calls, we prefer to leave the context intact and produce a single, merged
result. Although we are losing some specificity, this merging has a very sensible connection to recursive
types. Functions composing a recursive result typically split their actions into some base case, in which
they return a ground term, and a recursive case, in which they compose upon the result of a recursive call.
This structure is typically mirrored in recursive type descriptors, which contain a disjunction where some
disjuncts are non-recursive and others contain embedded *RECUR points. By forming our disjunctive
descriptors when we match multiple segments on a recursive call, we are letting the function structure
guide our construction of recursive descriptors.
An unfortunate aspect of the choice to take the disjunction in this manner is that, for some functions, it is a
very poor strategy. It is most effective for functions which are building recursive structures as they
traverse their input, like APPEND, and for functions which return a result whose type is not highly
dependent on the input type structure, like LEN. For functions, like recognizer functions, which return
simple objects whose type is highly dependent on the input type structure, this method results in analysis
which probes only one level into input parameter types. For instance, consider the TRUE-LISTP
function, modified to return ’OK and 0 instead of T and NIL.
(DEFUN SAD-BUT-TRUE-LISTP (X)
(IF (NULL X) ’OK (IF (CONSP X) (SAD-BUT-TRUE-LISTP (CDR X)) 0)))
74 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
Clearly, this is less satisfactory than the perfect segments we get for TRUE-LISTP itself, the result of its
treatment as a recognizer:
(((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
-> $T)
(((*REC !REC1
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL (*RECUR !REC1)))))
-> $NIL)
This is simply another case where it would be possible to refine the algorithm to get more accurate results.
A nice enhancement would be to make some heuristic choice about whether we would try to focus on
building *FIX forms in the context portion of the segments rather than in the result type. If this were
possible, then we could perhaps produce as strong a signature in the general case as we do for recognizers.
As it stands, given the limits on time spent developing and implementing the algorithm, it seemed a
practical choice to let the general algorithm focus on building *FIX forms in the result type, since a large
portion of the functions for which we might wish to focus on the arguments are, in fact, recognizers, and
we have very effective methods for dealing with them.
4.4.8 INFER-SIGNATURE
INFER-SIGNATURE is the top-level function in the type inference algorithm. Given a function and the
database of signatures, it returns a new signature, consisting of the guard descriptors, a flag saying
whether the guard is complete, a flag saying whether all functions in the call tree of the function have
complete guards, the list of segments for the function, and an item which, if the function is a recognizer, is
the descriptor characterizing the type which the function recognizes, NIL otherwise.
If the function is not a recognizer, INFER-SIGNATURE makes sure the guard does not call the function
recursively, since this would be ill-formed, and then invokes PREPASS on both the guard and the body.
Next it invokes DERIVE-EQUATIONS and SOLVE-EQUATIONS on the guard form to ensure there are
no guard violations in the guard. Either a guard violation or a recursive call in the guard results in failure.
If neither is detected, INFER-SIGNATURE invokes TYPE-PREDICATE-P on the prepassed guard form
to construct the vector of descriptors characterizing the guard. The interpretation of this vector is that if
the guard evaluates to a non-NIL value, the descriptors will characterize the parameters, and if the guard
descriptors are complete, satisfaction of the guard descriptors by the actual parameter descriptors
guarantees satisfaction of the actual guard. Again, there is no formal argument behind this claim, though
the checker will attempt to establish its own version when its time comes.
These descriptors are then used as the initial type-alist for invoking DERIVE-EQUATIONS and SOLVE-
EQUATIONS on the prepassed function body. Unless that algorithm detects a guard violation, it will
Type Checking for Common Lisp 75
The Implementation of the Inference Algorithm
produce the segments which will then be stored in the signature, along with the guard descriptors, the flag
(computed by TYPE-PREDICATE-P) saying whether the guard was complete, the flag easily computed
by checking the guard-complete flags for all functions called in the body, and the NIL value for the
recognizerp flag.
76 Type Checking for Common Lisp
The Implementation of the Inference Algorithm
Type Checking for Common Lisp 77
The Formal Semantic Model
Chapter 5
THE FORMAL SEMANTIC MODEL
This chapter gives a bottom-up construction of the formal basis for the specification and proof. We start
by defining the type descriptor language which will be formalized, and by giving well-formedness
predicates for type descriptors. Then we present the well-formedness predicates for signatures. Next, we
provide a simple Lisp evaluator, E, which is used in the specification and proof to provide the value of a
Lisp form in a given binding environment. Then, we give a formal semantics for type descriptors and
finally a full statement of the formal semantics of function signatures. The presentation is designed to
reveal concepts distinctly and to motivate an understanding of the choices contributing to the chosen
model. At the end of the chapter, there are pointers to discussions of several of the very different semantic
models which were considered before settling on this one.
Before embarking on the formal treatment, we will attempt to characterize the style of what is to follow
and to provide some motivation for the choices which led to this style.
An applicative subset of Common Lisp is the language in which the formal semantics is cast. It is also the
language used in the implementation of the algorithm, and it is therefore a comfortable language in which
to cast functions of the algorithm for formal analysis.
A carefully chosen applicative subset of Common Lisp could be used as a completely formal logic [Boyer
90]. The evaluator E, which we will introduce later in this chapter and which provides a semantics for the
Lisp subset supported by the type system, is one possible semantic basis for using Common Lisp in this
manner.
But the proof we will present is not a formal proof. Our use of Common Lisp as the vernacular for the
formal analysis is not completely formal. For example, non-total functions are used in the discussion.
Most of the functions introduced in this chapter which are a part of the formal semantics are stated with
guards, in the same style used in the supported Lisp subset, so that the statement of their proper domain is
formal. But later on when we introduce functions representing components of the checker algorithm, we
do not provide guards. Rather, we accompany the definitions with textual accounts of what is assumed
about the function arguments. Moreover, two functions introduced in the formal semantics as SUBRs are
defined in English rather than formally. These are WELL-FORMED-WORLD, which is a guard on one
of the arguments to E, and E-SUBRP, which is the subsidiary of E which evaluates functions which are in
the initial library of primitive functions. Another informal aspect of the analysis is that some of the
functions representing components of the algorithm do not terminate on all arguments. This is pointed out
where appropriate, but the ramifications of having non-terminating functions in a formal setting are not
78 Type Checking for Common Lisp
The Formal Semantic Model
elaborated, beyond claiming that we have only partial correctness proofs. Finally, we have not been
explicit about the machinations of Lisp as a logic supporting the type system, as we have with the
machinations of E as an interpreter for the language supported by the type system. In all these senses, the
reader is expected to fill in the gaps, which is certainly not characteristic of formal proof.
Yet we go on to conduct the proofs with great rigor, exploring cases exhaustively and writing our
accounts in extensive detail, in a manner which is more consistent with styles of formal proof. Though we
have tried to emphasize important points textually, the reader still risks having his view of the proof
obscured by detail. So by not treating Common Lisp as a formal logic, but by presenting our proof in
rigorous detail, we seem to be sacrificing both the expressive ease of informal proof and the absolute
certainty of formal proof. What motivated this choice of style?
Common Lisp linguistically unifies the discourse of the entire problem. As a powerfully expressive
programming language, it was well-suited to developing the prototype implementation, which was an
invaluable exploratory tool in developing the algorithms. And as previously stated, Common Lisp can be
used as a formal logic, and it is uniquely suited to seamless discourse about Common Lisp functions. The
researcher’s personal comfort with Lisp played no small factor; the type inference problem itself was quite
enough with which to grapple.
When we embarked on the proof, we were genuinely uncertain of the correctness of the algorithm. Since
the checker is a computational algorithm with a great deal of case analysis and sometimes tricky detail,
there was reason to suspect errors, though we had no particular areas of suspicion. Only by extensive, if
not exhaustive, exploration of the details could we raise our level of assurance of the soundness of the
algorithm. This level of concern was borne out in several instances. In one case, a significant error in the
checker’s unification algorithm was overlooked in an apparently plausible proof sketch laid out at the
level of a typical informal proof. Only by later fleshing out the proof with rigorous detail did we uncover
the error. In another case involving the containment algorithm, an early draft of the proof addressed a
particular detail insufficiently to satisfy one reader. Had the proof not already been expressed in rigorous
detail, arguably the point in question might not have become apparent to the reader. Though in the end no
error was found in the algorithm, further exploration of the problem revealed questions which merited
careful consideration. In short, regardless of the level of formality, exhaustive rigor was necessary to give
us the level of confidence in the algorithm which we sought. And since this thesis was to represent a
complete account of the work rather than a summary, we felt an obligation to present the work performed
in essentially the style with which it was conducted.
While adopting this level of rigor, why did we not take the next step to formality? The humble graduate
student feels himself to be extremely fallible. Given the nature of the system being proved, we arguably
believe that the only sure path to a formal proof would be with mechanized assistance. Without it, the
proof would suffer from the tedium of formality compounding the tedium of rigor. With the human
tendency in the face of mountains of detail to overlook cases, and more importantly, to lose track of the
central ideas being questioned, we feared that attempting what would amount to a human-generated
mechanical proof would border on folly. Yet to have embarked on a mechanically assisted proof, perhaps
with Nqthm [Boyer & Moore 88], seemed no less a folly. One of the precepts of mechanical proof as it is
practiced with current technology is that the human needs to have the conceptual proof firmly in mind
before seeking mechanized assurance. As previously stated, this was clearly not the case. And we could
see that the level of effort necessary to reach this point and then execute a mechanical proof would well
exceed the time available.
In essence, the level of assurance we received by using this style of proof was greater than we could have
achieved with a high level, informal proof, and the commitment of effort was significantly more modest
Type Checking for Common Lisp 79
The Formal Semantic Model
Moreover, it is far from clear that this was the appropriate point in this research effort to conduct a
mechanical proof. The system as it stands occupies one stage in its evolutionary development. We knew
at the time we embarked on the proof that there were many ways we might wish to later enhance the
system. Our goal was to establish a solid proof of concept which both established the validity of the type
system as currently developed and which laid the groundwork for later formal proof. From this basis we
could decide upon the next commitment, choosing from options like making further enhancements to the
system in support of its current scope, enlarging its scope to include other language constructs, and
re-implementing the system in the context of a more general proof environment. In any case, we thought
it prudent to hold off the major commitment of performing a mechanized formal proof.
Nevertheless, one of the ultimate goals of this effort, which has not yet been achieved, is a formal,
mechanically checked proof of the soundness of this system, possibly with Common Lisp as the formal
logic. As we sought, we have built a significant bridge to this end. Beyond a high-level, informal proof
of the system, we have produced an extensive, very detailed, albeit informal sketch of a formal proof.
With it, we can better evaluate the feasibility of performing a formal, mechanized proof, and the task of
ultimately constructing such a proof on this basis will be much less daunting than upon the basis of a
high-level, informal proof.
The following is the grammar for type descriptors with which we will be concerned in our semantic model
and in the checker.22
<descriptor> ::= <simple descriptor> | <variable> | *EMPTY |
*UNIVERSAL | (*CONS <descriptor> <descriptor> ) |
(*OR <descriptor>* ) | <rec descriptor>
Furthermore, <recur descriptor> is constrained so that when it takes the form (*RECUR <rec name>), the
<rec name> must be identical to that associated with the immediately containing *REC form. The
grammar for <recur descriptor> is just the grammar for <descriptor> extended with the *RECUR form.
We sometimes call the <recur descriptor> inside a <rec descriptor> the body of the <rec descriptor>.
Type variables may only appear within those portions of a *REC descriptor which are not replicated
22
Here we do not need to deal with the transient forms generated by the inference tool and relieved internally, i.e., <and
descriptor>, <not descriptor>, <subst descriptor>, *MERGE-FIX-POINT, <fix descriptor>, <fix-recur descriptor>, *ISO-RECUR,
*FREE-TYPE-VAR, and **RECUR-MARKER**. The inference tool descriptor grammar is in Section 4.3.
80 Type Checking for Common Lisp
The Formal Semantic Model
within any disjunct when the descriptor is opened. (See the definition of OPEN-REC-DESCRIPTOR-
ABSOLUTE in Section 5.5.) The following definition will help explain the well-formedness requirement
below.
Definition: Consider a normal form for a *REC descriptor in which
the body is an *OR descriptor, and no *OR occurs within any of
its disjuncts unless it is part of a nested *REC form. I.e.,
all disjunction in the body of the *REC is raised to the top.23
A replicating component, or replicating disjunct of a *REC
descriptor is any disjunct of this top-level *OR which
contains a *RECUR for the *REC. A terminating component
or terminating disjunct is any disjunct which does not
contain a *RECUR for the *REC.
This descriptor is already in *OR-lifted form. $NIL is a terminating disjunct, since it contains no
(*RECUR FOO) form. (*CONS &2 (*RECUR FOO)) is a replicating disjunct. Now consider
(*REC BAR (*OR &1 (*CONS (*OR $INTEGER $T) (*RECUR BAR))))
&1 is a terminating disjunct, and the two *CONS forms are replicating disjuncts.
A well-formedness requirement for descriptors must hold for any descriptors considered formally or
manipulated by the checker. The top-level function defining the well-formedness predicate is WELL-
FORMED-DESCRIPTOR.
Aside from conformance with the grammar above, the following properties must hold:
1. All *REC descriptors with the same name must be identical.
2. Any *RECUR form must have the same label as the nearest enclosing *REC.
3. Any *RECUR form must be nested within both an *OR and a *CONS within the *REC
body. (A *RECUR within an *OR but not within a *CONS is redundant. A *RECUR
within a *CONS but not within an *OR would represent an infinite structure. A *RECUR
within neither an *OR or a *CONS would be vacuous.)
4. A type variable may not appear within a replicating component of a *REC.
As we shall see when we define the semantics for descriptors, little of practical value is lost by imposing
this final requirement.
23
The lifting of *OR in the body is accomplished by repeated application of the reverse of two canonicalization rules:
Rule 8: (*OR (*CONS d1 d2) (*CONS d3 d2)) => (*CONS (*OR d1 d3) d2)
Rule 9: (*OR (*CONS d1 d2) (*CONS d1 d3)) => (*CONS d1 (*OR d2 d3))
Rules 8 and 9 can be applied in reverse because their proofs show their two sides to be equivalent. These rules are given in Section
6.7, and their proofs are presented in Appendix B.6.
Type Checking for Common Lisp 81
The Formal Semantic Model
The function WELL-FORMED-DESCRIPTOR and its subsidiaries capture these notions of well-
definedness. They are given in Appendix G.1.
A naive notion of the semantics of descriptors is that a descriptor defines a set of Lisp values, and a
particular value satisfies the descriptor if it is a member of the set. But a descriptor containing type
variables cannot define a set in the absence of some interpretation for variables. The notion of variables
embedded in our formal semantics is that a type variable represents an arbitrary type containing a single
value. Then the set of values we seek is defined by both the descriptor in question plus a binding list
which maps each variable in the descriptor to some Lisp value. We will explore this more fully as we
develop the semantic model.
Functions which check the well-formedness of a type signature returned by the inference tool are given in
Appendix G.2. The top level function is WELL-FORMED-SIGNATURE.
The state consists of the signatures of a set of function symbols, each associated with a definition. We
will frequently refer to the state by the name FS in the exposition and proof of soundness. In the
implementation code it goes by the name FUNCTION-SIGNATURES. The state is well formed if each
of its signatures is well formed. The function WELL-FORMED-STATE-ASSUMING-ARITY, given in
Appendix G.3, checks the state, mapping over it with WELL-FORMED-SIGNATURE.
Occasionally in the specification and proof, we will call upon an evaluator to provide the value of a Lisp
form in some environment. Here we define that evaluator, which provides a simple semantics for the
evaluation of Lisp expressions. Let the environment be:
ENV: A list of pairs, where the CAR of each pair is a variable
name and the CDR is a value in our Lisp universe. This list
represents the bindings of variables to values in the environment.
Our interpreter is a function E, defined below. E takes a FORM which is a well-formed Lisp form whose
variables are bound in ENV, an environment ENV (as above), a WORLD (as above), and a non-negative
integer value CLOCK. It can return either a value in the data space of our Lisp subset or one of two
values not in that data space: one signifies a break due to the clock being exhausted, while the other
signifies a break due to a guard not being satisfied. We present a statement of the proper domain of E
with a guard form in the same style as that of the Lisp dialect defined by E. The functions referenced in
the guard are presented in Appendix G.4. For comment on this style, see the discussion in Section 5.1.
(DEFUN E (FORM ENV WORLD CLOCK)
(DECLARE (XARGS :GUARD (AND (WELL-FORMED-ENV ENV)
(WELL-FORMED-WORLD WORLD)
(WELL-FORMED-CLOCK CLOCK)
(WELL-FORMED-FORM FORM ENV WORLD))))
(IF (< CLOCK 1)
(BREAK-OUT-OF-TIME)
(IF (AND (CONSP FORM) (ASSOC (CAR FORM) WORLD))
;; Function call
(LET ((ACTUALS (MAP-E (CDR FORM) ENV WORLD CLOCK)))
(IF (BREAKP ACTUALS)
ACTUALS
(LET ((FNCALL-ENV
(PAIRLIS (CADR (ASSOC (CAR FORM) WORLD))
ACTUALS)))
;; Evaluate the guard
(LET ((GUARD-VAL (E (CADDR (ASSOC (CAR FORM) WORLD))
FNCALL-ENV
WORLD
CLOCK)))
(IF (BREAKP GUARD-VAL)
GUARD-VAL
(IF (EQUAL GUARD-VAL NIL)
(BREAK-GUARD-VIOLATION)
(IF (SUBRP (CAR FORM))
(E-SUBRP (CAR FORM) ACTUALS WORLD)
(E (CADDDR (ASSOC (CAR FORM) WORLD))
FNCALL-ENV
WORLD
(- CLOCK 1)))))))))
(IF (AND (CONSP FORM) (EQL (CAR FORM) ’IF))
(LET ((TEST-VAL (E (CADR FORM) ENV WORLD CLOCK)))
(IF (BREAKP TEST-VAL)
TEST-VAL
(IF (EQUAL TEST-VAL NIL)
(E (CADDDR FORM) ENV WORLD CLOCK)
(E (CADDR FORM) ENV WORLD CLOCK))))
(IF (LITERALP FORM)
FORM
(IF (AND (CONSP FORM) (EQUAL (CAR FORM) ’QUOTE))
(CADR FORM)
;; FORM is a variable
(CDR (ASSOC FORM ENV))))))))
REST-VALS
(CONS NEXT-VAL REST-VALS)))))))
(DEFUN BREAK-OUT-OF-TIME ()
;; The floating point number .1 represents an out-of-time break.
.1)
(DEFUN BREAK-GUARD-VIOLATION ()
;; The floating point number .2 represents a guard violation.
.2)
E is a total function. This claim rests on the fact that a termination measure on E goes down on all
recursive calls. This termination measure is defined by a three-tuple composed of CLOCK, the size of the
world necessary to evaluate a term, and the size of the term (the length of its flattened structure). These
are given in order of decreasing significance. The count goes down whenever the clock goes down (i.e.,
on entry to the body of a function). The second count element, the world size, is of use only when we
evaluate the guard in conjunction with a function call. In this case, CLOCK does not go down, and the
size of the guard term may be larger than the size of the function call term. But since the guard form
never calls the function in which it appears, we may remove this function from the world for purposes of
evaluating the guard. Hence this count goes down on the recursive call to evaluate a guard. For all other
recursive calls, the size of the term goes down.
A trivial observation is that, given two calls to E which are identical except for the value of clock, if
neither call returns (break-out-of-time), both calls will return the same value.
84 Type Checking for Common Lisp
The Formal Semantic Model
Here, we will expose the notions in the semantic model in two phases, first providing a model for
variable-free descriptors, and then extending the model to encompass type variables.
As mentioned earlier, a naive notion of the semantics of descriptors is that a descriptor defines a set of
Lisp values, and that a value satisfies a descriptor if it is a member of the associated set. This notion can
be easily cast as a predicate SATISFIES-1, which determines whether a value is in the set defined by a
variable-free descriptor. (SATISFIES-1 TD V) returns T if the value V is a member of the set defined by
the descriptor TD, and NIL otherwise. We will later define a more general interpreter function which
handles variables, but for now, we factor the problem to illustrate the significance of variables.
(DEFUN SATISFIES-1 (DESCRIPTOR VALUE)
(CASE DESCRIPTOR
(*EMPTY NIL)
(*UNIVERSAL T)
($NIL (NULL VALUE))
($T (EQUAL VALUE T))
($NON-T-NIL-SYMBOL
(AND (SYMBOLP VALUE) (NOT (NULL VALUE)) (NOT (EQUAL VALUE T))))
($INTEGER (INTEGERP VALUE))
($NON-INTEGER-RATIONAL
(AND (RATIONALP VALUE) (NOT (INTEGERP VALUE))))
($STRING (STRINGP VALUE))
($CHARACTER (CHARACTERP VALUE))
(OTHERWISE
(CASE (CAR DESCRIPTOR)
(*CONS (AND (CONSP VALUE)
(SATISFIES-1 (CADR DESCRIPTOR) (CAR VALUE))
(SATISFIES-1 (CADDR DESCRIPTOR) (CDR VALUE))))
(*OR (SATISFIES-OR (CDR DESCRIPTOR) VALUE))
;; OPEN-REC-DESCRIPTOR-ABSOLUTE opens a *REC descriptor.
;; For example,
;; (OPEN-REC-DESCRIPTOR-ABSOLUTE
;; ’(*REC TL (*OR $NIL (*CONS *UNIVERSAL (*RECUR TL))))
;; = (*OR $NIL (*CONS *UNIVERSAL
;; (*REC TL (*OR $NIL (*CONS *UNIVERSAL
;; (*RECUR TL))))))
(*REC
(SATISFIES-1 (OPEN-REC-DESCRIPTOR-ABSOLUTE DESCRIPTOR) VALUE))
(OTHERWISE NIL))))))
SATISFIES handles lists of descriptors and values, which will come into play in our semantics for
function segments. SATISFIES-1 handles a single descriptor and value. Thus,
(SATISFIES-1
(*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(2 4 "FOO"))
= T
Type Checking for Common Lisp 85
The Formal Semantic Model
(SATISFIES-1
(*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(A . B))
= NIL
But variables play an important role in the semantics of descriptors and function signatures. The notion of
variables seems well-suited to describing, for instance, that the type of the result of the CAR function,
applied to a CONS object, is the same as the type of the first value in the ordered pair defined by the
CONS. The interpretation of multiple occurrences of a variable as type identity, rather than value identity,
threatened to seriously complicate the formal semantics of signatures. (See Appendix E for the discussion
of a semantic model considered but abandoned.) We were unable to provide a model of type variables
which, in the presence of disjunction, would provide the kind of specificity we required for handling
direct data transfer through a signature (as with CAR and CONS) and yet provide an instantiation
capability which would support the appearance of a variable in a replicating component of a *REC. In the
latter case, we certainly did not want to abide by the constraint that every value associated with the type
variable had to be EQUAL. We chose for the time being to abandon the use of type variables in
replicating components and to adopt a model that multiple occurrences of a type variable represent
multiple occurrences of a single data value. A type variable, then, may have only a single-point
instantiation.
At first glance, it would seem that a variable appearing in a *REC descriptor could be replicated on each
recursion, thus requiring under these semantics that each occurrence represent the same value. This does
not seem useful in most circumstances. We might like a variable in a *REC descriptor to represent
multiple occurrences of some particular type, to be instantiated later. But in our implementation, we have
restricted *REC descriptors so that variables may appear only in non-replicating disjuncts, so replication
on unfolding does not occur. The notion of a variable which can be instantiated with a type is not present,
though we discuss elsewhere (See Section 8.3) that one potentially valuable extension to the tool would be
a new class of variables specifically for this purpose in *REC descriptors.
We can use our SATISFIES-1 function as part of a semantics for descriptors and function segments
including variables. Doing so may further illuminate the intuition behind our type variables. Consider the
type descriptor (*CONS &1 &1). We can factor the satisfaction problem for descriptors with variables
into two parts. For a value X to satisfy (*CONS &1 &1), we would require
(SATISFIES-1 (*CONS *UNIVERSAL *UNIVERSAL) X)
But we would also require the CAR and CDR of X to be equal. So the expanded predicate for satisfying
(*CONS &1 &1) is:
(AND (CONSP X)
(SATISFIES-1 *UNIVERSAL (CAR X))
(SATISFIES-1 *UNIVERSAL (CDR X))
(EQUAL (CAR X) (CDR X))
Thus, to treat descriptors containing variables, we transform them all to *UNIVERSAL for submission to
the SATISFIES-1 predicate, but then we add the requirement that all data objects whose types are
represented by the same type variable within a descriptor or segment must be equal.
But proceeding in this way can become unwieldy. Consider the APPEND function.
(DEFUN APPEND (X Y)
(DECLARE (XARGS :GUARD (TRUE-LISTP X)))
(IF (NULL X)
Y
86 Type Checking for Common Lisp
The Formal Semantic Model
Segments:
($NIL &2) -> &2
Using SATISFIES, the interpretation of these segments is as follows. First, if the values x and y satisfy
the real guard of the function, then they also satisfy the guard descriptors.
for all Lisp values X and Y,
(TRUE-LISTP X)
->
( (SATISFIES (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
X)
and
(SATISFIES *UNIVERSAL Y) )
Second, if the values x and y satisfy the real guard of the function, then one of the segments is appropriate.
The (SATISFIES *UNIVERSAL ..) forms are unnecessary, but reflect the earlier discussion that a
variable is transformed to *UNIVERSAL with the additional equality tests added. (Here we use
"(APPEND X Y)" to denote the value returned by APPEND for the values X and Y.)
for all Lisp values X and Y,
(TRUE-LISTP X)
->
( ((SATISFIES $NIL X) and (SATISFIES *UNIVERSAL Y) and
(SATISFIES *UNIVERSAL (APPEND X Y)) and (EQUAL Y (APPEND X Y)))
or
((SATISFIES
(*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
X)
and
(SATISFIES *UNIVERSAL Y)
and
(SATISFIES
(*CONS *UNIVERSAL
(*REC !REC4
(*OR *UNIVERSAL (*CONS *UNIVERSAL (*RECUR !REC4)))))
(APPEND X Y))
and
(EQUAL (CAR X) (CAR (APPEND X Y)))
and
(EQUAL Y <<some CDR of (APPEND X Y)>>)) )
The notion of <<some CDR of (APPEND X Y)>> obviously needs better treatment. To deal with it, we
employ a recursively defined interpreter which can handle both the type predicate issues and the equality
issues in tandem. This interpreter, defined by the functions INTERP-SIMPLE and INTERP-SIMPLE-1,
is the one upon which we base the formal semantics for our system. Given a well-formed descriptor, a
value, and a binding mapping type variables to Lisp values, INTERP-SIMPLE-1 returns T if the value
Type Checking for Common Lisp 87
The Formal Semantic Model
satisfies the descriptor under the bindings, and NIL otherwise. INTERP-SIMPLE takes a list of
descriptors, a list (of the same length) of values, and a binding. It applies INTERP-SIMPLE-1 to
corresponding descriptor-value pairs, all under the same binding. All must return T in order for INTERP-
SIMPLE to return T.
Essentially, the rules embodied in the interpreter are as follows. Any value satisfies *UNIVERSAL, and
no value satisfies *EMPTY. Any value of the appropriate primitive type satisfies a primitive descriptor.
A value satisfies a type variable if the variable is bound to the identical value in the bindings. A value
satisfies a *CONS descriptor if it is a CONS whose CAR satisfies the first argument to the *CONS under
the bindings and whose CDR satisfies the second argument. A value satisfies an *OR if it satisfies one of
the disjuncts under the bindings. A value satisfies a *REC if it satisfies the body of the *REC with the
*REC substituted for the *RECUR form. This allows the interpreter to descend as deeply into the value’s
structure as necessary, opening up the *REC descriptor as it goes. All the preceding requirements are
checked by INTERP-SIMPLE-1. INTERP-SIMPLE is employed to check that lists of values satisfy lists
of descriptors under some bindings, assuming the lists are of the same length. It could have been
integrated with INTERP-SIMPLE-1, but since lists of descriptors never appear within a descriptor,
removing it from INTERP-SIMPLE-1 simplifies case analysis in some proofs.
The functions mentioned in the guard are defined just below. As previously discussed in Section 5.1, we
give the guards on our semantic interpreter functions in the same style we implement with our E function,
so as to specify their proper domain. The definitions of the functions called in the guard appear in
Appendix G.5.
(DEFUN INTERP-SIMPLE (DESCRIPTORS VALUES BINDINGS)
;; Where X, Y, and (APPEND X Y) denote values (We use (APPEND X Y)
;; to suggest a typical application.)
;; (INTERP-SIMPLE ’($NIL &2 &2)
;; ’(X Y (APPEND X Y))
;; ’((&2 . Y)))
;; performs the following test:
;; (AND (NULL X)
;; T
;; (EQUAL Y Y)
;; T
;; (EQUAL (APPEND X Y) Y)
(DECLARE
(XARGS :GUARD (AND (ALL-WELL-FORMED-DESCRIPTORS DESCRIPTORS)
(ALL-WELL-FORMED-VALUES VALUES)
(WELL-FORMED-BINDINGS BINDINGS DESCRIPTORS))))
(IF (NULL DESCRIPTORS)
T
(AND (INTERP-SIMPLE-1 (CAR DESCRIPTORS) (CAR VALUES) BINDINGS)
(INTERP-SIMPLE (CDR DESCRIPTORS) (CDR VALUES) BINDINGS))))
(OTHERWISE
(IF (VARIABLE-NAMEP DESCRIPTOR)
(AND (ASSOC DESCRIPTOR BINDINGS)
(EQUAL VALUE (CDR (ASSOC DESCRIPTOR BINDINGS))))
(CASE (CAR DESCRIPTOR)
(*CONS
(IF (CONSP VALUE)
(AND (INTERP-SIMPLE-1 (CADR DESCRIPTOR)
(CAR VALUE)
BINDINGS)
(INTERP-SIMPLE-1 (CADDR DESCRIPTOR)
(CDR VALUE)
BINDINGS))
NIL))
(*OR (INTERP-SIMPLE-OR (CDR DESCRIPTOR) VALUE BINDINGS))
;; OPEN-REC-DESCRIPTOR-ABSOLUTE opens a *REC descriptor.
;; For example,
;; (OPEN-REC-DESCRIPTOR-ABSOLUTE
;; ’(*REC TL (*OR $NIL (*CONS *UNIVERSAL (*RECUR TL))))
;; = (*OR $NIL
;; (*CONS *UNIVERSAL
;; (*REC TL (*OR $NIL (*CONS *UNIVERSAL
;; (*RECUR TL))))))
(*REC (INTERP-SIMPLE-1
(OPEN-REC-DESCRIPTOR-ABSOLUTE DESCRIPTOR)
VALUE
BINDINGS))
(OTHERWISE NIL))))))
The (VARIABLE-NAMEP DESCRIPTOR) case, which was not present in our SATISFIES-1 function,
establishes the relationship between the variable, the value, and the binding. Note that when no variables
appear in (TD1i..TDni),
(SATISFIES (TD1i..TDni) (V1..Vn))
is equivalent to
(INTERP-SIMPLE (TD1i..TDni) (V1..Vn) NIL)
This is the case with guard descriptors and with the segments of recognizer functions, which are always
variable-free.
Before giving the formal semantics for a signature, let us introduce a notational convention which will
help keep the exposition clean.
Definition: fooworld,clock
(E (foo a1 .. an)
((a1 . arg1) .. (an . argn))
world
clock)
The formal semantics for a signature is as follows. Here, as in all our other lemmas and proofs, "I" is a
shorthand notation representing INTERP-SIMPLE when its second and third arguments are lists,
INTERP-SIMPLE-1 when these arguments are a single descriptor and value.
Definition: GOOD-SIGNATUREP (FNDEF GUARD SEGMENTS WORLD CLOCK)
That is, if the guard of foo is satisfied by the values of the actual parameters arg1 .. argn and there is
adequate clock to completely evaluate the function call, then the result of the evaluation will not be a
guard-violation break, and there is some segment for which, under some binding of all its type variables to
Lisp values, the actual parameter values satisfy the formal parameter descriptors under the binding and the
result of evaluating the function foo on those actual values satisfies the result descriptor under the same
binding.
Thus, the signature of a function FOO in FS can be soundly utilized only if the guard descriptors of all the
functions called in FOO, including FOO itself, are complete. The guard descriptors will be complete if
the guard is a conjunction of recognizer calls on disjoint formal parameters.
When a new function is presented to the type inference system, the system’s essential task is to formulate
a signature for which the VALID-FS predicate holds.
In Section 5.7.1, we discuss an alternate semantic model we considered which utilized signatures of the
same form but provided a different interpretation, in which all, rather than some, segments whose left
hand side was satisfied by the actual parameters yielded a correct type for the result.
For an example of our semantics, consider the CDDR function, defined as follows.
(DEFUN CDDR (X)
(DECLARE (XARGS :GUARD (CDDR-ABLE X)))
(CDR (CDR X)))
where
Signature segments:
((*OR $NIL (*CONS *UNIVERSAL $NIL))) -> $NIL
((*CONS *UNIVERSAL (*CONS *UNIVERSAL &1))) -> &1
Given a value X satisfying the guard, and enough clock to complete the evaluation, either X satisfies
(*OR $NIL (*CONS *UNIVERSAL $NIL))
and there is some value which is both the CDR of the CDR of X and the value of (CDDR X).
24
A simple example will help motivate a presentation of the formal significance of the other information
computed and stored by the inference tool. Consider a function:
(DEFUN FOO (X)
(DECLARE (XARGS :GUARD (CONSP X)))
(IF (CONSP (CAR X))
(CONS "CAR is a CONS" NIL)
(CONS 0 (CDR X))))
Given this function, the inference tool stores the following information.
Function: FOO
Guard computed by the tool:
((*CONS *UNIVERSAL *UNIVERSAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*CONS *UNIVERSAL *UNIVERSAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*CONS (*CONS *UNIVERSAL *UNIVERSAL) *UNIVERSAL))
-> (*CONS $STRING $NIL))
(((*CONS (*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T)
(*FREE-TYPE-VAR 1.)))
-> (*CONS $INTEGER (*FREE-TYPE-VAR 1.)))
TC segments contained in Segments: T
Recognizer descriptor:
NIL
TC validates recognizer: NIL
Signature is certified sound: T
We have already discussed the guard descriptors and segments. Let us discuss the formal aspects of the
others in sequence.
We have said informally that a guard descriptor is complete if any value satisfying the guard descriptor
will satisfy the real guard. This is the sense of the "Guard complete" flag, but this particular flag
represents a statement made by the untrusted inference tool, so it carries no formal weight and must be
viewed as no more than an interesting conjecture. Similarly, the "All called functions complete" flag
represents the inference algorithm’s statement that all the functions called in a function body have
complete guard descriptors, and this carries no formal value.
By contrast, the "TC Guard complete" flag means that the guard complies with the following definition:
Definition: The guard descriptors for a function are complete
if the guard in the function definition is a conjunction of calls
to recognizer functions (see below) on distinct formal parameters.
Note: When a guard is of this form, then its descriptor will be
such that for any actual parameter values, the guard will
evaluate to a non-NIL Lisp value (given sufficient clock)
if and only if the values satisfy the guard descriptor
under interpretation by INTERP-SIMPLE. (Any binding will
do for this interpretation, since the guard descriptor is
variable-free.) This property will be established by the
proofs of lemmas GUARD-COMPLETE and TC-INFER-SIGNATURE-GUARD-OK
in Chapter 7.
92 Type Checking for Common Lisp
The Formal Semantic Model
"TC All called functions complete" means that all functions called in either the guard or function body,
and all the functions called in those functions, and so forth throughout their hereditary call trees have this
same property. These simple claims have been validated by the trusted algorithm, and by the reasoning
embodied in Lemma GUARD-COMPLETE, presented originally in Section 3.6.1, define the conditions
under which we know that the checker’s failure to detect a guard violation guarantees that the actual guard
will evaluate to a non-NIL value, specifically T.
The "TC Guard Replaced by Tool Guard" flag indicates that the checker opted to replace the guard it
computed itself with the guard originally computed by the inference tool. This is only done when three
tests are met. First, the guards must be different. (Else, why bother?) Second, the containment algorithm
must determine that the checker guard is contained in the inference tool guard. Third, the inference tool
guard must be contained in the checker guard. This mutual containment guarantees that the two guard
forms are equivalent. The reason for preferring the inference tool guard is a presumption that it is in a
nicer canonical form. This replacement rarely occurs in practice, but on rare occasions it allows for a
more compact representation of the guard.
The "TC segments contained in Segments" flag indicates the finding by TC-SIGNATURE that under
some binding, each of the checker segments is contained in some inference tool segment. This is a critical
hypothesis in Lemma TC-SIGNATURE-OK, proved in Section 7.3.3.
Definition: We say a function is a recognizer if its definition
possesses all the following properties:
1. It has exactly one formal parameter.
2. Its guard is T, either explicitly or by omission, and therefore its guard
descriptor is (*UNIVERSAL).
3. It returns only T or NIL, and there exists for it a correct signature
containing exactly two variable-free segments, one with a result type of $T
and the other with a result type of $NIL, such that the argument descriptors
of the segments characterize disjoint sets.25
4. The guards of all functions hereditarily in its call tree are complete.
The "Recognizer descriptor" item, when non-NIL, indicates that the inference algorithm considers this
function to be a recognizer, and the value stored here is a descriptor characterizing the values for which
the function returns T. As it is another judgement by the inference algorithm, it has no formal
significance. However, the "TC validates recognizer" flag, stored by the checker, indicates that the
checker agrees the function is a recognizer.
Finally, the "Signature is certified sound" flag indicates that the signature can be trusted, i.e., the guard is
complete, the segments have passed the containment test, if the function has a recognizer descriptor, the
checker validated its status as a recognizer, and every function in the hereditary call tree of this one also
meets these same tests. This flag is never used except in computing the value of the flag for subsequently
submitted functions, and it plays no role in the formal analysis. It is simply provided as a convenience for
reading output, as its value summarizes all checks a reader would need to make in order to see that a
signature has been certified by the tool.
25
When the system admits a function as a recognizer, it establishes this disjointness by computation, unifying the argument
descriptors and checking that the result is *EMPTY. The formal justification that this test is adequate is simple and is illustrated in
the proof of Lemma RECOGNIZER-SEGMENTS-COMPLETE in Appendix B.2.
Type Checking for Common Lisp 93
The Formal Semantic Model
A number of semantic models were explored before arriving at the one finally adopted. Perhaps more
than any other single phenomenon, the difficulty of finding a model which suited the nature of the
problem gives evidence to the general difficulty of the problem itself, its statement, its solution, and its
proof of soundness. In this section we present several of the alternatives which received significant
thought. Perhaps by doing so we can shed further light on the problem itself. Comparison and contrast
may lead to a better understanding of the solution presented in the rest of the thesis.
For the sake of discussion here, I call the semantic model we finally adopted "OR semantics", the intuition
being that for at least one, but not necessarily all, of the segments for which the actual parameters satisfy
the left hand side (or parameter descriptors), the function result will satisfy the right hand side. This
differs from "AND semantics", which would claim that for any segment whose left hand side is satisfied
by the actuals, the right hand side would be satisfied by the result. AND semantics is the interpretation
normally used in conjunction with the "->" in type notation.
The reason OR semantics was chosen over AND semantics has to do with the nature of Lisp evaluation
and with how it affects the implementation of the type inference algorithm. Consider the function:
(DEFUN AND-OR-SEMANTICS-EXAMPLE (X)
(IF (EQUAL X 3)
10
(IF (EQUAL X 5)
’FOO
T)))
We might wish for signature segments which would adhere to AND semantics:
($INTEGER) -> (*OR $INTEGER $NON-T-NIL-SYMBOL $T)
But producing a signature like this goes against the grain of an inference algorithm which will analyze the
function in Lisp’s recursive descent evaluation order. Ultimately, the results returned by the function will
be computed at the tips of its IF structure. At each tip we have a context of the types of the formal
variables, as determined by the guard and by the various IF tests through which control has passed, both in
the function being analyzed and in functions called in the body, as captured in their signatures. This
suggests that the orientation of the algorithm would be around generating a segment or a collection of
segments at these tips. Since there can be a test like (EQUAL X 3), which is satisfied by some integers
and not others, an $INTEGER parameter could factor into more than one of these tips.
One segment is produced for each tip of the IF tree, using the type contexts provided by the accumulated
IF tests governing the tip. This signature is not a good one under AND semantics, since it would suggest
94 Type Checking for Common Lisp
The Formal Semantic Model
that if the argument were an integer, the result would be simultaneously an integer, a non-t-nil-symbol,
and T. Given a preference for generating segments, at least initially, by recursive descent as described, we
would need to significantly transform this signature for AND semantics in order to preserve much useful
information. Ideally, we would transform it to the previously stated signature. This would require an
elaborate factorization of the signature amounting to an inversion. We might, for instance, find a common
descriptor among some of the left hand sides (like $INTEGER, in this case), and form new segments, first
removing that common descriptor from all left hand sides where it occurs and then adding a segment
mapping it to the disjunction of all the result types of the modified segments. In general, such
manipulations could lead to loss of information. It seems much more straightforward to define a
semantics which more neatly matches the results of a recursive descent derivation, i.e., OR semantics.
The INTERP-SIMPLE model of semantics is of an abstract flavor because of the existential quantifier
over vectors of values. Within the quantifier, though, is INTERP-SIMPLE, which is a constructive
function. But we considered an even more constructive semantics which would have broadened the
semantic function INTERP-SIMPLE to a new function INTERP which would encompass the quantifier.
Given a list of type descriptors and a list (of the same arity) of values, INTERP would return either FAIL
or a list of bindings lists, each of which would map type variables to Lisp expressions and would exhibit
possible choices for the existential quantifier over values in the quantifier semantics.
Where we denote that if (INTERP (TD1i..TDni TDi) (O1..On (F O1..On)))
does not fail, it returns bindings i,1 .. bi,m,
(GUARD V1 .. Vn)
->
(INTERP-SIMPLE (GD1..GDn) (V1 .. VN) NIL)
and
for some i, 1 ≤ i ≤ k,
(INTERP (TD1i..TDni TDi) (O1..On (F O1..On))) is not FAIL and
for some j, 1 ≤ j ≤ m
(INTERP-SIMPLE (TD1i..TDni TDi)
(V1..Vn (F V1..Vn))
bi,j)
That is, if the guard is satisfied by the actual parameters, for some segment, the guard descriptors are
satisfied by the parameters, INTERP returns a non-empty list of substitution lists mapping the variables in
the segment to a representation of Lisp values, under which the actual parameters satisfy the parameter
descriptors in the segment and the result value satisfies the result descriptor.
To illustrate, consider again the CADR function. This interpretation means that if X satisfies the guard
descriptor
(*OR $NIL (*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))),
then INTERP will return a singleton list containing a singleton substitution, i.e.,
(((&20 . (CAR (CDR X))))). Thus, the "values" suggested by INTERP would be symbolic values, stated
in terms of the variables in the formal parameter list.
This model was rejected in favor of the INTERP-SIMPLE semantics because this model placed the
complexity of finding satisfactory vectors of values in the semantic definition functions, rather than in the
inference system. The latter seems much preferable, since the discovery process is very complex and
could involve heuristics which can be ignored formally, so long as the results they yield are checked
formally. Thus, the semantic function INTERP-SIMPLE needs only to perform this check.
Type Checking for Common Lisp 95
The Formal Semantic Model
We explored several other semantic models before settling on the INTERP-SIMPLE model. In none of
the three alternate models mentioned below do we utilize the notion of value equality as the significance
of multiple occurrences of type variables. Rather, the variable represents some unspecified set of data
objects (or an unspecified recognizer function), uniformly over all occurrences, which is to say a variable
may be instantiated by another type descriptor.
The first alternate model is described in terms of sets, and appears in Appendix C, titled "A Set-Based
Semantics". At the end of this appendix is a somewhat extended version of the discussion of OR
semantics versus AND semantics. Also included are proofs of several properties which would have
contributed significantly to the proof of a system based on the model. This model was discarded because
of a desire for a more computational style of semantics.
Another semantic model was posed in the vernacular of functions and meta-functions. Each descriptor
corresponds to a recognizer function with a parameter for the value to be checked for satisfaction of the
descriptor. A descriptor containing type variables corresponds to a function with one extra parameter for
each type variable. These parameters are to be instantiated with functions corresponding to the type
descriptors instantiating the variables. This direction was abandoned because its meta-theoretic approach
posed more potential problems than we would encounter with a more straightforwardly computational
model. In particular, the semantic definition itself generates function definitions, a notion which could
become problematic. This direction was abandoned because its meta-theoretic approach was not, in the
end, sufficiently computational for our taste, despite its functional trappings. It is described in the
Appendix D, titled "A Function-Based Semantics".
The third model represented a final attempt to develop a computational style semantics in which type
variables could be instantiated with other type descriptors of the general form. It is described in Appendix
E, titled "A Semantics with Composed Substitutions". The difficulties encountered with this specification
provided the impetus for moving to the semantics of variables which allowed for their instantiation only
by Lisp values, or, more precisely, by types representing single values. This appendix shows, perhaps
more clearly than the others, how we stumbled along in the process of developing the formal semantics.
All three of these models, when abandoned, were works in progress. As such, they are without doubt
imperfect. They are provided both to show the various specification approaches attempted and to
illustrate the difficulty of arriving at an approach which was appropriate to the problem and viable for
proof of the algorithm.
96 Type Checking for Common Lisp
The Formal Semantic Model
Type Checking for Common Lisp 97
The Signature Checker
Chapter 6
THE SIGNATURE CHECKER
This chapter describes the type checker algorithm, which validates signatures produced by the inference
algorithm. The inference algorithm performs a number of operations which would be very difficult to
specify and is not formally verified in any sense. In contrast, the checker employs relatively fewer and
simpler concepts and is designed with verifiability in mind. Yet it is by no means a simple algorithm. A
thorough understanding of the proof of its correctness will require an understanding of the description
provided here.
In this chapter we first present an overview of the checker and its task. The remaining sections are
devoted to describing each of the significant algorithms within the checker. With each, we present the
lemmas providing a top level specification of the algorithm, along with a detailed description of the
algorithm. In order, we will discuss the top level of the checker algorithm TC-SIGNATURE, the central
recursive algorithm in the checker TC-INFER, the unification algorithm, the descriptor canonicalization
algorithm, and the containment algorithm.
Throughout this chapter keep in mind that the "inference algorithm" is the unverified heuristic algorithm,
described in Chapter 4, which generates a function signature, and the "checker algorithm", described here,
is the one whose soundness we will formally prove.
Let us first repeat the two lemmas which together make up the top level specification for the checker. The
first and most important specifies that its validation of the inference tool signature is sound. Here, as in all
our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE when its second
and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single descriptor and value.
Lemma TC-SIGNATURE-OK
The next lemma justifies our claim about guard verification, which is that if a function’s guard is an IF
form representation of a conjunction of recognizer calls on distinct formal parameters, and if the checker
does not detect a guard violation on a call to the function, then the real guard for the function, applied to
the actual parameters of the call, will evaluate to T.
Lemma GUARD-COMPLETE
(and
H1 (valid-fs fs world clock)
H2 (I (argtd1 .. argtdn) (arg1 .. argn) b)
H3 (contained-in-interface (*dlist argtd1 .. argtdn)
(*dlist rtd1 .. rtdn))
H4 (not (break-out-of-timep
(E (and (Ra a1) .. (Ra an))
1 n
((a1 . arg1) .. (an . argn))
world
clock))) )
=>
(equal (E (and (Ra a ) .. (Ra an))
1 1 n
((a1 . arg1) .. (an . argn))
world
clock)
t)
6.2 Overview
The job of the signature checker is to validate the result produced by the inference algorithm. In essence,
the checker performs the n+1st iteration of the iterative approximation algorithm whose first n iterations
were performed by the main inference algorithm. But fortunately, its task is vastly simpler than that of the
inference algorithm, for several reasons. Principal among these,
• The checker operates on a type descriptor language which is smaller than the one
manipulated in the inference algorithm.
• The checker does not need to discover any closed recursive forms.
• The checker neither needs nor attempts any special tricks to handle recognizer functions.
For comparison of the descriptor languages the two algorithms handle, see their respective BNF grammars
in Section 5.2 and Section 4.3. The checker needs to handle only the core language composed of
primitive descriptors, *EMPTY, *UNIVERSAL, variables, and constructions from *OR, *CONS, and
*REC. A number of other forms are employed in the inference algorithm in the search for closed form
recursive descriptors. Most notable among these, because of the difficulties they would introduce, are:
• *NOT, the descriptor which prescribes the complement of the type prescribed by its
argument,
• *FIX and *FIX-RECUR, the descriptors corresponding to a recursive form under
consideration for closure, and
• *AND, a descriptor which, in a limited context, prescribes the set of values corresponding to
the intersection of the value sets corresponding to its arguments.
Moreover, the checker adheres strictly to the singleton semantics for variables, which the inference
algorithm bends somewhat in the search for closed recursive forms. By the time the inference algorithm
has completed its work, all these interim forms have been discarded. This is fortunate, because even their
formal specification, much less proofs about their manipulation, could be very difficult. By the time the
checker receives a signature, it is expressed entirely in the core descriptor language, which is cleanly
defined with the semantic definition functions.
After the inference tool computes a signature for a function and stores it in the database, the checker
makes a single pass over the function, performing its own version of the type inference algorithm to get a
new signature. In does so under the assumption that the signature segments which were formulated for
the function by the inference algorithm are correct, under a rationale discussed briefly below. Since the
guard contains no recursive calls to the function in question, the checker can formulate its own guard
descriptors, using only verified signatures. It need not and does not make any assumptions about the
correctness of the guard descriptors derived by the inference algorithm. It uses its own guard descriptors
to formulate the initial type assumptions for analyzing the function body and to do guard checking on
recursive calls.
After obtaining its own signature, the checker then performs a crucial test, using the predicate
CONTAINED-IN-INTERFACE. It checks that for each segment in the checker signature, some binding
of type variables to values exists such that under the binding, the segment is contained in some segment
from the inference tool signature. If this is the case, and if the guards are complete for this function and
every function in its call tree, then we consider the inference tool signature validated. Since the checker
algorithm and the CONTAINED-IN algorithm are proven correct, if the signature passes these tests, we
have full assurance of the correctness of the signature.
It is counterintuitive to consider using, for a recursive function, a signature we do not trust as the basis for
deducing a signature which we do trust, since it seems that if the initial signature is incorrect, we are
100 Type Checking for Common Lisp
The Signature Checker
predicating our entire analysis on a false assumption. But imagine the graph of a recursive function as
being the union of the graphs of an infinite number of non-recursive functions, with the first being the
empty function, the second being the ordered pairs formed by cases where the function terminates without
recursion, the next being cases where it terminates with one recursion, and so forth. For a given function
f, call these non-recursive functions ⊥, f1, f2, .. fn. Each function definition fi is non-recursive in the
sense that it is formed by replacing the recursive calls of f with calls to fi−1. An inductive argument,
explained in detail in Section 7.2, can be used to demonstrate that if a containment relation holds between
the signatures for fn+1 and fn, then the signature for fn is a sound approximation for the graph of f. The
inference algorithm computes the signature for fn, and the checker computes the signature for fn+1.
If the containment relation holds, it means that the checker signature is a better approximation of the real
function graph than the inference tool signature. Why, then, do we use the inference tool signature as our
permanent result? The reason is that the inference tool signature is in a relatively compact form. By
contrast, the checker signature usually reflects an expansion of cases with respect to the base signature.
For each segment in the base signature, there may be many segments in the checker signature. Moreover,
the type descriptors in the checker signature may be larger, reflecting, for example, the opening of a *REC
descriptor. Furthermore, there may be a combinatorial explosion in the factoring of possible cases among
the formal parameter descriptors. We sacrifice compactness in the checker signature for simplicity in the
checker algorithm, where the soundness and the clarity of the code and its proof of correctness are
primary requirements. So while the checker signature may be marginally more accurate than the original,
we would pay a severe price in case explosion if we used it rather than the inference tool signature as our
permanent reference. Since our proof validates the soundness of both, we choose the latter for the sake of
efficiency as the basis for further inference computations.
The signature checker, like all the rest of the inference system implemention (save the highest level
interactive function, which modifies the system state as functions are added) is written as a composition of
purely applicative Common Lisp functions.
We will factor the detailed discussion of the checker algorithm into components, corresponding to its
principal functional composition. Each component will be discussed in a separate section. The top level
of the checker, invoked by calling TC-SIGNATURE, is discussed first. The top level function for
deriving the checker’s signature from the function text and the inference algorithm’s signature is TC-
INFER-SIGNATURE. The principal recursive function of the checker algorithm, the one which actually
traverses the function guard and body, is TC-INFER. This central algorithm will be discussed and
specified next. As with the inference algorithm, one of the main internal algorithms of the checker is its
unification algorithm for type descriptors. We will outline how unification occurs and give the formal
specification for the unifier, whose top level function is DUNIFY-DESCRIPTORS-INTERFACE. Next,
we will discuss the checker’s type descriptor canonicalization routines, which are employed throughout
the checker. Finally, we will discuss the containment algorithm, whose top level function is
CONTAINED-IN-INTERFACE. This algorithm provides the test which validates the correctness of our
signature.
TC-SIGNATURE, the top level function of the checker, essentially solicits and packages the results of the
checker’s analysis. The top level functions in the checker do the following.
1. Initially, they transform both the guard and the function body, using the function
TC-PREPASS. TC-PREPASS normalizes IF forms by transforming them so the IF test (the
first argument of the IF) always returns either T or NIL, and by performing the obvious
simplification if the resulting predicate form is either of the constants T or NIL. Typically,
Type Checking for Common Lisp 101
The Signature Checker
normalization consists of wrapping two calls to NULL around the test, if necessary. For
technical reasons, the inference tool can be more precise when IF tests always return T or
NIL. A simple example giving intuition about why this is true is when a formal parameter X
appears as a test. Suppose in the context of the IF, X has a type which includes $NIL and
some other possibilities. For example,
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
If X were left bare, barring the special treatment of IF which would complicate the
semantics of the checker, the segment for the IF test would be the single segment whose
result is the type of X. As we shall see later in the discussion of the checker algorithm, the
type context would not be refined for X in the THEN and ELSE arms. But transforming the
IF test to (NULL (NULL X)) tricks the checker into refining the context as a result of
applying the segments for NULL. The formal soundness property of TC-PREPASS, which
is stated in Lemma TC-PREPASS-OK and proved in Appendix B.1, is that in any
environment and for any form, evaluating the prepassed form in the environment produces
the same value as evaluating the original form in the environment. The definition of
TC-PREPASS appears in Appendix G.6.
2. We determine whether the guard descriptors are complete. This is an entirely syntactic test
on the guard expression, requiring only the knowledge of what functions in the world are
known to be recognizer functions. A recognizer call is either a call to a recognizer function
(See Section 5.6) or an equality test of the form (EQUAL <param> <value>) or
(EQUAL <value> <param>), where <param> is one of the formal parameters and <value> is
either T or NIL (the only singleton primitive types). The guard descriptors will be complete
if the guard is an IF form representation of the conjunction of recognizer calls on the formal
arguments, with no argument subject to more than one such call. For example, the IF form
representation of
(AND (INTEGERP X) (CONSP Y) (CHARACTERP Z))
is:
(IF (INTEGERP X) (IF (CONSP Y) (CHARACTERP Z) NIL) NIL)
Though this restriction in form may on the surface seem more onerous than that employed in
the inference algorithm, it is usually the case that a user could transform a guard conforming
to the inference algorithm regimen to one which suits the checker’s rules.
3. We determine whether all the functions called in the body have complete guards. Stored
with each function in the database is a flag which indicates whether its guard is complete
according to the checker regimen above.26 So this check is just a lookup on all called
functions.
4. Regardless of whether the guard is complete, we derive the vector of guard descriptors. We
do this by first obtaining segments for the guard by calling TC-INFER on the prepassed
guard form with an abstract association list (or alist) simply associating a new type variable
with each formal parameter, and with a concrete alist associating *UNIVERSAL with each
formal. (Section 6.5 will describe the significance of these alists and clarify some of the
following.) If a guard violation is found in the guard itself, we note this and do not attempt
to validate the function segments. But in the normal case, we next gather all the maximal
segments for which the result descriptor is anything other than $NIL, replace all occurrences
of singleton variables (i.e., a variable which appears only once within a segment) with
*UNIVERSAL, abort the computation if there are any remaining variables, and then form a
vector of guard descriptors, one for each argument, by *OR-ing the contexts of the
qualifying segments together and canonicalizing.
26
The flag for the current function is passed in as a parameter, since it has not yet been stored in the system database.
102 Type Checking for Common Lisp
The Signature Checker
5. If no guard violation is found in the guard, we derive the checker segments for the function
body by calling TC-INFER with the body, the same abstract alist as with the guard, and a
concrete alist pairwise associating each parameter with its guard descriptor, or if the guard
descriptor is *UNIVERSAL, with the variable from the abstract alist. If a guard violation is
detected in this analysis, we simply note such as the result. Otherwise, we return the list of
maximal segments computed by TC-INFER. (For a discussion of maximal segments, see
Section 6.5.
6. The inference algorithm also computed and stored a guard descriptor. In rare cases, the two
guard descriptors are equivalent but different. In these cases, more often than not, the
inference algorithm descriptor is in a nicer form, meaning that it may not contain some
redundancy that the checker guard descriptor has. So there is an advantage in using the
inference algorithm guard as the "official" one, in the sense that it is the one which will be
used for all subsequent purposes. If the two guard descriptors are different, the checker
determines whether they are equivalent by administering the containment test (described
below in Section 6.8) in both directions. If each is contained in the other, they may be used
interchangeably, so the tool chooses to use the inference algorithm guard. If either test fails,
the checker’s guard will be the one used for all subsequent purposes.
7. We administer the containment test for the checker segments, as described above. If
containment is not determined, we declare the signature to be unsound, and do not bother to
proceed.
8. If the segments are properly contained, then we check to see if the inference algorithm noted
that the function being analyzed is a recognizer. If so, the checker validates this fact by
doing the following tests on the now-validated segments in the inference tool signature.
a. There must be no type variables in the segments.
b. There are two segments, one with a $T result and the other with a $NIL result.
c. We unify the argument descriptors of the two segments, and the result must be
*EMPTY, signifying that there is no overlap of the domains for the two segments.
9. Finally, we return a structure which encapsulates all we have learned, with flags denoting
the results of the various tests just performed. It is a 7-tuple, containing the checker guard,
the guard containment flag, the guard complete flag, the ALL-CALLED-FUNCTIONS-
COMPLETE flag, the checker segments, the segments contained flag, and the flag denoting
whether the function was validated as a recognizer. (These flags were defined in Section
5.6.)
The interpretation of these results is that if all the above flags (excepting possibly the RECOGNIZERP
flag) are T, then the signature is perfect, in the sense that it fully satisfies the formal specification
GOOD-SIGNATUREP for function signatures. As such, it may be employed as part of a valid fs to make
sound inferences in deriving the signatures of succeeding functions.
The formal specification of the top level of the checker is given by Lemmas TC-SIGNATURE-OK and
GUARD-COMPLETE, which were presented in Section 6.1.
In E, our model of evaluation for our Lisp subset, on every function call, we first bind the formal
parameters to the actual parameter values, then evaluate the guard form of the called function in that
binding environment. If the guard evaluates to NIL, a guard violation error occurs. If the guard evaluates
to a non-NIL value, the body of the function is evaluated, and the value returned from the function call is
the result returned from that evaluation. The native functions in our dialect are defined as subrs, meaning
that if the guard evaluates to a non-NIL value, the value returned is computed by a table lookup, and no
Type Checking for Common Lisp 103
The Signature Checker
Imagine imposing a requirement on new function definitions that this property is preserved, i.e., that if the
guard for a function is non-NIL when evaluated on some actual parameters, then we are assured the
function body will evaluate without error and return a value. Thus, roughly speaking, the function guard
safely captures the domain restriction of a partial function. The purpose of guards in our Lisp subset,
then, is to provide a predicate for specifying conditions on the arguments to a function which will
guarantee that, if the function is called with arguments satisfying the predicate, evaluation of the function
will proceed without error and produce a value.
The notion of verifying a guard is one of discharging the proof obligations necessary to demonstrate for a
given function call in a function definition, that for any possible values of the definition’s parameters, the
guard form for that function will evaluate to a non-NIL value. There is one such obligation for each
function call appearing in the definition of the function. The obligation says that the actual parameters
satisfy the called function’s guard under the assumptions derived from the IF tests governing the function
call. For function calls in a function body (as opposed to function calls appearing in the guard form), we
can also assume that the function’s guard evaluated to a non-NIL value.
Assurance of the guard property would require a regimen where part of the process of defining a new
function is discharging the guard proof obligation for each function call it contains. Together with a proof
of termination, this would be sufficient to have the assurance we seek. This is because for functions
called non-recursively, all guard proof obligations have already been discharged, so by satisfying the
guard of each such function, we have satisfied the assumptions made in those proof obligations. For the
recursive calls, the same argument applies, but the justification involves a computational induction on the
arguments. This is essentially the strategy employed in the developmental Acl2 system [Boyer 90].
Using the evaluation model defined by our evaluator function E (in Section 5.4), we can prove that guard
validation regimen just described is sufficient to guarantee the no-error property we hypothesized.
Lemma GUARD-COMPLETE provides the basis for the inference system’s role in performing guard
verification, which is ultimately stated in Lemma TC-SIGNATURE-OK. Whenever the guard of a
function is stated as a conjunction of recognizer calls on distinct parameters, and likewise throughout the
call tree of the function, then our type inference system, by generating a signature without detecting a
guard violation, satisfies all the guard proof obligations for our new function.
We could imagine our system accomplishing the verification of guard proof obligations by acting in one
of at least two different roles. One way would be to view the inference system as generating and relieving
each proof obligation implicitly as it analyzes a new function definition, as it does now. The other would
be for some external proof obligation generator to formulate the proof obligation as an implication (cast as
an IF form), and then to use the inference tool to derive a typing for that form whose result would be
disjoint from $NIL. (Since recognizers are Boolean, the result would be $T.)
The stylistic constraint imposed on guards as a requirement for completeness is consistent with the
capabilities of the type system. It does not allow for relations between the types of various arguments,
such as
(OR (INTEGERP X) (SYMBOL-ALISTP Y))
27
Alternatively, they could have been defined in terms of "sub-primitive" functions which are guaranteed to evaluate without error
and return a value whenever the guard is satisfied. This strategy might be used for an implementation of our Lisp subset.
104 Type Checking for Common Lisp
The Signature Checker
or
(MEMBER X Y)
as such restrictions make a link between the types of two variables which cannot be represented by the
guard vector. It does not allow for guard expressions which depend on non-type-theoretic information,
such as (PRIMEP X).28 It also rules out some guard expressions where our guard descriptor is
sufficiently precise, but in these cases, the requisite predicate could be re-phrased as a conjunction of
recognizer calls. There are unlimited examples of this phenomenon, but to illustrate, the guard
(AND (CONSP X) (TRUE-LISTP X))
which would be rejected as not being complete because it is a conjunction of recognizer calls on
non-distinct parameters, could be recast as a function call
(PROPER-CONSP X)
Another example is the guard (NOT (CONSP X)), which could also be encapsulated as a function which
would qualify as a recognizer. So, many guards which do not satisfy the type system’s requirements for
guard completeness can be transformed into satisfactory guards via proper encapsulation.
The central function in the type checker is TC-INFER. TC-INFER takes four arguments, a Lisp FORM
being analyzed, an abstract type alist ABS-ALIST, a concrete type alist CONC-ALIST, and the database
of function signatures, FS (FUNCTION-SIGNATURES in the code). It returns a list of 3-tuples, where
each tuple is composed of a minimal segment, a concrete type alist, and a maximal segment.29 Both
minimal and maximal segments are of the same form, i.e., if the function we are checking has n
arguments, each is an (n+1)-tuple of type descriptors, but can carry slightly different information. The
concrete type alist has n entries and characterizes the types of the Lisp formals when the segment is a
possible choice. It is sometimes more restrictive than the CONC-ALIST parameter, as we shall see later.
An ABS-ALIST associates each formal parameter with a descriptor, where this descriptor is typically a
type variable or a structured descriptor with type variables embedded. A CONC-ALIST associates each
formal with a descriptor which is typically biased toward descriptors other than variables (though
variables are not excluded). Similarly, minimal segments are biased toward variables, and maximal
segments biased toward other descriptors.
The need for variables is to capture the transfer of a particular piece of data from the parameters to the
result, as with the application of a function like CAR. But often the advantage of variables is lost, for
instance when their manipulation would force them into the recursive depths of *REC descriptors, where
28
It is not sufficient for the argument to be an integer, and the type descriptor language could not capture the characteristic that a
given integer is a prime number.
29
The terms "minimal" and "maximal" are somewhat arbitrarily chosen. The sense of it is that the maximal segments carry the
maximum information about type structure, while the minimal segments have a bias toward the use of type variables, emphasizing
value sharing rather than type structure.
Type Checking for Common Lisp 105
The Signature Checker
they are not allowed. Thus, it is frequently more desirable to carry the best information we can without
variables, as with the concrete alist and the maximal segment. Were it not for a weakness in the type
descriptor language, we might be able to satisfy both needs simultaneously. If the descriptor language had
a construct for denoting that an object in a particular position is both identified with a type variable and
characterized by another kind of descriptor [for example (*BOTH &1 (*CONS $INTEGER $NIL))], then
there would be no need for the dichotomy of representation in the checker. But attempts to support this
kind of construct were aborted because of the complexity it introduced into an already very complicated
system. Given that there is no blended construct, the checker compensates by carrying multiple
representations of both the formal parameters of the function and the segments produced by the checker.
The formal specification for TC-INFER is in terms of the function INTERP-SIMPLE, which establishes
the consistency of Lisp values with type descriptors under a binding environment for the type variables in
the descriptor. Essentially, the specification is that for any values supplied for the variables in the
environment (i.e., those in the alists), if we have a valid database of signatures, a valid ABS-ALIST, and a
valid CONC-ALIST, if the checker does not detect a guard violation, the guards of all the functions in the
call tree of the form are complete, and clock is sufficient to allow a full evaluation of the form in the
binding environment, then evaluation of the form will not result in a guard violation, and TC-INFER
produces a valid list of 3-tuples. An ABS-ALIST or CONC-ALIST is valid if for each variable in the
environment, the descriptor associated with the variable correctly characterizes the value of the variable,
as witnessed by INTERP-SIMPLE. A list of 3-tuples is valid if there is at least one tuple in the list such
that the CONC-ALIST and both the minimal and maximal segments are valid with respect to the values of
the variables in the environment and the value produced by evaluating the form in that environment.
Segment validity is also defined in terms of INTERP-SIMPLE. The segments in the conclusion may
contain variables not in the original hypotheses, since variables can be imported with the segments for a
called function. Thus, the type variable binding used by INTERP-SIMPLE in the conclusion may be
different from the one in the hypotheses, but it is an extension of the original, meaning that all the
individual variable-value pairs in the original are also in the extended binding. Here, as in all our other
lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE when its second and third
arguments are lists, INTERP-SIMPLE-1 when these arguments are a single descriptor and value.
Lemma TC-INFER-OK
For any Lisp form form, function signature list fs, Lisp world world
including all functions hereditarily in the call tree of form,
for any non-negative integer clock, type variable bindings b,
Lisp values arg1 .. argm,
Lisp variables a1 .. am,
binding environment env of the form ((a1 . arg1) .. (am . argm))
where a1 .. am include all the free variables in form,
ABS-ALIST of the form ((a1 . tda1) .. (am . tdam)),
CONC-ALIST of the form ((a1 . tdc1) .. (am . tdcm)),
and denoting
(tc-infer form abs-alist conc-alist fs)
by
(((mintd1,1 .. mintd1,m mintd1)
((a1 . tdconc1,1) .. (am . tdconc1,m))
(maxtd1,1 .. maxtd1,m maxtd1))
..
((mintdn,1 .. mintdn,m mintdn)
((a1 . tdconcn,1) .. (am . tdconcn,m))
(maxtdn,1 .. maxtdn,m maxtdn)))
Note:
H1 establishes that the signatures in the system state fs are valid.
H2 establishes that the abs-arglist is valid.
H3 establishes that the conc-arglist is valid.
H4 establishes that no guard violations are detected in the
course of analyzing form.
H5 establishes that the guards of all functions in the call tree
of form are complete.
H6 establishes that the evaluation of form terminates without
exhausting the clock.
The Lisp dialect which the inference system supports is simple. There are only four different kinds of
forms: variables, quoted forms (including self-quoting forms like integers, strings, T, and NIL), IF forms
of arity three, and function calls. This section will describe how TC-INFER handles each of these.
Recall that TC-INFER takes as arguments one of these Lisp forms, an ABS-ALIST and a CONC-ALIST,
both association lists mapping the variables in the environment to type descriptors, the name of the
function in which the form occurs, the guard descriptor for that function, and the system state, FS. (FS is
the name used to refer to the state in the exposition and proof. FUNCTION-SIGNATURES is the name
used in the implementation code.) The function name and guard descriptor are used only to perform
guard verification on recursive calls; the name allows recognition that a function call is recursive, and the
guard must be provided as a parameter because the guard computed by the checker for the current
function is not in FS. TC-INFER returns a list of 3-tuples, where each tuple is composed of a minimal
segment, a concrete alist, and a maximal segment:
(((tdai,1..tdai,n) -> tdai) << the minimal segment >>
((v1 . tdv1) .. (vn . tdvn)) << the concrete alist >>
((tdci,1..tdci,n) -> tdci) << the maximal segment >>
(The "->" is included as a reading aid. It is not actually part of the data structure.) The concrete alist
represents the type context in which the segments were computed.
Variable occurrences are easy. We simply look up the type in the type alists. We return a list of a single
3-tuple, as follows. The minimal segment’s context component is the list of the types of the parameters,
Type Checking for Common Lisp 107
The Signature Checker
drawn from ABS-ALIST, and its result component is the type associated with the variable in question,
also taken from ABS-ALIST. The concrete alist in the tuple is the same as the input CONC-ALIST. The
maximal segment’s context component is the list of parameter types, taken from the CONC-ALIST, and
the result is the type associated with the variable in CONC-ALIST. For example,
(TC-INFER ’Y ; FORM
’((X . &1) (Y . &2)) ; ABS-ALIST
’((X . $INTEGER) (Y . (*OR $NIL $T))) ; CONC-ALIST
’FOO ; FNNAME
’(*UNIVERSAL *UNIVERSAL) ; GUARD
FS)
=
((((&1 &2) -> &2) << the minimal segment >>
((X . $INTEGER) (Y . (*OR $NIL $T))) << the concrete alist >>
(($INTEGER (*OR $NIL $T)) -> (*OR $NIL $T)))) << the maximal segment >>
Thus, both segments map objects of the types indicated by the respective type alists to the binding of Y in
those alists.
Quoted forms are also straightforward. As recognized by a simple predicate, a quoted form is either an
integer (as determined by the Lisp function INTEGERP), T, NIL, a rational literal as determined by the
Lisp function RATIONALP, a character (CHARACTERP), a string (STRINGP), a keyword
(KEYWORDP), or a CONS whose CAR is the atom QUOTE. The descriptor is constructed by the
function DESCRIPTOR-FROM-QUOTE, whose definition appears in Appendix G.7. The result returned
by TC-INFER is constructed exactly as with the identifier case, except that the descriptor constructed by
DESCRIPTOR-FROM-QUOTE appears in the result component of both segments instead of the type
descriptor extracted from the type alists. Lemma DESCRIPTOR-FROM-QUOTE-OK provides the
soundness specification for DESCRIPTOR-FROM-QUOTE.
Not surprisingly, function calls are the most involved case. First TC-INFER calls itself recursively on all
the function arguments, using the same type alists. If *GUARD-VIOLATION is returned for any of them,
it returns *GUARD-VIOLATION.
Next, it checks for a guard violation on the function being called. From the recursive calls of TC-INFER
noted above, each actual parameter has associated with it a list of triples, each of which may have a
different result type. For each parameter, all these result types are combined using *OR, and the result
canonicalized. A vector, of the arity of the argument list and representing the possible types of the
arguments, is constructed from these results. All variables in the vector are transformed to
*UNIVERSAL. Note that both this universalization and the *OR combination are operations which, if
anything, enlarge the collection of values represented in the descriptors. Having thus prepared this vector
representing a conservatively large set of possible values for the actual parameters, we employ the
containment relation, calling CONTAINED-IN-INTERFACE, to ascertain if our actual parameter vector
is contained in the vector of guard descriptors computed for the function. If not, we are unable to
demonstrate that a guard violation will not occur, so we return the atom *GUARD-VIOLATION,
effectively terminating the inference process.
Now comes the task of generating appropriate segments. The general idea is to consider each possible
type configuration for the actual parameters in conjunction with each segment for the called function to
determine a set of possible results for each compatible combination. The operation which performs this
joint consideration is descriptor unification. We unify two vectors, one containing information about the
actual parameters and calling context, and the other containing information from the function signature.
The values represented in the vectors are the values of the variables in the context, the values of the actual
parameters, and the value of the result.
108 Type Checking for Common Lisp
The Signature Checker
Finding each possible type configuration for the actuals is a complex operation when there is more than
one actual parameter, since each parameter may be associated with multiple segments, and each segment
may have a different context description, i.e., each segment may be the result of a different type
configuration for the variables in the environment. The contexts derived for one parameter are not likely
to neatly match the contexts for another parameter. So to determine the collection of possible contexts,
we perform a cross product operation.
The top level function of the cross product operation is TC-MAKE-ARG-CROSS-PRODUCT. It has a
single parameter, ARG-RESULTS, which is a list whose length is the arity of the function being called,
and whose elements are the lists of 3-tuples returned by TC-INFER on the respective arguments. The
operation takes place in two phases, where first we re-organize the data in ARG-RESULTS to prepare it
for the second phase, the combining operation.
Everything is best illustrated with an abbreviated example, in which we use the following notation. c1..c6
represent minimal segment contexts, each being a vector of descriptors of length equal to the number of
formal parameters in the function being analyzed (not the function being called). r1..r6 represent the
corresponding segment result types. The alistij are the concrete alists from the 3-tuples. The maxsegij are
the maximal segments, which are ignored for the formation of not only the cross product but even for the
value of TC-INFER on a function call. The alistij-rhs are vectors of descriptors composed of the right
hand sides of the alists alistij. In our example, the function being called has three arguments, the first
argument produced three 3-tuples, the second argument just one, and the third argument two. Thus, we
have
ARG-RESULTS = ((((c1 -> r1) alist11 maxseg11)
((c2 -> r2) alist12 maxseg12)
((c3 -> r3) alist13 maxseg13))
(((c4 -> r4) alist21 maxseg21))
(((c5 -> r5) alist31 maxseg31)
((c6 -> r6) alist32 maxseg32)))
The re-organization of ARG-RESULTS produces the following structure, a list of length equal to the
number of (order-preserving) combinations of segments in ARG-RESULTS, whose elements have arity of
the function being called (in this case 3).
(((c1 (r1) alist11-rhs) (c4 (r4) alist21-rhs) (c5 (r5) alist31-rhs))
((c1 (r1) alist11-rhs) (c4 (r4) alist21-rhs) (c6 (r6) alist32-rhs))
((c2 (r2) alist12-rhs) (c4 (r4) alist21-rhs) (c5 (r5) alist31-rhs))
((c2 (r2) alist12-rhs) (c4 (r4) alist21-rhs) (c6 (r6) alist32-rhs))
((c3 (r3) alist13-rhs) (c4 (r4) alist21-rhs) (c5 (r5) alist31-rhs))
((c3 (r3) alist13-rhs) (c4 (r4) alist21-rhs) (c6 (r6) alist32-rhs)))
The second phase of the cross product operation is to take each triple and smash it into one form, using
DUNIFY-DESCRIPTORS-INTERFACE (described in Section 6.6), rejecting empties and duplicates, so
we return a list whose elements are all distinct and of the form
((c -> (ri rj rk)) alist)
where c is the unification of the contexts from the triple, (ri rj rk) is the list of result values from the triple,
and alist is the unification of the conc-alists.
For example, the first triple in the intermediate structure yields a list of results of the form
((c -> (r1 r4 r5)) alist-rhs)
where c is a unification of c1, c4, and c5, and alist-rhs is a unification of alist11-rhs, alist21-rhs, and
alist31-rhs. There is a list of such forms because if the unification of c1, c4, and c5, and the alists returns
Type Checking for Common Lisp 109
The Signature Checker
an *OR, we construct a result for each disjunct, which in general means we returns a list of results. If the
unification is not an *OR, the list is of length one. Since unification is a binary operation, the unification
of the triple occurs pairwise. First we unify
(c1 (r1 *universal) alist11-rhs)
and
(c4 (*universal r4) alist21-rhs)
(Actually, we flatten these structure and prefix them each with *DLIST, so the unification can be handled
as a simple unification of descriptor lists. Notice the padding of the second element of each piece, since
each ri represents a distinct actual parameter in our function call. Then each result is further padded and
unified with
(c5 (*universal *universal r5) alist31-rhs))
Thus, the grand result of the cross product operation is a profile of possible types for our actual
parameters, including the types of the Lisp variables in the context under each scenario. The result is a list
of forms which represent all the combinations of type descriptors characterizing the types of variables in
the environment and the types of the actual parameters to our function call, along with the concrete alist
appropriate to each combination. In the sense that unification is a narrowing operation which excludes
from its result pieces of the world which are not common to both parameters, the parts which are excluded
here represent combinations of variable and parameter types which cannot possibly occur.
Having taken the cross product, the next phase of handling a function call is to unify each result with each
of the signature segments for the function being called. This will allow us to project a type for the result
of the function call. The unifier, given two vectors of length i, will produce a (possibly empty) list of i-ary
vectors. Our vectors correspond to the following configuration:
(vtd1..vtdn ptd1..ptdm rtd)
Now consider the two vectors that we unify to produce such configurations. The first vector, which
characterizes information from the calling context, is specifically of the form:
(ctd1..ctdn atd1..atdm *universal)
The key function which constructs these vectors, unifies them, and composes the resulting 3-tuples is
TC-MATCH-ACTUAL-HALF-SEG-TO-FORMAL-SEG.
After doing this unification, we are close to having our results for the function call. Given our unification
result, we need to compose a list of 3-tuples for our final result. (Remember that we perform the above
unification once for each pairing of cross product results and signature segments, so there may be many of
these cases to consider. All the resulting 3-tuples will be gathered into a single list to be returned by
TC-INFER.) Each of the single unified vectors from the segment-matching unification will ultimately
produce a list of result 3-tuples. This is because one final unification needs to be performed in order to
construct the maximal segment. For each of these vectors, however, we already have the contents of the
minimal segment and the concrete alist. The minimal segment is composed directly from the segment-
matching vector. Its context is the first n descriptors in the vector, i.e., the ones corresponding to the
variables in the environment. The result descriptor is taken from the final descriptor in the vector. The
concrete alist returned with each tuple is the one from the cross product element which contributed to this
result.
Computing the maximal segment is what requires the last unification. We need to inject the type
information from the concrete alist of the 3-tuple into its minimal segment. So we unify the (flattened)
minimal segment with a vector formed by the right hand sides of that concrete alist and a *UNIVERSAL
descriptor, the latter corresponding to the result type. If the unification produces no results, this means
that the bindings of the identifiers in the CONC-ALIST are incompatible with this particular result from
the segment-matching unification, and so no 3-tuple is returned for this combination. If multiple results
are produced, we make a 3-tuple for each one. The maximal segment is constructed directly from the
vector returned by this final unification.
Obviously, given the right descriptors in the setting of a function call, quite a case explosion can occur.
Even though the checker’s operations on function calls are fairly complex, they would be much more
complicated if steps were taken to minimize the propogation of cases. This is the price of keeping the
algorithm "simple". The payoff is in relative clarity of the code and the soundness proof.
An concrete example showing the steps described above for a function call appears just below in Section
6.5.3.
Having dispensed with function calls, we need only cover IF expressions to finish the discussion of
TC-INFER. TC-INFER’s handling of IF expressions is relatively straightforward. First, it calls itself
recursively on the first argument of the IF, the test form, with the same alists provided for the IF. Here, as
with any other point in the computation, if the guards fail to be validated for any function call, *GUARD-
VIOLATION is returned and percolates back to the top level call.
The analysis of the IF test returns the usual list of 3-tuples. Each element of this list provides a context for
considering the THEN and ELSE forms. For each result, if the result type (as characterized by the result
component of the maximal segment) can possibly be something other than NIL, then we call TC-INFER
recursively on the THEN form, with the same ABS-ALIST and with a CONC-ALIST formed from the
context descriptors of the maximal segment. Likewise, if the result type can possilby be NIL, then we call
TC-INFER recursively on the ELSE form. All the resulting segments, from these recursive calls for each
context provided by the TEST form, are appended to form the result for the IF form.
Type Checking for Common Lisp 111
The Signature Checker
This section contains a pedagogical example detailing the steps which TC-INFER takes in handling a
function call. A full, general discussion of the algorithm appears in Section 6.5.2 above. The best way to
read this section would likely be to follow the description in that section along with each illustrated step in
the example below.
Consider that our database already includes the following collection of functions, whose signatures, as
computed by the inference system, are provided. Recognizer descriptors are shown where appropriate.
Note that we use the "->" in this section to separate the left and right hand sides of a segment. This is for
descriptive purposes only. The "->" does not appear in any concrete representations of segments.
(DEFUN BAR-Y-GUARDP (Y)
(IF (INTEGERP Y) T (CHARACTERP Y)))
Function: BAR-Y-GUARDP
TC Guard:
(*UNIVERSAL)
Segments:
(((*OR $CHARACTER $INTEGER)) -> $T)
(((*OR $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
Recognizer descriptor:
(*OR $CHARACTER $INTEGER)
(DEFUN BAR (X Y)
(DECLARE (XARGS :GUARD (IF (TRUE-LISTP X) (BAR-Y-GUARDP Y) NIL)))
(IF (NULL X)
NIL
(CONS Y (BAR (CDR X) Y))))
Function: BAR
TC Guard:
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(*OR $CHARACTER $INTEGER))
Segments:
(($NIL (*OR $CHARACTER $INTEGER)) -> $NIL)
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER)
-> (*CONS $CHARACTER
(*REC !REC8 (*OR $NIL (*CONS $CHARACTER (*RECUR !REC8))))))
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$INTEGER)
-> (*CONS $INTEGER
(*REC !REC9 (*OR $NIL (*CONS $INTEGER (*RECUR !REC9))))))
(DEFUN BIM (X Y)
(DECLARE (XARGS :GUARD (TRUE-LISTP X)))
(IF (NULL X) NIL Y))
Function: BIM
TC Guard:
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
Segments:
(($NIL *UNIVERSAL) -> $NIL)
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(*FREE-TYPE-VAR 1.))
112 Type Checking for Common Lisp
The Signature Checker
Function: FOO-Y-GUARDP
TC Guard:
(*UNIVERSAL)
Segments:
(((*OR $CHARACTER $INTEGER $NIL)) -> $T)
(((*OR $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
Recognizer descriptor:
(*OR $CHARACTER $INTEGER $NIL)
(DEFUN FOO (X Y)
(DECLARE (XARGS :GUARD (IF (TRUE-LISTP X) (FOO-Y-GUARDP Y) NIL)))
(IF (NULL X)
NIL
(IF (INTEGERP (CAR X))
(CONS (CAR X) Y)
(IF (CHARACTERP (CAR X))
(CONS (CAR X) NIL)
NIL))))
Function: FOO
TC Guard:
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(*OR $CHARACTER $INTEGER $NIL))
Segments:
(((*OR $NIL
(*CONS (*OR $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL $STRING
$T (*CONS *UNIVERSAL *UNIVERSAL))
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))))
(*OR $CHARACTER $INTEGER $NIL))
-> $NIL)
(((*CONS $INTEGER
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER)
-> (*CONS $INTEGER $CHARACTER))
(((*CONS $INTEGER
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$INTEGER)
-> (*CONS $INTEGER $INTEGER))
(((*CONS $INTEGER
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$NIL)
-> (*CONS $INTEGER $NIL))
(((*CONS $CHARACTER
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(*OR $CHARACTER $INTEGER $NIL))
-> (*CONS $CHARACTER $NIL))
The guard establishes the following concrete type alist at the point of the call to FOO.
((X . (*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(Y . (*OR $CHARACTER $INTEGER)))
The 3-tuples produced for (BAR X Y) contain the following segments. (For each tuple, the minimal and
maximal segments are the same, and the CONC-ALIST is the same as the type alist above.
(($NIL $CHARACTER) -> $NIL)
(($NIL $INTEGER) -> $NIL)
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER)
-> (*CONS $CHARACTER
(*REC !REC8 (*OR $NIL (*CONS $CHARACTER (*RECUR !REC8))))))
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$INTEGER)
-> (*CONS $INTEGER
(*REC !REC9 (*OR $NIL (*CONS $INTEGER (*RECUR !REC9))))))
For (BIM X Y), the tool generates three 3-tuples, whose minimal segments are:
(($NIL &2) -> $NIL)
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
&2)
-> &2)
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
&2)
-> &2)
The latter segment is repeated because it occurs in conjunction with two different CONC-ALISTS and
two different maximal segments. The corresponding maximal segments are:
(($NIL (*OR $CHARACTER $INTEGER)) -> $NIL))
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER)
-> $CHARACTER)
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$INTEGER)
-> $INTEGER)
First we perform the guard verification step. We combine, using *OR, all the result types from the
maximal segments for (BAR X Y) to give a conservative type for the first argment, and likewise for
(BIM X Y) and the second argument. Thus, the type constructed for the first argument is:
(*OR $NIL
(*CONS $CHARACTER
(*REC !REC8 (*OR $NIL (*CONS $CHARACTER (*RECUR !REC8)))))
(*CONS $INTEGER
(*REC !REC9 (*OR $NIL (*CONS $INTEGER (*RECUR !REC9))))))
114 Type Checking for Common Lisp
The Signature Checker
The containment algorithm determines that the descriptors for the two arguments are contained in the
guard descriptor for FOO, so we have passed the guard verification test.
Next we use TC-MAKE-ARG-CROSS-PRODUCT to find all possible combinations of types for the first
and second arguments, so they can be used to match against the segments for FOO. TC-MAKE-ARG-
CROSS-PRODUCT takes the minimal segments and CONC-ALISTs for the arguments and forms the
cross product of all their possible combinations, and then uses unification to merge each combination into
a single schema. The form of each such schema is a list of three lists. The first list gives descriptors for
the variables in the environment; the second list gives descriptors for each of the arguments to the function
call; and the third list is the right hand sides for the combined CONC-ALIST. (In our example, the
CONC-ALIST component will not be very interesting. Since all the incoming CONC-ALISTs are
identical, the resulting CONC-ALISTs will also be identical with the originals. But this is not always the
case; IF forms in the arguments could result in different CONC-ALISTs on the input segments.) The
result of the cross product operation in our example is:
((($NIL $CHARACTER) ; types of X and Y
($NIL $NIL) ; types of (BAR X Y) and (BIM X Y)
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(*OR $CHARACTER $INTEGER))) ; CONC-ALIST for X and Y
(($NIL $INTEGER) ; types of X and Y
($NIL $NIL) ; types of (BAR X Y) and (BIM X Y)
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(*OR $CHARACTER $INTEGER)))
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER) ; types of X and Y
((*CONS $CHARACTER ; types of (BAR X Y) and (BIM X Y)
(*REC !REC8 (*OR $NIL (*CONS $CHARACTER (*RECUR !REC8)))))
$CHARACTER)
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(*OR $CHARACTER $INTEGER)))
(((*CONS *UNIVERSAL
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$INTEGER) ; types of X and Y
((*CONS $INTEGER
(*REC !REC9 (*OR $NIL (*CONS $INTEGER (*RECUR !REC9)))))
$INTEGER) ; types of (BAR X Y) and (BIM X Y)
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(*OR $CHARACTER $INTEGER))))
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER)
-> (*CONS $INTEGER $CHARACTER))
(((*CONS $INTEGER
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$INTEGER)
-> (*CONS $INTEGER $INTEGER))
(((*CONS $INTEGER
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$NIL)
-> (*CONS $INTEGER $NIL))
(((*CONS $CHARACTER
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(*OR $CHARACTER $INTEGER $NIL))
-> (*CONS $CHARACTER $NIL))
TC-INFER unifies each cross product result with each segment for FOO in a grand unification which
involves the type of each variable in the environment, the type of each parameter, and the type of the
result of the call to FOO. Since there is no descriptor corresponding to TC-INFER-EXAMPLE’s variable
types in the segments for FOO, and since there is no descriptor for the result of FOO in the cross product,
both schema are padded accordingly with *UNIVERSAL. Let us consider the unification performed for
the first cross product element and the first segment of FOO. The cross product element is:
(($NIL $CHARACTER) ; types of X and Y
($NIL $NIL) ; types of (BAR X Y) and (BIM X Y)
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
(*OR $CHARACTER $INTEGER)))
The first two cross product element components are combined into a single list and extended with
*UNIVERSAL, the latter representing the result of the call to FOO, giving the descriptor list:
(*DLIST $NIL $CHARACTER $NIL $NIL *UNIVERSAL)
These two descriptor lists are then unified, using DUNIFY-DESCRIPTORS-INTERFACE, giving the
result:
(*DLIST $NIL $CHARACTER $NIL $NIL $NIL)
Since the immediate goal is to form a segment mapping the types of X and Y to the type of the result of
the function call, we are no longer interested in the types of the function parameters in this result, so we
ignore them. The minimal segment formed from the remnant is:
(($NIL $CHARACTER) -> $NIL)
Note that had the result of the unification been *EMPTY, this unification would have resulted in zero
segments. Had the result been an *OR of *DLISTs, the result would have been one minimal segment for
each disjunct.
The final step is to form the maximal segment by unifying the minimal segment with a descriptor list
composed of the descriptors from the CONC-ALIST provided with this cross product element, padded
with an extra element for the result type. I.e., we unify the descriptor from the minimal segment:
(*DLIST $NIL $CHARACTER $NIL)
So the maximal segment is identical with the minimal segment. The CONC-ALIST completing the
3-tuple is the one from the cross product element.
This process is repeated for each combination of cross product elements and segments from FOO, and all
the results collected into a list of 3-tuples. The collection of 3-tuples for (FOO (BAR X Y) (BIM X Y))
is:
(((($NIL $CHARACTER) -> $NIL) ; minimal segment
((X . (*REC TRUE-LISTP ; CONC-ALIST
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(Y . (*OR $CHARACTER $INTEGER)))
(($NIL $CHARACTER) -> $NIL)) ; maximal segment
((($NIL $INTEGER) -> $NIL) ; minimal segment
((X . (*REC TRUE-LISTP ; CONC-ALIST
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(Y . (*OR $CHARACTER $INTEGER)))
(($NIL $INTEGER) $NIL)) ; maximal segment
((((*CONS *UNIVERSAL ; minimal segment
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$INTEGER)
-> (*CONS $INTEGER $INTEGER))
((X . (*REC TRUE-LISTP ; CONC-ALIST
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(Y . (*OR $CHARACTER $INTEGER)))
(((*CONS *UNIVERSAL ; maximal segment
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
Type Checking for Common Lisp 117
The Signature Checker
-> $INTEGER)
(*CONS $INTEGER $INTEGER)))
((((*CONS *UNIVERSAL ; minimal segment
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER)
-> (*CONS $CHARACTER $NIL))
((X . (*REC TRUE-LISTP ; CONC-ALIST
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(Y . (*OR $CHARACTER $INTEGER)))
(((*CONS *UNIVERSAL ; maximal segment
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
$CHARACTER)
-> (*CONS $CHARACTER $NIL))))
Since this is the outermost form of the body, the maximal segments from this collection of tuples will
become the segments for function TC-INFER-EXAMPLE.
In this system, as in many type inference systems, unification is a central algorithm. [Knight 89] The
unification of descriptors is perhaps the central algorithm of both the inference algorithm and the signature
checker. As we can see from the preceding discussion of the checker, it is at the heart of the process of
deriving segments for function call results. Moreover, the entire function signature scheme, though
reminiscent of the traditional notation of function signature in programming (modified for OR semantics,
see the discussion in Section 5.7.1) is oriented toward applying the unifier to both the guard vector and the
segments. Thus in large measure, to understand the inference algorithm and the type checker, one needs
to understand at least the role which the unifier plays.
Informally, the specification of descriptor unification is as follows. Given two descriptors (or two lists of
descriptors of the same arity), the unifier produces a descriptor (or list of descriptors of the same arity as
the parameters) such that, under any given binding of type variables, any value (or list of values) which
satisfies both the input descriptors under our INTERP-SIMPLE or INTERP-SIMPLE-1 function will also
satisfy the result descriptors. In one sense, this specification encourages the notion that unification can
produce descriptors which are larger than necessary, in order to satisfy this soundness constraint. After
all, the unifier could always return *UNIVERSAL and satisfy this specification. But this is just our
soundness requirement, and our usability requirement is that the unifier should produce the most specific
descriptor possible which satisfies this constraint. In this sense, unification is essentially a narrowing
operation, and one may think of unification in the spirit of an intersection operation, where we want the
result to signify only values common to both input descriptors. Though this usability requirement is not
formalized, if we relaxed our treatment of it in the implementation, we would surely generate uselessly
general signatures.
DUNIFY-DESCRIPTORS-INTERFACE is the top level function of the unifier. It takes two formal
parameters, the descriptors to be unified, and returns a descriptor representing the unified result. In the
case where the arguments are single type descriptors, DUNIFY-DESCRIPTORS-INTERFACE returns a
single type descriptor. Alternatively, it can take as arguments two *DLIST forms, each of the form
(*DLIST td1 .. tdn), where each tdi is a single type descriptor, and return either a *DLIST of length n or
118 Type Checking for Common Lisp
The Signature Checker
an *OR of *DLIST forms, each of length n. A *DLIST is simply a notation for packaging a list of simple
descriptors into a single argument, serving like a mutual recursion flag so the function can take either
single descriptors or lists of descriptors. When an *OR of *DLIST forms is returned, it signifies that any
of the *DLISTS is a correct unification of the arguments. That it is packaged as an *OR is perhaps a case
of poor naming, since this *OR has no formal relation to the normal *OR descriptor, but indicates to the
caller that the result should be unpacked and considered a list of possible answers. In fact, *OR was used
because within DUNIFY-DESCRIPTORS-INTERFACE is a call to DUNIFY-DESCRIPTORS, which
can return a list of unified forms. These are combined with *OR, which is perfectly natural in the single
descriptor case. When *DLIST forms are used as arguments, the values used in the specification Lemma
DUNIFY-DESCRIPTORS-INTERFACE-OK must be considered lists of values of the same length as the
*DLISTs, and the calls of "I" are calls of INTERP-SIMPLE rather than INTERP-SIMPLE-1.
(and (I tda v b)
(I tdb v b))
=>
(I (dunify-descriptors-interface tda tdb) v b)
B is a binding of type variables to singleton values. By "fully instantiating", we mean simply that all the
variables appearing in the descriptor in question are bound in B.
This function checks that under a given binding B, each substitution maps a variable to a descriptor which
accurately characterizes the value to which the same variable is bound in B. I.e., for each substitution
(&i . tdi), under a binding including (&i . v), (INTERP-SIMPLE-1 tdi v b) = T. The substitution list is
used to accumulate information about variables already encountered on recursive descent into the
problem.
Type Checking for Common Lisp 119
The Signature Checker
A lemma which will be most critical to the proof of soundness for the unifier is the specification of
DUNIFY-DESCRIPTORS, below. It omits mention of TERM-RECS, which has no bearing on
soundness. That is, one can insert TERM-RECS in the call to DUNIFY-DESCRIPTORS, viewing it as
universally quantified.
Lemma DUNIFY-DESCRIPTORS-OK
Denoting (dunify-descriptors tda tdb substs)
by ((td1 . substs1) .. (tdn . substsn)),
for all v and fully instantiating b,
(and
H1 (interp-substs substs b)
H2 (I tda v b)
H3 (I tdb v b))
=>
for some i,
(and (interp-substs substsi b)
(I tdi v b))
Paraphrased, this says that under any binding, if a value satisfies both input descriptors under the binding,
and if the substitution is valid under the binding, then there exists some descriptor-subst pair in the result
such that the subst is valid, and the value satisfies the descriptor.
The unification rules are as follows. They should be considered in order, as two descriptors may match
more than one action. Thus, the triggering condition for any rule may be read to include the negation of
previous triggering conditions.
1. If DESCRIPTOR1 equals DESCRIPTOR2, then the result is the singleton list containing
(DESCRIPTOR1 . SUBSTS). (This corresponds to Case 1 in the proof of Lemma
DUNIFY-DESCRIPTORS-OK.)
2. If either descriptor is *EMPTY, the result is NIL. (Cases 2 and 2’)
3. If either descriptor is *UNIVERSAL, the result is the singleton pairing the other descriptor
with SUBSTS. (Cases 3 and 3’)
4. If DESCRIPTOR1 is of the form (*OR td1 .. tdn), then if DESCRIPTOR2 is equal to some
tdi, the result is the list containing only (DESCRIPTOR2 . SUBSTS), otherwise we append
the lists from the results of unifying DESCRIPTOR2 with each tdi under the same SUBSTS
and TERM-RECS. (Case 4)
5. If DESCRIPTOR2 is of the form (*OR td1 .. tdn), then proceed symmetrically as above.
(Case 4’)
120 Type Checking for Common Lisp
The Signature Checker
6. We will describe the case where either of the descriptors is a variable below. (Case 5)
7. If both DESCRIPTOR1 and DESCRIPTOR2 are *REC forms, we resort to a bag of tricks,
also described below. (Case 6.1)
8. If DESCRIPTOR1 is a *REC form, then if the pair (DESCRIPTOR1 . DESCRIPTOR2) is a
member of TERM-RECS, we return a list containing only (DESCRIPTOR2 . SUBSTS).
Otherwise, we recur, with DESCRIPTOR1 opened up, and the pair
(DESCRIPTOR1 . DESCRIPTOR2) pushed onto TERM-RECS. It would be possible, with
*REC forms in both descriptors, for this algorithm to go into an infinite recursion, first
unfolding one *REC and then the other, or by some similar but more elaborate scenario.
But DUNIFY-DESCRIPTORS maintains a stack of pairs of operands, the TERM-RECS
parameter, so that when it is recursively called with a pair which it has already seen, it can
recognize that it is in an endless recursive chain. Any infinite unfolding in this manner must
come back around to considering the original case, buried beneath a number of recursive
calls. Almost always, this indicates that no value can satisfy both descriptors, and hence,
almost always it would be sound to return *EMPTY rather than DESCRIPTOR2. An
example of this would be:
(DUNIFY-DESCRIPTORS
’(*REC DOUBLE
(*OR $NIL (*CONS *UNIVERSAL
(*CONS *UNIVERSAL
(*RECUR DOUBLE)))))
’(*CONS *UNIVERSAL
(*REC DOUBLE
(*OR $NIL (*CONS *UNIVERSAL
(*CONS *UNIVERSAL
(*RECUR DOUBLE))))))
SUBSTS TERM-RECS)
Here, proceeding without a stopper would result in first opening DESCRIPTOR1 and diving
into the *CONS, then opening DESCRIPTOR2 and diving into the *CONS, ad infinitum.
And there is no value which satisfies both these descriptors, since to do so would require a
list whose length is both even and odd. Unfortunately, it is also possible that some value
could satisfy both descriptors, and our algorithm simply is not strong enough to formulate
the unifying descriptor. For this reason, we have to be conservative, so we return
DESCRIPTOR2. We could return either DESCRIPTOR1 or DESCRIPTOR2 if we could
figure out which were better in some sense, but in the current implementation we do not
attempt to do so. (Case 6.2)
9. If DESCRIPTOR2 is a *REC form, proceed symmetrically as above. (Case 6.2’)
10. If both DESCRIPTOR1 and DESCRIPTOR2 are *CONS descriptors, we do exactly as with
*DLISTS below (since this is essentially a special case of length two), except that at the end
we glue the *CONS operator back onto the descriptors instead of *DLIST. (Case 7)
11. If both DESCRIPTOR1 and DESCRIPTOR2 are *DLIST descriptors, first we make sure the
lists are of the same length (otherwise indicates an error in the implementation, as does
unifying a *DLIST with a single descriptor). Then we engage in a recursive algorithm for
unifying lists of descriptors. First we unify the first descriptor in DESCRIPTOR1 with the
first descriptor in DESCRIPTOR2. If this is the end of the list, we return this result (with
the result descriptors wrapped in a list). Otherwise, we append the results of recurring, for
each descriptor-subst pair returned for the CARs, on the CDRS. On each recursive call, we
use the substs from the pair, which is a composition of the original SUBSTS and additional
substitutions accumulated from the CAR. Each recursive call will return a list of pairs, and
from this list we make another list, where we form the pairs by CONS-ing our CAR
descriptor onto the descriptor list in each result pair, and using the substitutions from the
result pair without modification. When we emerge from this recursive traversal, we stick
the *DLIST operator back onto each complete list, so that the "descriptor" in each of our
result pairs is a *DLIST of the same arity as the original. (Case 8)
Type Checking for Common Lisp 121
The Signature Checker
12. Otherwise return NIL, since we have eliminated all possibilities except that one descriptor is
a *CONS and the other is not, or that both descriptors are atomic simple descriptors like
$INTEGER but are not equal. Thus, there can be no commonality in the values they
represent. (Case 9)
So what do we do with variables? A basic notion is that when we unify a variable with something, the
result is a list of descriptor-substitution pairs where the variable is the descriptor and the substitution is the
input SUBSTS augmented with a new element which maps the variable to the thing with which the
variable is being unified. But adding this new element can be a complex operation. For instance, if the
variable is already mapped in SUBSTS, then the two mappings need to be reconciled. This is done by
unifying the two right hand sides, since unification is the operation which forms a descriptor for the
common ground between two descriptors. So with the complexity thus unfolding, let us look in detail at
how a new element is added to the SUBST list.
The merge is performed as follows. If there is no current element mapping VARIABLE in SUBSTS, we
simply insert (VARIABLE . DESCRIPTOR) as a new element. The SUBSTS are maintained as an
ordered list, according to the lexical ordering of the bound variables. As discussed below, this ordering is
one of several tricks that are used to help avoid potential problems with infinite loops. So inserting a new
element means inserting it at the proper position in the list. We then return the singleton list containing
this augmented SUBST.
If there is already an element mapping VARIABLE in SUBSTS, we get the right hand side and unify it
with our DESCRIPTOR, using the original SUBSTS as the parameter on the recursive call to DUNIFY-
DESCRIPTORS. We take each result from the unifier and create new substitution lists as follows,
collecting all the resulting substitution lists into a great list of substitutions. First, we compare the
substitution returned from the unifier with the substitution we started with, to see if the right hand sides
for any variables other than the one we are concerned with are different (or not present) in the two
substitutions. If not, then we just include in our result the original substitution with the right hand side for
our variable replaced with the descriptor from the unifier result which we are considering.
If there are any other different right hand sides, however, we engage in a second level merging operation.
This second level operation takes a list of substitution elements and a list of substitutions and returns a list
of substitutions. The idea is that we return all substitutions obtainable by "merging" the elements into any
of the given substitutions. The initial list of substitution elements contains all the elements in the
substitution (returned by the unifier) being considered which are new or different relative to the original
SUBSTS. The initial list of substitutions contains the single original substitution SUBSTS. We proceed
as follows, in the order determined by the lexical ordering of the variables. We take the first substitution
element in our first argument and use a recursive call of DMERGE-NEW-SUBST to merge the element
into each substitution in the second argument. The result from each such merge is, of course, a list of
substitutions, and all these lists are appended to form the results of merging the first new element. Then,
we merge the next new element into all these substitution lists, and so forth, until each variable with a new
right hand side has been merged.
122 Type Checking for Common Lisp
The Signature Checker
After this second level merging finally returns a list of substitutions, we return to the variable whose
unification began this whole process, and we replace its old right hand side in each of the substitution lists
from the second level merge with the right hand side proposed from the original unification. This final
step is the same as the final step for the case, described two paragraphs above, where no second level
merge was required.
The algorithm uses some safeguards against construction of ill-formed substitutions. For instance, the
case analysis in DUNIFY-DESCRIPTORS ensures that circularities that consist only of substitution
elements which map variables to variables are never constructed. Essentially, this is accomplished by
mapping variables directly to other variables only in a lexically downward manner, and by never allowing
an *OR as the top level form on the right hand side in a substitution element. (A variable could appear as
one of the disjuncts, thus creating a potential loop if it is lexically greater than the variable on the left hand
side.) Also, DUNIFY-DESCRIPTORS ensures that no substitution element can be created where the
right hand side is a *REC descriptor which, if opened up, could be an *OR with a variable as a top-level
disjunct.
But in the merging operation, it may still be possible to create a substitution in which a circularity exists
where some variable in the loop is embedded within a *CONS. As originally constructed, the algorithm
removed any such substitution from consideration, under the presumption that it represented an
unrealizable situation where some Lisp CONS value would be required to be equal to some component of
itself. For example,
((&1 . (*CONS &2 $NIL)) (&2 . &1))
The soundness assumption for any substitution which is integral to the proof of DUNIFY-
DESCRIPTORS-OK is that there may exist some binding of its type variables to Lisp values such that no
inconsistency arises. In this case, let us generically say that this binding is
((&1 . val1) (&2 . val2))
Clearly, there is a contradiction here, since VAL1 cannot be equal to its own CAR. But consider the
substitution
((&1 . (*CONS (*OR $T &2) $NIL)) (&2 . &1))
This substitution contains a loop, but is perfectly consistent with the binding
((&1 . (CONS T NIL)) (&2 . (CONS T NIL)))
Therefore, we cannot remove a substitution from consideration simply because it contains a loop, since
doing so may amount to removing the sole solution satisfying the conclusion of DUNIFY-
DESCRIPTORS-OK.
One possible way out of this quandary would be to analyze the substitution to determine how it could be
soundly pruned to remove the loop. The previous example could be soundly transformed to
((&1 . (*CONS $T $NIL)) (&2 . &1))
since the &2 disjunct in the substitution element for &1 did not contribute to the soundness argument with
respect to the good binding. But in general, it may be difficult to determine just where to perform this
kind of surgical pruning.
But it so happens that despite the measures taken, the proof we give for DUNIFY-DESCRIPTORS-OK is
a partial correctness proof, in that termination is not guaranteed. So, rather than complicate both the
algorithm and its proof of correctness with a difficult pruning operation, we chose to simply allow
DUNIFY-DESCRIPTORS to return a looping substitution. Then, when DUNIFY-DESCRIPTORS-
INTERFACE prepares to apply the substitution to construct its final result, it first checks that it is
well-formed. If there are any circularities, the algorithm announces that an ill-formed substitution was
constructed and punts by returning the descriptor argument whose form contains the fewer CONSes.
Either argument descriptor could be returned, and any heuristic for returning one or the other would be
trivially sound. We chose this heuristic because of its simplicity. The function which tests for
well-formedness is WELL-FORMED-SUBSTS, and its definition is in Appendix G.8. It checks both the
downward-directedness of all variable-to-variable substitution elements and the absence of circularity.
The saving grace is that, while we can construct, by hand, calls to the unifier which will result in the
construction of looping substitutions, we have failed, despite some thought and effort, to construct an
inference problem which led to such a call. So there is no apparent loss of functionality in this non-
implemention of a termination safeguard.
Now let us pop back to the two outstanding original cases: unifying a variable with another variable, and
unifying a variable with a non-variable descriptor. When we unify a variable with another variable, we
first determine which is lexically greater. We merge the binding of the larger variable to the smaller one
into SUBSTS, and for each resulting substitution, we form a descriptor-substitution pair where the
descriptor is the smaller variable. So:
(dunify-descriptors ’&1 ’&2 nil nil)
=
((&1 . ((&2 . &1))))
where the result is a single pair whose descriptor component is &1 and whose substitution list is ((&2 .
&1)). Also,
(dunify-descriptors ’&3 ’&2 ’((&2 . (*cons &1 $nil))) nil)
=
124 Type Checking for Common Lisp
The Signature Checker
and
(dunify-descriptors ’&3 ’&2 ’((&2 . &1)) nil)
=
((&2 . ((&2 . &1) (&3 . &2))))
If we unify a variable with a descriptor other than a variable, we first check several special cases. For the
sake of exposition, let us denote DESCRIPTOR1 by "&i". We have already considered the possibility
that DESCRIPTOR2 is an *OR descriptor, and in the case where DESCRIPTOR1 is one of its disjuncts,
we simply returned DESCRIPTOR1 paired with our original SUBSTS. But we have not yet considered
the possibility that DESCRIPTOR2 is a *REC form. If it is, and the body of the *REC descriptor is an
*OR form, and if &i is one of the possible disjuncts of that *OR, then we treat it as we would have with
the simple *OR, returning the single pair ((&i . SUBSTS)).
If DESCRIPTOR2 is some other descriptor in which &i appears, we do a trick which may seem arcane,
but which is easily justified and may result in more accurate results. We know from previous tests that
DESCRIPTOR2 in no way represents an *OR form where &i is one of its disjuncts. We also know that
DESCRIPTOR2 is not equal to &i. Therefore, if &i appears in DESCRIPTOR2, it must be because it
appears within some *CONS structure. Since variables represent specific Lisp values, there is no way &i
can unify with, or represent the same value as, a descriptor with &i buried inside a *CONS. This is not to
say our unification will fail. A valid result of unifying &i with
(*CONS (*OR &i $INTEGER) $NIL)
is
((&i . ((&i . (*CONS $INTEGER $NIL)))))
So what we do is "screen" DESCRIPTOR2 for occurrences of &i, eliminating them where they appear as
possible disjuncts in any *OR configuration. We can then return &i as our descriptor, paired with each
substitution formed by merging the mapping of &i to the screened descriptor with the original SUBSTS.
Of course, if the screened descriptor is effectively *EMPTY, we return an empty list. The code
implementing this case is presented in Appendix G.10.
If &i does not appear in DESCRIPTOR2, we first consider the case where DESCRIPTOR2 is a *REC
which has some variable as one of its disjuncts, for example where DESCRIPTOR2 is of the form
(*REC FOO (*OR $NIL &j (*CONS *UNIVERSAL (*RECUR FOO))))
If this is the case, we simply open the *REC using OPEN-REC-DESCRIPTOR-ABSOLUTE recursively
unify it with &i. We open the *REC because we do not want to allow a substitution of the form:
(&i . (*REC FOO (*OR $NIL &j (*CONS *UNIVERSAL (*RECUR FOO)))))
This is because, in our notion of substitutions, we have meticulously avoided any substitution mapping a
variable to an *OR, as this could profoundly complicate the issue of termination by allowing for the
creation of ill-formed, but realizable substitutions. (See discussion above.) A substitution element like
that above is a mapping to an *OR in thin disguise, and is likewise to be avoided. We accomplish this by
exposing the *OR as the top level form (i.e., by opening the *REC), so that on our recursive call we unify
with the *OR. As described in the original case analysis, this results in the *OR being split, and we wind
up creating distinct substitutions within which we map the variable to each disjunct of the *OR.
Type Checking for Common Lisp 125
The Signature Checker
This strategy itself could cause a non-termination problem resulting from the infinite unfolding of *REC
descriptors. (Perhaps our &i originated from a similar position to that of &j in some other *REC
descriptor.) This situation could only arise either from the unification of two *REC descriptors or from
the unification of a *REC descriptor with another descriptor containing a *REC. In the first case, our
special case *REC unification rules (described below) are such that no direct unification of *RECs can
lead to this scenario. And the TERM-RECS mechanism guards against infinite recursion arising from the
unification of a *REC with a non-*REC. In the absence of a termination proof, of course, we have to take
these claims on faith.
We identified the previous special case to protect against it falling into our last course of action, which is
simply to merge the substitution element (&i . DESCRIPTOR2) into SUBSTS, and to pair the descriptor
&i with each resulting substitution. This, in fact, is the normal case.
That completes the discussion of unification where one of the descriptors is a variable.
The only case which has not yet been explained is that of unifying two *REC descriptors. As stated
previously, there is no general algorithm for this case in the checker; it is handled by resorting to a bag of
tricks. The tricks are a collection of rules which are triggered essentially by pattern matching against the
arguments. A default rule handles all cases for which no other rule is eligible. Although a general
procedure like the one employed in the inference algorithm would have been attractive and powerful, we
were unable to imagine its proof against the semantic model. By using a rule-based approach, we reduce
the proof of this case to a manageable proof of soundness for each of our collection of special-case
rules.30
As an aid to pattern matching, we transform the descriptors to a slightly different canonical form. In the
case where the body of the *REC is an *OR (the normal case), we gather all the non-recurring cases into a
single disjunct. So for example,
(*REC FOO (*OR $INTEGER $NIL &1 (*CONS *UNIVERSAL (*RECUR FOO))
(*CONS (*RECUR FOO) *UNIVERSAL)))
is transformed to:
(*REC FOO (*OR (*OR $INTEGER $NIL &1)
(*CONS *UNIVERSAL (*RECUR FOO))
(*CONS (*RECUR FOO) *UNIVERSAL)))
If the top level form is a *CONS, we similarly re-canonicalize both its CAR and CDR components. After
application of the rules, any resulting descriptors are re-canonicalized to the more conventional original
form.
Each rule has two parts, an enabling condition and an action. The enabling condition is basically a pattern
match against the arguments. The action typically reduces the unification problem for the *RECs to some
embedding of recursive calls to the unifier on selected components of the *RECs. In some cases the
action is simply to note failure by returning NIL. In the default rule, employed when no other enabling
condition is satisfied, the action is to return the list containing a single pair consisting of the argument
with the lesser number of atoms in its form and the input substs. This is a conservative result, but after
exhausting all the possibilities embodied in the rules, the most precise characterization of the unifying
descriptor we could manage would be either DESCRIPTOR1 or DESCRIPTOR2, and we choose the
30
In the implementation, the top level function for *REC unification is DUNIFY-RECS.
126 Type Checking for Common Lisp
The Signature Checker
The rules employed are as follows. The notation is in a generic pidgin descriptor dialect, meant for
brevity rather than precision. Each rule is stated as an equality, meaning that the unification represented
on the left hand side results in the answer on the right hand side. Names like "foo", "bar", and "bim" are
generic *REC descriptor labels. Names like "d1", "d2", etc. represent arbitrary descriptors, except as
qualified in the text. "s" represents the input SUBSTS. The TERM-RECS argument is not mentioned.
The pattern embodied in the enabling condition is stated implicitly in the form of the arguments. In all the
*REC rules, we assume that the only occurrences of *RECUR with the same label as the top level *REC
are those explicitly shown. Rules whose names end with a "’" character are just slight modifications of
the rules with the same, but unprimed, names. The rules are considered in the order presented (as opposed
to the order of their numbers), hence with any rule we can assume that the enabling condition for all
previously stated rules has failed.
On the right hand sides of the rules, a name like "d12" represents the result of a call to
(DUNIFY-DESCRIPTORS-INTERFACE d1 d2), where d1 and d2 are variable-free descriptors. If we
were to use a call to DUNIFY-DESCRIPTORS instead, the result would be a list of descriptor-
substitution pairs where in each pair the substitution is our original substitution. All results with the same
substitution list may be combined with *OR, therefore in this case we can return a singleton result, with
the pair being the *OR of all the descriptor results. We are lazy. DUNIFY-DESCRIPTORS-
INTERFACE performs this disjunction for us, and since the recursive arguments are variable-free, we
need not be concerned about the impact of this recursive call on any substitution. So we simply use
DUNIFY-DESCRIPTORS-INTERFACE and pair its result with the substitution argument.
Rule DUNIFY-*REC12
Where the two *recs differ only in name, and where bar < foo in
the lexical ordering DESCRIPTOR-ORDER,
(dunify-descriptors (*rec foo ( .. (*recur foo) .. ))
(*rec bar ( .. (*recur bar) .. ))
s)
= (((*rec bar ( .. (*recur bar) .. )) . s))
Rule DUNIFY-*REC1
(dunify-descriptors
(*rec foo (*or $nil (*cons d1 (*recur foo))))
(*rec bar (*or $nil (*cons d2 (*recur bar))))
s)
=
(((*rec bim (*or $nil (*cons d12 (*recur bim)))) . s))
Rule DUNIFY-*REC1’
(dunify-descriptors
(*rec foo (*or $nil (*cons (*recur foo) d1)))
(*rec bar (*or $nil (*cons (*recur bar) d2)))
s)
=
(((*rec bim (*or $nil (*cons (*recur bim) d12))) . s))
Comment:
Rule DUNIFY-*REC1’ is identical to DUNIFY-*REC1, except that the
Type Checking for Common Lisp 127
The Signature Checker
Rule DUNIFY-*REC2
Where d1 and d3 have no *CONS and no variables,
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 (*recur foo))))
(*rec bar (*or d3 (*cons d4 (*recur bar))))
s)
=
(((*rec bim (*or d13 (*cons d24 (*recur bim)))) . s))
Comment:
d1 and d3 are variable-free by prescription, d2 and d4 by definition,
since variables cannot appear in the replicating disjuncts of a *REC.
Rule DUNIFY-*REC2’
Where d1 and d3 have no *CONS and no variables,
(dunify-descriptors
(*rec foo (*or d1 (*cons (*recur foo) d2)))
(*rec bar (*or d3 (*cons (*recur bar) d4)))
s)
=
(((*rec bim (*or d13 (*cons (*recur bim) d24))) . s))
Comment:
Rule DUNIFY-*REC2’ is the same as Rule DUNIFY-*REC2, except that the
order of the arguments in the cons is reversed.
Rule DUNIFY-*REC3
Where d2, d3, and d4 contain no variables and d3 is either a
primitive descriptor or a disjunction of primitive descriptors,
and denoting
(dunify-descriptors &j (*rec bar (*or d3 (*cons d4 (*recur bar)))) s)
= ((&j . s1) .. (&j . sn)),
Rule DUNIFY-*REC3’
Where d2, d3, and d4 contain no variables and d3 contains no *cons
forms, and adopting the notation that
(dunify-descriptors &j (*rec bar (*or d3 (*cons (*recur bar) d4))) s)
= (((&j . s1) .. (&j . sn)),
Comment:
Rule DUNIFY-*REC3’ is the same as Rule DUNIFY-*REC3, except that the
order of the arguments in the cons is reversed.
Rule DUNIFY-*REC4
Where neither descriptor contains any variables and d1 and d5 contain
128 Type Checking for Common Lisp
The Signature Checker
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 d3) (*cons d4 (*recur foo))))
(*rec bar (*or d5 (*cons d6 (*recur bar))))
s)
=
(((*rec bim (*or d15 (*cons d26 d3bar) (*cons d46 (*recur bim)))) . s))
Rule DUNIFY-*REC4’
Where neither descriptor contains any variables and d1 and d5 contain
no *cons forms,
(dunify-descriptors-interface
d3 (*rec bar (*or d5 (*cons d6 d7) (*cons (*recur bar) d8))))
= d3bar
(dunify-descriptors
(*rec foo (*or d1 (*cons d3 d2) (*cons (*recur foo) d4)))
(*rec bar (*or d5 (*cons (*recur bar) d6)))
s)
=
(((*rec bim (*or d15 (*cons d3bar d26) (*cons (*recur bim) d46))) . s))
Comment:
This is Rule DUNIFY-*REC4 with the arguments to the conses reversed.
Rule DUNIFY-*REC5
With no variables appearing in either descriptor, and denoting
(dunify-descriptors-interface
d1 (*cons d4 (*rec bar (*or d3 (*cons d4 (*recur bar))))))
= d14bar, and
(dunify-descriptors-interface
d3 (*cons d2 (*rec foo (*or d1 (*cons d2 (*recur foo))))))
= d32foo
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 (*recur foo))))
(*rec bar (*or d3 (*cons d4 (*recur bar))))
s)
=
(((*rec bim (*or d13 d14bar d32foo (*cons d24 (*recur bim)))) . s))
Comment:
This rule is very general, in that it does not require d1 or d3 to be
cons-free. Since all the non-replicating disjuncts in the *rec body
are gathered into an *or, d1 and d3 represent all such terms, with
the only other restriction being that they are variable-free. Since
there may be other rules which apply to more specific cases and result
in less complex analysis than the computation of d13, d14bar, d32foo,
and d24, eligibility for applying this rule should be checked later
than the ones above.
Rule DUNIFY-*REC5’
With no variables in either descriptor,
(dunify-descriptors
(*rec foo (*or d1 (*cons (*recur foo) d2)))
(*rec bar (*or d3 (*cons (*recur bar) d4)))
s)
=
(((*rec bim (*or d13 d1bar4 d3foo2 (*cons (*recur bim) d24))) . s))
Comment:
Rule DUNIFY-*REC5’ is the same as Rule DUNIFY-*REC5, except that the
Type Checking for Common Lisp 129
The Signature Checker
Rule DUNIFY-*REC6
With no variables anywhere and no conses in d1 or d4,
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 (*cons d3 (*recur foo)))))
(*rec bar (*or d4 (*cons d5 (*recur bar))))
s)
=
(((*rec bim (*or d14 (*cons d25 (*cons d35 (*recur bim))))) . s))
Rule DUNIFY-*REC7
No variables anywhere and no conses in d1 or d2,
(dunify-descriptors
(*rec foo (*or d1 (*cons (*recur foo) (*recur foo))))
(*rec bar (*or d2 (*cons (*recur bar) (*recur bar))))
s)
=
(((*rec bim (*or d12 (*cons (*recur bim) (*recur bim)))) . s))
Rule DUNIFY-*REC8
No variables anywhere and no conses in d1 or d5,
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*recur bar))))
s)
=
(((*rec bim (*or d15 (*cons d26 (*cons d36 (*cons d46 (*recur bim))))))
. s))
Rule DUNIFY-*REC9
No variables anywhere and no conses in d1 or d5,
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*cons d7 (*recur bar)))))
s)
=
(((*rec bim
(*or d15
(*cons d26
(*cons d37
(*cons d46
(*cons d27
(*cons d36
(*cons d47
(*recur bim)))))))))
. s))
Rule DUNIFY-*REC13
No variables anywhere and no conses in d2 or d4
(dunify-descriptors (*rec foo (*cons d1 (*or d2 (*recur foo))))
(*rec bar (*cons d3 (*or d4 (*recur bar))))
s)
=
(((*rec bim (*cons d13 (*or d24 (*recur bim)))) . s))
Rule DUNIFY-*REC10
No variables anywhere and no conses in d2 or d3,
(dunify-descriptors
130 Type Checking for Common Lisp
The Signature Checker
Rule DUNIFY-*REC11
Where d2, d3, and d4 contain no variables, and adopting the notation
(dunify-descriptors &j (*rec bar (*or d3 (*cons d4 (*recur bar)))) s)
= ((&j . s1) .. (&j . sn))
(dunify-descriptors
d3 (*cons d2 (*rec foo (*or &j (*cons d2 (*recur foo))))) s)
= ((d32foo1 . sd31) .. (d32foom . sd3m)),
Rule DUNIFY-*REC11’
Where d2, d3, and d4 contain no variables, and adopting the notation
(dunify-descriptors &j (*rec bar (*or d3 (*cons (*recur bar) d4))) s)
= (((&j . s1) .. (&j . sn)) and
(dunify-descriptors
d3 (*cons d2 (*rec foo (*or &j (*cons (*recur foo) d2)))) s)
= (((d32foo1 . sd31) .. (d32foom . sd3m)
Rule DUNIFY-*REC14
No variables anywhere and no conses in d1, d3, d5, or d7,
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 (*or d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*or d7 (*cons d8 (*recur bar))))))
s)
=
(((*rec bim (*or d15 (*cons d26 (*or d37 (*cons d48 (*recur bim))))))
. s))
Rule DUNIFY-*REC0’
Where either foo or bar contains some variable,
(dunify-descriptors foo bar s)
=
(dunify-descriptors (replace-all-vars-with-universal foo)
(replace-all-vars-with-universal bar)
s)
Rule DUNIFY-*REC0
(dunify-descriptors (*rec foo .. ) (*rec bar .. ) s)
=
(if (< (fringe-length (*rec bar .. )) (fringe-length (*rec foo .. )))
(((*rec bar .. ) . s))
(((*rec foo .. ) . s)))
More rules could certainly be added. The rules that we use have been sufficient to support all but a very,
very few of the cases which have arisen in testing.
Despite all the steps taken to guarantee termination of the unifier algorithm, the termination argument is
very difficult. Among many other factors, the opening of *REC descriptors causes problems. We were
never able to find a correct measure with which to prove the unification algorithm terminates, though we
sketched one possible approach. For an extended discussion of the argument as we left it, see Appendix
F. The measure involved a vector of four distinct factors, each weighted differently and each rather
vaguely defined. This effort, if nothing else, convinced us that any correct measure would be of great
complexity, and further pursuing such a proof did not seem worthwhile. But even being aware of the
structure of this failed effort to find a measure, we were unable to construct an example which failed to
terminate, nor did we encounter any such example in any of the testing exercises. We believe the
algorithm will always terminate, but in the absence of proof, we must take this on faith. In any case, a
proof of termination is not necessary for the rest of this work.
Keeping the descriptors being manipulated by the checker in a canonical form helps minimize case
propagation and clarify the algorithms. So, in many instances after new descriptors have been formulated,
they are canonicalized. The canonicalization process is essentially the application of valid transformation
rules appropriate for the descriptor being considered.
All the transformation rules are value-preserving, in that the canonicalized descriptor always represents
the same set of values as the original, or rather, the same interpretation with respect to a binding under
INTERP-SIMPLE. For soundness purposes, it is sufficient for the formal specification for each
canonicalization to be of the form:
For any descriptor td, value v, and binding b covering td,
(I td v b) => (I (pcanonicalize-descriptor td) v b)
formal specifications of all the canonicalization rules, we have the following top-level lemma specifying
the correctness of PCANONICALIZE-DESCRIPTOR. As with DUNIFY-DESCRIPTORS-
INTERFACE, if the argument is a *DLIST, a *DLIST of the same length is returned, and in this case in
the lemma below, the value v is considered to be a list of values the same length as the *DLIST, and the
call to "I" denotes a call to INTERP-SIMPLE rather than INTERP-SIMPLE-1.
Lemma PCANONICALIZE-DESCRIPTOR-OK
(I td v b)
=>
(I (pcanonicalize-descriptor td) v b)
Below this level, the canonicalization code simply applies the transformation rules as appropriate. Each
rule is bi-directional, in that the input and output descriptors are equivalent. The rules are stated below,
with the direction of canonicalization being from left hand side to right hand side. Along with each is the
name of the principal Lisp function within which it is implemented. Comments are supplied occasionally
for clarity.
Rule 1:
(*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
=> *UNIVERSAL
From function PCONSOLIDATE-UNIVERSAL-DESCRIPTORS
Rule 2:
(*REC FOO (... (*RECUR FOO) ...))
=> (... (*REC FOO (... (*RECUR FOO) ...)) ...)
From function OPEN-REC-DESCRIPTOR-ABSOLUTE
For example:
(OPEN-REC-DESCRIPTOR-ABSOLUTE
(*REC ILIST (*OR $NIL (*CONS $INTEGER (*RECUR ILIST)))))
=
(*OR $NIL
(*CONS $INTEGER
(*REC ILIST (*OR $NIL (*CONS $INTEGER
(*RECUR ILIST))))))))
Rule 3:
(... (*REC FOO (... (*RECUR FOO) ...)) ...)
=> (*REC FOO (... (*RECUR FOO) ...))
From function PFOLD-RECS-IF-POSSIBLE
For example:
(PFOLD-RECS-IF-POSSIBLE
(*OR $NIL
(*CONS $INTEGER
Type Checking for Common Lisp 133
The Signature Checker
Comment: Rules 2 and 3 are the inverses of each other. They represent
the validity of opening up and folding *REC descriptors. Each is
appropriate, depending on context. It is more common for
OPEN-REC-DESCRIPTOR-ABSOLUTE to be invoked as a stand-alone
canonicalization from elsewhere in the system, whereas
PFOLD-RECS-IF-POSSIBLE is invoked in the normal course of
canonicalization.
Rule 4:
(*OR .. (*OR d1 .. d2) ..) => (*OR .. d1 .. d2 ..)
From function PCANONICALIZE-OR-DESCRIPTOR
Comment: The nested *OR may appear in any argument position of the
surrounding *OR.
Rule 5:
(*OR .. d1 .. d1 ..) => (*OR .. d1 .. ..)
From function PCANONICALIZE-OR-DESCRIPTOR
Rule 6.
(*OR .. *EMPTY .. ) => (*OR .. .. )
From function PCANONICALIZE-OR-DESCRIPTOR
Rule 7.
Where D does not contain a (*RECUR FOO),
(*OR D (*REC FOO (*OR D .. (*RECUR FOO))))
=> (*REC FOO (*OR D .. (*RECUR FOO)))
From function PREMOVE-RECURSIVE-EXPANSION-DUPLICATES
Rule 8:
(*OR (*CONS d1 d2) (*CONS d3 d2)) => (*CONS (*OR d1 d3) d2)
From PMERGE-OR-DESCRIPTOR-CONSES
Rule 9:
(*OR (*CONS d1 d2) (*CONS d1 d3)) => (*CONS d1 (*OR d2 d3))
From PMERGE-OR-DESCRIPTOR-CONSES
Rule 10:
(*OR .. *UNIVERSAL ..) => *UNIVERSAL
From PCANONICALIZE-OR-DESCRIPTOR
Rule 11:
(*OR d) => d
From PCANONICALIZE-OR-DESCRIPTOR
Rule 12:
(*OR) => *EMPTY
From PCANONICALIZE-OR-DESCRIPTOR
Rule 13:
(*OR .. d1 .. d2 ..) => (*OR .. d2 .. d1 ..)
From PCANONICALIZE-OR-DESCRIPTOR
Rule 14:
(*CONS *EMPTY d) => *EMPTY
134 Type Checking for Common Lisp
The Signature Checker
From PCANONICALIZE-CONS-DESCRIPTOR
Rule 15:
(*CONS d *EMPTY) => *EMPTY
From PCANONICALIZE-CONS-DESCRIPTOR
Rule 16:
(*REC FOO (*OR .. (*RECUR FOO) ..)) => (*REC FOO (*OR .. ..))
From function PCANONICALIZE-REC-DESCRIPTOR
Rule 17:
Where within (.. (*RECUR FOO) ..) there is no *OR enclosing the
(*RECUR FOO), but some *CONS enclosing it,
Rule 18:
Where (*dlist d1 .. dfoo .. dn) and (*dlist d1 .. dbar .. dn)
are identical except that they differ in one position, dfoo vs. dbar:
(*or ... (*dlist d1 .. dfoo .. dn) ... (*dlist d1 .. dbar .. dn) ...)
= (*or ... (*dlist d1 .. (*or dfoo dbar) .. dn) ...)
From function PCANONICALIZE-OR-DESCRIPTOR
One may think of type descriptors as defining sets of Lisp values. The elementary notion of containment
of a descriptor TD1 in a descriptor TD2 is that membership in TD1 implies membership in TD2.
Membership is defined in terms of our interpreter function INTERP-SIMPLE and therefore in terms of a
binding of type variables to values.
The main feature of INTERP-SIMPLE with respect to variables is the binding list, which binds type
variables from the descriptor to Lisp values. When the interpreter encounters a variable &i, as in the call
(INTERP-SIMPLE-1 &i V B), it returns T if the binding of &i in B is equal to V, and NIL otherwise. So
one may think of the set of Lisp values defined by a descriptor TD as the set of all values V such that there
exists a binding B such that (INTERP-SIMPLE TD V B) is true. For example, the set of Lisp values
Type Checking for Common Lisp 135
The Signature Checker
I.e., we could imagine an infinite collection of bindings in which for any Lisp value X, there is some
binding in which &1 is mapped to X.
Our notion of containment is that TD1 is contained in TD2 if for any value V where some binding B
causes (INTERP-SIMPLE TD1 V B) to be true, there is some binding B’ such that (INTERP-SIMPLE
TD2 V B’) is also true. For example, (*CONS &1 $NIL) is contained in (*CONS &2 (*OR $NIL $T)),
because for any value V and binding B such
(INTERP-SIMPLE-1 (*CONS &1 $NIL) V B)
will be true, namely any binding which binds &2 to the same value to which &1 was bound in B. Our
algorithm for checking containment, then, has the task of demonstrating that B’ can be generated. If no
such binding can be found, the algorithm must fail.
6.8.1 Overview
Not surprisingly, the problem is much simpler if both descriptors are variable-free, so much so that this is
treated as a special case and solved with a distinct and much faster algorithm. Furthermore, even when
variables appear in the descriptors, some effort is made to reduce the problem to the variable-free case.
All these judgements are made in the function CONTAINED-IN-INTERFACE, which is the entry point
for the containment algorithm.
To rid the problem of variables, CONTAINED-IN-INTERFACE tries the following tricks. First, any
variable which occurs only once in a descriptor is replaced with *UNIVERSAL. Since the only
information a variable carries is that it refers to some particular, unspecified value, the only practical use
for a variable is to represent a multiple occurrence of that value within a descriptor or descriptor list.
Thus, when we have singleton variables, we replace them with *UNIVERSAL and canonicalize. Second,
after this replacement, if there are no variables in the second descriptor, the variables in the first descriptor
are irrelevant, and they can be replaced with *UNIVERSAL. This is sound because it is never unsound to
"increase" the value set of TD1. By doing so, we could never create containment where it did not already
136 Type Checking for Common Lisp
The Signature Checker
exist. Moreover, it has no impact on the result, since if TD1 is a variable and if TD2 is variable-free and
is not *UNIVERSAL, then we always say TD1 is not contained in TD2.
If trying these optimizations produces variable-free descriptors, then we call the simple algorithm
CONTAINED-IN. If there are still variables to contend with, we first standardize the variables in the two
descriptors apart and then call a complicated heuristic algorithm VCONTAINED-IN, which produces a
list of mappings, each of which can be interpreted as a schema for constructing bindings witnessing the
containment requirement. Then, we check the validity of the result from VCONTAINED-IN with yet
another, much simpler algorithm ICONTAINED-IN. The application of ICONTAINED-IN to the result
of VCONTAINED-IN is managed by the function MAPPINGS-DEMONSTRATE-CONTAINMENT.
All these subsidiary algorithms will be described below.
In our system proof, the optimizations in CONTAINED-IN-INTERFACE are proven sound as part of the
proof of CONTAINED-IN-INTERFACE-OK. The soundness of the CONTAINED-IN algorithm is stated
in Lemma CONTAINED-IN-OK. The VCONTAINED-IN algorithm is not proved, as it serves only as a
heuristic witness, suggesting a possible solution, but the MAPPINGS-DEMONSTRATE-
CONTAINMENT/ICONTAINED-IN algorithm which checks this solution is specified in the lemmas
ALL-DESCRIPTOR1-DISJUNCTS-OK-OK and ICONTAINED-IN-OK. All these lemmas will be stated
below and proved in Section 7.8.
The formal specifications of these algorithms are as follows. Here, as in all our other lemmas and proofs,
"I" is a shorthand notation representing INTERP-SIMPLE when its second and third arguments are lists,
INTERP-SIMPLE-1 when these arguments are a single descriptor and value. As with DUNIFY-
DESCRIPTORS-INTERFACE, the TD1 and TD2 arguments can be *DLIST forms of the same length,
and in this case in the lemmas below, the value V is considered to be a list of values the same length as the
*DLIST, and the calls to "I" denote calls to INTERP-SIMPLE rather than INTERP-SIMPLE-1. The top
level specification for CONTAINED-IN-INTERFACE is embodied in the lemma:
Lemma CONTAINED-IN-INTERFACE-OK
For any descriptors td1 and td2, value v, and binding b,
For any descriptors td1 and td2, Lisp value v, and binding b1,
(and
H1 (null (gather-variables-in-descriptor td1))
H2 (null (gather-variables-in-descriptor td2))
H3 (contained-in td1 td2)
H4 (I td1 v b1))
=>
For some b2, (I td2 v b2)
CONTAINED-IN and ICONTAINED-IN take an extra argument, TERM-RECS, which serves only as
part of a termination mechanism and hence has no bearing on this soundness lemma. Therefore we
eliminate reference to it to minimize notation. One may consider it to be universally quantified in the
lemmas.
Lemma ALL-DESCRIPTOR1-DISJUNCTS-OK-OK
For any descriptor td1 of the form (*or td11 .. td1n)
and descriptor td2, value reference vref, value v, and binding
b1 covering the variables of td1,
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 For all i, for some simple mapping m,
(and (well-formed-mapping m vref b1)
(icontained-in td1i (apply-subst m td2) vref))
H3 (I td1 v b1) )
=>
For some b2, (I td2 v b2)
For any descriptors td1 and td2, simple mapping m, Lisp value v,
symbolic value reference vref, and binding b1 covering the
variables in td1,
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 (well-formed-mapping m vref b1)
H3 (icontained-in td1 (dapply-subst-list-1 m td2) vref)
H4 (I td1 v b1) )
=>
For some b2, (I td2 v b2)
6.8.3 CONTAINED-IN
In the absence of variables, descriptors can be readily seen as representing sets of values, and the notion of
containment can be easily grasped as a subset operation. This mindset may help with understanding the
containment algorithm for variable-free descriptors.
The function CONTAINED-IN takes three parameters, the two descriptors TD1 and TD2, and a TERM-
RECS argument which functions just as it does in DUNIFY-DESCRIPTORS to prevent an infinitely
recursive unwinding of overlapping pairs of *REC descriptors. The recursive pattern of CONTAINED-
138 Type Checking for Common Lisp
The Signature Checker
The following case analysis is used. The cases are considered in order, so each assumes the descriptors
being considered do not suit any of the previous cases.
1. For any descriptor td, td is contained in td.
2. *EMPTY is contained in any descriptor.
3. No descriptor is contained in *EMPTY.
4. *UNIVERSAL is contained in no descriptor.
5. Any descriptor is contained in *UNIVERSAL.
6. An *OR descriptor is contained in TD2 if all its disjuncts are contained in TD2.
7. If TD1 is a *REC descriptor, then:
a. If TD2 is an *OR descriptor, one of whose disjuncts is name isomorphic to TD1,
then TD1 is contained in TD2. By name isomorphism, we mean either that the two
*REC descriptors are equal or that one is exactly like the other except for bearing a
different label.
b. If TD2 is a *REC descriptor, we try each of the *REC rules described below. Each
rule has an enabling condition and an action. The first rule whose enabling condition
is satisfied provides the verdict.
c. If TD2 is not a *REC, we first see if we have on previous recursion encountered the
TD1-TD2 pair. If so, we return NIL. This is the TERM-RECS test, since the
TERM-RECS parameter contains a list of all previously encountered pairs.
Otherwise, we open TD1 and recur, CONS-ing the TD1-TD2 pair onto the TERM-
RECS argument.
8. If TD2 is an *OR, then TD1 is contained in TD2 if it is contained in one of the disjuncts of
TD2.
9. If TD2 is a *REC, we administer the TERM-RECS test as above. If we are not looping, we
open TD2 and recur, with the augmented TERM-RECS list.
10. If both descriptors are CONS-es, we have containment if the respective CARs and CDRs
both exhibit containment.
11. If both descriptors are *DLISTs, we have containment if we have pairwise containment on
all the arguments.
12. Otherwise, containment fails.
Before trying the *REC rules, we canonicalize both descriptors in the same way as we did for the
DUNIFY-DESCRIPTORS *REC rules, turning all the top level *ORs into binary forms, where the first
disjunct is a disjunction of non-replicating forms, and the second is a disjunction of replicating forms
(forms containing the *RECUR).
The notation used in the rules is like that used in the DUNIFY-DESCRIPTORS rules. For instance, *REC
forms defined on the left hand side are referred to by their labels on the right hand side. A reference to
"foo" or "bar" on the right hand side of a rule is a shorthand for the entire *REC form of that name on the
left hand side. For a full explanation of the notation, see Section 6.6.3. Recall that the enabling condition
is embodied in the form of the left hand side of the equality. Because the rules are considered in order,
each enabling condition effectively includes the negation of all previous enabling conditions. The
numbering of the rules is somewhat irregular, so that it will correspond to analogous ICONTAINED-IN
*REC rules, to follow later. A rule whose name contains a "’" is a minor variant of its predecessor.
Type Checking for Common Lisp 139
The Signature Checker
In reading each of these rules, remember the CONTAINED-IN algorithm is only invoked when both of its
arguments are variable-free. Thus, all component descriptors are variable-free, and thus CONTAINED-
IN may be called recursively.
Rule Contain-*REC1
Where the two *recs differ only in name
(contained-in (*rec foo ( .. (*recur foo) .. ))
(*rec bar ( .. (*recur bar) .. )))
= t
Rule Contain-*REC2
(contained-in (*rec foo (*or $nil (*cons d1 (*recur foo))))
(*rec bar (*or $nil (*cons d2 (*recur bar)))))
=
(contained-in d1 d2)
Rule Contain-*REC2’
(contained-in (*rec foo (*or $nil (*cons (*recur foo) d1)))
(*rec bar (*or $nil (*cons (*recur bar) d2))))
=
(contained-in d1 d2)
Rule Contain-*REC3
Where d1 and d3 are either primitive non-cons descriptors or
*ORs of primitive non-cons descriptors
(contained-in (*rec foo (*or d1 (*cons d2 (*recur foo))))
(*rec bar (*or d3 (*cons d4 (*recur bar)))))
=
(and (contained-in d1 d3) (contained-in d2 d4))
Rule Contain-*REC3’
Where d1 and d3 are either primitive non-cons descriptors or
*ORs of primitive non-cons descriptors
(contained-in (*rec foo (*or d1 (*cons (*recur foo) d2)))
(*rec bar (*or d3 (*cons (*recur bar) d4))))
=
(and (contained-in d1 d3) (contained-in d2 d4))
Rule Contain-*REC5
(contained-in (*rec foo (*or d1 (*cons d2 (*recur foo))))
(*rec bar (*or d3 (*cons d4 (*recur bar)))))
=
(or (and (contained-in d1 d3) (contained-in d2 d4))
(contained-in foo d3)
(and (contained-in d1 bar) (contained-in d2 d4)))
Rule Contain-*REC6
No conses in d1 or d4.
(contained-in (*rec foo (*or d1 (*cons d2 (*cons d3 (*recur foo)))))
(*rec bar (*or d4 (*cons d5 (*recur bar)))))
=
(and (contained-in d1 d4) (contained-in d2 d5) (contained-in d3 d5))
Rule Contain-*REC6’
No conses in d1 or d4.
(contained-in (*rec bar (*or d4 (*cons d5 (*recur bar))))
(*rec foo (*or d1 (*cons d2 (*cons d3 (*recur foo))))))
=
nil
Rule Contain-*REC7
No conses in d1 or d4.
(contained-in (*rec foo (*or d1 (*cons (*recur foo) (*recur foo))))
(*rec bar (*or d2 (*cons (*recur bar) (*recur bar)))))
=
(contained-in d1 d2)
Rule Contain-*REC8
140 Type Checking for Common Lisp
The Signature Checker
No conses in d1 or d5.
(contained-in
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*recur bar)))))
=
(and (contained-in d1 d5)
(contained-in d2 d6)
(contained-in d3 d6)
(contained-in d4 d6))
Rule Contain-*REC8’
No conses in d1 or d5.
(contained-in
(*rec bar (*or d5 (*cons d6 (*recur bar))))
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo)))))))
=
nil
Rule Contain-*REC9
No conses in d1 or d5.
(contained-in
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*cons d7 (*recur bar))))))
=
nil
Rule Contain-*REC9’
No conses in d1 or d5.
(contained-in
(*rec bar (*or d5 (*cons d6 (*cons d7 (*recur bar)))))
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo)))))))
=
nil
Rule Contain-*REC10
No conses in d2 or d4
(contained-in (*rec foo (*cons d1 (*or d2 (*recur foo))))
(*rec bar (*cons d3 (*or d4 (*recur bar)))))
=
(and (contained-in d1 d3) (contained-in d2 d4))
Rule Contain-*REC11
No conses in d2 or d3.
(contained-in (*rec foo (*cons d1 (*or d2 (*recur foo))))
(*rec bar (*or d3 (*cons d4 (*recur bar)))))
=
(and (contained-in d1 d4) (contained-in d2 d3))
Rule Contain-*REC11’
d3 is a primitive or a disjunction of primitives,
(contained-in (*rec bar (*or d3 d4))
(*rec foo (*cons d1 d2)))
=
nil
Rule Contain-*REC13
No conses in d1, d3, d5, or d7
(contained-in
(*rec foo (*or d1 (*cons d2 (*or d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*or d7 (*cons d8 (*recur bar)))))))
=
(and (contained-in d1 d5)
(contained-in d2 d6)
(contained-in d3 d7)
(contained-in d4 d8))
Rule Contain-*REC0
(contained-in foo bar) = nil
Type Checking for Common Lisp 141
The Signature Checker
In the case where one or both of the descriptors contains a variable which could not be removed by
CONTAINED-IN-INTERFACE, to demonstrate containment we must demonstrate the existence of a
binding B’ such that the containment relation defined in terms of INTERP-SIMPLE holds. Since the
binding B’ will depend on information from B, as well as TD1 and TD2, without knowing B, we cannot
generate specific bindings B’. So instead, we attempt to generate what we will call a mapping which
shows precisely how B’ can be constructed, given some arbitrary B for which (I TD1 V B) is true.
Finding this mapping is the job of the function VCONTAINED-IN.
This task is relatively difficult, and a proof of the algorithm embodied in VCONTAINED-IN would be
daunting. But our soundness proof will be in the same spirit which led us to use a checker which is
proven correct validate the function signatures generated by the inference system. Rather than proving the
soundness of VCONTAINED-IN directly, we prove the correctness of an algorithm ICONTAINED-IN
which checks that a mapping produced by the unverified algorithm VCONTAINED-IN does indeed
demonstrate that the containment relation holds.
REC-TAIL is a function which returns the atom which is the final CDR of a list.
(DEFUN REC-TAIL (L) (IF (ATOM L) L (REC-TAIL (CDR L))))
DLIST-ELEM is a function which, given a list L and a positive integer i less than the length of L, returns
the ith element of L. This function is here only to index over the value list considered as V when TD1 and
TD2 are *DLISTs.
(DEFUN DLIST-ELEM (L I)
(IF (EQUAL I 1) (CAR L) (DLIST-ELEM (CDR L) (- I 1))))
A mapping directs the construction of B’ in the following manner. If a <target> is a <td1 variable name>
the right hand side of the corresponding binding would be the binding of the <target> in B. If a <target> is
$NIL, the right hand side would be NIL. Similarly for $T and T. If the <target> is a symbolic value
reference, such as V, (CAR V), or (REC-TAIL V), the right hand side would be the value of the
corresponding component of the Lisp value under consideration by INTERP-SIMPLE. This value is the
one represented symbolically by V in the statement of CONTAINED-IN-INTERFACE-OK. Note that we
could have used a symbolic value in all cases, but, as we shall see in the description of the algorithms, this
would not serve to handle multiple occurrences of variables within the descriptors.
ICONTAINED-IN determines whether the containment relationship exists between TD1 and TD2 under
the mapping. If it does, given B and V, the mapping can be easily transformed into a binding B’ which
will satisfy the containment requirement in terms of INTERP-SIMPLE. If any of the mappings in the list
formulated by VCONTAINED-IN can be validated, we have demonstrated containment. But as we shall
later see, if there is disjunction in TD1, it may be the case that no single mapping will demonstrate
containment. It is sufficient, however, to canonicalize TD1 so that every disjunction lying outside of any
embedded *REC descriptor is raised to the top, i.e., so that TD1 is a disjunction of descriptors, each of
which is *OR-free down to embedded *REC descriptors. Then, if for all disjuncts of TD1, there exists a
simple mapping which demonstrates containment, containment is thereby demonstrated for the
disjunction. We shall prove the correctness of ICONTAINED-IN, which as a checker will contribute to
the proof of the correctness of CONTAINED-IN-INTERFACE.
VCONTAINED-IN is more complicated than ICONTAINED-IN, since it has to discover the appropriate
mappings, rather than just validating them. To support this discovery requires a more elaborate notion of
a mapping. VCONTAINED-IN takes as parameters two descriptors TD1 and TD2, an initial (elaborate)
mapping M, a TERM-RECS argument (initially NIL), and a value reference VREF by which we will refer
to some arbitrary value V, and it returns a list of (elaborate) mappings M1..Mn.
(VCONTAINED-IN td1 td2 m nil vref) = (m1 .. mn)
Failure is signified by returning an empty list of mappings. The implicit assumption is that there exists a
binding B such that (INTERP-SIMPLE TD1 V B) is true. (M1 .. Mn) may be thought of as extensions of
M. Again, the nature of the result is to show precisely how a binding B’ can be constructed from B and V
such that (INTERP-SIMPLE TD2 V B’) is true. A list of mappings demonstrates containment if for any
value satisfying TD1 under B, one of the mappings in the list creates a B’ such that the value satisfies TD2
under B’.
To characterize elaborate mappings, we modify the preceding grammar for mappings by replacing
<mapping> ::= ( <elem>* )
with
Type Checking for Common Lisp 143
The Signature Checker
VCONTAINED-IN returns a list of mappings. In a sense of validity to be explained below, a mapping list
is a collection of mappings, all of which must be present in order to demonstrate containment. A
*CHOICE encapsulates a collection of mapping lists, each of which individually demonstrates
containment.
A list of mappings is necessary to characterize what happens when TD1 is an *OR. Consider the
example:
(VCONTAINED-IN ’(*OR $NIL $T) ’&1 NIL NIL vref)
=
(((&1 . NIL)) ((&1 . T)))
Containment is guaranteed if when the value V is NIL, &1 is bound to NIL, and if when the value V is T,
&1 is bound to T. Obviously, V cannot be simultaneously T and NIL, and the bindings to handle each
situation are mutually exclusive. So we use a list of bindings to characterize what must be possible in
order for containment to exist. The list of mappings enumerates the list of choices for B’ which might be
needed.
A *CHOICE indicates that, for all of the mapping lists which follow, one of its mappings will generate a
binding which will be acceptable. *CHOICE arises when TD2 is an *OR. For example,
(VCONTAINED-IN ’$INTEGER ’(*OR &1 &2) NIL NIL ’V)
=
((*CHOICE (((&1 . V))) (((&2 . V)))))
Containment is guaranteed either when &1 is bound to the integer in question, or &2 is similarly bound.
In either case, it does not matter what the other variable is bound to. We can see this in the setting of our
interpreter.
(INTERP-SIMPLE $INTEGER V B)
expands to
(OR (EQUAL V (CDR (ASSOC &1 B’))) (EQUAL (CDR (ASSOC &2 B’))))
A B’ constructed from this mapping can map either &1 or &2 to the integer V, and the predicate will be
true.
When generating mapping lists witnessing containment for *CONS or *DLIST structures, we compute the
mappings validating containment for the CARs. If there are no such mappings, we fail. Then we use each
of the mappings as the base mapping for the recursive analysis of the CDRs. If a base mapping is a
*CHOICE, we can succeed if any one of the choices leads to a positive result. But in order for
containment to be demonstrated for the *CONS or *DLIST, every base mapping from the CAR must lead
to a positive result.
To see how his result was derived, first notice that the preceding example gives us the result for the CARs
(giving (CAR V) rather than V as the symbolic value reference):
((*CHOICE (((&1 . (CAR V)))) (((&2 . (CAR V))))))
This is a singleton list composed of a single *CHOICE mapping, thus we need only show that the
*CHOICE will work in the CDR. So we proceed to the CDRs with our choices. There are two recursive
calls of VCONTAINED-IN, one for each choice. Since each choice is a list of mappings, each mapping
in one of the lists must be an effective base mapping for the CDR. But in this case, since each list is a
singleton, our solution can be produced by one of the mappings from either:
(VCONTAINED-IN ’$NIL ’&2 ’((&1 . (CAR V))) NIL (CDR V))
or
(VCONTAINED-IN ’$NIL ’&2 ’((&2 . (CAR V))) NIL (CDR V))
In either case, we must map &2 to $NIL to get a solution. This poses no conflict with our first choice,
which places no constraint on &2, and the first mapping is extended with the mapping element (&2 .
$NIL) to produce the result:
(((&1 . (CAR V)) (&2 . $NIL)))
But the second choice gets ruled out, because &2 could not be mapped to both (CAR V) and $NIL. If
(CAR V) is anything but NIL, there would be inconsistency. Thus, our ultimate result is the singleton list
of mappings:
(((&1 . (CAR V)) (&2 . $NIL)))
On the other hand, when a disjunction is in TD1, we require that each of the disjuncts be contained under
some mapping. The same mapping need not apply to all disjuncts. This generation of multiple mappings
is the reason VCONTAINED-IN returns a list of mappings rather than a single one. Consider the
example:
(VCONTAINED-IN ’(*OR &1 $INTEGER) ’&2 NIL NIL ’V)
=
(((&2 . &1)) ((&2 . V)))
=
(ALL-VCONTAINED-IN ’&1 ’&2 ’(((&2 . &1)) ((&2 . (CAR V)))))
=
NIL
because in the case where the CAR, and thus &2, corresponds to $INTEGER and &2 maps to (CAR V),
the CDR, and thus also &2, cannot necessarily correspond to an arbirtrary &1. An example which
validates this failure is the value ’(10 . NIL). The only binding b1 for which
(INTERP-SIMPLE (*CONS (*OR &1 $INTEGER) &1) (10 . NIL) b1)
Containment exists in this case, since the mapping arising from the CDR containment is in both cases
consistent with both of the cases from the CAR.
Moreover, the same phenomenon is the reason *CHOICE encompasses a list of mapping lists, rather than
just a list of mappings. For example,
(VCONTAINED-IN-INTERFACE (*CONS &1 &1)
(*OR (*CONS &2 &3) (*CONS &2 &2)))
=
((*CHOICE (((&2 . &1) (&3 . &1))) (((&2 . &1)))))
Generating mapping lists for witnessing containment of one *REC descriptor within another *REC
descriptor is handled as in DUNIFY-DESCRIPTORS and CONTAINED-IN. Rather than employing a
general algorithm, we will employ a set of rules, each suited to descriptors of a particular form. An
example of such a rule is:
(VCONTAINED-IN (*REC FOO (*OR $NIL (*CONS D1 (*RECUR FOO))))
(*REC BAR (*OR $NIL (*CONS D2 (*RECUR BAR))))
M
VREF)
=
(VCONTAINED-IN D1 D2 M *BOGUS-VREF*)
The FOO descriptor is contained in the BAR descriptor if D1 is contained in D2. Thus, the problem is
reduced to one which does not involve the top level *REC descriptors. The VREF parameter on the right
hand side is denoted bogus because there can be no variables in D1 or D2, and hence the VREF will never
come into play. Each rule has a set of conditions which must be fulfilled in order for it to be invoked on a
given problem. As a default, we have a final rule which will fire unconditionally when no other rule will,
and which declares that containment does not exist. This action is not very satisfying, but is sound.
(There being no formal notion of soundness for VCONTAINED, the reader can take this in the informal
sense.) The *REC descriptors are put in a special canonical form prior to being matched against the rule
enablers, as with DUNIFY-DESCRIPTORS and CONTAINED-IN. The rules are sufficiently similar in
form to the CONTAINED-IN *REC rules that they do not merit stating here, but they may be found in the
source code, stored in the constant REC-VCONTAINMENT-RULES. Since VCONTAINED-IN is just a
heuristic function about which we make no claims of correctness, the details of the rules are not so
146 Type Checking for Common Lisp
The Signature Checker
important.
Thus, ICONTAINED-IN, which otherwise is quite like CONTAINED-IN, differs in that it must be
prepared to handle very simple scenarios with variables and with our symbolic value references in TD2.
It makes judgements on value references by constructing its own symbolic value reference as it recurs
down through the *CONS structure of TD1 and TD2. This reference is a parameter of ICONTAINED-IN.
On the initial call the actual parameter is ’V, which is the same symbolic root which was given to
VCONTAINED-IN. Every time ICONTAINED-IN recurs into a *CONS, or a *DLIST, the proper
destructor (CAR, CDR, or DLIST-ELEM) is wrapped around the value reference. The following rules for
variables and symbolic value references augment the treatments of other forms, which are as in
CONTAINED-IN.
1. If TD2 is a type variable, if TD1 is the identical type variable, TD1 is contained in TD2,
otherwise it is not.
2. If TD2 is a symbolic value reference, if it is identical to the current value of the value
reference parameter to ICONTAINED-IN, then TD1 is contained in TD2, otherwise it is not.
As with UNIFY-DESCRIPTORS, CONTAINED-IN, and VCONTAINED-IN, the case where both TD1
and TD2 are *REC descriptors is handled with a collection of special case rules. The method is identical,
and the rules are similar to those used in CONTAINED-IN. The specific rules used in ICONTAINED-IN
may be found in the Appendix B-I, which also includes the proofs of several of the rules. The notation is
the same kind used for the CONTAINED-IN rules.
31
In the implementation code, these simple mappings are referred to as semibindings.
Type Checking for Common Lisp 147
The Proof of Soundness
Chapter 7
THE PROOF OF SOUNDNESS
This chapter presents the proof of soundness for the signature checker, i.e., of its validation of the function
signatures produced by the inference system. We discuss the organization of the proof and each of its
main ideas, state some key lemmas, and present some detailed proofs. In the case of more lengthy or
subsidiary proofs, we point to their appearance in Appendix B. The order of presentation will be
top-down, though this means, of course, that at almost every level the validity of the proof will rest on the
proofs of lemmas presented later in the discussion. Most of these lemmas, however, have been previously
stated in Chapter 6.
In the next section, we give a rough overview of the proof in terms of its main components, how they
relate to one another, and which formal structures they manipulate. In the following section, we provide
an informal sketch of the top level proof of soundness, thus providing a sense of direction for the sections
which follow. Then we proceed with the proof itself, starting with a section containing the proofs of the
top level lemmas. After a brief overview, we prove Lemma GUARD-COMPLETE, which is one of our
top-level lemmas and the foundation for the treatment of guard verification. In the section "Validation of
the Guard Descriptor", we prove Lemma TC-INFER-SIGNATURE-GUARD-OK, an important
supporting lemma for the proof of Lemma TC-SIGNATURE-OK, our top level goal, which follows in
Section 7.3.3. In the remaining sections, we prove important lemmas about subsidiary algorithms: the
principal recursive function in the checker TC-INFER, the descriptor unifier DUNIFY-DESCRIPTORS-
INTERFACE, the descriptor canonicalizer PCANONICALIZE-DESCRIPTOR, and the containment
algorithm CONTAINED-IN-INTERFACE.
Fruitful reading of this chapter will require familiarity with the material presented in Chapter 5, which
presents the formal semantics of the system, and Chapter 6, which gives detailed descriptions of the
algorithms in the checker.
148 Type Checking for Common Lisp
The Proof of Soundness
There are four basic components to the proof, each corresponding to a subsystem of the signature checker.
These are
• The descriptor canonicalizer, whose top level function is
PCANONICALIZE-DESCRIPTOR and whose top level soundness lemma is
PCANONICALIZE-DESCRIPTOR-OK,
• The descriptor unification algorithm, whose top level function is DUNIFY-DESCRIPTORS-
INTERFACE and whose top level lemma is DUNIFY-DESCRIPTORS-INTERFACE-OK,
• The containment algorithm, whose top level function is CONTAINED-IN-INTERFACE and
whose top level lemma is CONTAINED-IN-INTERFACE-OK, and
• The main checker algorithm, whose top level function is TC-SIGNATURE and whose top
level lemmas are TC-SIGNATURE-OK and GUARD-COMPLETE.
Two interpreters are central to the formal semantics of the type system. One, composed of the functions
INTERP-SIMPLE-1 and INTERP-SIMPLE, takes a type descriptor or a list of descriptors respectively, a
Lisp value or list of values respectively, and a binding of type variables to values and determines whether
the value is a member of the set of values represented by the descriptor under the binding. The Lisp
evaluator E takes a Lisp form, an environment binding Lisp variables to values, a world of defined
functions, and a non-negative integer clock and returns either a special value indicating the clock value
was not adequately large to allow full evaluation, a special value indicating a guard violation occurred on
some function call during the evaluation, or the Lisp value of the form in the environment if the evaluation
succeeded.
The realm of discourse in the canonicalization proof consists of descriptors and descriptor lists, Lisp
values, type variable bindings, and INTERP-SIMPLE. Bindings are present only because they are
required by INTERP-SIMPLE. They are not manipulated in any way. E does not appear at all. In the
global hierarchy of the proof, the canonicalization proof is self-contained. It relies on none of the other
components.
The unification proof also involves descriptors and descriptor lists, values, bindings, and INTERP-
SIMPLE. Also playing a role is another entity, a substitution, which maps type variables to other
descriptors. A significant companion function is INTERP-SUBSTS, a predicate which determines
whether a substitution is consistent with a binding. Substitutions appear only in the proof about the
unification algorithm. Again, E makes no appearance. The unifier proof relies directly on the proof of the
canonicalizer, but not on any other principal component.
As with the canonicalizer and the unifier, the containment proof involves descriptors and descriptor lists,
values, bindings, and INTERP-SIMPLE. A significant local notion in the containment proof is that of a
mapping, which embodies a method for constructing one type variable binding from another. Thus, the
properties and the manipulation of bindings are points of interest in this proof. In the proof hierarchy, the
containment proof uses the canonicalization lemmas, but does not rest on the unifier or the top level
lemmas.
The top level proof about the checker uses almost everything previously mentioned. It manipulates
function signatures, of which descriptors are components, values, type variable bindings, Lisp variable
environments, the database of function signatures, Lisp forms, the world, and the clock. Both INTERP-
SIMPLE and E are significantly explored. As both containment and unification are key concepts in the
checker, their respective soundness lemmas are used heavily in the proof, along with the ubiquitous
canonicalizer. Only in the top level proofs do Lisp forms, the evaluator E, the clock, and the world come
Type Checking for Common Lisp 149
The Proof of Soundness
into play.
Here we describe the very highest level of the formal setting of the inference algorithm and the checker,
providing some semi-formal intuition for understanding why the lemmas we have proved give a proof of
soundness for the system. In particular, we argue the soundness of using the untrusted signature generated
by the inference algorithm as a basis for the checker’s computation of a trusted signature, and how the
containment relation can reflect that trust back to the original signature. This discussion is intended
simply to motivate the ideas behind the top level proof. The rigorous proof is given later as the proof of
Lemma TC-SIGNATURE-OK (Section 7.3.3).
Let f∗ be the graph of a function f whose definition has been submitted to the inference system. The
inference algorithm applied to f generates some function signature S. We want to ensure that S is a valid
signature for f, i.e., that f∗ ⊆ S, where by "⊆" we mean that the signature embodies a set of ordered tuples
of which f∗ is a subset. Embodied in the specification is the notion that any tuple of values
[arg1 .. argm (f arg1 .. argm)] (where f is m-ary) satisfies S, meaning that if the values [arg1 .. argm]
satisfy the guard of the function, then the tuple satisfies some segment of S. Satisfaction is always
determined by the function INTERP-SIMPLE. A segment (td1 .. tdm) -> td of S is satisfied if there is
some binding b of the type variables in the segment such that,
(INTERP-SIMPLE (td1 .. tdm td) (arg1 .. argm (f arg1 .. argm)) b)
This line of reasoning is totally straightforward where f is a non-recursive function, i.e, a function for
which the body of its definition does not contain a call to itself. But if f is recursive, we must recognize
that TC-INFER-SIGNATURE assumes the correctness of S in deriving S’ (i.e., it assumes that f∗ ⊆ S).
Why is it valid to allow this assumption? A simple inductive argument on the graph of the function will
illustrate.
Imagine the graph of a recursive function as being the union of the graphs of an infinite number of
non-recursive functions, with the first being the empty function, the second being the ordered pairs formed
by cases where the function terminates without recursion, the next being cases where it terminates with at
most one recursion, and so forth. For a given function f, call these non-recursively defined partial
functions ⊥, f1, f2, .. fn. Each function definition fi is non-recursive in the sense that it is formed from the
definition of f by replacing the recursive calls with calls to fi−1. Since at each level fi, the function
perhaps adds tuples to the graph of fi−1 without taking any away, the following relationship holds:
⊥ ⊆ f1 ⊆ .. ⊆ fn .. ⊆ f∗
TC-INFER-SIGNATURE assumes the soundness of the database fs, which contains the signatures of all
previously defined functions. In the case when the function being analyzed, f, is recursive, this includes
150 Type Checking for Common Lisp
The Proof of Soundness
the signature S, which has been ascribed to f (or more precisely, fn).
fn ⊆ (TC-INFER-SIGNATURE f fs)
holds. Now we will show by induction on n that it holds for recursive functions, too. The base case is
trivial:
⊥ ⊆ (TC-INFER-SIGNATURE f fs)
fn ⊆ (TC-INFER-SIGNATURE f fs)
We wish to prove
we really mean
since the idea behind the checker is to treat f as a non-recursive function where calls to f are treated as
calls to fn, whose signature was computed by the inference algorithm and is represented by fn-sig. Since
fn is non-recursive, our assumption of the non-recursive case establishes the validity of fn-sig. Given the
casting of fn+1 as a non-recursive function which calls fn rather than itself, assuming we have a valid
signature for fn, and given that we have established our lemma for non-recursive functions, we establish
our goal.
Originally, we attacked this problem with an evaluator which assumed that all functions terminate
normally and simply failed when there was a guard violation. The clock which now appears in the
definition of E (Section 5.4) appeared in neither the original evaluator nor in any of the soundness
lemmas. Normal termination is a basic requirement placed on all new functions submitted to the system,
and under the assumption that all functions terminate for all values satisfying the guard, it is sufficient to
guarantee that the evaluator will return normally unless it finds a guard violation. Hence, we can count on
the use of our evaluator in our lemmas.
But the problem with this approach is that, since ⊥, f1, f2, .. fn are partial functions, the guard of f, which
we have ignored to this point in this discussion, does not guarantee they will return a value for all inputs
satisfying the guard. Specifically, if a call to fn requires more than n "recursive" calls, the evaluation of
the chain of partial functions will collide with ⊥. If we attempt to patch the guards of each fi, the fix will
require turning the guards into forms which cannot result in complete descriptors, thus violating a crucial
assumption for sound use of signatures.
To remedy the problem, we introduced a second kind of break in the E function, BREAK-OUT-OF-
TIME, and the clock parameter which ticks down on every entry to a function body. We modified all the
top level lemmas to be quantified over all non-negative integer clock values and to have a new assumption
specifying that the clock is sufficiently large to allow full evaluation of the form under consideration
Type Checking for Common Lisp 151
The Proof of Soundness
The top-level proof, then, becomes an induction on the value of the clock. In its base case, the clock is 0,
and the hypothesis guaranteeing no BREAK-OUT-OF-TIME is violated, establishing the goal trivially. In
the inductive step, where we reason about a call to fn+1, the clock allows enough time to acquire a value
from fn, whose signature is the one we assume correct by our inductive assumption. Then we can apply
the containment argument as before to achieve our goal.
The formal presentation of this argument appears in the proof of Lemma TC-SIGNATURE-OK-1, which
is subsidiary to the proof of TC-SIGNATURE-OK in Section 7.3.3.
Credit for the suggestion of this treatment of recursive functions goes to Matt Kaufmann.
The lemmas which we will prove about the highest level of the checker establish the following with
regard to a call to the function being analyzed. When
1. the guard descriptors for the function are complete and the guards of all functions
hereditarily in the call tree of the function are complete, i.e., when each guard is formulated
as a conjunction of calls to recognizer functions on distinct formal parameters,
2. the inference system does not return a *GUARD-VIOLATION result, and
3. the clock is sufficiently large to allow evaluation of the function call in question to finish,
then we will show
1. the evaluation of the guard expression on the actual parameters will not cause a guard
violation;
2. any actual parameters satisfying the guard expression will satisfy the guard descriptor
generated by the checker; and
3. if each segment generated by the checker is contained in some segment generated by the
inference tool, then for any values satisfying the guard expression, no guard violation will
occur in the course of evaluating the function call, and some segment generated by the
checker will characterize both the argument values and the value of the function applied to
them.
From the last claim in particular, following the approach outlined in Section 7.2, we will surmise that the
segments generated by the inference tool are a correct typing for the function. The segments generated by
the inference tool are correct, if not optimal, in that they represent a superset of the tuples defining the
function.
Our top-level correctness lemma, TC-SIGNATURE-OK captures the notions just stated.
Lemma TC-SIGNATURE-OK
for any world of Lisp functions world, including at least the above
definition of foo and the definitions of all the functions in the
call tree of foo,
for any list of function signatures fs, including signatures for
at least all the functions in the call tree of foo, except foo
itself,
for any non-negative integer clock,
when (tc-signature foo fs) successfully validates a signature
for foo,
Our other top-level result, Lemma GUARD-COMPLETE states that if a function’s guard is formulated as
a conjunction of calls to recognizer functions on distinct parameters, if descriptors characterizing the
actual parameters in a call to the function are contained in the function’s guard descriptors, and if the
clock is sufficient to allow full evaluation of the guard on those parameters, then the parameters will
satisfy the real guard.
Lemma GUARD-COMPLETE
(and
H1 (valid-fs fs world clock)
H2 (I (argtd1 .. argtdn) (arg1 .. argn) b)
H3 (contained-in-interface (*dlist argtd1 .. argtdn)
(*dlist rtd1 .. rtdn))
H4 (not (break-out-of-timep
(E (and (Ra a1) .. (Ra an))
1 n
((a1 . arg1) .. (an . argn))
world
clock))) )
=>
(equal (E (and (Ra a ) .. (Ra an))
1 1 n
((a1 . arg1) .. (an . argn))
world
clock)
Type Checking for Common Lisp 153
The Proof of Soundness
t)
This lemma is critical to the proof of Lemma TC-SIGNATURE-OK, but because it makes possible the
guard verification method discussed in Section 6.4 and because it is the primary specification of the guard
descriptor component of a signature, we present it as a top-level result.
An alternate formulation of Lemma GUARD-COMPLETE might have omitted the notions of containment
and argument descriptors and dealt strictly with some argument values satisfying the guard descriptor.
Thus, it would have been stated:
(and
H1 (valid-fs fs world clock)
H2 (I (rtd1 .. rtdn) (arg1 .. argn) b)
H3 (not (break-out-of-timep
(E (and (Ra a1) .. (Ra an))
1 n
((a1 . arg1) .. (an . argn))
world
clock))) )
=>
(equal (E (and (Ra a ) .. (Ra an))
1 1 n
((a1 . arg1) .. (an . argn))
world
clock)
t)
This would have more succinctly presented the notion that to determine some values satisfy a guard, it is
sufficient to determine that they satisfy the guard descriptor. But by bundling the lemma as we did, we
made it more convenient for every use in our proof.
Next, we will give the proof for GUARD-COMPLETE, which reveals it to be a corollary of another
lemma, RECOGNIZER-SEGMENTS-COMPLETE. Then we will state and prove Lemma TC-INFER-
SIGNATURE-GUARD-OK, which will support the proof of TC-SIGNATURE-OK, to which we return at
the end of this section.
The proof of this lemma will rest on the proof of an important property about the signatures of recognizer
functions. This is not surprising, since a guard descriptor is complete if the guard is a conjunction of
recognizer function calls on distinct formal parameters.
Recognizer functions are important in the type system because very precise signatures can be constructed
for them, and because calls to recognizer functions (see Section 5.6) typically appear where the
information they yield is important to the type inference problem. Recognizers are the linchpin of guard
verification in the type system. If the guard of a function is expressed as a conjunction of recognizer calls
on distinct formal parameters, we claim the containment test performed by the checker on function calls is
sufficient to guarantee satisfaction of the function guard.
154 Type Checking for Common Lisp
The Proof of Soundness
The formal property of recognizers which will support this proof is given below in Lemma
RECOGNIZER-SEGMENTS-COMPLETE. It is that, given a function which conforms to the various
syntactic requirements for recognizers, whose signature has two segments, both variable-free, one with a
result type of $T and the other with result type $NIL, where DUNIFY-DESCRIPTORS-INTERFACE
applied to the argument components of the two segments produces *EMPTY, if some descriptor
characterizing the argument to the function is contained in the argument descriptor for the $T segment,
and if the clock is sufficient to allow full evaluation of the function called with that argument, then the
result of that evaluation is T.
Lemma RECOGNIZER-SEGMENTS-COMPLETE
(and
H1 (valid-fs fs world clock)
H2 (I arg-td v b)
H3 (contained-in-interface arg-td targ)
H4 (not (break-out-of-timep (Rworld,clock(v)))) )
=>
(equal (Rworld,clock(v)) t)
(contained-in (universalize-all-vars32
(*dlist argtd1 .. argtdn))
(*dlist rtd1 .. rtdn))
32
..
(contained-in (universalize-all-vars argtdn) rtdn)
We claim that any values which satisfy the guard expression also satisfy the guard descriptor. This is
significant in our proof that the segments generated by TC-INFER-SIGNATURE are correct, because the
first step in the generation of those segments is the formulation of the type alists which will characterize
the arguments. The concrete alist is extracted from the guard descriptor, which is itself extracted from the
segments computed from the guard expression by TC-INFER. The guard descriptor is also the one
returned by TC-INFER-SIGNATURE.
The lemma which captures this notion, TC-INFER-SIGNATURE-GUARD-OK, says that if TC-INFER-
SIGNATURE does not return *GUARD-VIOLATION as its result, then for any binding environment of
the formal variables, if there is sufficient clock for the guard to evaluate fully in that environment, and if
the guard form evaluates to a non-NIL value, then that value does not signify a guard violation, and there
exists some binding of type variables b such that running INTERP-SIMPLE with the vector of guard
descriptors produced by TC-INFER-SIGNATURE and the actual parameters produces T.33 (Here, as in
33
In fact, the binding is immaterial, since no variables appear in the guard descriptors, but keeping the notion around allows
seamless discourse in terms of INTERP-SIMPLE.
156 Type Checking for Common Lisp
The Proof of Soundness
all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE when its
second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single descriptor
and value. )
Lemma TC-INFER-SIGNATURE-GUARD-OK
Note:
H1 establishes that fs is valid for the given clock.
H2 establishes that no guard violations are detected in the
course of analyzing the guard.
H3 establishes that the guard descriptors for all the functions in the
call tree of the guard are complete.
H4 establishes that the evaluation of the guard form under env terminates
without exhausting the clock.
H5 establishes that the evaluation of the guard form under env returns
a non-nil value.
In fact, we will not need H5 to establish the first conjunct of this goal, but there is no harm in stating the
lemma this way. The proof of this lemma utilizes two other significant lemmas, TC-INFER-OK and
TC-PREPASS-OK, the latter of which is stated when its application arises in the proof. TC-INFER-OK is
the soundness specification for TC-INFER, the recursive function which is the heart of the checker
algorithm. TC-INFER is discussed at length in Section 6.5. Lemma TC-INFER-OK appears and is
discussed further in Section 7.5 below, and its detailed proof is given in Appendix B.4. TC-PREPASS is
the pre-processor for Lisp forms which normalizes IF expressions so that the test components always
return Boolean values. It is discussed at length in Section 6.3. The detailed proof of Lemma TC-
PREPASS-OK appears in Appendix B.1.
Proof of Lemma TC-INFER-SIGNATURE-GUARD-OK
Now consider:
Lemma TC-PREPASS-OK
For any list of function signatures fs, Lisp world world,
non-negative integer clock, Lisp form form, and binding
environment env,
Recall the discussion in Section 7.2, in which we noted that the graph of a recursive function can be
represented as the union of the graphs of an infinite number of non-recursive functions, with the first
being the empty function, the second being the ordered pairs formed by cases where the function
terminates without recursion, the next being cases where it terminates with at most one recursion, and so
forth. For a given function f, we called these non-recursive functions ⊥, f1, f2, .. fn. Each function
definition fi is non-recursive in the sense that it is formed by replacing the recursive calls of f with calls to
fi−1. Since each of these functions is partial, yet retains the guard of f to satisfy the guard completeness
requirement, we adopted the clock semantics to support reasoning about them in the proof of Lemma
TC-SIGNATURE-OK and its main supporting lemma, TC-SIGNATURE-OK-1.
We will first give the simple proof for Lemma TC-SIGNATURE-OK, then present the statement and
proof of Lemma TC-SIGNATURE-OK-1.
Proof of Lemma TC-SIGNATURE-OK
(and
(not (break-out-of-timep (fooworld,clock(arg1 .. argn))))
(not (null
(E guard-form ((a1 . arg1) .. (an . argn)) world clock))) )
=>
(and
(not (break-guard-violationp (fooworld,clock(arg1 .. argn))))
for some k in [1..m]
for some binding b of type variables to Lisp values
covering tdk,1 .. tdk,n and tdk
(I (tdk,1 .. tdk,n tdk)
Lemma TC-SIGNATURE-OK-1
are denoted
((tc-td1,1 .. tc-td1,n tc-td1) .. (tc-tdl,1 .. tc-tdl,n tc-tdl))
for any world of Lisp functions world, including at least foo and
all the functions in the call tree of foo,
for any list of function signatures fs, including signatures for
at least all the functions in the call tree of foo, except foo
itself,
for any non-negative integer clock, for any Lisp values arg1 .. argn,
(and
H1 (valid-fs fs world clock)
H2 (and (not (equal (guard
(tc-infer-signature
foo (cons (infer-signature foo fs) fs)))
*guard-violation))
(not (equal (segments
(tc-infer-signature
foo (cons (infer-signature foo fs) fs)))
*guard-violation)))
H3 (tc-all-called-functions-complete guard-form fs)
H4 (tc-all-called-functions-complete+ body fs foo t)
H5 for all i in [1..l],
for some j in [1..m],
(contained-in-interface (*dlist tc-tdi,1 .. tc-tdi,n tc-tdi)
(*dlist tdj,1 .. tdj,n tdj))
H6 (well-formed-signature (gtd1 .. gtdn)
((tc-td1,1 .. tc-td1,n tc-td1)
..
(tc-tdl,1 .. tc-tdl,n tc-tdl))
n)
H7 (not (break-out-of-timep (fooworld,clock(arg1 .. argn))))
H8 (not (null (E guard-form ((a1 . arg1) .. (an . argn)) world clock))))
=>
(and
C1 (not (break-guard-violationp (fooworld,clock(arg1 .. argn))))
C2 for some k in [1..m]
for some binding b of type variables to Lisp values
covering tdk,1 .. tdk,n and tdk
(I (tdk,1 .. tdk,n tdk)
The containment algorithm is discussed at length in Section 6.8, and the proof of Lemma CONTAINED-
IN-INTERFACE-OK is presented in Section 7.8. We will use this lemma, Lemma TC-INFER-
SIGNATURE-GUARD-OK, and Lemma TC-INFER-OK in the proof of TC-SIGNATURE-OK-1.
Proof of Lemma TC-SIGNATURE-OK-1
break-out-of-time.
Inductive case:
Assuming the lemma for clock = c, we need to prove it holds for
clock = c + 1.
H9 (and
(not (break-guard-violationp (E guard-form env world c+1)))
For some binding b, (I (gtd1 .. gtdn) (arg1 .. argn) b) )
Lemma VALID-FS-CLOCK
H10 (and
Type Checking for Common Lisp 163
The Proof of Soundness
(good-signaturep foo
(gtd1 .. gtdn)
((td1,1 .. td1,n td1)
..
(tdm,1 .. tdm,n tdm))
world
c)
H11 (valid-fs (cons ((gtd1 .. gtdn) (infer-signature foo fs)) fs) c).
(tc-infer
(tc-prepass
body (cons ((gtd1 .. gtdn) (infer-signature foo fs)) fs))
((a1 . &1) .. (an . &n))
((a1 . (if gtd1 = *universal then &1 else gtd1))
..
(an . (if gtdn = *universal then &n else gtdn)))
(cons ((gtd1 .. gtdn) (infer-signature foo fs)) fs))
where &1 .. &n are new variables (i.e., variables not already
used in the analysis of foo; we choose these names arbitrarily
for the sake of exposition). So instantiate Lemma TC-INFER-OK,
with form = the first argument to TC-INFER, abs-alist = the second,
conc-alist = the third, fs = the fourth argument, world = world,
clock = c, env = ((a1 . arg1) .. (an . argn)),
and b = ((&1 . arg1) .. (&n . argn)). We relieve
its hypotheses. H11 is equal to H1’. H2’ is trivially true
under the binding b by simple expansion of I. H3’ follows directly
by observing H2’ and the second conjunct of H9, and expanding
I. H4’ is a conjunct of H2. H5’ is an immediate consequence of
H4. To establish H6’, expand the call to E in H7:
(and
(not (break-guard-violationp
(E (tc-prepass
body (cons ((gtd1 .. gtdn) (infer-signature foo fs)) fs))
env world c)))
for some i in 1..l,
for some binding b covering the descriptors below,
(and
(I (tc-tdi,1 .. tc-tdi,n tc-tdi)
(arg1 .. argn
(E (tc-prepass
body (cons ((gtd1 .. gtdn) (infer-signature foo fs)) fs))
env world c))
b)
(extends-binding b ((&1 . arg1) .. (&n . argn)))) )
(and
(not (break-guard-violationp (E body env world c)))
for some i in 1..l,
for some binding b covering the descriptors below,
(and (I (tc-tdi,1 .. tc-tdi,n tc-tdi)
(arg1 .. argn (E body env world c))
b)
(extends-binding b ((&1 . arg1) .. (&n . argn)))) )
H13 (and
(not (break-guard-violationp
(fooworld,c+1(arg1 .. argn))))
for some i in 1..l,
for some binding b,
(I (tc-tdi,1 .. tc-tdi,n tc-tdi)
b) )
Just to be complete, we make the following observation about the use of type variables. In the proof of
TC-INFER-OK below, and implicitly wherever we use this lemma (as we did in the previous proof), we
use the fact that whenever we utilize the VALID-FS predicate to obtain the segments for some function,
those segments are imported into the checker’s problem state with all fresh type variables. We claim that
these variables have never been introduced before and will never be introduced again. As a matter of
programming efficacy, the inference tool in fact does "reset" the counter on its variable generator
whenever a new function is introduced for analysis. This does not cause a problem, because no variables
from any previous problem state appear in the new problem state unless they are renamed on importation
of signature segments. From a formal viewpoint, though, we can consider that a type variable name is
actually a pair whose first component is the kind of name (&2, for instance) by which we have been
referring to them all along, and the second component is the name of the function being analyzed. Thus,
no variable is ever introduced twice as a "fresh" variable for the life of the operation of the system.
A similar trick is used to ensure that *REC descriptor labels are not duplicated. The labels are generated
by a pseudo-gensym function which forms them from the characters "!REC" followed by a counter. Since
*REC labels are not renamed when a signature is extracted from FS, this is not enough. So another string
of characters, unique for each function being processed, is attached to the end of the *REC label, ensuring
the uniqueness of labels for the life of the system operation.
As mentioned in Section 6.3, if the guard descriptor computed by the inference tool is different from the
one computed by the checker, the checker determines whether the inference tool descriptor may be used
instead of its own. This is done after the checker has generated its segments, so in any case it uses its own
guard descriptor for that purpose. The checker’s guard descriptor may be replaced by the inference tool
guard if the two are equivalent. This is determined by administering the containment test in both
directions. If each is contained in the other, the two are equivalent under interpretation by INTERP-
SIMPLE. The following lemma provides the formal justification.
166 Type Checking for Common Lisp
The Proof of Soundness
Lemma GUARD-REPLACEMENT-OK
(and
H1 (null (gather-vars-in-descriptor (*dlist td11.. td1n)))
H2 (null (gather-vars-in-descriptor (*dlist td21.. td2n)))
H3 (contained-in-interface (*dlist td11.. td1n)
(*dlist td21.. td2n))
H4 (contained-in-interface (*dlist td21.. td2n)
(*dlist td11.. td1n))
=>
(equal (I (td11.. td1n) (v1.. vn) b)
(I (td21.. td2n) (v1.. vn) b))
The following lemma, which we state without proof, establishes that fs, the initial state of function
signatures, is correct for all the subrs it represents. The initial fs appears in Appendix A.
Lemma INITIAL-FS-VALID
Where world is a world containing the initial set of primitive subrs,
and where initial-function-signatures is the list of function
signatures for the primitives,
The purpose of all the proofs in the preceding section was to illustrate that if the system database is a
VALID-FS initially, we obtain a VALID-FS by repeated applications of the tool. Lemma INITIAL-FS-
VALID establishes the base case for this line of reasoning, and Lemma TC-SIGNATURE-OK provides
the inductive step.
Type Checking for Common Lisp 167
The Proof of Soundness
The recursive function which is at the heart of the checker is TC-INFER. TC-INFER takes four
arguments, a Lisp FORM being analyzed, an abstract type alist ABS-ALIST, a concrete type alist CONC-
ALIST, and the database of FUNCTION-SIGNATURES. It returns a list of 3-tuples, where each tuple is
composed of a minimal segment, a concrete type alist, and a maximal segment. This function is discussed
at length in Section 6.5. Its formal specification is given by Lemma TC-INFER-OK, which states that if
FORM is evaluated in a context where the variables are bound to values which satisfy the descriptors in
both ABS-ALIST and CONC-ALIST, if TC-INFER does not return *GUARD-VIOLATION, if all the
guards in the call tree of form are complete, and if clock is sufficient to allow full evaluation of the form
in a given environment, then no guard violation will occur in the course of the evaluation, and some tuple
in the result is such that the variable values and the result of the evaluation satisfy both the minimal and
maximal segments in the tuple, and the variable values satisfy the descriptors in the concrete alist. The
type variable binding under which this is achieved is an extension (see Appendix B.3) of the one under
which the context values satisfied the two alist parameters.
Lemma TC-INFER-OK
For any Lisp form form, function signature list fs, Lisp world world
including all functions hereditarily in the call tree of form,
for any non-negative integer clock, type variable bindings b,
Lisp values arg1 .. argm,
Lisp variables a1 .. am,
binding environment env of the form ((a1 . arg1) .. (am . argm))
where a1 .. am include all the free variables in form,
ABS-ALIST of the form ((a1 . tda1) .. (am . tdam)),
CONC-ALIST of the form ((a1 . tdc1) .. (am . tdcm)),
and denoting
(tc-infer form abs-alist conc-alist fs)
by
(((mintd1,1 .. mintd1,m mintd1)
((a1 . tdconc1,1) .. (am . tdconc1,m))
(maxtd1,1 .. maxtd1,m maxtd1))
..
((mintdn,1 .. mintdn,m mintdn)
((a1 . tdconcn,1) .. (am . tdconcn,m))
(maxtdn,1 .. maxtdn,m maxtdn)))
Note:
168 Type Checking for Common Lisp
The Proof of Soundness
The top level conjecture for descriptor unification in the checker is characterized by the following lemma.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Lemma DUNIFY-DESCRIPTORS-INTERFACE-OK
For any descriptors tda and tdb, Lisp value v and fully
instantiating binding of type variables to Lisp values b,
(and (I tda v b)
(I tdb v b))
=>
(I (dunify-descriptors-interface tda tdb) v b)
So two lemmas, one characterizing the result of DUNIFY-DESCRIPTORS and the other for DAPPLY-
SUBST-LIST, will provide the foundation for the proof of DUNIFY-DESCRIPTORS-INTERFACE-OK.
Type Checking for Common Lisp 169
The Proof of Soundness
(and
H1 (interp-substs substs b)
H2 (I tda v b)
H3 (I tdb v b))
=>
for some i,
(and (interp-substs substsi b)
(I tdi v b))
Note: Recall that DUNIFY-DESCRIPTORS also takes a fourth argument, TERM-RECS, which is
employed solely as a mechanism for terminating the computation in certain cases. As such, it has no
effect on the soundness argument, which does not rely on termination, and to avoid clutter in the lemmas
and proofs, we simply omit mention of it. One could also think of it as being a universally quantified
variable in the lemma.
The INTERP-SUBSTS predicate characterizes the information which is carried by SUBSTS and by each
SUBSTSi. The substitution lists map variables to descriptors characterizing what we know about the type
of the value corresponding to the variable. INTERP-SUBSTS guarantees that this information is
consistent with the value bound to the variable in BINDINGS.
(DEFUN INTERP-SUBSTS (SUBSTS BINDINGS)
;; SUBSTS should be an alist whose keys are type variables and
;; whose associated values are well-formed descriptors.
;; BINDINGS should be an alist associating type variables to
;; well-formed descriptors.
(IF (NULL SUBSTS)
T
(AND (INTERP-SIMPLE-1 (CDR (CAR SUBSTS))
(CDR (ASSOC (CAR (CAR SUBSTS)) BINDINGS))
BINDINGS)
(INTERP-SUBSTS (CDR SUBSTS) BINDINGS))))
Note that the reverse implication in DUNIFY-DESCRIPTORS-OK does not hold. Consider an example
where the unifier loses information:
(DUNIFY-DESCRIPTORS
’(*REC FOO (*OR &1 (*CONS (*OR $INTEGER $T) (*RECUR FOO))))
’(*REC BAR (*OR &1 (*CONS (*OR $INTEGER $NIL) (*RECUR BAR))))
NIL)
= (by *REC Rule 0’)
(((*REC BIM (*OR *UNIVERSAL (*CONS $INTEGER (*RECUR BIM)))) . NIL))
carry the information necessary to relate the last tail of v to the binding of &1 in b.
Lemma DAPPLY-SUBST-LIST-OK
For any well-formed substitution s, descriptor td, binding b,
and value v, (where td can be a *dlist containing n descriptors
iff v is a value list of length n, in which case the call to I
is on the list of n descriptors)
Lemma DAPPLY-SUBST-LIST-OK says that if the substitution list S is consistent with the binding list
B, as determined by INTERP-SUBSTS, and if the value V satisfies a descriptor TD with respect to B, then
with respect to B, V also satisfies the descriptor formed by applying the substitution list S to the descriptor
TD.
Note that since we are not demonstrating a proof of termination, this is a partial correctness proof, since
we are not guaranteeing that the length of the computation is finite. (See the discussion in Section 6.6.)
Proof of Lemma DUNIFY-DESCRIPTORS-INTERFACE-OK
34
giving us
(pcanonicalize-descriptor
(*or (dapply-subst-list substs1 td1)
..
(dapply-subst-list substsn tdn)))
(I (pcanonicalize-descriptor
(*or (dapply-subst-list substs1 td1)
..
(dapply-subst-list substsn tdn)))
v b)
QED.
172 Type Checking for Common Lisp
The Proof of Soundness
Although the canonicalization is directed, each rule happens to be an equality, in the sense that td1 and td2
represent the same set of values under any interpretation by INTERP-SIMPLE. All that is required for
soundness of each rule "td1 ==> td2" is proof of an associated lemma of the form35:
For every value v and binding b,
(I td1 v b)
=>
(I td2 v b)
Since it would be tedious to restate each rule in this form, we simply provide the rule notation, but the
proof follows the INTERP-SIMPLE model. The rules are stated in Section 6.7, and the proofs are in
Appendix B.6. Each rule is annotated with the name of the function in which it is implemented.
It will be convenient to have a top-level lemma to which to appeal when encountering canonicalization.
Lemma PCANONICALIZE-DESCRIPTOR-OK
(I td v b)
=>
(I (pcanonicalize-descriptor td) v b)
By definition,
(PCANONICALIZE-DESCRIPTOR td) = (PREAL-CANONICALIZE-DESCRIPTOR td).
By definition, PREAL-CANONICALIZE-DESCRIPTOR calls
(PCANONICALIZE-DESCRIPTOR-1 td), checks to see if the
result = td, and, if so, returns it. If not, it calls itself
recursively, by way of two individual canonicalizations,
(PREAL-CANONICALIZE-DESCRIPTOR
(PFOLD-RECS-IF-POSSIBLE
(PCONSOLIDATE-UNIVERSAL-DESCRIPTORS result)))
35
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE when its second and third
arguments are lists, INTERP-SIMPLE-1 when these arguments are a single descriptor and value.
Type Checking for Common Lisp 173
The Proof of Soundness
(I td v b)
=>
(I (PREAL-CANONICALIZE-DESCRIPTOR td) v b)
(I td v b)
=>
(I (PCANONICALIZE-DESCRIPTOR-1 td) v b)
The containment algorithm has the task of demonstrating that b’ can be generated. If the algorithm cannot
determine how to do so, it returns a NIL result.
In the special case where TD1 and TD2 are variable-free, the binding is irrelevant. This is the case
handled by the subsidiary algorithm CONTAINED-IN. The task is the relatively simple one of
174 Type Checking for Common Lisp
The Proof of Soundness
determining whether a value’s membership in the set characterized by TD1 guarantees its membership in
the set corresponding to TD2.
As described in Section 6.8, if variables appear in the descriptors, the top level function
CONTAINED-IN-INTERFACE attempts to transform the arguments to variable-free forms and to employ
the CONTAINED-IN-INTERFACE algorithm. If it cannot do this, it employs the heuristic algorithm
VCONTAINED-IN to produce a mapping which represents a method for producing a B’ from any
satisfactory B. Then it validates that the mapping will work by calling a verified routine
MAPPINGS-DEMONSTRATE-CONTAINMENT, which manages the application of a checker algorithm
ICONTAINED-IN. Essentially, MAPPINGS-DEMONSTRATE-CONTAINMENT raises disjunctions in
TD1 to the top, as previously described, and then determines whether for each disjunct, the mapping
provides a method for constructing a binding which will establish containment. If so, CONTAINED-IN-
INTERFACE returns T, otherwise it returns NIL.
Let us leave for the moment further discussion of CONTAINED-IN-INTERFACE, except to note that it
makes a decision as to whether the problem can be handled by the variable-free containment function
CONTAINED-IN. We will state and prove lemmas about CONTAINED-IN and ICONTAINED-IN, then
return to the proof of CONTAINED-IN-INTERFACE. Here, as in all our other lemmas and proofs, "I" is
a shorthand notation representing INTERP-SIMPLE when its second and third arguments are lists,
INTERP-SIMPLE-1 when these arguments are a single descriptor and value.
For any descriptors td1 and td2, Lisp value v, and binding b1,
(and
H1 (null (gather-variables-in-descriptor td1))
H2 (null (gather-variables-in-descriptor td2))
H3 (contained-in td1 td2)
H4 (I td1 v b1))
=>
For some b2, (I td2 v b2)
Note: Recall that CONTAINED-IN also takes a third argument, TERM-RECS, which is employed solely
as a mechanism for terminating the computation in certain cases. As such, it has no effect on the
soundness argument, which does not rely on termination, and to avoid clutter in the lemmas and proofs,
we simply omit mention of it. One could also think of TERM-RECS as being a universally quantified
variable in the lemma. This lemma is proved in Appendix B.8.
For the case where we still have variables in our descriptors, VCONTAINED-IN is simply a heuristic
function serving as a witness for ICONTAINED-IN. Supplied with TD1 and TD2, VCONTAINED-IN
returns a mapping structure which is subsequently flattened into a list of simple mappings. (See the
discussion of mapping structures in Section 6.8.4.) Using equivalence-preserving canonicalization rules,
which have been proven correct, we lift *ORs in TD1. Furthermore, imagine a symbolic value reference
VREF which represents, on entry, the value to which we refer as V in Lemma CONTAINED-IN-
INTERFACE-OK. If for each disjunct TD1i, there exists some simple mapping Mj from our list which
can be applied to TD2 so that
(ICONTAINED-IN td1i (DAPPLY-SUBST-LIST mj td2) vref)
Type Checking for Common Lisp 175
The Proof of Soundness
holds, then we have shown containment, as we shall now prove, and CONTAINED-IN-INTERFACE
returns true. To show the soundness of this approach, we need to prove the following lemma:
Lemma ALL-DESCRIPTOR1-DISJUNCTS-OK-OK
For any descriptor td1 of the form (*or td11 .. td1n)
and descriptor td2, value reference vref, value v, and binding
b1 covering the variables of td1,
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 For all i, for some simple mapping m,
(and (well-formed-mapping m vref b1)
(icontained-in td1i (apply-subst m td2) vref))
H3 (I td1 v b1) )
=>
For some b2, (I td2 v b2)
This proof, of course, will require a statement of correctness of ICONTAINED-IN, which follows.
ICONTAINED-IN must be given a simple mapping which is well-formed in the context of the problem.
The predicate which determines well-formedness is WFF-MAPPING. The WFF-MAPPING predicate on
mappings is only strong enough to guarantee a mapping is syntactically consistent with the binding B and
the value reference being passed around.
(DEFUN WELL-FORMED-MAPPING (M B VREF)
;; M is assumed to be an alist whose keys are type variables.
(IF (NULL M)
T
(IF (OR (EQUAL (CDR (CAR M)) ’$NIL)
(EQUAL (CDR (CAR M)) ’$T)
(ASSOC (CDR (CAR M)) B)
(AND (VAR-REFP (CDR (CAR M)))
(SAME-ROOT (CDR (CAR M)) VREF)))
(WELL-FORMED-MAPPING (CDR M) B VREF)
NIL)))
For any descriptors td1 and td2, simple mapping m, Lisp value v,
symbolic value reference vref, and binding b1 covering the
variables in td1,
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 (well-formed-mapping m vref b1)
H3 (icontained-in td1 (dapply-subst-list-1 m td2) vref)
H4 (I td1 v b1) )
=>
For some b2, (I td2 v b2)
The proof of this lemma is given in Appendix B-G. With this lemma in hand, the proof of ALL-
176 Type Checking for Common Lisp
The Proof of Soundness
DESCRIPTOR1-DISJUNCTS-OK-OK is straightforward.
Proof of Lemma ALL-DESCRIPTOR1-DISJUNCTS-OK-OK
Lemma TC-UNIVERSALIZE-SINGLETON-VARS-1-OK
(I (tc-universalize-singleton-vars-1 td vlist) v b)
=>
For some b’ (I td v b’)
Lemma RENAMING-PRESERVES-I
(I td v b)
=>
for some b’ (I (subst &j &i td) v b’)
(equal <component of v> (cdr (assoc &j (subst &j &i b))))
QED.
Type Checking for Common Lisp 179
Testing the System
Chapter 8
TESTING THE SYSTEM
Not surprisingly, extensive testing of the system revealed its strengths and weaknesses very well. In this
chapter, we will present the test suite used to demonstrate the system, give an overview of the results
produced, and examine certain examples which illustrate the successes and shortcomings of the tool on
some representative functions. Along the way we will discuss how these results might suggest future
work.
The system was tested by placing it in its bare, initial state, described with the handful of signatures in
Appendix A, and then submitting a collection of almost 400 functions to the tool for analysis.
These functions are in three basic groups. The first is a group of functions which bootstrap the database
into a more useful collection of Lisp functions. These include some equality and membership functions,
some Boolean connectives, a family of recognizer functions which compose into increasingly complex
structures, and the family of CAR/CDR compositions (CADR, CDDR, CADDR, etc.). On this basis, we
introduce the second group, the non-mutually recursive functions from the axioms.lisp file of an early
developmental version of the Acl2 theorem prover of Boyer and Moore [Boyer 90]. The last group
consists mainly of functions which were conceived during the development process to test various aspects
of the system.
Many of the functions from axioms.lisp were modified so that the inference tool could produce better
signatures. This was done so that the signature base would be in better condition for evaluating later
results based on those signatures. In some cases, the original definition was submitted to show how the
tool performed, and then a modified definition was submitted as the basis for future reference. In a few
cases, a signature was composed by hand and summarily stuffed into the database, either because the
function could not be modified so that the tool could produce a satisfactory result, or because the author
lost patience with the process.
A number of other modifications to the axioms.lisp functions were made, and for a variety of reasons.
Almost all, if not all, of the changes fell under one of the following categories:
• Macros, such as AND, OR, LIST were expanded by hand, since macros are not part of the
accepted subset. This usually led to unattractive, verbose, and gratuitously inefficient code.
• Guards and IF tests were inserted, enhanced, or weakened. In some cases, this is because
the original function definitions actually had inadequate guards. In other cases, guards
which specified more than could be captured in the type system were weakened into type
predicates. For example, a guard like (EQ (CAR X) ’FOO) may have been transformed to
180 Type Checking for Common Lisp
Testing the System
(SYMBOLP (CAR X)). We attempted to comment all these cases in the examples.
• The above techniques and others were used to transform some functions frequently used as
guards or IF predicates into recognizers. This was to benefit from the increased accuracy of
the signature produced.
• Calls to NTH and UPDATE-NTH where the first argument is a constant were modified into
the appropriate CAR/CDR nests, since NTH’s signature yields no information about its
result. Examples of such modifications occur in functions like GLOBAL-TABLE and
UPDATE-GLOBAL-TABLE, which manipulate an object of a large state type employed in
the axioms.lisp code. There is no reason why the tool could not give NTH and UPDATE-
NTH special treatment and make these modifications in a prepass phase.
• Symbols with package names were changed to strings, since packaged atoms are not
handled. An example is in *INITIAL-GLOBAL-TABLE*.
• Global constants were changed to parameterless functions. Support for constants would be
trivial to implement, utilizing existing functionality in the tool.
• LET forms were transformed by dereferencing LET-bound variables, replacing each
occurrence with the terms to which they were bound. Thus, a form like
(LET ((X (FOO Y))) (CONS X X))
would be transformed to
(CONS (FOO Y) (FOO Y))
This led to great inefficiency, since types were computed repeatedly, particularly in cases,
like the ROUND function, where there are deeply nested LET constructs. This conversion,
though necessary to bring the function into the accepted subset, was extremely explosive.
• Signatures for COERCE and INTERN were introduced into the initial FUNCTION-
SIGNATURES. These signatures do not handle all the cases necessary to correspond to the
full Common Lisp definitions of the functions, but they were satisfactory for the arguments
with which those functions were invoked in the axioms code.
None of the functions from MUTUAL-RECURSION nests were included, since this definitional
capability is not supported in the inference system.
A simple driver program was used to submit this test suite to the inference tool. For each function, it
invokes a function NEW-FUNCTION, which simply calls the inference algorithm, then the signature
checker, and reports the result, including any accounts of failure. On the Symbolics, it also measures the
total time consumed on each invocation. Almost invariably, the larger portion of the time was spent in the
signature checker. Invoked at the end of the testbed run, the function CHECK-FUNCTION-
SIGNATURES reports on the condition of the signature database. For each function in the database, it
checks the well-formedness of the signature and reports the following results from the inference tool:
• Cases where the inference tool and/or the checker reported an incomplete guard descriptor
or when some function in the hereditary call tree of a function has an incomplete guard
descriptor,
• Cases where the checker’s guard descriptor was replaced by the inference tool guard
descriptor (where the descriptors are different, but containment holds in both directions).
• Cases where the checker’s signature segments are not contained in the inference algorithm
segments.
• Cases where the inference algorithm decreed a function to be a recognizer and the checker
disagreed.
The full results produced by the test suite are available via anonymous ftp. See Appendix H for
instructions on how to retrieve them. In the suite, some examples are preceded by comments, including
Type Checking for Common Lisp 181
Testing the System
functions from axioms.lisp which were modified from the original. The results are not annotated with
descriptive labels as they are in this chapter, but the signature components are in the same order as those
presented in annotated form in the next section.
The "Signature is certified sound" flag is provided as a quick reference for the reader. It signifies that the
checker has determined the guard is complete, that the guards of all the functions in the call tree are
complete, that the segment containment relation holds, and if the inference algorithm claims a function is
a recognizer, the checker agrees.
373 functions were submitted to the tool. For 231 of these functions, the inference tool generated
signatures which the checker subsequently validated to be sound.
The following things were noted for the 142 remaining functions:
26 functions were discarded by the inference algorithm because of
perceived guard violations.
Names: CASE-LIST CASE-LIST-CHECK REVERSE INTEGER-ABS-FOOBA
GETPROPS1 HAS-PROPSP1 DIMENSIONS MAXIMUM-LENGTH DEFAULT
ARRAY2P AREF1 COMPRESS1 AREF2 COMPRESS2 ASET2 OPEN-CHANNEL1
STATE-P1 OPEN-INPUT-CHANNEL-P1 OPEN-OUTPUT-CHANNEL-P1
MATCHFN FOO1 GOO1 GOO2 GOO4 GETPROPS
TRANSLATE-DECLARATION-TO-GUARD1
OPEN-OUTPUT-CHANNEL-ANY-P1 OPEN-INPUT-CHANNEL-ANY-P1
DELETE-PAIR SOME-SLASHABLE T-STACK-LENGTH1
32-BIT-INTEGER-STACK-LENGTH1 PUT-ASSOC-EQ FOO SYM-INP
APPLY-SUBST1 COND-MACRO
TRANSLATE-DECLARATION-TO-GUARD/INTEGER
Among the examples in which the inference algorithm detected a guard violation, in some, like
INTEGER-ABS-FOOBA, GETPROPS1 (presented in the next section), HASPROPS1, MATCHFN,
FOO1, GOO1, GOO2, and GOO4, there is a real guard violation in the function. In a number of other
cases, a problem arises because the tool cannot derive strong enough descriptors for forms used as guards
or IF tests to govern the violating call. For example, in CASE-LIST and CASE-LIST-CHECK, a call to
LEGAL-CASE-CLAUSESP is used as a guard. But because the definition of LEGAL-CASE-
CLAUSESP employs a test (EQ ’OTHERWISE (CAR (CAR TL))), it is not treated as a recognizer, and
its signature is therefore not sufficiently strong to define a safe type context for the operations within
CASE-LIST and CASE-LIST-CHECK. Another class of problem is exhibited by the guard violations in
DIMENSIONS, DEFAULT, ARRAY2P, AREF1, COMPRESS11, AREF2, ASET2, and COMPRESS2.
In these functions, a relation is used as a governor, either within the guard or in an IF test, and relations
yield no information as type predicates. A third class of guard violations is exhibited in STATE-P1,
OPEN-INPUT-CHANNEL-P1, and OPEN-OUTPUT-CHANNEL-P1, where the violation is caused by
the failure of the system to treat abstraction functions whose definitions are calls to destructor functions
Type Checking for Common Lisp 183
Testing the System
(like CADDR) in the same way as it treats the destructor function CADDR. As discussed in Section
4.4.4, the TYPE-PREDICATE-P algorithm within the inference algorithm gives special treatement to
certain destructor functions. If this treatment were extended to user-defined destructors like GLOBAL-
TABLE, the tool would be able to verify the guards in STATE-P1, et al.
Among the functions whose guard descriptors were judged incomplete by the checker but complete by the
inference algorithm, only the LENGTH function appears to get much use in subsequently defined
functions. Among the functions for which the inference algorithm and the checker agree on
incompleteness are many, like EQ, EQL (presented below), NTH, MEMBER, FLOOR, CEILING,
TRUNCATE, ROUND, MOD, REM, EXPT, DIMENSIONS, and a number of others which receive
subsequent use. The functions just named all have guards which either contain relations between formal
parameters, which have some value dependency, like (NOT (EQUAL X 0)), or which have a disjunction
on the types of distinct parameters, for example
(OR (EQLABLEP X) (EQLABLE-LISTP Y))
In none of these cases is there a way to fully capture the guard with a vector of descriptors, so the problem
does not reflect on the checker’s stringent requirement that a guard must be a conjunction of recognizer
calls on distinct parameters. Thus, of the 102 functions whose signatures the checker refused to
acknowledge for soundness, it is almost certain that the failure was due to guards which could not
legitimately be captured by the type system, regardless of the strength of the heuristics used.
A number of different reasons accounted for the failure of the segment containment check in 17 cases.
Perhaps the most common cause of failure was dissonance in the use of type variables between the
inference algorithm and the checker. If multiple occurrences of variables in the original signature do not
lead to checker segments with similar multiple occurrences, containment is almost sure to fail. A simple,
but characteristic example is GOO6, which we will display below, but a similar problem is shared by
XXXJOIN-2, ASET-32-BIT-INTEGER-STACK, DECREMENT-BIG-CLOCK, READ-IDATE, and
READ-RUN-TIME. Another common problem, which afflicted SUBST, APPLY-SUBST1, FOOP, and
UPDATE-NTH was the absence of an applicable *REC rule in the heuristic VCONTAINED-IN
algorithm. In any such case, it is likely a sound rule could be supplied. Similarly, the lack of a good
DUNIFY-DESCRIPTORS *REC rule caused a problem with OPEN-CHANNEL-LST-2 and OPEN-
CHANNELS-P-2. Infinite loops involving *REC descriptors in the DUNIFY-DESCRIPTORS were
resolved by the TERM-RECS mechanism while processing OPEN-CHANNEL-LST-2, OPEN-
CHANNELS-P-2, STATE-P, FOO7, likely leading to creation by default of overly inclusive descriptors
which contributed to containment failure. Some sort of bug in the inference algorithm, probably related to
the closure of *FIX forms, led to an incorrect signature being generated for MAKE-LIST-AC which the
checker correctly refused to validate. Finally, the ARRAY1P, given below, example illustrates some sort
of problem with the factoring of *OR, where a checker segment had a result type of (*OR $NIL $T), but
the original segments all had result types of either $T or $NIL, causing containment to fail.
For three of the five functions whose recognizer status was rejected by the checker, this failure was due to
the failure of the segment containment test. For the other two, PROPER-CONSP-2 and NON-T-NIL-
SYMBOL-PLISTP, the lack of a suitable *REC rule for unification or containment seemed to play a role.
For these two, their signatures were otherwise valid, and thus could be used for subsequent sound
inferences.
Of the functions which were resubmitted, most were functions from axioms.lisp which invoked either the
functions NTH or UPDATE-NTH, where the numeric argument was a constant. Since it was known that
the signatures for NTH or UPDATE-NTH yield no information about the result, the original functions
were submitted for illustration and then modified to have the calls to NTH or UPDATE-NTH replaced
184 Type Checking for Common Lisp
Testing the System
with appropriate CAR/CDR/CONS operations. The results make for interesting comparison and highlight
that NTH and UPDATE-NTH would be good candidates for special treatment in a prepass.
Of the functions for which hand-coded signatures were jammed into the database, nine were functions for
which the inference tool had signalled guard violations, two were functions whose signatures never
stabilized, and one for a function whose signature had an incomplete guard. For the remainder, we judged
by inspection of the function that similar problems would befall, and chose simply to not bother with the
original submission.
We believe these results are quite encouraging. In spite of the known inadequacies of the inference
algorithm, for almost two-thirds of the functions submitted, the tool generated sound signatures. Some of
these examples were designed to fail, via guard violation, for instance, or by use of guards known not to
be complete. A user who attempted to use the complete guard regimen wherever possible would likely
receive significant assistance from the tool in the task of guard verification. Moreover, for those functions
where guards would need to express more than could be captured in the type system, a highly symbiotic
relationship is easily imagined for the inference system and a general-purpose, automatic theorem prover
capable of handling those cases.
When displayed, the timings are from execution on a Symbolics 3645 machine. The system runs much
faster on a Sun Sparc platform. For some functions, notably large and complex functions manipulating
the huge state types of axioms.lisp, the amount of time consumed by the tool was significant, occasionally
exceeding a half hour.
The tool performed quite well on recognizer functions. Some examples follow.
(DEFUN TRUE-LISTP (X)
(IF (NULL X) (NULL X) (IF (CONSP X) (TRUE-LISTP (CDR X)) NIL)))
Function: TRUE-LISTP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
-> $T)
(((*REC !REC1
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL (*RECUR !REC1)))))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor:
(*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
TC validates recognizer: T
Signature is certified sound: T
Time: 2. seconds
(IF (RATIONALP X)
(RATIONALP X)
(IF (SYMBOLP X) (SYMBOLP X) (CHARACTERP X))))
Function: EQLABLEP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $T))
-> $T)
(((*OR $STRING (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor:
(*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $T)
TC validates recognizer: T
Signature is certified sound: T
Time: 3. seconds
Function: EQLABLE-ALISTP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*REC EQLABLE-ALISTP
(*OR $NIL
(*CONS (*CONS (*OR $CHARACTER $INTEGER $NIL
$NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $T)
*UNIVERSAL)
(*RECUR EQLABLE-ALISTP)))))
-> $T)
(((*REC !REC1
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL (*RECUR !REC1))
(*CONS (*OR $CHARACTER $INTEGER $NIL
$NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T
(*CONS (*OR $STRING
(*CONS *UNIVERSAL *UNIVERSAL))
*UNIVERSAL))
*UNIVERSAL))))
-> $NIL)
TC segments contained in Segments: T
186 Type Checking for Common Lisp
Testing the System
Recognizer descriptor:
(*REC EQLABLE-ALISTP
(*OR $NIL
(*CONS (*CONS (*OR $CHARACTER $INTEGER $NIL
$NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$T)
*UNIVERSAL)
(*RECUR EQLABLE-ALISTP))))
TC validates recognizer: T
Signature is certified sound: T
Function: COND-CLAUSESP-1
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*REC COND-CLAUSESP-1
(*OR $NIL
(*CONS (*CONS *UNIVERSAL
(*OR $NIL (*CONS *UNIVERSAL $NIL)))
(*RECUR COND-CLAUSESP-1)))))
-> $T)
(((*REC !REC2
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL (*RECUR !REC2))
(*CONS (*OR $CHARACTER $INTEGER $NIL
$NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T
(*CONS
*UNIVERSAL
(*OR $CHARACTER $INTEGER
$NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS
*UNIVERSAL
(*OR $CHARACTER $INTEGER
$NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL *UNIVERSAL)
)))))
*UNIVERSAL))))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor:
(*REC COND-CLAUSESP-1
(*OR $NIL
(*CONS (*CONS *UNIVERSAL
(*OR $NIL (*CONS *UNIVERSAL $NIL)))
Type Checking for Common Lisp 187
Testing the System
(*RECUR COND-CLAUSESP-1))))
TC validates recognizer: T
Signature is certified sound: T
Time: 8. seconds
This kind of result continued to scale up to enormous structures, as witnessed by the examples WORLDP,
KNOWN-PACKAGE-ALISTP, TIMER-ALISTP, OPEN-CHANNEL1-2, OPEN-CHANNEL-LST-2,
READABLE-FILE-2, READABLE-FILES-LST-2, WRITTEN-FILE-2, WRITTEN-FILE-LST-2, READ-
FILE-2, READ-FILE-LST-2, WRITEABLE-FILE-2, WRITEABLE-FILE-LST-2. (Most of these
functions were modified, for instance by replacing calls to NTH to the proper CAR/CDR nests.)
(DEFUN PROPER-CONSP (X)
(IF (CONSP X)
(IF (CONSP (CDR X)) (PROPER-CONSP (CDR X)) (EQUAL (CDR X) NIL))
NIL))
Function: PROPER-CONSP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*REC !REC1 (*CONS *UNIVERSAL (*OR $NIL (*RECUR !REC1)))))
-> $T)
(((*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*REC !REC3
(*CONS *UNIVERSAL
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*RECUR !REC3))))))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor:
(*REC !REC1 (*CONS *UNIVERSAL (*OR $NIL (*RECUR !REC1))))
TC validates recognizer: T
Signature is certified sound: T
Time: 8. seconds
Function: IMPROPER-CONSP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*REC !REC2
(*CONS *UNIVERSAL
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T (*RECUR !REC2)))))
188 Type Checking for Common Lisp
Testing the System
-> $T)
(((*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*REC !REC4
(*CONS *UNIVERSAL (*OR $NIL (*RECUR !REC4))))))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor:
(*REC !REC2
(*CONS *UNIVERSAL
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T (*RECUR !REC2))))
TC validates recognizer: T
Signature is certified sound: T
Time: 6. seconds
Function: IMPLIES
Guard computed by the tool:
(*UNIVERSAL *UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL *UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $T)
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
$NIL)
-> $NIL)
(($NIL *UNIVERSAL) -> $T)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Time: 7. seconds
Unfortunately, the guard for EQL is not an acceptable type predicate, since it is a disjunction, rather than a
conjunction, of recognizer calls on distinct arguments. (The guard is the macro expansion of
(OR (EQLABLEP X) (EQLABLEP Y)).) So the guard descriptors default to *UNIVERSAL, and the
guard is not complete. This shortcoming will frequently percolate through applications as an inability to
guarantee the guard verification
Notice the form (*FREE-TYPE-VAR 1). This is just the form in which type variables are stored in the
system database. Each distinct variable will have a different integer within this representation.
(DEFUN EQL (X Y)
(DECLARE (XARGS :GUARD (IF (EQLABLEP X) (EQLABLEP X) (EQLABLEP Y))))
(EQUAL X Y))
Type Checking for Common Lisp 189
Testing the System
Function: EQL
Guard computed by the tool:
(*UNIVERSAL *UNIVERSAL)
Guard complete: NIL
All called functions complete: T
TC Guard:
(*UNIVERSAL *UNIVERSAL)
TC Guard complete: NIL
TC All called functions complete: NIL
TC Guard Replaced by Tool Guard: NIL
Segments:
(($NIL $NIL) -> $T)
(($T $T) -> $T)
(((*FREE-TYPE-VAR 1) (*FREE-TYPE-VAR 1)) -> $T)
(((*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING (*CONS *UNIVERSAL *UNIVERSAL))
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
$NIL)
-> $NIL)
(($T (*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: NIL
Time: 8. seconds
TWICE-GUARDED is a lovely example. In the (CONSP X) case, the tool deduces that the inner
recursive call can return only NIL, and likewise for the outer one. Then, seeing that the result type for the
CONSP case is the same as for the (ATOM X) case (which because of the (TRUE-LISTP X) guard has
been narrowed to (NULL X)), and since there is only one argument, it merges the (NULL X) and
(CONSP X) segments by disjoining their arguments with *OR. When it canonicalizes this result, it
notices that it can fold the *OR back into the *REC descriptor for TRUE-LISTP.
(DEFUN TWICE-GUARDED (X)
(DECLARE (XARGS :GUARD (TRUE-LISTP X)))
(IF (ATOM X) X (TWICE-GUARDED (TWICE-GUARDED (CDR X))))))
Function: TWICE-GUARDED
Guard computed by the tool:
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
Guard complete: T
All called functions complete: T
TC Guard:
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
The CADR function illustrates how one must coddle the tool to help it establish the formal soundness
claim. First we show the function as normally defined (but renamed to ORIG-CADR). The guard is the
190 Type Checking for Common Lisp
Testing the System
form
(OR (NULL X) (AND (CONSP X) (OR (NULL (CDR X)) (CONSP (CDR X)))))
Function: ORIG-CADR
Guard computed by the tool:
((*OR $NIL
(*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $NIL
(*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))))
TC Guard complete: NIL
TC All called functions complete: NIL
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $NIL (*CONS *UNIVERSAL $NIL)))
-> $NIL)
(((*CONS *UNIVERSAL (*CONS (*FREE-TYPE-VAR 1) *UNIVERSAL)))
-> (*FREE-TYPE-VAR 1))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: NIL
The inference algorithm is clever enough to realize that the guard is complete. But the guard expression
does not conform to the requirement, imposed by the checker, that a guard must be a conjunction of
recognizer calls on distinct arguments in order to generate complete descriptors. Thus, the "TC Guard
complete" flag is NIL, and likewise for the "TC All called functions complete" flag, which can only be T
if the "TC Guard complete" flag is T.
However, if we take the guard expression and make it the body of a function, CADR-GUARD, the tool
will determine that the function is a recognizer and generate a signature which it certifies to be sound.
(DEFUN CADR-GUARD (X)
(IF (NULL X) (NULL X)
(IF (CONSP X)
(IF (NULL (CDR X)) (NULL (CDR X)) (CONSP (CDR X))) NIL)))
Function: CADR-GUARD
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $NIL
(*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))))
-> $T)
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
Type Checking for Common Lisp 191
Testing the System
$STRING $T
(*CONS *UNIVERSAL
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T))))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor:
(*OR $NIL
(*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL))))
TC validates recognizer: T
Signature is certified sound: T
Then, replacing the old guard with a call to this function enables the checker to validate that the signature,
which is identical to the one generated for ORIG-CADR, is sound. Certainly, relaxing the checker’s
complete guard criteria would make the system a bit more usable. But modulo a little inconvenience, no
functionality is lost with the current implementation.
(DEFUN CADR (X)
(DECLARE (XARGS :GUARD (CADR-GUARD X)))
(CAR (CDR X)))
Function: CADR
Guard computed by the tool:
((*OR $NIL
(*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $NIL
(*CONS *UNIVERSAL (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $NIL (*CONS *UNIVERSAL $NIL)))
-> $NIL)
(((*CONS *UNIVERSAL (*CONS (*FREE-TYPE-VAR 1) *UNIVERSAL)))
-> (*FREE-TYPE-VAR 1))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
There are a number of cases in the testbed where examples were modified and then resubmitted with
guards in a form amenable to completeness. Look for functions whose names end with "-GUARD".
As previously discussed, the lack of a type-instantiable variable within *REC descriptors leaves
something to be desired in the signature for the APPEND function.
(DEFUN APPEND (X Y)
(DECLARE (XARGS :GUARD (TRUE-LISTP X)))
(IF (NULL X) Y (CONS (CAR X) (APPEND (CDR X) Y))))
Function: APPEND
Guard computed by the tool:
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
((*REC TRUE-LISTP (*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP))))
*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
192 Type Checking for Common Lisp
Testing the System
Segments:
(($NIL (*FREE-TYPE-VAR 1)) -> (*FREE-TYPE-VAR 1))
(((*CONS (*FREE-TYPE-VAR 2)
(*REC TRUE-LISTP
(*OR $NIL (*CONS *UNIVERSAL (*RECUR TRUE-LISTP)))))
(*FREE-TYPE-VAR 1))
-> (*CONS (*FREE-TYPE-VAR 2)
(*REC !REC4
(*OR (*FREE-TYPE-VAR 1)
(*CONS *UNIVERSAL (*RECUR !REC4))))))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Notice that the *UNIVERSAL within !REC4 represents the element types of the list X. If the second
segment could employ a different type of variable, for example:
(((*CONS (*NEW-TYPE-VAR 2)
(*REC TRUE-LISTP
(*OR $NIL
(*CONS (*NEW-TYPE-VAR 2) (*RECUR TRUE-LISTP)))))
(*FREE-TYPE-VAR 1))
-> (*CONS (*NEW-TYPE-VAR 2)
(*REC !REC4
(*OR (*FREE-TYPE-VAR 1)
(*CONS (*NEW-TYPE-VAR 2) (*RECUR !REC4))))))
then the signature for APPEND-INTLISTS below could carry into the body of the *REC descriptor in its
result type the information that the elements from the first parameter were integers. This would be a
worthy upgrade to the system. As it is, we must be content with:
(DEFUN APPEND-INTLISTS (X Y)
(DECLARE (XARGS :GUARD
(IF (INTEGER-LISTP X) (INTEGER-LISTP Y) NIL)))
(APPEND X Y))
Function: APPEND-INTLISTS
Guard computed by the tool:
((*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
Guard complete: T
All called functions complete: T
TC Guard:
((*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP))))
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($NIL (*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> (*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(((*CONS $INTEGER
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
(*REC INTEGER-LISTP
(*OR $NIL (*CONS $INTEGER (*RECUR INTEGER-LISTP)))))
-> (*CONS $INTEGER
(*REC !REC4-G1568
(*OR (*REC INTEGER-LISTP
Type Checking for Common Lisp 193
Testing the System
(*OR $NIL
(*CONS $INTEGER
(*RECUR INTEGER-LISTP))))
(*CONS *UNIVERSAL (*RECUR !REC4-G1568))))))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
As mentioned in the discussion of the SAD-BUT-TRUE-LISTP example in Section 4.4.7, the tool can do
a less than adequate job when a function which is not a recognizer decomposes its argument recursively.
A good example of this phenomenon is the function STANDARD-CHAR-LISTP, below. Ideally, the
signature would have included a segment mapping a list of characters to $T or $NIL, and another segment
mapping everything else to $NIL. But since we do not construct *FIX forms on the left hand side of our
segments, we wind up with a signature which suggests $T as a possible result when the argument is, for
example, the CONS of a character onto an integer. There are several other examples in the test suite
illustrating this phenomenon, among them LAST, FLIP, and COND-CLAUSESP (which also shows the
futility of using a LENGTH test as a structure constraint in restricting TRUE-LISTP types).
(DEFUN STANDARD-CHAR-LISTP (L)
(IF (CONSP L)
(IF (CHARACTERP (CAR L))
(IF (STANDARD-CHAR-P (CAR L))
(STANDARD-CHAR-LISTP (CDR L)) NIL)
NIL)
(EQUAL L NIL)))
Function: STANDARD-CHAR-LISTP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*CONS $CHARACTER *UNIVERSAL)) -> (*OR $NIL $T))
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
(($NIL) -> $T)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: NIL
The signature for STANDARD-CHAR-LISTP is not certified sound because the signature for MEMBER,
which has an incomplete guard, is not certified.
Next is an example from axioms.lisp where the inference algorithm discovers a guard violation. Other
examples in the test suite where guard violations are detected include the following.
(DEFUN GETPROPS1 (ALIST)
(IF (NULL ALIST) NIL
(IF (NULL (CDR (CAR ALIST))) (GETPROPS1 (CDR ALIST))
(CONS (CONS (CAR (CAR ALIST)) (CAR (CDR (CAR ALIST))))
(GETPROPS1 (CDR ALIST))))))
Analysis terminated
Result:
(*UNABLE-TO-TYPE*
((*UNIVERSAL) T T
(*GUARD-VIOLATION
((CAR ALIST) *MARKER-6
((ALIST *OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL *UNIVERSAL)))
((*MARKER-5) &7))
(($CHARACTER) ($CHARACTER)))
NIL))
Next we show two examples where the checker could not confirm segment containment and therefore
refused to validate the soundness of a signature. The first suggests a problem in the checker algorithm
with carrying variables into its final result. The checker’s strategy of carrying both abstract and concrete
type alists and of generating both minimal and maximal segments enabled it to handle almost all the cases
with which we tested it. But this case is evidence that there is still some price being paid for not having a
descriptor form, like (*BOTH &1 $CHARACTER), which provides annotation of a type variable with a
descriptor characterizing a type constraint on the variable.
(DEFUN GOO6 (X)
(DECLARE (XARGS :GUARD (CONSP X)))
(IF (EQUAL (CAR X) 0) 0 (CONS (CAR X) NIL))))
Function: GOO6
Guard computed by the tool:
((*CONS *UNIVERSAL *UNIVERSAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*CONS *UNIVERSAL *UNIVERSAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*CONS $INTEGER *UNIVERSAL))
-> $INTEGER)
(((*CONS (*FREE-TYPE-VAR 1) *UNIVERSAL))
-> (*CONS (*FREE-TYPE-VAR 1) $NIL))
TC segments contained in Segments: NIL
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: NIL
The second suggests a problem with the factorization of an *OR descriptor. If the troublesome segment
were factored into two segments, one with a result type of $NIL and the other $T, each would have been
Type Checking for Common Lisp 195
Testing the System
contained in one of the original signature segments. This factorization would be a relatively easy upgrade,
and its formal justification would be straightforward. But there might be a significant performance
penalty, and we would have to weigh whether it is justified by the rare occasion where benefit is derived.
(DEFUN ARRAY1P (NAME L)
(IF (SYMBOLP NAME)
(IF (ALISTP L)
(IF (KEYWORD-LISTP (CDR (HEADER NAME L)))
(IF (TRUE-LISTP (DIMENSIONS NAME L))
(IF (EQUAL (LENGTH (DIMENSIONS NAME L)) 1)
(IF (INTEGERP (CAR (DIMENSIONS NAME L)))
(IF (INTEGERP (MAXIMUM-LENGTH NAME L))
(IF (< 0 (CAR (DIMENSIONS NAME L)))
(IF
(< (CAR (DIMENSIONS NAME L))
(MAXIMUM-LENGTH NAME L))
(IF
(<= (MAXIMUM-LENGTH NAME L)
(*MAXIMUM-POSITIVE-32-BIT-INTEGER*))
(BOUNDED-INTEGER-ALISTP L
(CAR (DIMENSIONS NAME L)))
NIL)
NIL)
NIL)
NIL)
NIL)
NIL)
NIL)
NIL)
NIL)
NIL))
Function: ARRAY1P
Guard computed by the tool:
(*UNIVERSAL *UNIVERSAL)
Guard complete: T
All called functions complete: NIL
TC Guard:
(*UNIVERSAL *UNIVERSAL)
TC Guard complete: T
TC All called functions complete: NIL
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $NIL $NON-T-NIL-SYMBOL $T)
(*OR (*CONS (*CONS *UNIVERSAL *UNIVERSAL)
(*REC ALISTP
(*OR $NIL
(*CONS (*CONS *UNIVERSAL *UNIVERSAL)
(*RECUR ALISTP)))))
(*REC !REC1
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL (*RECUR !REC1))
(*CONS (*OR $CHARACTER $INTEGER $NIL
$NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T)
*UNIVERSAL)))
(*REC ALISTP
(*OR $NIL
(*CONS (*CONS *UNIVERSAL *UNIVERSAL)
(*RECUR ALISTP))))))
-> $NIL)
(((*OR $NIL $NON-T-NIL-SYMBOL $T) $NIL)
-> $T)
196 Type Checking for Common Lisp
Testing the System
Chapter 9
FUTURE WORK AND CONCLUSION
In this chapter we summarize and comment upon the significance of this research and then discuss several
avenues for its extension.
9.1 Summary
In this report we have presented a system for inferring type signatures for functions defined in a purely
applicative subset of Common Lisp. We defined a mechanism which allows a user to annotate his
function with a guard declaration, which can be an arbitrary predicate on the function’s parameters, but
which is most fruitfully used here for stating the type requirements on his parameters. The guard
mechanism has an advantage over the SATISFIES predicate of Common Lisp that it can be used to state
required relations among several parameters, but this usage is not well supported by our type system.
Starting from a very small library of pre-defined function signatures, the system accepts new function
definitions, checks that the type constraints embodied in guards are observed on function calls, and
assigns a type signature to any function for which no guard violation is diagnosed.
The type inference algorithm is heuristic and has no formal model, but the signatures it produces have a
formal semantics. We have defined a checker algorithm which validates that a signature produced by the
inference algorithm is correct with respect to our evaluation model for Lisp and an interpreter semantics
for our type descriptors. For the case where the guards of a function and all its subsidiary functions are
completely captured by type descriptors, we presented a proof establishing that any signature certified by
the checker is correct. By submitting a significant collection of typical (and some atypical) functions to
the system, we have subjectively observed that the signatures produced are "good", in addition to being
sound.
This system differs significantly from previous type inference efforts. Other type inference systems for
Lisp have generally focussed on automatically annotating a program to supply compiler directives, with
the goal being to reduce the need for run-time type checking. They have operated only with respect to a
known set of previously declared types, mainly the native types for atomic Lisp objects, though some
recent work supports record types declared with DEFSTRUCT and an abstraction for simple lists.
Though some of these systems offer a more extensive treatment of primitive types and operate on a larger
subset of the Common Lisp language, judging from the literature, our system represents several
significant steps over any previous work:
• Rather than producing annotations on forms, it generates complete function signatures
which capture ad hoc polymorphism and modularize results at function boundaries.
• It successfully infers and manipulates arbitrary recursive types which have not been
198 Type Checking for Common Lisp
Future Work and Conclusion
previously declared.
• It performs the static diagnosis of type errors and, under certain constraints on the style of
expression used in guards, can certify that a function is free from type errors.
• Its results are validated by a rigorous mathematical proof with respect to a formal evaluation
model and a formal semantics for type signatures.
This system also differs significantly, both in style and substance, from other type systems which do
satisfy the above criteria (along with many other marvelous formal properties) but operate on languages
other than Lisp. Truly formidable type systems have been created for languages carefully designed with
type inference in mind. But to our knowledge, no other such system works on a language which treats
data structures as freely as Common Lisp. In particular, our type system supports the creation and
recognition of objects of virtually arbitrary new and undeclared structure, such as can be freely
constructed with Lisp CONS, while simultaneously supporting the ad hoc polymorphism inherent in
Lisp’s IF.
Given that the type inference problem for our Common Lisp subset is undecidable, this system has a
strongly heuristic flavor. The separation of the heuristics in the type inference algorithm and the checker
from the formal validation of its results allows for incremental improvement or complete revision in its
heuristic techniques without imposing any new proof requirement, so long as the semantic model remains
stable.
The formal underpinnings of this system and its ability to deal with recursive functions suggest a
potentially symbiotic relationship with other formal, mechanical tools for analysis of Lisp functions. In
particular, powerful theorem provers which utilize Lisp-like logics are available to support mathematical
modelling of computing systems, and this type system could play the role of a specialized proof assistant
for type-related problems. Alternatively, this system could conceivably support a Lisp compiler or Lisp
system development tools in ways other Lisp type inference systems could not, by providing reliable type
annotations for complex, freely constructed structures.
This work could be extended in a number of ways. Three threads of work are apparent:
• re-implementing or adding certain features and heuristics to improve its performance on the
task as currently defined,
• extending the system to handle an enlarged language subset, and
• finding fruitful ways to utilize the system in a larger system development context.
We will discuss each thread separately.
Some of these possibilities have been previously mentioned in the report, but we collect them here and
add a few new suggestions. Some amount to little more than minor enhancements to the system, while
others suggest new approaches to particular problems which might be more fruitful than those currently
implemented.
Type Checking for Common Lisp 199
Future Work and Conclusion
Probably the single most important feature which could be added to the system would be a second class of
type variable, instantiable with type descriptors rather than singleton values. This has been previously
discussed with respect to the APPEND example. Such a variable could appear within a replicating
component of a *REC descriptor, for instance. Denoting it by %1 in the following example, a signature
segment for APPEND might be:
(((*CONS %1 (*REC TRUE-LISTP
(*OR $NIL (*CONS %1 (*RECUR TRUE-LISTP)))))
&2)
-> (*CONS %1 (*REC !REC4 (*OR &2 (*CONS %1 (*RECUR !REC4))))))
Thus, with the help of a little canonicalization, if we called APPEND with two lists of integers, we could
infer that the result would be a list of integers rather than a true list terminating in a list of integers.
Another significant effort would be to extend the descriptor language with a *BOTH form, whose first
component would be a (normal) type variable and whose second component would be a descriptor. Any
value satisfying the form would need to satisfy both the variable (i.e., the binding of the type variable
would need to equal the value) and the descriptor. Such a construct would obviate the need for the
checker’s schizophrenic strategy of carrying both abstract and concrete type alists and generating both
minimal and maximal segments. It would also remove the need for the *SUBST form and the restrictions
result component in the inference algorithm’s unification procedure. But we should not underestimate the
possible difficulty of implementing such a construct, even though its semantics in an INTERP-SIMPLE
setting are straightforward. This construct was considered during the development of the inference
algorithm, and the difficulties it presented were simply too challenging to face simultaneously with the
other problems confronted in the developmental prototype.
in type predicates resulted in a limitation in the power of the inference algorithm. The fundamental
problem is that the linear representation of a type alist or a guard descriptor does not allow for the
expression of a relation like the one above. A potentially worthwhile direction of inquiry would be to
study a different representation of type alists and guard descriptors, perhaps employing a clausal form
(disjunctive normal form, for instance). This would allow for freer use of negation of a type alist and
enlarge the class of forms which could be used to generate complete guard descriptors. Because of the
pervasiveness of guard descriptors and type alists, it is also an issue which could have root and branch
impact on the entire system, and careful study would be required to measure the cost-benefit tradeoffs.
200 Type Checking for Common Lisp
Future Work and Conclusion
A potential benefit to the system is that this alternate representation might allow a guard expression or
type predicate to be cleanly factored into type-specific components and other residual predicates, so that
type-specific problems (like type inference) could be cleanly handled by the type system, and other
problems by some other mechanism. This suggests an approach to guard verification, blending the
capabilities of the type system and other methods, so that the support for guard verification could be
generalized to arbitrary predicates on parameters.
The system as designed seeks to verify guards supplied by the user. It would be interesting to investigate
how the tool could be put in a mode in which it generates guards instead. That is, when a new function is
submitted (without a guard), the system could attempt to combine context information in the function
body with the guard requirements from all the functions called in the body to generate a guard expression
which, when holding for the parameters, would guarantee the absence of guard violations in the body.
This would involve only the addition of a new heuristic algorithm, since the process of signature
validation would be identical to that used now.
The unification algorithm in the checker, with its merging of substitutions and <descriptor>/<subst> result
form, is a much more satisfying and tractable treatment of variables than that of the inference algorithm’s
unification routine, which combines *SUBST annotations and an extra "restrictions" component in its
result. Short of implementing the *BOTH descriptor described above, it would be worthwhile to
determine if unification in the inference algorithm could be re-implemented to use the checker’s strategy.
The challenge would be to see if the checker’s strategy could support the extra descriptor forms
manipulated in the inference algorithm.
As was mentioned with the STANDARD-CHAR-LISTP example in the previous chapter, as well as in the
discussion of the SAD-BUT-TRUE-LISTP example in Section 4.4.7, the *FIX heuristic is only
constructed on the right hand, or result, side of a segment, when it would sometimes be appropriate to
construct it on the left hand, or argument, component as well. We believe the approach could be
generalized, so that *FIX descriptors would come into play when functions decompose their arguments
recursively, as well as when they construct their results recursively. It is conceivable that this approach
could render the special algorithm for the treatment of recognizer functions unnecessary. But in any case
it is likely that for functions whose guard does not specify a recursively typed argument, but whose body
nevertheless unwinds an argument recursively, we could see a vast improvement in the quality of
signatures produced by the inference algorithm.
The run-time performance of the checker leaves much to be desired. Given the consistent choice of
conceptual simplicity over computational efficiency, this is to be expected. But if computational overhead
becomes a serious deterrent to the employment of the type system, a good place to look for improvement
would be in the checker. It is quite possible that some simple steps could be taken to contain the
explosion of cases without involving extensive formal complication, perhaps by merging segments
produced in intermediate results. This would likely require a pragmatic approach, utilizing extensive
testing to see if such steps would result in a higher rejection rate by the checker.
There is some inconvenience to the user in the checker’s insistence that a guard must be a conjunction of
recognizer calls on distinct parameters in order to result in complete descriptors. That the inference
algorithm has more liberal criteria is an indication that the checker could be more lenient. Again, some
extra pain in the formal analysis might result in a relaxation of the checker’s requirement. Though the
CADR/CADR-GUARD example in the preceding chapter illustrates there is likely no true loss of
functionality with the current state of affairs, there is obviously some extra overhead for the user to
conform to the checker’s requirement which could be avoided.
9.2.1-J Miscellany
This section represents a potpourri of minor enhancements which have been previously mentioned and
which would probably be worthwhile.
Just as the system notices when a recognizer is being defined, it could also notice when a destructor
function is introduced. Thus, user abstraction functions whose bodies are essentially nests of CAR and
CDR calls could be treated uniformly with CAR and CDR, which receive special treatment in some
corners of the algorithm. (See the discussion of "feedback" in Section 4.4.7.)
NTH is a Lisp function commonly used to access elements of lengthy lists. Unfortunately, the signature
derived for NTH by the inference algorithm allows virtually no useful information to appear in the result
type. This suggests a need for special treatment, as mentioned in the previous chapter. Calls to NTH,
where the numeric argument is a constant, could be expanded in a prepass algorithm to the proper nest of
calls to CAR and CDR, thus allowing the type variable mechanism to handle the transferrence of
information from argument to result. UPDATE-NTH could be treated similarly.
Occasionally, one would like to define a recognizer function which, for example, specifies a list of some
specific length N, but using the form (EQUAL (LENGTH FOO) N) disqualifies the function from
treatment as a recognizer. To preserve the functionality, the user must construct an ugly CONSP nest to
convey the same idea. This is another form which could perhaps be transformed in a prepass so that the
ugly nest could be constructed reliably and without imposition on the user.
The ARRAY1P test suite example in the previous chapter illustrated a case where the checker failed to
factor a segment with an *OR result type into separate segments for each disjunct. This resulted in a
failure of the segment containment step. A trivial patch would be to perform this factorization. To do so
with minimal loss of efficiency, we might perform this factorization only after containment actually
failed. This kind of factorization could possibly mitigate against the loss of functionality hypothesized
above as a result of efficiency improvements in the checker.
202 Type Checking for Common Lisp
Future Work and Conclusion
As *REC descriptors are currently defined, the top level form in the body may be a *CONS, as in the
descriptor:
(*REC PROPER-CONSP
(*CONS *UNIVERSAL (*OR $NIL (*RECUR PROPER-CONSP))))
In numerous places, extra case analysis is required to consider this kind of form, and clarity in the
discussion of *REC descriptors can be reduced. It might be worthwhile to require that the top level form
in a *REC descriptor be an *OR. All that would likely be necessary to accomplish this is a
canonicalization applied to newly formed *REC descriptors which would bring an outer *CONS out of
the *REC body. The PROPER-CONSP descriptor above would thus be transformed to:
(*CONS *UNIVERSAL
(*REC !REC1 (*OR $NIL (*CONS *UNIVERSAL (*RECUR !REC1)))))
A number of extensions to the base language would be relatively easy to implement using the techniques
in the current system.
As we have mentioned, support for global constants would be trivial to implement. In fact, constants are
currently supported in the implementation of the inference algorithm. As a matter of expediency, and
because there seemed to be little of formal interest in them, constants were not supported in the checker.
Support of constants was simply disabled at the top level loop.
9.2.2-B LET
Though not quite so simple, support for LET was initially provided in the inference algorithm, for the case
where the LET bindings contained no recursive calls to the function in which they appeared. The
treatment was simply to compute the type for each LET-bound variable and push it onto the type-alist for
consideration in the LET body. To simplify the task presented to the initial implementation of the
checker, we again suppressed this capability. This seemed in keeping with the notion that macros are not
supported. Nevertheless, LET seems like an entirely feasible endeavor, if restricted as above. A general
treatment of LET in the presence of recursion gets into the realm of mutually recursive functions, which
are discussed below.
Multiple-value functions can be quite useful in a purely applicative language, and there seems to be no
good reason why they could not be easily supported within the inference system. One way of doing so
would be to interpret VALUES as we would LIST and MULTIPLE-VALUE-BIND as we would a
destructuring LET. This would likely require little extension to the system other than to treat these forms
in a prepass.
A very interesting problem would be to handle type inference for functions whose definitions are mutually
recursive. This would require the nest of all such functions to be submitted as a collection. Intuitively,
we can imagine the fundamental approach being to extend the iterative stabilization strategy in the
Type Checking for Common Lisp 203
Future Work and Conclusion
following manner. Currently, we iterate over the body of a new function, with each iteration establishing
a new set of working segments, until the collection of working segments stabilizes. To handle mutual
recursion, we imagine starting the analysis by assigning an empty set of segments to each function,
making one pass over all the functions, replacing its segments with the new ones, and then repeating this
process until the working segments for all the functions have stabilized. The checker could then treat the
functions one at a time. But the proof of Lemma TC-SIGNATURE-OK would need to be modified to
support the inductive step of adding a whole collection of new signatures at once. Without having
devoted any serious thought to this prospect, we can only say it would require very careful formal
attention.
Extension to arrays and structures defined by defstruct may be problematic, since we assume our Lisp
subset is purely applicative. Common Lisp’s standard update mechanism for arrays and structures is
SETF, which is a destructive operation. Tracking the side effects of destructive operations is beyond the
current scope and spirit of this effort. A purely applicative model of arrays and structures, however,
would lend itself well to treatment within the inference system.
One purpose for which type inference has been employed in other systems is to supply inferred type
declarations to the compiler. The improvement of run-time speed of code generated with prolific type
declarations is the apparent raison d’etre for the TICL type inference system [Ma 90] and the Nimble
Type Inferencer [Baker 90].
The system described in this report could be easily modified to generate declarations in this style. A
simple mapping could be maintained in the database associating descriptors with their recognizer
functions. Whenever the type of any form is determined to correspond to some variable-free type
descriptor, a THE declaration [Steele 90] could be wrapped around the form. Similarly, DECLARE forms
could be inserted to correspond to information derived from guard expressions. Thus, a function
(DEFUN RAW-FN (X Y)
(DECLARE (XARGS :GUARD (IF (INTEGER-LISTP X) (INTEGERP Y) NIL)))
(IF (EQUAL Y 0)
(CAR X)
(RAW-FN (CDR X) (MINUS Y 1))))
If for some target compiler, an INTEGER-LISTP is not useful, any such declaration could be toned down
to (OR CONSP NULL).
204 Type Checking for Common Lisp
Future Work and Conclusion
While our system could likely produce improved efficiency from such an endeavor, the Nimble and TICL
systems already seem to be quite adept at producing declarations for the types of atomic objects and
support a richer collection of such types than we do. But these systems appear to provide little support for
structured objects, which is the forte of our system. One could easily imagine a partnership between some
such system and ours, whereby the one could produce type declarations for atomic objects, and our system
would be responsible for annotating forms representing structured objects and objects derived from
structures.
Perhaps the most fruitful potential placement of our type system would be to embed it within a more
general automated reasoning system for Lisp functions. In particular, if our inference tool could work in
tandem with a more general purpose theorem prover to perform tasks like guard verification, it is likely
that such problems could be factored so that the inference tool could handle type-specific questions, and
problems outside the type domain could be handled by the prover. Many typical problems presented to a
theorem prover for computable functions have type-related components, and the type inference system
could potentially operate as a specialized algorithm within a prover, just as a rewriter or a specialized
decision procedure might also operate. Tailored as it is to recursive functions, our type inference system
could be expected to make a significant contribution. On balance, we believe this approach could increase
the capability of a prover to resolve type-related problems without user intervention.
Conversely, access to assistance from a general theorem prover could allow the type system to solve a
greater number of problems in its own domain. If, for example, the prover could assist the type inference
system in verifying guards containing predicates which lie partly outside the type domain, we could
support much greater generality in the kinds of guards one could use and still receive certified signatures
from the system. The class of functions whose signatures could be certified sound by the checker would
grow, and the tool would be more useful as a result. In short, we believe there is fertile ground for
symbiosis between the type inference system and a general purpose prover.
Type Checking for Common Lisp 205
Signatures of Functions in the Initial State
Appendix A
Signatures of Functions in the Initial State
This appendix illustrates the function signatures for all the functions in the initial system state. Each
(*FREE-TYPE-VAR n) form is a placeholder form representing a type variable, which will be instantiated
with a fresh variable when the signature is invoked for use. Multiple occurrences within a segment of the
number associated with a *FREE-TYPE-VAR marker indicate that the same variable will appear in all
such positions.
Function: CONS
Guard computed by the tool:
(*UNIVERSAL *UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL *UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*FREE-TYPE-VAR 1.) (*FREE-TYPE-VAR 2.))
-> (*CONS (*FREE-TYPE-VAR 1.) (*FREE-TYPE-VAR 2.)))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: CAR
Guard computed by the tool:
((*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*CONS (*FREE-TYPE-VAR 1.) *UNIVERSAL)) -> (*FREE-TYPE-VAR 1.))
(($NIL) -> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: CDR
Guard computed by the tool:
((*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL)))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*CONS *UNIVERSAL (*FREE-TYPE-VAR 1.))) -> (*FREE-TYPE-VAR 1.))
(($NIL) -> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
206 Type Checking for Common Lisp
Signatures of Functions in the Initial State
Function: BINARY-+
Guard computed by the tool:
((*OR $INTEGER $NON-INTEGER-RATIONAL)
(*OR $INTEGER $NON-INTEGER-RATIONAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $INTEGER $NON-INTEGER-RATIONAL)
(*OR $INTEGER $NON-INTEGER-RATIONAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($INTEGER $INTEGER) -> $INTEGER)
(($INTEGER $NON-INTEGER-RATIONAL) -> $NON-INTEGER-RATIONAL)
(($NON-INTEGER-RATIONAL $INTEGER) -> $NON-INTEGER-RATIONAL)
(($NON-INTEGER-RATIONAL $NON-INTEGER-RATIONAL)
-> (*OR $INTEGER $NON-INTEGER-RATIONAL))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: UNARY--
Guard computed by the tool:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($INTEGER) -> $INTEGER)
(($NON-INTEGER-RATIONAL) -> $NON-INTEGER-RATIONAL)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: BINARY-*
Guard computed by the tool:
((*OR $INTEGER $NON-INTEGER-RATIONAL)
(*OR $INTEGER $NON-INTEGER-RATIONAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $INTEGER $NON-INTEGER-RATIONAL)
(*OR $INTEGER $NON-INTEGER-RATIONAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
((($INTEGER $INTEGER) -> $INTEGER)
(($INTEGER $NON-INTEGER-RATIONAL)
-> (*OR $INTEGER $NON-INTEGER-RATIONAL))
(($NON-INTEGER-RATIONAL $INTEGER)
-> (*OR $INTEGER $NON-INTEGER-RATIONAL))
(($NON-INTEGER-RATIONAL $NON-INTEGER-RATIONAL)
-> (*OR $INTEGER $NON-INTEGER-RATIONAL))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Type Checking for Common Lisp 207
Signatures of Functions in the Initial State
Function: UNARY-/
Guard computed by the tool:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $INTEGER $NON-INTEGER-RATIONAL))
-> (*OR $INTEGER $NON-INTEGER-RATIONAL))
TC segments contained in Segments: T
Recognizer descriptor:
NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: <
Guard computed by the tool:
((*OR $INTEGER $NON-INTEGER-RATIONAL)
(*OR $INTEGER $NON-INTEGER-RATIONAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $INTEGER $NON-INTEGER-RATIONAL)
(*OR $INTEGER $NON-INTEGER-RATIONAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $INTEGER $NON-INTEGER-RATIONAL)
(*OR $INTEGER $NON-INTEGER-RATIONAL))
-> (*OR $NIL $T))
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: EQUAL
Guard computed by the tool:
(*UNIVERSAL *UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL *UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($NIL $NIL) -> $T)
(($T $T) -> $T)
(((*FREE-TYPE-VAR 1.) (*FREE-TYPE-VAR 1.)) -> $T)
(($NIL
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
$NIL)
-> $NIL)
(($T
(*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
208 Type Checking for Common Lisp
Signatures of Functions in the Initial State
Function: CONSP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*CONS *UNIVERSAL *UNIVERSAL)) -> $T)
(((*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T))
-> $NIL))
TC segments contained in Segments: T
Recognizer descriptor: (*CONS *UNIVERSAL *UNIVERSAL)
TC validates recognizer: T
Signature is certified sound: T
Function: INTEGERP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($INTEGER) -> $T)
(((*OR $CHARACTER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: $INTEGER
TC validates recognizer: T
Signature is certified sound: T
Function: RATIONALP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $INTEGER $NON-INTEGER-RATIONAL)) -> $T)
(((*OR $CHARACTER $NIL $NON-T-NIL-SYMBOL $STRING $T
(*CONS *UNIVERSAL *UNIVERSAL)))
Type Checking for Common Lisp 209
Signatures of Functions in the Initial State
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: (*OR $INTEGER $NON-INTEGER-RATIONAL)
TC validates recognizer: T
Signature is certified sound: T
Function: STRINGP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($STRING) -> $T)
(((*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: $STRING
TC validates recognizer: T
Signature is certified sound: T
Function: CHARACTERP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($CHARACTER) -> $T)
(((*OR $INTEGER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: $CHARACTER
TC validates recognizer: T
Signature is certified sound: T
Function: SYMBOLP
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $NIL $NON-T-NIL-SYMBOL $T)) -> $T)
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $STRING
(*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor: (*OR $NIL $NON-T-NIL-SYMBOL $T)
TC validates recognizer: T
Signature is certified sound: T
210 Type Checking for Common Lisp
Signatures of Functions in the Initial State
Function: NULL
Guard computed by the tool:
(*UNIVERSAL)
Guard complete: T
All called functions complete: T
TC Guard:
(*UNIVERSAL)
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(($NIL) -> $T)
(((*OR $CHARACTER $INTEGER $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL)
TC segments contained in Segments: T
Recognizer descriptor:
$NIL
TC validates recognizer: T
Signature is certified sound: T
Function: DENOMINATOR
Guard computed by the tool:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $INTEGER $NON-INTEGER-RATIONAL)) -> $INTEGER)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: NUMERATOR
Guard computed by the tool:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $INTEGER $NON-INTEGER-RATIONAL))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $INTEGER $NON-INTEGER-RATIONAL)) -> $INTEGER)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
Function: SYMBOL-NAME
Guard computed by the tool:
((*OR $NIL $NON-T-NIL-SYMBOL $T))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $NIL $NON-T-NIL-SYMBOL $T))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
Type Checking for Common Lisp 211
Signatures of Functions in the Initial State
Function: SYMBOL-PACKAGE-NAME
Guard computed by the tool:
((*OR $NIL $NON-T-NIL-SYMBOL $T))
Guard complete: T
All called functions complete: T
TC Guard:
((*OR $NIL $NON-T-NIL-SYMBOL $T))
TC Guard complete: T
TC All called functions complete: T
TC Guard Replaced by Tool Guard: NIL
Segments:
(((*OR $NIL $NON-T-NIL-SYMBOL $T)) -> $STRING)
TC segments contained in Segments: T
Recognizer descriptor: NIL
TC validates recognizer: NIL
Signature is certified sound: T
212 Type Checking for Common Lisp
Proofs of Selected Lemmas
Type Checking for Common Lisp 213
Proofs of Selected Lemmas
Appendix B
Proofs of Selected Lemmas
TC-PREPASS normalizes IF expressions by ensuring that the test expressions return only T or NIL. In
any case where it cannot determine that an IF test evaluates only to T or NIL, it encapsulates the test form
in two calls to the NULL function. It also treats the special cases where the test is (or can be easily
reduced to) T or NIL, so that the entire IF form is reduced to its second or third argument, as appropriate.
The definition of TC-PREPASS and its subsidiary function TC-PREPASS-IF-PRED appears in Appendix
G.6.
The property we wish to prove is that for any non-negative integer clock, if FS is a valid representation of
our world of Lisp functions, evaluating (TC-PREPASS FORM FS) in any environment yields the same
result as evaluating FORM in the environment.
Intuitively, the argument is very simple. The TC-PREPASS algorithm performs only the two
transformations described above, and it performs them throughout the form. First let us consider the
transformation of wrapping two calls to NULL around an IF test. Since NULL is a SUBR, once the
argument is evaluated, checking the guard and returning a value require no clock time. So a BREAK-
OUT-OF-TIME is produced on E’s evaluation of a call to NULL if and only if a BREAK-OUT-OF-TIME
is produced while evaluating the parameter. Similarly for a BREAK-GUARD-VIOLATION, since the
guard to NULL is T. If evaluating an IF test does not yield a break, then the only significance of the result
is whether it is NIL or non-NIL.
(NULL (NULL NIL)) = (NULL T) = NIL
and
(NULL (NULL <any non-NIL value>)) = (NULL NIL) = T
Thus, the evaluator will receive the same direction from the test, regardless of whether it is encased in the
two calls to NULL.
The other transformations performed by TC-PREPASS are simplifications. In the case where the
(possibly transformed) IF test is NIL, the IF is replaced by its else arm, and where the IF test is T, the IF is
replaced by its then arm. Consider the case where the IF test is NIL. The argument for the T case will be
symmetric, with the only difference being consideration of the then arm rather than the else.
If CLOCK < 1, evaluation of either the original form or the transformed one produces a BREAK-OUT-
OF-TIME. Now consider the case where CLOCK ≥ 1. Evaluating NIL does not otherwise involve the
clock, and evaluating IF does not involve ticking the clock down on the recursive calls of E to evaluate the
test or either of the result arms. Furthermore, IF is lazy, in that it does not evaluate the arm which will not
contribute to the result value. Thus, the clock on entry to the transformed form (i.e., the else form), will
be the same as it would be on entry to the else form in the original IF, and therefore a form transformed in
this manner will evaluate to a BREAK-OUT-OF-TIME if and only if the original IF form evaluates to a
BREAK-OUT-OF-TIME. A similar argument holds for a guard violation, since there is no guard check
on an IF or on NIL. Since the value returned by an IF form whose test is NIL is just the value returned by
the else form, the value returned will be the same if the IF is replaced by its else form.
The preceding argument should suffice for a proof of TC-PREPASS-OK and spare the reader from
214 Type Checking for Common Lisp
Proofs of Selected Lemmas
delving into what lies below. But for the intrepid, the proofs below involve essentially an exhaustive
examination of cases arising from one-level expansions of the definitions of E, TC-PREPASS, and
TC-PREPASS-IF-PRED, down to recursive calls. The functions TC-PREPASS and TC-PREPASS-IF-
PRED (the latter specifically treats IF tests) are mutually recursive, and each is specified by its own
lemma. Hence, the proof of our evaluation property involves a proof of the conjunction of Lemma
TC-PREPASS-OK and Lemma TC-PREPASS-IF-PRED-OK. We present this composite proof as a proof
of each individual lemma under a joint computational induction scheme. The induction is on the number
of calls to TC-PREPASS and TC-PREPASS-IF-PRED necessary to return a result from the top level call
to TC-PREPASS. This will enable us to use each lemma as an inductive assertion on each subsidiary call.
Lemma TC-PREPASS-OK
For any list of function signatures fs, Lisp world world,
non-negative integer clock, Lisp form form, and binding
environment env,
Lemma TC-PREPASS-IF-PRED-OK
For any list of function signatures fs, Lisp world world, non-negative
integer clock, Lisp IF form test, and binding environment env,
First let us take care of the easy case where clock < 1. In this
case, by the definition of E, for any form, both
(break-out-of-timep form env world clock)
and
(break-out-of-timep (tc-prepass form fs) env world clock)
hold.
(E test env world clock) = (E (tc-prepass test fs) env world clock)
(E then-form env world clock)
= (E (tc-prepass then-form fs) env world clock)
(E else-form env world clock)
= (E (tc-prepass else-form fs) env world clock)
By definition of tc-prepass,
(tc-prepass (if test then-form else-form) fs)
=
(let ((prepassed-pred
(tc-prepass-if-pred (tc-prepass test fs) fs)))
(if (null prepassed-pred)
(tc-prepass else-form fs)
216 Type Checking for Common Lisp
Proofs of Selected Lemmas
(break-out-of-timep
(E (tc-prepass-if-pred (tc-prepass test fs) fs) env world clock))
We need to establish
(break-out-of-timep
(E (let ((prepassed-pred
(tc-prepass-if-pred (tc-prepass test fs) fs)))
(if (null prepassed-pred)
(tc-prepass else-form fs)
(if (equal prepassed-pred t)
(tc-prepass then-form fs)
(list ’if
prepassed-pred
(tc-prepass then-form fs)
(tc-prepass else-form fs)))))
env world clock))
Thus,
(tc-prepass (if test then-form else-form) fs)
= (tc-prepass else-form fs)
The argument goes just like that of Case 4.1, except that we do
not need to mention the possibility of the clock running down
in the final sub-case.
(and
H1 (valid-fs fs world clock)
H2 (E (tc-prepass test fs) env world clock) = nil
H3 (equal (E (tc-prepass-if-pred (tc-prepass test fs) fs)
env world clock)
nil) )
=>
(equal (E (if test then-form else-form) env world clock)
(E (tc-prepass (if test then-form else-form) fs)
env world clock))
By definition of E,
(E (if (tc-prepass-if-pred (tc-prepass test fs) fs)
(tc-prepass then-form fs)
(tc-prepass else-form fs))
env world clock)
= (if (not (equal (E (tc-prepass-if-pred (tc-prepass test fs) fs)
env world clock)
nil))
(E (tc-prepass then-form fs) env world clock)
(E (tc-prepass else-form fs) env world clock))
Case 4.4 (E test env world clock) = some non-nil Lisp value
First let us take care of the easy case where clock < 1. In this
case, by the definition of E, for any form, both
Type Checking for Common Lisp 219
Proofs of Selected Lemmas
By definition of td-prepass-if-pred,
(tc-prepass-if-pred form fs) = nil
Trivial.
By definition of tc-prepass-if-pred,
(tc-prepass-if-pred form fs) = t
By definition of E, (E form env world clock) = form
(E t env world clock) = t
Case 3: form is a cons whose car is the atom quote and whose cadr
is the atom nil.
By definition of tc-prepass-if-pred,
(tc-prepass-if-pred form fs) = nil
By definition of E,
(E (quote nil) env world clock) = nil
(E nil env world clock) = nil
Case 4: form is a cons whose car is the atom quote and whose cadr
is not nil.
By definition of tc-prepass-if-pred,
(tc-prepass-if-pred form fs) = t
By definition of E,
(E (quote form) env world clock) = form
(E t env world clock) = t
The null function returns T if its argument is nil, and nil if its
argument is anything but nil, so (null (null nil)) = nil.
Since (E nil env world clock) = nil, our goal is established.
By definition of E,
(E (if test then-form else-form) env world clock)
=
(if (equal (E test env world clock) nil)
(E else-form env world clock)
(E then-form env world clock))
(if (break-out-of-timep
(E (if test then-form else-form) env world clock))
(break-out-of-timep
(E (tc-prepass-if-pred (if test then-form else-form) fs)
env world clock))
(if (break-guard-violationp
(E (if test then-form else-form) env world clock))
(break-guard-violationp
(E (tc-prepass-if-pred (if test then-form else-form) fs)
env world clock))
(if (equal (E (if test then-form else-form) env world clock)
nil)
(equal
(E (tc-prepass-if-pred (if test then-form else-form) fs)
env world clock)
nil)
(equal
(E (tc-prepass-if-pred (if test then-form else-form) fs)
env world clock)
t))))
Case 6.3.4 (E else-form env world clock) some non-nil Lisp value v
By definition of E,
(E (if test then-form else-form) env world clock) = v (non-nil)
Using the inductive assumption of TC-PREPASS-IF-PRED-OK,
instantiated with form = else-form.
(equal (E (tc-prepass-if-pred else-form fs) env world clock) t)
and by definition of E
(E (tc-prepass-if-pred (if test then-form else-form) fs)
env world clock)
= t, establishing the goal for this case
Case 6.4 (E test env world clock) = some non-nil Lisp value v
As with Case 6.3, exchanging the role of nil and non-nil and using
the inductive assumption of TC-PREPASS-IF-PRED-OK, instantiated
with form = then-form rather than else-form.
By definition,
(td-prepass-if-pred (f form1 .. formn) fs)
=
if f can return only T or NIL (as determined in fs)
(f (tc-prepass form1 fs)
..
(tc-prepass formn fs))
else
(null (null (f (tc-prepass form1 fs)
..
(tc-prepass formn fs))))
The argument here is the same as in the previous case, but extended
as in Case 5 to drive E down through the calls to null. Since null
is a subr and its guard is T, evaluating (null (null x)) will cause
an out-of-time break if and only if evaluating x causes one, and
will cause a guard break if and only if evaluating x causes one.
QED.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Lemma RECOGNIZER-SEGMENTS-COMPLETE
(and
H1 (valid-fs fs world clock)
H2 (I arg-td v b)
H3 (contained-in-interface arg-td targ)
H4 (not (break-out-of-timep (Rworld,clock(v)))) )
=>
(equal (Rworld,clock(v)) t)
H9 (I targ v b)
Using H9’, we can rule out the second disjunct. This establishes
(and (I targ v b) (equal t (Rworld,clock(v)) b)),
the second conjunct of which is our goal. QED.
Lemma VALID-FS-CLOCK
By induction on clock.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Lemma TC-INFER-OK
For any Lisp form form, function signature list fs, Lisp world world
including all functions hereditarily in the call tree of form,
for any non-negative integer clock, type variable bindings b,
Lisp values arg1 .. argm,
Lisp variables a1 .. am,
binding environment env of the form ((a1 . arg1) .. (am . argm))
where a1 .. am include all the free variables in form,
ABS-ALIST of the form ((a1 . tda1) .. (am . tdam)),
CONC-ALIST of the form ((a1 . tdc1) .. (am . tdcm)),
and denoting
(tc-infer form abs-alist conc-alist fs)
by
(((mintd1,1 .. mintd1,m mintd1)
((a1 . tdconc1,1) .. (am . tdconc1,m))
(maxtd1,1 .. maxtd1,m maxtd1))
..
((mintdn,1 .. mintdn,m mintdn)
((a1 . tdconcn,1) .. (am . tdconcn,m))
(maxtdn,1 .. maxtdn,m maxtdn)))
Note:
H1 establishes that the signatures in the system state fs are valid.
H2 establishes that the abs-arglist is valid.
Type Checking for Common Lisp 227
Proofs of Selected Lemmas
Note that the second conjunct of the conclusion prescribes that the
binding b’ is to cover exactly the collection of variables in the
descriptors under the quantifier. Any binding which happens to
contain excess elements can be trimmed as necessary to suit the
exact coverage requirement. This is trivially obvious from the
definition of I.
Let us denote (E form env world clock) with the abbreviation form-val.
The descriptors returned for the form ai are thus the ones associated
with ai in the two alists. We can use b as the binding we
need in the conclusion. Thus, our goal is:
Lemma DESCRIPTOR-FROM-QUOTE-OK:
For any Lisp form, binding environment env, Lisp world, type variable
binding b, and non-negative integer clock,
fs)
= ((((mintd1,1 .. mintd1,m) -> mintd1)
((a1 . tdconc1,1) .. (am . tdconc1,m))
((maxtd1,1 .. maxtd1,m) -> maxtd1))
..
(((mintdn,1 .. mintdn,m) -> mintdn)
((a1 . tdconcn,1) .. (am . tdconcn,m))
((maxtdn,1 .. maxtdn,m) -> maxtdn)))
(tc-infer test-form
((a1 . tda1) .. (am . tdam))
((a1 . tdc1) .. (am . tdcm))
fs)
= ((((test-mintd1,1 .. test-mintd1,m) -> test-mintd1)
((a1 . test-tdconc1,1) .. (am . test-tdconc1,m))
((test-maxtd1,1 .. test-maxtd1,m) -> test-maxtd1))
..
(((test-mintdl,1 .. test-mintdl,m) -> test-mintdl)
((a1 . test-tdconcl,1) .. (am . test-tdconcl,m))
((test-maxtdl,1 .. test-maxtdl,m) -> test-maxtdl)))
(tc-infer then-form
((a1 . tda1) .. (am . tdam))
((a1 . test-maxtdi,1) .. (am . test-maxtdi,m))
fs)
= ((((then-mintdi,1,1 .. then-mintdi,1,m) -> then-mintdi,1)
((a1 . then-tdconci,1,1) .. (am . then-tdconci,1,m))
((then-maxtdi,1,1 .. then-maxtdi,1,m) -> then-maxtdi,1))
..
(((then-mintdi,o ,1 .. then-mintdi,o ,m) -> then-mintdi,o )
i i i
((a1 . then-tdconci,o ) .. (am . then-tdconci,o ))
i,1 i,m
((then-maxtdi,o .. then-maxtdi,o ) -> then-maxtdi,o )))
i,1 i,m i
(tc-infer else-form
((a1 . tda1) .. (am . tdam))
((a1 . test-maxtdi,1) .. (am . test-maxtdi,m))
fs)
= ((((else-mintdi,1,1 .. else-mintdi,1,m) -> else-mintdi,1)
((a1 . else-tdconci,1,1) .. (am . else-tdconci,1,m))
((else-maxtdi,1,1 .. else-maxtdi,1,m) -> else-maxtdi,1))
..
(((else-mintdi,p ,1 .. else-mintdi,p ,m) -> else-mintdi,p )
i i i
((a1 . else-tdconci,p ) .. (am . else-tdconci,p ))
i,1 i,m
((else-maxtdi,p .. else-maxtdi,p ) -> else-maxtdi,p )))
i,1 i,m i
We note also that the result of
(tc-infer (IF test-form then-form else-form)
((a1 . tda1) .. (am . tdam))
((a1 . tdc1) .. (am . tdcm))
fs)
is by definition a subsequence of the append of the results
over all i for which we call
(tc-infer then-form
((a1 . tda1) .. (am . tdam))
((a1 . test-maxtdi,1) .. (am . test-maxtdi,m))
fs)
and
(tc-infer else-form
((a1 . tda1) .. (am . tdam))
230 Type Checking for Common Lisp
Proofs of Selected Lemmas
(We shall see later that we do some filtering of cases such that
we do not necessarily call tc-infer on both the then-form and the
else-form for any given i.)
(test-mintdi,1 .. test-mintdi,m)
(test-tdconci,1 .. test-tdconci,m)
(test-maxtdi,1 .. test-maxtdi,m)
Lemma DUNIFY-DESCRIPTORS-INTERFACE-OK
For any descriptors tda and tdb, Lisp value v and fully
instantiating binding of type variables to Lisp values b,
(and (I tda v b)
(I tdb v b))
=>
(I (dunify-descriptors-interface tda tdb) v b)
is true. This being the case, we need not consider this tuple
in analyzing then-form, since there is no way this tuple can
signify a context which can result in a non-NIL value for the
test form. Removing it from the pool will not affect the
truth of H8.
(tc-infer then-form
((a1 . tda1) .. (am . tdam))
((a1 . test-maxtdi,1) .. (am . test-maxtdi,m))
fs)
H9 (and
(not (break-guard-violationp (E then-form env world clock)))
for all i such that the ith tuple survived the screen,
for some j in [1..oi],
for some bi,j covering the following descriptors,
(and (I (then-mintdi,j,1 .. then-mintdi,j,m then-mintdi,j)
(arg1 .. argm (E then-form env world clock))
bi,j)
(I (then-maxtdi,j,1 .. then-maxtdi,j,m then-maxtdi,j)
(arg1 .. argm (E then-form env world clock))
bi,j)
(I (then-tdconci,j,1 .. then-tdconci,j,m)
(arg1 .. argm)
bi,j)
(extends-binding bi,j bi)))
The argument is exactly analogous to Case 3.1, using $NIL for the
screen and using a recursive call on else-form to compute the
tuples which guarantee the result.
Type Checking for Common Lisp 233
Proofs of Selected Lemmas
(tc-infer formi
((a1 . tda1) .. (am . tdam))
((a1 . tdc1) .. (am . tdcm))
fs)
=
((((mintdi,1,1 .. mintdi,1,m) -> mintdi,1)
((a1 . tdconci,1,1) .. (am . tdconci,1,m))
((maxtdi,1,1 .. maxtdi,1,m) -> maxtdi,1))
..
(((mintdi,l ,1 .. mintdi,l ,m) -> mintdi,l )
i i i
((a1 . tdconci,l ) .. (am . tdconci,l ))
i,1 i,m
((maxtdi,l .. maxtdi,l ) -> maxtdi,l )))
i,1 i,m i
I.e., there are li different 3-tuples generated for the ith
argument.
(and
(tc-all-called-functions-complete guard-form fs)
(tc-all-called-functions-complete body fs)
(not (break-out-of-timep
(fworld,clock
((E form1 env world clock) .. (E formo env world clock)))))
(not (null (E guard-form
((v1 . (E form1 env world clock))
..
(vn . (E formo env world clock)))
world clock))) )
=>
(and
C1 (not (break-guard-violationp
(fworld,clock
((E form1 env world clock) .. (E formo env world clock)))))
C2 for some k in [1..p]
for some binding b of type variables to Lisp values
covering stdk,1 .. stdk,o and stdk
(I (stdk,1 .. stdk,o stdk)
((E form1 env world clock) .. (E formo env world clock)
(fworld,clock
((E form1 env world clock) .. (E formo env world clock))))
b) )
Notice that new variables are introduced to our problem state with
the use of this signature. By definition, when TC-INFER uses a
signature from fs, it replaces all the variables in the stored
representation of the signature with "fresh" or previously unused
variables. Thus, for any two triples returned by TC-INFER, the
variables they share are limited to those in the argument list of
the call.
Expand the definition of I and take the last conjunct of the result,
Thus,
m is the number of identifiers in the context, indexed by g
o is the number of parameters in the call to f, indexed by i
li is the number of tuples generated for each actual i, indexed by j
p is the number of segments in the signature for f, indexed by k
n is the number of tuples for (f form1 .. formo), indexed by s
q is the number of elements in the cross product (see below), indexed
by h
(and
H1 (valid-fs fs world clock)
H2 (I (tda1 .. tdam) (arg1 .. argm) b)
H3 (I (tdc1 .. tdcm) (arg1 .. argm) b)
H4 (not (equal (tc-infer form abs-alist conc-alist fs)
*guard-violation))
H5 (tc-all-called-functions-complete form fs)
H6 (not (break-out-of-timep (E form env world clock)))
H7 for each i in 1..o,
(and
(not (break-guard-violationp (E formi env world clock)))
for some j in [1..li] (the number of tuples for this actual),
there exists a binding bi,j covering the descriptors below
(and (I (mintdi,j,1 .. mintdi,j,m mintdi,j)
Type Checking for Common Lisp 237
Proofs of Selected Lemmas
H11 (and
(not (break-guard-violationp
(fworld,clock
((E form1 env world clock) .. (E formo env world clock)))))
for some k in [1..p],
for some binding b’ covering stdk,1 .. stdk,o and stdk,
(I (stdk,1 .. stdk,o stdk)
((E form1 env world clock) .. (E formo env world clock)
(fworld,clock
((E form1 env world clock) .. (E formo env world clock))))
b’) ) )
=>
(and
C1 (not (break-guard-violationp (E form env world clock)))
for some s in [1..n],
for some binding b’’ covering the descriptors below,
C2 (and (I (mintds,1 .. mintds,m mintds)
(arg1 .. argm form-val)
b’’)
C3 (I (maxtds,1 .. maxtds,m maxtds)
(arg1 .. argm form-val)
b’’)
C4 (I (tdconcs,1 .. tdconcs,m) (arg1 .. argm) b’’)
C5 (extends-binding b’’ b)) )
(H9 and H10 are no longer useful and have been discarded.)
The first conjunct of H7 for each formi and the first conjunct
of H11 establishes our goal C1, which says that the entire form will
evaluate without guard violation.
bi,j )
i
(I (tdconci,j .. tdconci,j )
i,1 i,m
(arg1 .. argm)
bi,j )
i
(extends-binding bi,j b))
i
Now, merge all the bi,j , just as we did above. The same
i
argument as used above, regarding the mergeability of these bindings
and their binding extension properties, establishes
(tc-infer formi
((a1 . tda1) .. (am . tdam))
((a1 . tdc1) .. (am . tdcm))
fs)
=
((((mintdi,1,1 .. mintdi,1,m) -> mintdi,1)
Type Checking for Common Lisp 239
Proofs of Selected Lemmas
Lemma TC-MAKE-ARG-CROSS-PRODUCT-OK
For any mintd, maxtd, and tdconc descriptors, type variable binding b,
non-negative integers l1 .. ln, m, and o, Lisp values arg1 .. argm,
and Lisp values actual1 .. actualo,
form-val)
(merge-bindings* b1,j .. bo,j ))
1 o
(I (cptdconch,1 .. cptdconch,m)
(arg1 .. argm)
(merge-bindings* b1,j .. bo,j )))
1 o
For f’s signature segments, represented in H11, what is missing in this
matchup are descriptors for each of the objects in the calling context.
These will each be represented also with *universal, giving us a
modified hypothesis:
These mintd’s are the descriptors which will finally be a part of the
result returned by TC-INFER. In particular, in each 3-tuple of the
result, the minimal segment will be extracted from one of these
unified results. The appended results of all the pairwise
unifications will form the entire collection of minimal segments.
(I (cptdconch,1 .. cptdconch,m)
(arg1 .. argm)
(merge-bindings (merge-bindings* b1,j .. bo,j ) b’))
1 o
which was established in H14, is preserved in the conclusion, where s
is the index rather than h. This establishes C4. The cptdconc’s are
the tdconc’s of our ultimate result.
The maximal segments for each tuple are computed by combining via
unification of the descriptors in each minimal segment/conc-alist
pair. C4 represents what we know of the conc-alist paired with
each minimal segment. Again, since the CONC-ALIST gives only
descriptors for the types of the variables in the environment and
says nothing about the type of the result of the function call (i.e.,
the conc-alist is n-ary and the minimal segment is has arity n+1), we
need to pad the descriptor lists associated with the CONC-ALIST before
unifying it with our minimal segments. We simply add a *universal to
correspond to the result type. Clearly, the result value satisfies
*universal. Thus, for each s, we perform the unification:
(dunify-descriptors-interface
(*dlist tdconcs,1 .. tdconcs,m *universal)
(*dlist mintds,1 .. mintds,m mintds))
=
((*dlist maxtds,1,1 .. maxtds,1,m maxtds,1)
..
(*dlist maxtds,x,1 .. maxtds,x,m maxtdx,s))
I.e., for any given minimal segment, unifying against the CONC-ALIST
may produce multiple results, and we make a list of them. Once again,
we invoke DUNIFY-DESCRIPTORS-INTERFACE-OK, instantiated for each
combination:
For any Lisp form form, binding environment env, world world, and
non-negative integer clock clock,
IS-QUOTED-FORM returns T if its argument is a self-evaluating literal (T, NIL, a character, an integer, a
rational, a string, or a keyword) or an explicitly quoted form. DESCRIPTOR-FROM-QUOTE derives a
descriptor for any such form.
For any Lisp form form, binding environment env, world world, and
non-negative integer clock clock,
(clock ≠ 0)
=>
(I (descriptor-from-quoted-form form)
(E (quote form) env world clock)
b)
Proof of DESCRIPTOR-FROM-QUOTE-OK
By the definition of E,
(E form env world clock) = (E (quote form) env world clock)
Using this substitution, the result follows directly from
Lemma DESCRIPTOR-FROM-QUOTED-FORM-OK, instantiated with
form = form.
Proof of DESCRIPTOR-FROM-QUOTED-FORM-OK
By induction on the cons structure of forms.
By definition,
By the definition of E,
We wish to prove
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Lemma TC-MAKE-ARG-CROSS-PRODUCT-OK
246 Type Checking for Common Lisp
Proofs of Selected Lemmas
For any mintd, maxtd, and tdconc descriptors, type variable binding b,
non-negative integers l1 .. ln, m, and o, Lisp values arg1 .. argm,
and Lisp values actual1 .. actualo,
Recall that in the setting for TC-MAKE-ARG-CROSS-PRODUCT, each list of triples in its argument is
the result of the call to TC-INFER on one of the actual parameters in our function call. Each element in
the value of the call to TC-MAKE-ARG-CROSS-PRODUCT is the result of combining one element from
each collection of segments corresponding to the actual parameters. Thus, each is a combination of m
segments. The combinator formats the segments to align descriptors for matching values and calls
DUNIFY-DESCRIPTORS-INTERFACE.
Proof of TC-MAKE-ARG-CROSS-PRODUCT-OK
(and
H1 for all i in [1..r+1],
for some j in [1..li],
(I (mintdi,j,1 .. mintdi,j,m mintdi,j tdconci,j,1 .. tdconci,j,m)
(arg1 .. argm actuali arg1 .. argm)
b)
H2 for some h,
(I (semi-cptdah,1 .. semi-cptdah,m
semi-cptdrh,1 .. semi-cptdrh,r
semi-cptdconch,1 .. semi-cptdconch,m)
(arg1 .. argm actual1 .. actualr arg1 .. argm)
b) )
=>
for some h in [1..q],
(I (cptdah,1 .. cptdah,m
cptdrh,1 .. cptdrh,r+1
cptdconch,1 .. cptdconch,m)
(arg1 .. argm actual1 .. actualo arg1 .. argm)
b)
(semi-cptdah,1 .. semi-cptdah,m
semi-cptdrh,1 .. semi-cptdrh,r
semi-cptdconch,1 .. semi-cptdconch,m)
(semi-cptdah,1 .. semi-cptdah,m
semi-cptdrh,1 .. semi-cptdrh,r
semi-cptdconch,1 .. semi-cptdconch,m)
(semi-cptdah,1 .. semi-cptdah,m
semi-cptdrh,1 .. semi-cptdrh,r
*universal
semi-cptdconch,1 .. semi-cptdconch,m)
(I (semi-cptdah,1 .. semi-cptdah,m
semi-cptdrh,1 .. semi-cptdrh,r
semi-cptdconch,1 .. semi-cptdconch,m)
(arg1 .. argm actual1 .. actualr arg1 .. argm)
b)
=>
(I (semi-cptdah,1 .. semi-cptdah,m
semi-cptdrh,1 .. semi-cptdrh,r
*universal
semi-cptdconch,1 .. semi-cptdconch,m))
(arg1 .. argm actual1 .. actualr actualr+1 arg1 .. argm)
b)
Similarly,
(mintdr+1,j,1 .. mintdr+1,j,m tdrr+1,j tdconcr+1,j,1 .. tdconcr+1,j,m)
(from H1 instantiated with i = r+1) is padded out with a
*universal for each parameter 1..r, giving
(mintdr+1,j,1 .. mintdr+1,j,m
*universal1 .. *universali
tdrr+1,j
tdconcr+1,j,1 .. tdconcr+1,j,m),
tdconcr+1,j,1 .. tdconcr+1,j,m)
(arg1 .. argm actual1 .. actualr+1 arg1 .. argm)
b))
=>
(I (cptda1 .. cptdam cptdr1 .. cptdrr+1 cptdconc1 .. cptdconcm)
(arg1 .. argm actual1 .. actualr+1 arg1 .. argm)
b)
In this case, the same deduction is true of the *or form as was true
of the simple *dlist from Case 3.1. But we split the result into
separate elements in the cross product, where each element is a
disjunct of the *or. Thus, Lemma DUNIFY-DESCRIPTORS-INTERFACE-OK
gives
Here are some lemmas about these functions which are used in the proof of Lemma TC-INFER-OK.
Lemma EXTENDS-BINDING-MONOTONIC
For any descriptors td, Lisp value v, and bindings b and b’,
where b covers the type variables in td,
(and (I td v b)
(extends-binding b’ b))
=>
(I td v b’)
Lemma MERGE-BINDINGS-EXTENDS-BINDINGS
(mergeable-bindings b1 b2)
=>
(and (extends-binding (merge-bindings b1 b2) b1)
(extends-binding (merge-bindings b1 b2) b2))
Lemma EXTENDS-BINDING-TRANSITIVE
This section is a collection of detailed proofs of some significant lemmas which contributed to the proof
of DUNIFY-DESCRIPTORS-INTERFACE-OK (see Section 7.6). The last subsection includes proofs for
some of the rules with which DUNIFY-DESCRIPTORS handles the case where both arguments are *REC
descriptors. The rules are stated in Section 6.6.3.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Lemma DAPPLY-SUBST-LIST-OK
For any well-formed substitution s, descriptor td, binding b,
and value v, (where td can be a *dlist containing n descriptors
iff v is a value list of length n, in which case the call to I
is on the list of n descriptors)
A discussion of well-formed substitutions appears in Section 6.6.3, and the definition of the function
WELL-FORMED-SUBSTS, which implements the well-formedness test, is in Appendix G.8. Essentially,
each substitution element which maps a variable to another variable must be downward directed in its
mapping with respect to a lexical ordering for variable names. Also, no circularities may exist within the
substitution.
We will factor the proof, using the following lemma, which characterizes DAPPLY-SUBST-LIST-1
rather than DAPPLY-SUBST-LIST.
Lemma DAPPLY-SUBST-LIST-OK-1
For any well-formed substitution s, descriptor td, binding b,
and value v, (where td can be a *dlist containing n descriptors
iff v is a value list of length n, in which case the call to I
is on the list of n descriptors)
(dapply-subst-list s td)
=
(dapply-subst-list (dapply-subst-list-1 s td) td)
(I (dapply-subst-list-1 s td) v b)
Our goal is
(and
H1 (interp-substs (s1 .. sn+1) b)
H2 (I td v b)
H3 (and (interp-substs (s1 .. sn) b) (I td v b))
=>
(I (dapply-subst-list-1 (s1 .. sn) td) v b) )
=>
(I (dapply-subst-list-1 (s1 .. sn+1) td) v b)
(and
H1 (interp-substs (s1 .. sn+1) b)
H2 (I td v b)
H3 (I (dapply-subst-list-1 (s1 .. sn) td) v b) )
=>
(I (dapply-subst-list-1 (s1 .. sn+1) td) v b)
Case 2.1 &i does not appear in (dapply-subst-list-1 (s1 .. sn) td)
Trivial. By definition,
Lemma APPLY-SUBST-OK
(and
H1 (I td v b)
H2 (I tdi (cdr (assoc &i b)) b) )
=>
(I (subst tdi &i td) v b)
(subst tdi &i td) = td, equating the conclusion with the first
hypothesis.
(and
H1 (I td v b)
H2 (I tdi v b)
H3 (I (subst tdi &i tdcar) (car v) b)
H4 (I (subst tdi &i tdcdr) (cdr v) b) )
=>
(I (subst tdi &i td) v b)
(and (consp v)
(I (subst tdi &i tdcar) (car v) b)
(I (subst tdi &i tdcdr) (cdr v) b))
The expansion of H1 gives (consp v), and the other two conjuncts
are from our inductive hypotheses.
H1 expands to
(and
H1 (I (open-rec-descriptor-absolute td) v b)
H2 (I tdi (cdr (assoc &i b)) b) )
=>
(I (subst tdi &i (open-rec-descriptor-absolute td)) v b)
QED.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Lemma DUNIFY-DESCRIPTORS-OK
Denoting (dunify-descriptors tda tdb substs)
by ((td1 . substs1) .. (tdn . substsn)),
for all v and fully instantiating b,
256 Type Checking for Common Lisp
Proofs of Selected Lemmas
(and
H1 (interp-substs substs b)
H2 (I tda v b)
H3 (I tdb v b))
=>
for some i,
(and (interp-substs substsi b)
(I tdi v b))
For brevity in the exposition of the proof, we will abbreviate SUBSTS with "s", and INTERP-SUBSTS
with "I-S".
Note that this lemma is part of the large nest of unification lemmas which are being proved together by
computational induction. These lemmas include DUNIFY-DESCRIPTORS-INTERFACE-OK, DUNIFY-
DESCRIPTORS-OK, DMERGE-OK, DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST-OK,
DMERGE-SECOND-ORDER-OK, DMERGE-DUNIFIED-FORMS-INTO-SUBSTS-OK, and the various
lemmas which are associated with the special case rules for unifying pairs of *REC descriptors. Within
this collection of proofs, we may use any of these lemmas as an inductive assumption to characterize a
subsidiary call of a function in question. A discussion of the induction appears in Section 7.6.
Proof of Lemma DUNIFY-DESCRIPTORS-OK
The algorithm and the proof are somewhat symmetric in their case
analysis, in the following sense. Whenever we consider a top
level case where tda is some particular kind of descriptor (for
example, Case 2 is where tda is *empty), the next top level case
is where we consider tdb to be the same kind of descriptor. The
only assymetry is that in any tdb case, we know that tda is not
the same kind of descriptor, since we just considered that case
immediately before, and can therefore assume its negation. Since
the proofs of the tdb cases will follow the same line of reasoning
as the tda case preceding it, we generally omit the argument to
save from being tedious.
For each j,
(and (I-S s b) (I tdaj v b) (I tdb v b))
=>
For some i in [1..mj]
(and (I-S sj,i b) (I tdajbi v b))
H2 expands by definition of I to
Lemma DMERGE-OK
H1 (and (I-S s b)
H2 (I &i v b)
H3 (I td v b))
=> for some k in [1..n], (I-S sk b)
C2 (I &i v b))
The proof is analogous to Case 5.1, with &i and &j switching places.
Lemma UNIFY-VAR-EMBEDDED-VAR
(I &i v b)
=>
(not (I (*cons .. &i ..) v b))
36
Examples:
(*rec foo (*or &i (*cons $t $nil) (*cons *universal (*recur foo))))
(*rec bim (*or (*rec bar (*or &i (*cons $integer (*recur bar))))
(*cons $character (*recur bim))))
37
Case 5.4 tdb is a *rec descriptor in which some variable other than
&i can be a top level disjunct.
By definition,
(dunify-descriptors &i tdb s)
= (dunify-descriptors &i (open-rec-descriptor-absolute tdb)38 s)
Our theorem is
H1 (and (I-S s b)
H2 (I &i v b)
H3 (I tdb v b))
=>
For some k,
C1 (and (I-S sk b)
C2 (I &i v b))
38
OPEN-REC-DESCRIPTOR-ABSOLUTE returns the body (or CADDR) of tdb with the *RECUR form for the *REC replaced by
tdb.
260 Type Checking for Common Lisp
Proofs of Selected Lemmas
will always produce a result, and the proof of soundness for this
case is therefore the conjoined proofs for all the rules. The
rules are named Rule Dunify-*Reci for some integer i, and are
stated and proved in Appendix B.7.
Choice 6.2.1:
Apply Canonicalization Rule 2 (which opens *rec descriptors) to
tda. Thus, (dunify-descriptors tda tdb s) =
(dunify-descriptors (open-rec-descriptor-absolute tda) tdb s)
By definition,
(dunify-descriptors (*cons tdacar tdacdr) tdb s) = ()
The only extra baggage is the consp test, which is trivially true.
Type Checking for Common Lisp 261
Proofs of Selected Lemmas
Comment: *dlist forms are never nested within any other descriptors.
Furthermore, they are only unified with other *dlists of the same
length.
We denote that
(dunify-descriptors
(*dlist tda1 .. tdan) (*dlist tdb1 .. tdbn) s)
= (((*dlist tdab1,1 .. tdabn,1) . s1)
..
((*dlist tdab1,m .. tdabn,m) . sm))}
Base case: n = 0
By definition,
(dunify-descriptors (*dlist) (*dlist) s) = (((*dlist) . s))
Let us denote
(dunify-descriptors
(*dlist tda1 .. tdan−1) (*dlist tdb1 .. tdbn−1) s)
= (((*dlist tdab1,1 .. tdabn−1,1) . sn−1,1)
..
((*dlist tdab1,p .. tdabn−1,p) . sn−1,p))}
(dunify-descriptors
(*dlist tda1 .. tdan−1) (*dlist tdb1 .. tdbn−1) s)
gives us
H1, H2, and H3 relieve the antecedents for H4, since the expansions
of I and I-S in H1, H2, and H3 establish a superset of the predicates
produced by the expansions of those calls in the hypotheses for H4.
Thus, we establish the conclusion in H4.
Finally, the second conjunct in the conclusion of H4, plus the second
conjunct of the conclusion of H5 establish the second conjunct of
our goal, and for that same j and k, the first conjunct of H5
establishes the first conjunct of the goal, and we are done.
Case 9.2 tdb = (*cons tdcar tdcdr) for any tdcar and tdcdr.
(dunify-descriptors $integer tdb s) = ()
Trivially true, as with Case 7.1.
QED.
Type Checking for Common Lisp 263
Proofs of Selected Lemmas
Note that DMERGE-OK and several other lemmas proved in this section are part of the large nest of
unification lemmas which are being proved together by computational induction. These lemmas include
DUNIFY-DESCRIPTORS-INTERFACE-OK, DUNIFY-DESCRIPTORS-OK, DMERGE-OK,
DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST-OK, DMERGE-SECOND-ORDER-OK,
DMERGE-DUNIFIED-FORMS-INTO-SUBSTS-OK, and the various lemmas which are associated with
the special case rules for unifying pairs of *REC descriptors. Within this collection of proofs, we may use
any of these lemmas as an inductive assumption to characterize a subsidiary call of a function in question.
A discussion of the induction appears in Section 7.6.
H1 (and (I-S s b)
H2 (I &i v b)
H3 (I td v b))
=> for some k in [1..n], (I-S sk b)
To prove this lemma, we will need some lemmas about subsidiary functions within DMERGE-NEW-
SUBST. DINSERT-SUBST simply inserts a new substitution element for &i into a substitution in which
&i is not already mapped.
Lemma DINSERT-SUBST-OK
H1 (and (I-S s b)
H2 (I &i v b)
H3 (I td v b)
=> (I-S (dinsert-subst &i td s) b)
Lemma DREPLACE-SUBST-OK
For any type variable &i, substitution s with &i in its domain,
descriptor td, binding b, and Lisp value v,
H1 (and (I-S s b)
H2 (I &i v b)
H3 (I td v b)
=> (I-S (dreplace-subst &i td s) b)
Lemma DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST-OK
(and
H1 (I-S dus b)
H2 for some k in [1..o], (I-S sk b) )
=> for some i in [1..n], (I-S smi b)
Proof of DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST-OK
By definition,
(DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST &i (td . s) (s1 .. so))
is the append of the results of
(DMERGE-NEW-SUBST &i (cdr (assoc &i s)) sj) for each j in 1..o.
So we will also want to use an inductive hypothesis on
DMERGE-NEW-SUBST, which is Lemma DMERGE-OK:
Since we are trying to prove our lemma for the case where o = m + 1,
H2 gives us:
Since these substs are part of the result, one of them will
satisfy the conclusion. QED
Lemma DMERGE-SECOND-ORDER-OK
Type Checking for Common Lisp 265
Proofs of Selected Lemmas
Proof of DMERGE-SECOND-ORDER-OK
We will induct on p. If p = 0, the result (sm1 .. smn)
is equal to (s1 .. sm). Thus, H2 equates with the conclusion.
(DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST
&j+1 (dutd . dus) (smj1 .. smjo)),
giving us
Lemma DMERGE-DUNIFIED-FORMS-INTO-SUBSTS-OK
39
This is a notational convenience which may be somewhat misleading. These variables need not be contiguously named, nor is &1
required to be among them. We mean merely to say that there is a list of variable names of length p, each of which is in the domain
of dus.
40
It is known that the bindings of those variables are different from the ones in the s1 .. sm.
266 Type Checking for Common Lisp
Proofs of Selected Lemmas
H1 (and (I-S s b)
H2 for some j in [1..m], (and (I-S ssj b) (I tdsj v b))
H3 (I &i v b))
=> for some k in [1..n] (I-S sk b)
Proof of DMERGE-DUNIFIED-FORMS-INTO-SUBSTS-OK
Case 1. ssj contains no substs (other than for &i) which differ
from s.
Case 2. ssj contains substs (other than for &i) which differ from s.
By definition,
(I &i v b)
(I tdsj v b)
=> (I-S (dreplace-subst &i tdsj msol) b)
H1 (and (I-S s b)
H2 (I &i v b)
H3 (I td v b))
=> for some k in [1..n], (I-S sk b)
(dmerge-dunified-forms-into-substs
((tdsi1 . ssi1) .. (tdsim . ssim)) &i s)
= (s1 .. sn)
This section contains the proofs of some, but not all of special-case rules employed by DUNIFY-
DESCRIPTORS to handle the situations where both its DESCRIPTOR1 and DESCRIPTOR2 arguments
are *REC descriptors. The rules are described and listed in Section 6.6.3. Each rule corresponds with a
lemma which validates the application of the rule. The collected proofs of these lemmas are what satisfy
Case 6.1 of the proof of DUNIFY-DESCRIPTORS-OK.
For selected rules, we state and prove the associated lemmas here. In many respects, the rules are very
similar. Because of this similarity, the proofs of the corresponding lemmas are very similar in style and
technique. Rather than burdening this document with the exhaustive proofs of all the rules, we present
selected examples which are sufficient to show the proof techniques to provide convincing evidence that
the other rules can be proven sound by similar arguments. The rules proved include some of the simplest
and the most challenging, since the simple ones exemplify the proof style most succinctly, and the
difficult ones bear the most scrutiny. The rule base is intended to be extendable, so other rules could be
added, and their proofs would almost certainly follow the models given here.
Recall, as discussed in Section B.5, that each of these lemmas is included in the large nest of lemmas
about mutually recursive functions in the unifier. The proof of this collection of lemmas is by a grand
computational induction on the length of the unification computation. Any lemma in the nest may be used
as an inductive assertion corresponding to the result of any subsidiary call in the algorithm. The *REC
rules typically include recursive calls to DUNIFY-DESCRIPTORS-INTERFACE in cases where the
arguments are known to be variable-free, or to DUNIFY-DESCRIPTORS for arguments which could
contain variables. In the first case, we use DUNIFY-DESCRIPTORS-INTERFACE-OK as the inductive
assertion, in the latter we use DUNIFY-DESCRIPTORS-OK. As with the other lemmas in the nest, the
*REC rule lemmas are partial correctness results, since we have not give a termination proof.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value. Also, we abbreviate INTERP-SUBSTS with "I-S". Furthermore, each rule is stated
in terms of named *REC descriptors, with names like "foo", "bar", and "bim". In each case, after having
introduced these descriptors, we refer to them simply by name. Finally, arbitrary component descriptors
of the *RECs are typically named di, where i is some integer, and we use dij to signify the result returned
by (DUNIFY-DESCRIPTORS-INTERFACE di dj).
Type Checking for Common Lisp 269
Proofs of Selected Lemmas
This is a very simple *REC rule. We do not have to worry about variables at all; they cannot occur in a
replicating component of a *REC descriptor, so D1 and D2 are variable-free. (See the discussion of
replicating components in Section 5.2.) Thus, substitutions are not an issue. The substitution s never gets
extended, and it appears in the result intact.
Our rule for this case says that the definition of (DUNIFY-DESCRIPTORS foo bar s) is the single pair
formed by s and a *REC descriptor which is of the same form as foo but with
(DUNIFY-DESCRIPTORS-INTERFACE d1 d2) replacing d1. Thus, the lemma which corresponds to
our rule is:
Lemma DUNIFY-*REC1-OK
H1 - H3 are the standard hypotheses about the arguments to DUNIFY-DESCRIPTORS in lemmas about
the unifier. H4 specifies the result returned from the recursive call to DUNIFY-DESCRIPTORS-
INTERFACE, tying it to the use of the same term d12 in bim. The conclusion is the image of the
conclusion to Lemma DUNIFY-DESCRIPTORS-OK for this case, stated in terms of the result
constructed according to the rule.
Proof of Lemma DUNIFY-*REC1-OK
Case 1: v = nil
H5 (I d12 vcar b)
H6 (I bim vcdr b)
Our case assumption establishes (consp v), H5 gives (I d12 vcar b),
and H6 gives (I bim vcdr b), establishing the goal and completing
the case analysis. QED.
Rule Dunify-*Rec3
Where d2, d3, and d4 contain no variables and d3 is either a
primitive descriptor or a disjunction of primitive descriptors,
and denoting
(dunify-descriptors &j (*rec bar (*or d3 (*cons d4 (*recur bar)))) s)
= ((&j . s1) .. (&j . sn)),
Rule Dunify-*Rec3, stated as a theorem in terms of the interpreter, and assuming the constraints we
placed on our descriptors above, is:
Lemma DUNIFY-*REC3-OK
Case 1. v is atomic
for some i,
(and (I-S si b) (I &j v b))
for some i,
(and (I-S si b) (I &j v b))
H6 (I d24 (car v) b)
H7 for some i,
(and (I-S si b) (I bim (cdr v) b))
goal, thus establishing the goal for this case and completing
the case analysis. QED.
Rule Dunify-*Rec5
(dunify-descriptors
(*rec foo (*or d1 (*cons d2 (*recur foo))))
(*rec bar (*or d3 (*cons d4 (*recur bar))))
s)
=
(((*rec bim (*or d13 d14bar d32foo (*cons d24 (*recur bim)))) . s))
Rule Dunify-*Rec5 is very general, in that it does not require d1 or d3 to be cons-free. Since all the
non-replicating terms in a *REC body can be gathered into an *OR, d1 and d3 represent any such terms,
with the only other restriction being that they are variable-free. The lemma corresponding to the rule is:
Lemma DUNIFY-*REC5-OK
C1 (or (I d13 v b)
C2 (I d14bar v b)
C3 (I d32foo v b)
C4 (and (consp v) (I d24 (car v) b) (I bim (cdr v) b)))
H2’ (or (I d1 v b)
(and (consp v) (I d2 (car v) b) (I foo (cdr v) b)))
H3’ (or (I d3 v b)
(and (consp v) (I d4 (car v) b) (I bar (cdr v) b)))
Case 1. v is atomic
Type Checking for Common Lisp 273
Proofs of Selected Lemmas
This establishes the first disjunct of H2’ and H3’, since the
consp tests will fail. As suggested by H4, instantiate Lemma
DUNIFY-DESCRIPTORS-INTERFACE-OK with tda = d1, tdb = d3, v = v,
and b = b. H2’ and H3’ relieve its hypotheses, giving us the
conclusion,
(I d13 v b)
H8 (I d24 (car v) b)
H9 (I bim (cdr v) b)
Rule Dunify-*Rec11
Where d2, d3, and d4 contain no variables, and adopting the notation
Rule Dunify-*Rec11, stated as a theorem in terms of the interpreter, and assuming the constraints we
placed on our descriptors above, is:
Lemma DUNIFY-*REC11-OK
For any Lisp value v, binding b covering foo and bar, and
substitution s,
(and
H1 (I-S s b)
H2 (I foo v b)
H3 (I bar v b)
H4 (equal (dunify-descriptors-interface d2 d4) d24)
H5 (equal (dunify-descriptors &j bar s)
((&j . s1) .. (&j . sn)))
H6 (equal (dunify-descriptors d3 (*cons d2 foo) s)
((d32foo1 . sd31) .. (d32foom . sd3m))) )
=>
(or
C1 for some i in 1..n,
(and (I-S si b)
(I (*rec bim (*or &j (*cons d24 (*recur bim)))) v b))
C2 for some k in 1..m
(and (I-S sd3k b)
(I (*rec bamk (*or d32fook (*cons d24 (*recur bamk)))) v b)))
C1 expands by definition of I to
C2 expands by definition of I to
Case 1. v is an atom
This rules out the consp disjuncts in H2’ and H3’. Instantiate
Lemma DUNIFY-DESCRIPTORS-OK as suggested by H5, with tda = &j,
tdb = bar, v = v, substs = s, and b = b. H1, H2’, and H3 relieve
its hypotheses, establishing the conclusion:
for some i,
(and (I-S si b) (I &j v b))
H7 (I d24 (car v) b)
The canonicalization rules are stated in the form "td1 ==> td2", where td1 is a generic representation of
descriptors eligible for the canonicalization, and td2 is the representation of the form into which those
descriptors are canonicalized. By "==>" we mean only that
(PCANONICALIZE-DESCRIPTOR TD1) = TD2
Although the canonicalization is directed, each rule happens to be an equality, in the sense that td1 and td2
represent the same set of values under any interpretation by INTERP-SIMPLE. All that is required for
soundness of each rule "td1 ==> td2" is proof of an associated lemma of the form:
For all value v and binding b,
(I td1 v b)
=>
(I td2 v b)
Since it would be tedious to restate each rule in this form, we simply provide the rule notation, but the
proofs follow the INTERP-SIMPLE model. Each rule is annotated with the name of the function in which
it is implemented.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Rule 1:
(*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL
$NON-T-NIL-SYMBOL $STRING $T (*CONS *UNIVERSAL *UNIVERSAL))
==> *UNIVERSAL
From function PCONSOLIDATE-UNIVERSAL-DESCRIPTORS
The reverse implication is also straightforward. It requires the knowledge that the only data items in our
world are characters, integers, NIL, non-integer rational numbers, symbols other than T or NIL, strings, T,
and conses whose atomic leaves are among the aforementioned. This, of course, is a basic and valid
assumption in the problem domain. Thus, proof is simply to show by cases in the data space that for some
disjunct of the *OR descriptor, (I <disjunct> v b) holds for any v and b. I.e., if the object v were a
character, then for any b,
(I (*or $character $integer $nil $non-integer-rational
$non-t-nil-symbol $string $t (*cons *universal *universal))
v
b)
= T
Type Checking for Common Lisp 277
Proofs of Selected Lemmas
since this call to I would expand to a disjunction of calls, one of which, (I $character v b), would hold.
The same kind of argument would apply to all other types of data in the world.
Rule 2:
(*rec foo (... (*recur foo) ...))
==> (... (*rec foo (... (*recur foo) ...)) ...)
from function OPEN-REC-DESCRIPTOR-ABSOLUTE
Comment: This is a little strange, because the very function which performs this canonicalization is part of
the definition of the semantics. When the interpreter encounters a *REC form, it calls itself recursively
with the argument (OPEN-REC-DESCRIPTOR-ABSOLUTE form). Thus, the validity of this
canonicalization is trivially established.
Proof:
(I (*rec foo (... (*recur foo) ...)) v b)
= by expanding the definition of I
(I (... (*rec foo (... (*recur foo) ...)) ...) v b)
Rule 3:
(... (*rec foo (... (*recur foo) ...)) ...)
==> (*rec foo (... (*recur foo) ...))
From function PFOLD-RECS-IF-POSSIBLE
Comment: By this notation we indicate that the forms represented by the (... ...) outside the *REC
descriptor are identical to those represented by the (... ...) inside the *REC but outside the (*RECUR
FOO). Moreover, we intend that there may be more than one (*RECUR FOO) inside the *REC, and in
this case, the larger form would have matching occurrences of the *REC.
Proof:
This follows directly from the proof for Rule 2. If we apply the
interpreter to the right hand side and let it run one step, we
reduce the right hand side to the left hand side.
Rule 4:
(*or .. (*or d1 .. d2) ..) ==> (*or .. d1 .. d2 ..)
From function PCANONICALIZE-OR-DESCRIPTOR
Proof:
(I (*or .. (*or d1 .. d2) ..))
= (by definition of I)
(or ..
(I (*or d1 .. d2) v b)
.. )
= (by definition of I)
(or (or (I d1 v b)
..
(I d2 v b))
.. )
= (by associativity of OR)
(or ..
(I d1 v b)
..
(I d2 v b)
.. )
and
(I (*or .. d1 .. d2 ..) v b)
= (by definition of I)
(or ..
(I d1 v b)
..
(I d2 v b)
.. )
Thus, the two sides are equivalent.
278 Type Checking for Common Lisp
Proofs of Selected Lemmas
Rule 5:
(*or .. d1 .. d1 ..) ==> (*or .. d1 .. ..)
From function PCANONICALIZE-OR-DESCRIPTOR
Comment: This rule characterizes any multiple occurrence of a descriptor as arguments of an *or.
Proof:
(I (*or .. d1 .. d1 ..) v b)
= (by definition of I)
(or ..
(I d1 v b)
..
(I d1 v b)
.. )
= (by tautology)
(or ..
(I d1 v b)
..
..)
= (by definition of I)
(I (*or .. d1 .. ..) v b)
Rule 6.
(*or .. *empty .. ) ==> (*or .. .. )
From function PCANONICALIZE-OR-DESCRIPTOR
Comment: This rule characterizes any occurrence of *EMPTY as an argument of an *OR. The proof is
analogous in any such scenario.
Proof:
(I (*or .. *empty .. ) v b)
= (by definition of I)
(or ..
(I *empty v b)
.. )
= (by definition of I in the *empty case)
(or ..
nil
..)
= (by disjunctive syllogism)
(or ..
.. )
= (by definition of I)
(I (*or .. ..) v b)
Rule 7.
Where d does not contain a (*recur foo),
(*or d (*rec foo (*or d .. (*recur foo))))
==> (*rec foo (*or d .. (*recur foo)))
From function PREMOVE-RECURSIVE-EXPANSION-DUPLICATES
Proof:
(*or d (*rec foo (*or d .. (*recur foo))))
= (by Rule 2)
(*or d (*or d .. (*rec foo (*or d .. (*recur foo)))))
= (by Rule 5)
(*or d .. (*rec foo (*or d .. (*recur foo))))
= (by Rule 3)
(*rec foo (*or d .. (*recur foo)))
Rule 8:
(*or (*cons d1 d2) (*cons d3 d2)) ==> (*cons (*or d1 d3) d2)
From PMERGE-OR-DESCRIPTOR-CONSES
Proof:
(I (*or (*cons d1 d2) (*cons d3 d2)) v b)
= (by definition of I)
(or (I (*cons d1 d2) v b)
Type Checking for Common Lisp 279
Proofs of Selected Lemmas
Rule 9:
(*or (*cons d1 d2) (*cons d1 d3)) ==> (*cons d1 (*or d2 d3))
From PMERGE-OR-DESCRIPTOR-CONSES
Proof:
As with Rule 8.
Rule 10:
280 Type Checking for Common Lisp
Proofs of Selected Lemmas
Proof:
(I *universal v b) = T, so the proof is trivial.
Rule 11:
(*or d) ==> d
I.e., the *OR encapsulates only one disjunct.
From PCANONICALIZE-OR-DESCRIPTOR
Proof:
(I (*or d) v b)
= (by definition of INTERP-SIMPLE-OR)
(or (I d v b)
nil)
= (by definition of OR)
(I d v b)
Rule 12:
(*or) ==> *empty
Here, the *OR encapsulates no disjuncts.
From PCANONICALIZE-OR-DESCRIPTOR
Proof:
(I (*or) v b)
= (by definition of INTERP-SIMPLE-OR)
NIL
(I *empty v b)
= (by definition of INTERP-SIMPLE-OR)
NIL
Rule 13:
(*OR .. d1 .. d2 ..) ==> (*OR .. d2 .. d1 ..)
From PCANONICALIZE-OR-DESCRIPTOR
Comment: This rule allows the placement of *OR disjuncts into a canonical order, determined by the
lexical ordering function DESCRIPTOR-ORDER.
Proof:
(I (*or .. d1 .. d2 ..) v b)
= (by definition of INTERP-SIMPLE-OR)
(or ..
(I d1 v b)
..
(I d2 v b)
..)
= (by commutativity of OR)
(or ..
(I d2 v b)
..
(I d1 v b)
..)
= (by definition of INTERP-SIMPLE-OR)
(I (*or .. d2 .. d1 ..) v b)
Rule 14:
(*cons *empty d) ==> *empty
from PCANONICALIZE-CONS-DESCRIPTOR
Proof:
(I (*cons *empty d) v b)
= (by definition of I)
(and (consp v)
(I *empty v b)
(I d v b))
= (by definition of I)
Type Checking for Common Lisp 281
Proofs of Selected Lemmas
(and (consp v)
nil
(I d v b))
= (by simplification)
nil
= (by definition of I)
(I *empty v b)
Rule 15:
(*cons d *empty) ==> *empty
From PCANONICALIZE-CONS-DESCRIPTOR
Rule 16:
(*rec foo (*or tdfoo (*recur foo))) ==> (*rec bar tdbar)
where tdbar = (subst (*recur bar) (*recur foo) tdfoo)
and bar is a new *rec name and hence (*recur bar) does not occur in
tdfoo.
Comment: The critical feature of the left hand side is that the (*RECUR FOO) form is a top-level disjunct
in the body of the *REC. The idea motivating this canonicalization is that since the (*RECUR FOO) adds
no information not already present in the rest of the descriptor, it can be eliminated. The *REC must be
renamed, since all occurrences of *REC forms with the same label in any given context must be identical.
Proof:
We wish to prove that for any value v and binding b,
(and (I (*rec foo (*or tdfoo (*recur foo))) v b)
(equal tdbar (subst (*recur bar) (*recur foo) tdfoo))
for any x,
(equal (subst x (*recur bar) tdfoo) tdfoo)
=>
(I (*rec bar tdbar) v b)
Case 1. td = *empty
This falsifies H1.
Choose any tdi for which H1 holds and use the lemma as an
inductive assertion, instantiating with td = tdi, v = v,
and b = b. This establishes one disjunct of the goal, which
suffices.
(I (open-rec-descriptor-absolute td) v b)
Type Checking for Common Lisp 283
Proofs of Selected Lemmas
(I (open-rec-descriptor-absolute
(subst (*rec bar td1bar)
(*rec foo (*or td1foo (*recur foo)))
td)))
v b)
Rule 17:
Where within (.. (*RECUR FOO) ..) there is no *OR enclosing the
(*RECUR FOO), but some *CONS enclosing it,
Comment: With this notation, we indicate a *REC structure with no terminating disjunct. Thus, it can
represent only infinite objects, and since these are not allowed in our subset, we reduce the form to
*EMPTY.
Proof:
Let us perform the proof on a canonical example, so we avoid getting
tangled up in notation. We will prove
We will prove this by induction on the depth (in the CDR) of the data
object v.
(AND (CONSP V)
(I d (CAR V) B)
(I *EMPTY (CDR V) B))
QED
Rule 18:
Where (*dlist d1 .. dfoo .. dn) and (*dlist d1 .. dbar .. dn)
are identical except that they differ in one position, dfoo vs. dbar:
(*or ... (*dlist d1 .. dfoo .. dn) ... (*dlist d1 .. dbar .. dn) ...)
= (*or ... (*dlist d1 .. (*or dfoo dbar) .. dn) ...)
From function PCANONICALIZE-OR-DESCRIPTOR
Comment: An *OR of *DLISTS is not a conventional descriptor. But it is formed on exit from DUNIFY-
DESCRIPTORS-INTERFACE when we call it with *DLIST descriptors, whenever multiple results are
returned from its call to DUNIFY-DESCRIPTORS. Thus, rather than there being a semantic notion of an
*OR of *DLISTs, this packaging is just a protocol with any function which calls DUNIFY-
DESCRIPTORS-INTERFACE with *DLIST arguments, allowing DUNIFY-DESCRIPTORS-
INTERFACE to behave polymorphically. The caller has the obligation to unpack the *OR. and to treat
the list of results as a disjunction of possibilities. Hence, this canonicalization allows a merging of results.
Proof:
We will let v be the list of values (v1 .. vfoobar .. vn).
..
(I dn vn b))
= distributing OR over AND and reducing all instances of
(OR x x) to x
(and (I d1 v1 b)
..
(or (I dfoo vfoobar b)
(I dbar vfoobar b))
..
(I dn vn b))
= by definition of I
(and (I d1 v1 b)
..
(I (*or dfoo dbar) vfoobar b)
..
(I dn vn b))
= by definition of I
(I (d1 .. (*or dfoo dbar) .. dn)
(v1 .. vfoobar .. vn)
b)
QED
The lemmas proved in this section are CONTAINED-IN-OK and ICONTAINED-IN-OK and some
lemmas associated with their *REC rules, ICONTAINED-IN-EQUAL-TDS, which is subsidiary to
ICONTAINED-IN-OK, and UNIVERSALIZE-SINGLETON-VARS-1-OK, which is subsidiary to
CONTAINED-IN-INTERFACE-OK.
CONTAINED-IN-OK is the soundness lemma for the CONTAINED-IN algorithm, which is employed by
CONTAINED-IN-INTERFACE whenever neither of the descriptors contain type variables. The main
part of the proof is in the section which follows immediately. One case in this proof is deferred in this
section. This is the case where both DESCRIPTOR1 and DESCRIPTOR2 are *REC descriptors. The
proof approach for this case is in the second section to follow.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Lemma CONTAINED-IN-OK
For any descriptors td1 and td2, Lisp value v, and binding b1,
(and
H1 (null (gather-variables-in-descriptor td1))
H2 (null (gather-variables-in-descriptor td2))
H3 (contained-in td1 td2)
H4 (I td1 v b1))
=>
Type Checking for Common Lisp 287
Proofs of Selected Lemmas
The proof of soundness of CONTAINED-IN is actually the "simultaneous" proof of the conjunction of a
collection of lemmas whose proofs rely on one another. The nest includes Lemma CONTAINED-IN-OK
and the correctness lemmas for all the *REC containment rules. The proof is by computational induction
on the length of the computation by CONTAINED-IN. This allows the assumption of any appropriate
lemma within the nest to characterize the result of any recursive call in the computation subsidiary to the
top level call.
Recall that CONTAINED-IN also takes a third argument, TERM-RECS, which is employed solely as a
mechanism for terminating the computation in certain cases. As such, it has no effect on the soundness
argument, which does not rely on termination, and to avoid clutter in the lemmas and proofs, we simply
omit mention of it. One could think of it as being a universally quantified variable in the lemma with no
particular demands placed on it.
Proof of Lemma CONTAINED-IN-OK
Since td1 and td2 are variable-free and the algorithm introduces no
new variables, b1 and b2 are irrelevant to the evaluation of
(I td1 v b1) and (I td2 v b2). We fix b2 to equal b1 and simply
refer to b to eliminate any confusion. Now proceed by cases.
Case 7.3. td2 falls under neither Case 7.1 nor 7.2.
By definition, (contained-in td1 td2)
= (contained-in (open-rec-descriptor-absolute td1) td2).
This equality is maintained under canonicalization Rule 2 (See
Section 6.7) for opening *rec descriptors.
Therefore, we appeal to the inductive hypothesis, instantiated
with td1 = (open-rec-descriptor-absolute td1).
Case 10. td1 = (*dlist td11 .. td1n) and td2 = (*dlist td21 .. td2n)
v is a list of values (v1 .. vn).
By definition,
Type Checking for Common Lisp 289
Proofs of Selected Lemmas
For any descriptors td1i and td2i, value vi, and binding b,
(and (null (gather-variables-in-descriptor td1i))
(null (gather-variables-in-descriptor td2i))
(contained-in td1i td2i)
(I td1i vi b))
=>
(I td2i vi b)
Case 11. td1 = (*cons td1car td1cdr) and td2 = (*cons td2car td2cdr)
This is analogous to Case 10, where the lists of arguments are
of length 2, the descriptor constructor is *cons rather than *dlist,
and the value is (cons (car v) (cdr v) rather than (v1 .. vn).
The proof is exactly analogous to Case 10, with these surface issues
taken into consideration.
Case 12. td1 = (*cons td1car td2cdr) and td2 is a primitive descriptor
(contained-in td1 td2) = nil, falsifying the antecedent H3.
Case 13. td1 is a primitive descriptor and td2 = (*cons td2car td2cdr).
(contained-in td1 td2) = nil, falsifying the antecedent H3.
Case 14. td1 = $integer (Without loss of generality, this same argument
applies to the cases for other primitive descriptors)
This section gives the proof approach for the case where both DESCRIPTOR1 and DESCRIPTOR2 are
*REC descriptors, which is handled by application of a collection of special purpose rules. Each rule has
an enabling condition which is essentially a pattern matching predicate on the forms of the descriptors.
Each rule also has an action, which determines whether containment exists for that case. The first such
rule which is enabled for given arguments makes the complete determination. Collectively, the proofs of
soundness of all these rules is sufficient to prove the soundness of the algorithm for this case, since
collectively they provide an exhaustive case analysis.
Each rule coincides with a lemma stated in terms of INTERP-SIMPLE. The statement of these lemmas
and their proofs are supplied for only some of the rules stated here, but these examples should provide
adequate demonstration that the other rules are provable by the same techniques. In particular, Rule
Contain-*REC2 was chosen for proof because it is a simple rule illustrating the basic technique, and Rule
Contain-*REC5 was chosen because it is perhaps the most complex of the rules.
Recall that the lemma corresponding to each rule is a member of the nest of lemmas, including
CONTAINED-IN-OK, which we are proving simultaneously via computational induction on the length of
the computation by CONTAINED-IN. This allows the assumption of each lemma with regard to every
subsidiary call within the computation.
There are numerous rules, and they are all stated in Section 6.8.3. The rules are stylistically quite similar.
Most reduce the problem to one or more containment problems for component descriptors, and hence the
assumption of CONTAINED-IN-OK in their proofs.
Since the CONTAINED-IN algorithm is only invoked when both of its arguments are variable-free, we
have a standing assumption on all the rules that the descriptors in question are variable-free. Thus their
components are also variable-free, and CONTAINED-IN may be called recursively.
Many of the rules have a NIL result, indicating that containment does not exist. The soundness proofs for
these rules are trivial, since returning a NIL result negates one of the antecedents to the lemma which
would correspond to any such rule.
Furthermore, there are some rules whose names end with a "’". These rules are generally symmetric
variants of the rules with the same, but unprimed, names. Their proofs would be similarly symmetric.
Rule Contain-*REC2 has just such a variant.
The rules are employed in the following manner. When CONTAINED-IN is called with the two *REC
descriptors, it invokes each rule in sequence until it finds one whose enabling condition establishes that
the *REC descriptors are of the form indicated by canonical example as FOO and BAR. The rule then
indicates the definition of CONTAINED-IN for *REC descriptors of that form. The lemmas are
generically of the same following form as CONTAINED-IN-OK:
For any variable-free *rec descriptors of the form foo and bar,
Lisp value v, and type variable binding b1,
but with the (CONTAINED-IN FOO BAR) hypothesis replaced by the right hand side of the containment
Type Checking for Common Lisp 291
Proofs of Selected Lemmas
rule. This simply reflects the fact that for this case, the right hand side provides the definition of
CONTAINED-IN for this case.
This rule reduces the containment problem for *REC descriptors of the form characterized by FOO and
BAR to the containment problem for D1 and D2.
We wish to prove a lemma validating this rule as an inference in the CONTAINED-IN algorithm. The
rule corresponds to and is justified by the lemma:
Lemma CONTAIN-*REC2-OK
For any (variable-free) descriptors d1 and d2, where foo is a
descriptor of the form (*rec foo (*or $nil (*cons d1 (*recur foo))))
and bar is a descriptor of the form
(*rec bar (*or $nil (*cons d2 (*recur bar)))),
for any Lisp value v, and type variable binding b1,
(and
H1 (null (gather-variables-in-descriptor foo))
H2 (null (gather-variables-in-descriptor bar))
H3 (contained-in d1 d2)
H4 (I foo v b1))
=>
For some b2, (I bar v b2)
For any descriptors d1, d2, d3, and d4, where foo is a descriptor of
the form (*rec foo (*or d1 (*cons d2 (*recur foo)))) and bar is a
descriptor of the form (*rec bar (*or d3 (*cons d4 (*recur bar)))),
For any value v and binding b1,
(and
H1 (null (gather-variables-in-descriptor foo))
H2 (null (gather-variables-in-descriptor bar))
H3 (or (and (contained-in d1 d3) (contained-in d2 d4))
(contained-in foo d3)
(and (contained-in d1 bar) (contained-in d2 d4)))
H4 (I foo v b1))
=>
For some binding b2, (I bar v b2)
Since foo and bar are variable-free, then d1, d2, d3, and d4, by
virtue of being syntactically contained in either foo or bar, are
also variable-free, and we will proceed under that notion without
further mention. Furthermore, there is no need to be concerned
about the bindings b1, b2, or any others, since because there
are no variables in the problem state, they never come into play
in the evaluation of I. So let us just use b1 as a witness for
any binding.
(or (I d3 v b1)
(and (consp v) (I d4 (car v) b1) (I bar (cdr v) b1)))
B.9 ICONTAINED-IN-OK
ICONTAINED-IN-OK is the soundness lemma for the ICONTAINED-IN algorithm, which is employed
by CONTAINED-IN-INTERFACE whenever either of the descriptors contain type variables. The main
part of the proof is in the section which follows immediately. One of its more significant cases is factored
into Lemma ICONTAINED-IN-EQUAL-TDS, which is proved in the next section. Also deferred in the
main text is the case where both DESCRIPTOR1 and DESCRIPTOR2 are *REC descriptors. The proof
approach for this case is in the third section to follow.
Here, as in all our other lemmas and proofs, "I" is a shorthand notation representing INTERP-SIMPLE
when its second and third arguments are lists, INTERP-SIMPLE-1 when these arguments are a single
descriptor and value.
Type Checking for Common Lisp 295
Proofs of Selected Lemmas
Lemma ICONTAINED-IN-OK:
For any descriptors td1 and td2, simple mapping m, Lisp value v,
symbolic value reference vref, and binding b1 covering the
variables in td1,
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 (well-formed-mapping m vref b1)
H3 (icontained-in td1 (dapply-subst-list-1 m td2) vref)
H4 (I td1 v b1) )
=>
For some b2, (I td2 v b2)
Note: Recall that ICONTAINED-IN takes another argument, TERM-RECS, which is employed solely as
a mechanism for terminating the computation in certain cases. As such, it has no effect on the soundness
argument, which does not rely on termination. To avoid clutter in the lemmas and proofs, we simply omit
further mention of it, though one could think of it as being universally quantified and arbitrarily
instantiable.
Proof of Lemma ICONTAINED-IN-OK
The EVAL function used here is the normal Lisp EVAL, since this
allows MAKE-BINDING-FROM-MAPPING to be executable in a normal
Lisp world. In our formal world, this call of EVAL corresponds
to a call to E with the same form, any binding (since the form
is a ground term), any world containing the SUBRs CAR, CDR,
DLIST-ELEM and REC-TAIL, and any positive integer clock. Since
the only functions invoked are SUBRs, no clock ticks are
consumed. (Although DLIST-ELEM and REC-TAIL are not in our initial
database of function signatures, they certainly could be. We chose
not to include them because the scenario just described here is
somewhat hypothetical and is relevant only to the proof, not to
the normal operation of the system.)
For any descriptors td1 and td2, simple mapping m, Lisp value v,
symbolic value reference vref, and binding b1 covering the
variables in td1,
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 (well-formed-mapping m vref b1)
H3 (icontained-in td1 (A-S m td2) vref)
H4 (I td1 v b1) )
=>
(I td2 v (make-binding-from-mapping m v b1 vref))
Since there are no value references in td1, our case assumption rules
out the possibility that m maps &i to a value ref.
Lemma ICONTAINED-IN-EQUAL-TDS
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 (well-formed-mapping m vref b1)
H3 (I td1 v b1)
H4 (equal td1 (dapply-subst-list-1 m td2)) )
=>
(I td2 v (make-binding-from-mapping m v b1 vref))
Note: The "equality" test the system employs to compare the first
argument of icontained-in to the disjuncts is really an isomorphism
test which allows the disjunct to be identical except for the
name used as a *rec label. A trivial observation is that for
any two name-isomorphic *rec descriptors rec1 and rec2,
for all v and b, (I rec1 v b) => (I rec2 v b). Thus, name
isomorphism is all that is required here.
Case 9.3. (A-S-1 m td2) falls under neither Case 9.1 nor 9.2.
By definition, (icontained-in td1 (A-S-1 m td2) vref)
298 Type Checking for Common Lisp
Proofs of Selected Lemmas
Since the cases where td1 is one of td2, *universal, *empty, an *or,
or a *rec have been covered above, we know td1 is a *cons or a
primitive.
Also by definition of I,
(I td2 v b2) = (or (I td21 v b2) .. (I td2n v b2)).
Lemma A-S-1-OPEN-REC-COMMUTE
For any *rec descriptor td and well-formed mapping m,
By definition,
(icontained-in
(*dlist td11 .. td1n)
(*dlist (A-S-1 m td21) .. (A-S-1 m td2n))
vref)
=
(and (icontained-in td11 (A-S-1 m td21) (dlist-elem vref 1))
..
(icontained-in td1n (A-S-1 m td2n) (dlist-elem vref n)))
(and
H1 (disjoint (gather-variables-in-descriptor td1i)
(gather-variables-in-descriptor td2i))
H2 (well-formed-mapping m (dlist-elem vref i) b1)
H3 (icontained-in td1i (A-S-1 m td2i) (dlist-elem vref i))
H4 (I td1i vi b1) )
=>
(I td2i
vi
(make-binding-from-mapping m vi b1 (dlist-elem vref i)))
Case 13. td1 = (*cons td1car td1cdr) and td2 = (*cons td2car td2cdr)
This is handled by ICONTAINED-IN analogously to Case 12, where
the length of the *dlist is two, where *cons is the constructor
instead of *dlist, where car and cdr are the accessors rather
than dlist-elem, and the value v is a cons rather than a list
of values. The argument is exactly analogous.
By definition,
(icontained-in td1 (A-S-1 m td2) vref) = nil,
falsifying H3.
By definition,
(icontained-in td1 (*cons td2car td2cdr) vref) = nil,
falsifying H3.
By definition,
(icontained-in td1 (A-S-1 m td2) vref) = nil,
falsifying H3.
41
Lemma ICONTAINED-IN-EQUAL-TDS
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 (well-formed-mapping m vref b1)
H3 (I td1 v b1)
H4 (equal td1 (dapply-subst-list-1 m td2)) )
=>
(I td2 v (make-binding-from-mapping m v b1 vref))
Then (A-S-1 m td2) = td2. But by H4, td1 = td2, and by H1,
the variables in td1 and td2 are distinct. Since td2 is a
variable, we have a contradiction.
= (by definition of I)
(equal v (cdr (assoc &j b1)))
By the definition of make-binding-from-mapping,
(make-binding-from-mapping m v b1 vref) includes the element
(&i . (cdr (assoc &j b1))). Therefore, in the conclusion,
(I &i v (make-binding-from-mapping m v b1 vref))
=
(equal v (cdr (assoc &i b1)))
which we established above.
By this lemma,
(A-S-1 m td2)
=
(*or (A-S-1 m td21) .. (A-S-1 m td2n))
By this lemma,
(A-S-1 m td2)
=
(*dlist (A-S-1 m td21) .. (A-S-1 m td2n))
(and (I td21
v1
(make-binding-from-mapping m v1 b1 (dlist-elem vref 1)))
..
(I td2n
vn
(make-binding-from-mapping m vn b1 (dlist-elem vref n))))
(Here by the notation (dlist-elem vref i), we mean the form whose
car is the atom dlist-elem, whose cadr is vref, and whose caddr
is the integer i, rather than the value of applying the function
dlist-elem to the symbolic value reference vref.)
Lemma A-S-1-OPEN-REC-COMMUTE
For any *rec descriptor td and well-formed mapping m,
(I (open-rec-descriptor-absolute td2)
v
(make-binding-from-mapping m v b1 vref))
What follows is a list of all the containment rules used in ICONTAINED-IN to handle the case where
both arguments are *REC descriptors. Each rule gives the definition of ICONTAINED-IN for the case
where the arguments are *REC descriptors of a certain form (where some previous case in
ICONTAINED-IN did not apply). Each rule coincides with a lemma stated in terms of the descriptor
interpreter. The statement of these lemmas and their proofs are supplied for only some of the rules stated
here, but these examples should provide adequate demonstration that the other rules are provable by the
same techniques. Most of these rules are exactly analogous to the similarly numbered Contain-*Rec rules
and are justified by the same means as those rules. The ones which are different in nature are those where
type variables or Lisp variable references appear in the arguments, and these rules are the ones which are
presented for proof. Some of the rules yield a NIL result for ICONTAINED-IN, and we note that the
proofs of these rules are trivial, since a NIL result negates one of the antecedents to the lemma which
would correspond to the rule.
Note that descriptors appearing within replicating components of the *REC descriptors are variable-free,
by the definition of well-formedness for *REC descriptors. (For a discussion, see Section 5.2.) Thus, the
rules frequently have an invocation of the CONTAINED-IN algorithm, rather than ICONTAINED-IN, on
the right hand side. For example, see Rule IContain-*Rec2. For the same reason, descriptors appearing
within replicating components are also free of symbolic value references, since such a reference could
have only gotten there via replacement of a type variable in application of a mapping.
Rule IContain-*REC1
Where the two *recs differ only in name,
Rule IContain-*REC2
(icontained-in (*rec foo (*or $nil (*cons d1 (*recur foo))))
(*rec bar (*or $nil (*cons d2 (*recur bar))))
vref)
=
(contained-in d1 d2)
Rule IContain-*REC2’
(icontained-in (*rec foo (*or $nil (*cons (*recur foo) d1)))
(*rec bar (*or $nil (*cons (*recur bar) d2)))
vref)
=
(contained-in d1 d2)
Rule IContain-*REC3
Type Checking for Common Lisp 305
Proofs of Selected Lemmas
Rule IContain-*REC3’
Where d1 and d3 are either primitive non-cons descriptors or
*ORs of primitive non-cons descriptors,
Rule IContain-*REC40
(icontained-in (*rec foo .. )
(*rec bar (*or *universal .. ))
vref)
= t
Rule IContain-*REC41
(icontained-in (*rec foo .. )
(*rec bar (*or vref (*cons d2 (*recur bar))))
vref)
= t
Rule IContain-*REC42
(icontained-in (*rec foo .. )
(*rec bar (*or (*or .. vref .. ) (*cons .. )))
vref)
= t
Rule IContain-*REC43
Where d3 is a primitive descriptor or a disjunction of primitive
descriptors,
(icontained-in
(*rec foo (*or d3 (*cons d4 (*recur foo))))
(*rec bar (*or (rec-tail vref) (*cons d2 (*recur bar))))
vref)
=
(contained-in d4 d2)
Rule IContain-*REC44
Rule IContain-*REC45
Where d1 and d2 contain no variables,
(icontained-in
(*rec bim (*or &i (*cons d1 (*recur bim))))
(*rec bar (*or (*or .. &i .. ) (*cons d2 (*recur bar))))
vref)
=
(contained-in d1 d2)
The proof of the following rule is trivial, since the NIL result
negates the antecedent that the result is non-NIL.
Rule IContain-*REC46
Where d2, d3, and d4 contain no variables,
Rule IContain-*REC5
When there are variables in either foo or bar,
Rule IContain-*REC6
There are no type variables anywhere and d1 and d4 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors.
(icontained-in
(*rec foo (*or d1 (*cons d2 (*cons d3 (*recur foo)))))
(*rec bar (*or d4 (*cons d5 (*recur bar))))
vref)
=
(and (contained-in d1 d4)
(contained-in d2 d5)
(contained-in d3 d5))
Rule IContain-*REC6’
There are no type variables anywhere and d1 and d4 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors.
(icontained-in
(*rec bar (*or d4 (*cons d5 (*recur bar))))
(*rec foo (*or d1 (*cons d2 (*cons d3 (*recur foo)))))
vref)
=
nil
Rule IContain-*REC7
There are no type variables anywhere and d1 and d4 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
(icontained-in
Type Checking for Common Lisp 307
Proofs of Selected Lemmas
Rule IContain-*REC8
There are no type variables anywhere and d1 and d5 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
(icontained-in
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*recur bar))))
vref)
=
(and (contained-in d1 d5)
(contained-in d2 d6)
(contained-in d3 d6)
(contained-in d4 d6))
Rule IContain-*REC8’
There are no type variables anywhere and d1 and d5 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
(icontained-in
(*rec bar (*or d5 (*cons d6 (*recur bar))))
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
vref)
=
nil
Rule IContain-*REC9
There are no type variables anywhere and d1 and d5 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
(icontained-in
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*cons d7 (*recur bar)))))
vref)
=
nil
Rule IContain-*REC9’
There are no type variables anywhere and d1 and d5 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
(icontained-in
(*rec bar (*or d5 (*cons d6 (*cons d7 (*recur bar)))))
(*rec foo (*or d1 (*cons d2 (*cons d3 (*cons d4 (*recur foo))))))
vref)
=
nil
Rule IContain-*REC10
There are no type variables anywhere and d2 and d4 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
Rule IContain-*REC11
There are no type variables anywhere and d2 and d3 allow no
*cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
Rule IContain-*REC11’
There are no type variables in bar and d3 is a
primitive descriptor or a disjunction of primitive descriptors,
Rule IContain-*REC13
There are no type variables anywhere and d1, d3, d5, and d7
allow no *cons forms, i.e., they are primitive descriptors or
disjunctions of primitive descriptors,
(icontained-in
(*rec foo (*or d1 (*cons d2 (*or d3 (*cons d4 (*recur foo))))))
(*rec bar (*or d5 (*cons d6 (*or d7 (*cons d8 (*recur bar))))))
vref)
=
(and (contained-in d1 d5)
(contained-in d2 d6)
(contained-in d3 d7)
(contained-in d4 d8))
Rule IContain-*REC0
(icontained-in foo bar vref) = nil
Now for the soundness proofs of some of the rules above. Each rule being proved will be followed by
statement of the lemma corresponding to the rule, and then the lemma will be proved. Lemmas following
the same pattern could be stated for the other rules and proved using similar techniques.
Recall that the main theorem we are trying to prove is the following.
Lemma ICONTAINED-IN-OK:
For any descriptors td1 and td2, simple mapping m, Lisp value v,
symbolic value reference vref, and binding b1 covering the
variables in td1,
(and
H1 (disjoint (gather-variables-in-descriptor td1)
(gather-variables-in-descriptor td2))
H2 (well-formed-mapping m vref b1)
Type Checking for Common Lisp 309
Proofs of Selected Lemmas
Note that for each of our rules, the descriptor in the second argument to ICONTAINED-IN is not the TD2
in this theorem, but rather (DAPPLY-SUBST-LIST-1 M TD2). So in each of the following, we will use a
shorthand "bar-td2" to represent the *REC descriptor TD2, we will use "bar" to represent the second
argument in the icontained-in rule, i.e., (DAPPLY-SUBST-LIST M bar-td2). We will also abbreviate
DAPPLY-SUBST-LIST-1 with "A-S-1".
Rule IContain-*REC41
(icontained-in (*rec foo .. )
(*rec bar (*or vref (*cons .. (*recur bar) ..)))
vref)
= t
(and
H1 (disjoint (gather-variables-in-descriptor foo)
(gather-variables-in-descriptor bar-td2))
H2 (well-formed-mapping m vref b1)
H3 (equal bar (A-S-1 m bar-td2))
H4 (icontained-in foo (A-S-1 m bar-td2) vref)
H5 (I foo v b1))
=>
(I bar-td2 v (make-binding-from-mapping m v b1 vref))
(or (equal v v) .. )
= t
QED
Rule IContain-*REC43
Where d3 is a primitive descriptor or a disjunction of primitive
descriptors,
(icontained-in
(*rec foo (*or d3 (*cons d4 (*recur foo))))
(*rec bar (*or (rec-tail vref) (*cons d2 (*recur bar))))
vref)
=
(contained-in d4 d2)
(and
H1 (disjoint (gather-variables-in-descriptor foo)
(gather-variables-in-descriptor bar-td2))
H2 (well-formed-mapping m vref b1)
H3 (equal bar (A-S-1 m bar-td2))
H4 (contained-in d4 d2)
H5 (I foo v b1))
=>
C (I bar-td2 v (make-binding-from-mapping m v b1 vref))
Case 1 (I d3 v b1)
It suffices to establish
C1 (and (consp v)
C2 (I d2 (car v) (make-binding-from-mapping m v b1 vref))
C3 (I bar-td2 (cdr v) (make-binding-from-mapping m v b1 vref)))
Lemma CONTAINED-IN-OK
For any descriptors td1 and td2, Lisp value v, and binding b1,
(and
H1 (null (gather-variables-in-descriptor td1))
H2 (null (gather-variables-in-descriptor td2))
H3 (contained-in td1 td2)
H4 (I td1 v b1))
=>
For some b2, (I td2 v b2)
We instantiate this lemma with td1 = d4, td2 = d2, v = (car v),
and b1 = b1. The well-formedness rule for *rec descriptors
establishes that d2 and d4 have no variables, since both d2 and
d4 appear in replicating components of their respective *rec
descriptors. H4 equals the third antecedent. The fourth and
final antecedent is established by our case assumption. So we
can use the conclusion. As always, in the absence of variables,
the bindings are irrelevant to the evaluation of I, so b2 may as
well be the binding (make-binding-from-mapping m v b1 vref). This
establishes C2.
Rule IContain-*REC44
Where d1 and d2 contain no variables,
(icontained-in (*rec foo (*or &i (*cons d1 (*recur foo))))
(*rec bar (*or &i (*cons d2 (*recur bar))))
vref)
=
(contained-in d1 d2)
(and
H1 (disjoint (gather-variables-in-descriptor foo)
(gather-variables-in-descriptor bar-td2))
H2 (well-formed-mapping m vref b1)
H3 (equal bar (A-S-1 m bar-td2))
H4 (contained-in d1 d2)
H5 (I foo v b1))
=>
C (I bar-td2 v (make-binding-from-mapping m v b1 vref))
the mapping m, the binding for &k in the result is the same as
the binding of &i in b1. The (well-formed-mapping m vref b1)
predicate in H2 guarantees that &i is bound in b1. So with this
in mind, we see that (I &k v (make-binding-from-mapping m v b1 vref))
expands by the definition of I to (equal v (cdr (assoc &i b1))),
which we established above.
C1 (and (consp v)
C2 (I d2 (car v) (make-binding-from-mapping m v b1 vref))
C3 (I bar-td2 (cdr v) (make-binding-from-mapping m v b1 vref)))
Since bar-td2 has no symbolic value references and its only variable
is &k, the only point at which the binding enters in the evaluation
of these two calls of I is with respect to &k, and from what we
know of m and from the definition of make-binding-from-mapping,
we see that &k maps to the binding of &i in both
(make-binding-from-mapping m (cdr v) b1 vref) and
(make-binding-from-mapping m v b1 vref). Hence, C3 follows from
the conclusion of the inductive hypothesis
QED.
(I (tc-universalize-singleton-vars-1 td vlist) v b)
=>
314 Type Checking for Common Lisp
Proofs of Selected Lemmas
By definition,
(tc-universalize-singleton-vars-1 td vlist) = td
establishing the goal trivially, with b’ = b.
(I (subst *universal
&i
(tc-universalize-singleton-vars-1 td (cdr vlist)))
v b)
=>
For some b’ (I td v b’)
Appendix C
A Set-Based Semantics
In Chapter 5, we mentioned that several alternate semantic models were considered before we settled on
the INTERP-SIMPLE semantics. This appendix describes one of those models, one in which type
descriptor variables can be instantiated by other descriptors, rather than by singleton values. We
abandoned this model because of a desire for a more computational style of semantics. As this was a
work in progress when abandoned, it is no doubt imperfect.
This appendix consists of notes originating from a discussion with Matt Kaufmann about semantics,
which were extended and elaborated as the ideas were developed. Some of the discussion reflects that the
ideas were in flux. Kaufmann provided the first draft of the definition of Eval and the statement of the
theorems proved.
Let TD be the set of type descriptors, and let CLU be the Common Lisp universe of objects. Let TA be
the set of type assignments, i.e. functions mapping the set of type variables to the power set of CLU,
P(CLU). For the purpose of communication, we may sometimes refer to TA as mapping from type
variables to variable-free descriptors, as any variable-free descriptor is an exact representation of a set of
Lisp values.
We define a function "Eval", to be a mapping from a type descriptor and a type assignment to a set of
Common Lisp values.
Eval: TD x TA -> P(CLU)
Some discussion leads up to the definition of Eval((*REC <rec-name> td),a). Consider an example of a
recursive descriptor:
T = (*rec tl (*or $nil (*cons *universal (*recur tl))))
Then consider the descriptor To, derived from T by replacing the recursion point with a fresh variable,
which we will call &*.
To = (*or $nil (*cons *universal &*))
Eval(T,a) = ∪n Pan(phi)
i.e., the union over all naturals n of the function Pan with the initial argument of the empty set, where Pa
is defined
Pa(s) = Eval(To,a ++ {<&*,s>}) ("++" is an appending operator)
The expansions of this function are as follows (where "x" is a Cartesian product employing the CONS
constructor):
The base set is ∅.
etc.
Another notation for the big union "∪n Pan(∅)" is the form
Union{Pan(∅): n ∈ w}
Here, "w" is omega, the set of all natural numbers, the "∈" is "member", so the set qualifier is a
quantification over all natural numbers, with n being the quantified variable.
we have
To = td/((*recur <recname>) . &*)
Eval(T,a) = ∪n Pan(∅)
where
Pa(s) = Eval(To,a ++ {<&*,s>})
A proof of the following rewrite rule would have been crucial to the proof of unification. I did not get
around to this proof before abandoning this semantic model. With respect to the preceding fixed point
semantics, we would have needed to prove the equivalence:
Eval((*REC <rec-name> td),a) =
Type Checking for Common Lisp 319
A Set-Based Semantics
Next, we will need a notion of descriptor substitution. A substitution is simply a function from a set of
type variables into TD. The notion of application of substitution, written td/s (for td in TD and s a
substitution) has a simple recursive definition, defined as follows. Note that we can view the identity
substitution s_id as the empty function.
Definition of "/": (descriptor, substs) -> descriptor
&i/s = s(&i)
prim/s = prim, where "prim" is one of
{$CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T}
*universal/s = *universal
*empty/s = *empty
(*cons t1 t2)/s = (*cons tl/s t2/s)
(*or t1 t2)/s = (*or tl/s t2/s)
(*and t1 t2)/s = (*and tl/s t2/s)
(*rec <recname> td)/s = (*rec <recname> td/s)
C.1 Monotonicity
Proof:
Case 1 td = &i
td/s1 = si(&i) td/s2 = s2(&i)
Eval (s1(&i), a) ⊆ Eval (s2(&i), a) def of <=
Proof: Easy.
C.4 Instantiation
Proof:
Expand both the hypothesis and the conclusion using the definition of |=
all in TA, all (x1, .., xn) | xi in Eval(tdi,a) for 1 <= i <= n,
f(x1, .., xn) in Eval(td,a)
=>
all a in TA, all (x1, .., xn) | xi in Eval(tdi/s,a) for 1 <= i <= n,
f(x1, .., xn) in Eval(td/s,a)
So now consider:
Proof: Trivial, since Eval touches the context only when a variable
is encountered in t.
Proof of L1:
Let ah = s o ac. So the problem is to prove Eval(t,ah) = Eval(t/s,ac).
Do this by induction on the structure of type descriptors.
Given this semantics for type descriptors, we can now study the semantics of the function type signatures
generated by the inference system.
Imagine that for each function F in a state S, S assigns a set to F which we’ll call its domain. (We could
say that this is the set recognized by the guard of F.) In some sense all functions are total, so this is not
quite the standard notion of domain. But the semantic interpretation of a function applied to arguments
not satisfying the guard is extremely weak. Our type inference system can certainly only generate a
meaningful signature mapping arguments within the domain. So we will not consider mapping arguments
outside of the domain.
We can consider a function signature to be a set of segments, where a segment is a syntactic entity of the
form (TD1, TD2, ..., TDN) -> TD, where N is the arity of the function. Intuititively, such a segment says
that if the ith parameter of the function is in the set prescribed by TDi, then the value of the function
application may be in the set TD. We could say the specification for our inference algorithm TC, applied
to a function F in a state S (where the state contains the signatures of previously defined functions) is:
TC(f,s) == {(td1...tdn) -> td | f |= (td1...tdn) -> td }
By "F |= (TD1...TDN) -> TD" we mean that "(TD1...TDN) -> TD" is a valid segment for the function
F. By this we mean the following.
If f is a function of arity n then we might write the following
definition of |= :
But perhaps this definition of |= is would not provide the most worthwhile signatures for our Lisp
functions. Consider the function:
(DEFUN FOO (X)
(IF (EQUAL X 3)
10
(IF (EQUAL X 5)
’FOO
T)))
Under the preceding definition of |=, we might wish for the signature:
($INTEGER) -> (*OR $INTEGER $NON-T-NIL-SYMBOL $T)
($CHARACTER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL $STRING $T)
-> $T
But producing a signature like this goes against the grain of an inference algorithm which will analyze the
function in Lisp’s recursive descent evaluation order. Ultimately, the results returned by the function will
be computed at the tips of its IF structure. At each tip we will have a context of the types of the formal
variables, as determined by the various IF tests through which they have passed. This suggests that the
orientation of the algorithm would be around generating a segment or a collection of segments at these
tips. Since there can be tests like (EQUAL X 3), an $INTEGER parameter could factor into more than
one of these tips. But the preceding definition of |= determines that each segment will have on its left
hand side a collection of type descriptors totally distinct from any other segment. To derive such a
Type Checking for Common Lisp 323
A Set-Based Semantics
collection would seem to require a complete inversion of the segments produced via recursive analysis.
Such an inversion might be possible, but it would be tricky to preserve information about variables, as the
scope of variables would then become the entire signature rather than being limited to one segment.
A signature for FOO more in harmony with recursive descent analysis would be:
($INTEGER) -> $INTEGER
($INTEGER) -> $NON-T-NIL-SYMBOL
(*UNIVERSAL) -> $T
One segment is produced for each tip of the IF tree. But clearly, we have a type assignment, i.e., the
empty one, for which if X1 = 3, the second and the third segments do not conform to |=. So this style of
signature is inconsistent with our definition of |=. We will need a definition of signatures wherein
individual segments are neither wholly independent nor absolutely applicable. For any actual argument
types, there may be more than one segment whose left hand side is satisfied by the arguments. So let us
try to find a definition consistent with this alternate signature for FOO.
One of the pieces of information returned by the inference algorithm is a descriptor list characterizing the
function guard. We can imagine two important components of our definition. One ensures that for any
arguments which satisfy the function guard, at least one segment will be applicable, i.e., will have a left
hand side which is also satisfied by the type of the arguments. Stated formally, this requirement is as
follows:
for all (x1...xn) such that there exists a substitution a in TA such that
for all i such that 1<=i<=n, xi in Eval(guardi,a)
where (guard1..guardn) = the type descriptor list characterizing
the function guard
there exists a segment (td1..tdn) -> td in TC(f,s) such that
there exists a substitution a in TA such that
for all i such that 1<=i<=n, xi in Eval(tdi,a)
We can paraphrase the first part of this requirement states by saying: "for all (X1...XN) such that
F(X1...XN) is defined (and N is the arity of F)". The second part says that for any arguments satisfying
the function guard, there is some segment whose left hand side is satisfied by the types of the arguments.
Now we need to characterize the fact that one of the segments will yield the correct type for the arguments
(X1...XN). There are several ways to phrase this requirement, all equivalent and each revealing a slightly
different view of a signature and how it might be used. The most straightforward simply states, in our
formal setting, that one of the segments applies.
where S = ((td11...tdn1) -> td1) .. ((td1m...tdnm) -> tdm) is the
collection of segments in TC(f,s) such that for all j, 1<=j<=m
there exists a substitution a in TA such that
for all i such that 1<=i<=n, xi in Eval(tdij,a)
for some segment ((td1...tdn) -> td) in S,
for some substitution a in TA such that
for all i such that 1<=i<=n, xi in Eval(tdi,a),
f(x1..xn) in Eval(td,a)
S is the set of segments whose left hand sides are satisfied by the (X1...XN) under some substitution for
type variables. The statement is, then, that one of the segments, given the right substitution, supplies the
correct result type.
The next statement, rather than stressing the selection of a single correct segment, focuses on an aggregate
result which is also correct under some substitution. This is the interpretation which, we will see, suggests
the principal rule used in formulating closed recursive forms for the results of recursive functions.
324 Type Checking for Common Lisp
A Set-Based Semantics
The following alternate statement is much like the previous one and perhaps more directly grounds in the
notion of sets.
where S = ((td11...tdn1) -> td1) .. ((td1m...tdnm) -> tdm) is the
collection of segments in TC(f,s) such that for all j, 1<=j<=m
there exists a substitution a in TA such that
for all i such that 1<=i<=n, xi in Eval(tdij,a)
and where (((td11...tdn1) -> td1),a1) .. (((td1m...tdnm) -> tdm),am)
is the collection of all such segments and their associated
substitutions,
f(x1..xn) in Union (Eval(td1,a1) .. Eval(tdm,am))
Type Checking for Common Lisp 325
A Function-Based Semantics
Appendix D
A Function-Based Semantics
In Chapter 5, we mentioned that several alternate semantic models were considered before we settled on
the INTERP-SIMPLE semantics. This appendix describes one of those models, where the semantics is
given in the vernacular of functions and meta-functions. Each descriptor corresponds to a function, and a
descriptor containing type variables corresponds to a function with one parameter for each type variable.
These parameters are to be instantiated with functions corresponding to the type descriptors instantiating
the variables. This direction was abandoned because its meta-theoretic approach posed more potential
problems than we would encounter with a more straightforwardly computational model. In particular, the
semantic definition itself generates function definitions, a notion which could become problematic. As
this was a work in progress when abandoned, it is no doubt imperfect.
The notion of well-formed descriptor is the precisely same in this semantic model as it is in the model
finally adopted for the system. While working on this model, we also had a descriptor construction
(*NOT <descriptor>), which represented the negation of the enclosed descriptor.
For any well-formed descriptor, there is an associated Lisp recognizer meta-function. A recognizer
meta-function is very much like a recognizer function, except that, rather than having arity one, its arity is
one plus the number of distinct variables which appear in the descriptor. The actuals supplied for these
variables are the names of recognizer meta-functions which have exactly one argument, i.e., recognizers
corresponding to variable-free type descriptors.
So let us define a generator function for these meta-functions. Given a descriptor, it generates a list of
DEFUNs of meta-recognizer functions, with the first being the top level meta-recognizer for the entire
descriptor. This generator provides the basis for a function-based semantics for the descriptor language.
GEN-RECOGNIZERS takes a type descriptor as input and generates a list of meta-recognizer functions
which, taken together, recognize objects conforming to the descriptor under the binding of type variables
to variable-free meta-recognizers provided in the parameter list.
Here are several examples of calls to GEN-RECOGNIZERS, followed by the list of functions each call
returns. Interleaved are examples of calls to the meta-recognizers on illustrative arguments and the results
to those calls. A hideous thing about reading these results is the names of the functions generated, which
are the string of characters composing the descriptor itself followed by "-RECOGNIZER". The names are
ugly, but are guaranteed to be unique and replicable. The only exception to this convention, which is
unnecessary but is a concession to readability, is the case of recursive meta-recognizers, which take on the
name associated with the *REC form from which they were generated. This concession is safe under the
requirement that identically named *REC descriptors must be identical.
When the descriptor is primitive, we generate a single function with the appropriate calls to primitive Lisp
functions.
(GEN-RECOGNIZERS ’$CHARACTER)
($CHARACTER-RECOGNIZER #\a)
T
When it is a *CONS, we have a CONSP test plus appropriate tests applied to the CAR and CDR.
(GEN-RECOGNIZERS ’(*CONS $INTEGER $NIL))
When the descriptor is a variable, we generate a meta-recognizer with an extra formal parameter
corresponding to the variable. The caller should supply a meta-recognizer function of arity one as the
actual parameter for this formal. This function is then applied to the object occupying the position of the
variable.
(GEN-RECOGNIZERS ’&1)
When the descriptor is a *REC, we generate a recursive function where the recursive call is in the position
corresponding to the *RECUR.
(GEN-RECOGNIZERS
’(*REC INT-LISTP-1 (*OR $NIL (*CONS $INTEGER (*RECUR INT-LISTP-1)))))
(GEN-RECOGNIZERS
’(*OR (*CONS *UNIVERSAL $T)
(*REC TRUE-LISTP-1
Type Checking for Common Lisp 327
A Function-Based Semantics
This example illustrates two things: first, that nested recursive descriptors are handled appropriately, with
their variable parameters composed correctly, and second, that the variable parameter meta-recognizers
are applied uniformly.
(GEN-RECOGNIZERS
’(*CONS &2
(*REC TRUE-LISTP-2
(*OR (*REC TRUE-LISTP-3
(*OR $NIL (*CONS &2 (*RECUR TRUE-LISTP-3))))
(*CONS &1 (*RECUR TRUE-LISTP-2))))))
(|(*CONS &2
(*REC TRUE-LISTP-2
(*OR (*REC TRUE-LISTP-3 (*OR $NIL (*CONS &2 (*RECUR TRUE-LISTP-3))))
(*CONS &1 (*RECUR TRUE-LISTP-2)))))-RECOGNIZER|
’(FOO 1 2 3 FOO BAR)
’$INTEGER-RECOGNIZER
’$NON-T-NIL-SYMBOL-RECOGNIZER)
T
(|(*CONS &2
(*REC TRUE-LISTP-2
(*OR (*REC TRUE-LISTP-3 (*OR $NIL (*CONS &2 (*RECUR TRUE-LISTP-3))))
(*CONS &1 (*RECUR TRUE-LISTP-2)))))-RECOGNIZER|
’(FOO 1 2 3 BAR 4)
’$INTEGER-RECOGNIZER
’$NON-T-NIL-SYMBOL-RECOGNIZER)
NIL
Here are the functions defining the meta-recognizer generator. There is an assumption that the descriptors
are well-formed.
328 Type Checking for Common Lisp
A Function-Based Semantics
(DEFUN GEN-RECOGNIZER-BODY
(DESCRIPTOR BASE REC-DEFUNS RECUR-NAME VAR-MAP)
;; DESCRIPTOR is a type descriptor or some componenent of a type
;; descriptor.
;; BASE is the formal associated with the data object, perhaps in a
;; nest of calls to CAR and CDR.
;; REC-DEFUNS is an alist mapping *REC forms in the original descriptor
;; to the functions which have been generated for them.
;; RECUR-NAME -- If the form presented to GEN-RECOGNIZERS is a *REC
;; form, RECUR-NAME is the name associated with the *REC. NIL
;; otherwise.
;; VAR-MAP is an alist mapping type descriptor variables occurring in
;; the outer form to names which can be used as parameters to the
;; meta-recognizer without confusing the Lisp reader. (&1 is
;; interpreted as a keyword parameter tag with a typo.)
(CASE DESCRIPTOR
(*EMPTY NIL)
(*UNIVERSAL T)
($CHARACTER ‘(CHARACTERP ,BASE))
($INTEGER ‘(INTEGERP ,BASE))
($NIL ‘(NULL ,BASE))
($NON-INTEGER-RATIONAL
‘(IF (RATIONALP ,BASE) (NULL (INTEGERP ,BASE)) NIL))
($NON-T-NIL-SYMBOL ‘(IF (SYMBOLP ,BASE)
(IF (NULL ,BASE)
NIL
(IF (EQUAL ,BASE T) NIL T))
NIL))
($STRING ‘(STRINGP ,BASE))
($T ‘(EQUAL ,BASE T))
(OTHERWISE
(IF (VARIABLE-NAMEP DESCRIPTOR)
(LIST ’APPLY
(CDR (ASSOC DESCRIPTOR VAR-MAP))
(LIST ’LIST BASE))
(CASE (CAR DESCRIPTOR)
Type Checking for Common Lisp 329
A Function-Based Semantics
(*CONS
‘(AND (CONSP ,BASE)
,(GEN-RECOGNIZER-BODY
(CADR DESCRIPTOR) ‘(CAR ,BASE) REC-DEFUNS
RECUR-NAME VAR-MAP)
,(GEN-RECOGNIZER-BODY
(CADDR DESCRIPTOR) ‘(CDR ,BASE) REC-DEFUNS
RECUR-NAME VAR-MAP)))
(*OR (CONS ’OR
(MAPCAR
(FUNCTION
(LAMBDA (DISJUNCT)
(GEN-RECOGNIZER-BODY DISJUNCT
BASE
REC-DEFUNS
RECUR-NAME
VAR-MAP)))
(CDR DESCRIPTOR))))
(*NOT
(LIST ’NOT
(GEN-RECOGNIZER-BODY (CADR DESCRIPTOR)
BASE
REC-DEFUNS
RECUR-NAME
VAR-MAP)))
(*REC
(LET ((REC-DEFN
(CDR (ASSOC DESCRIPTOR REC-DEFUNS :TEST #’EQUAL))))
(CONS (CADR REC-DEFN)
(CONS BASE (CDR (CADDR REC-DEFN))))))
;; Here, we assume the *RECUR label matches the nearest
;; enclosing *REC.
(*RECUR (CONS RECUR-NAME (CONS BASE (ALL-CDRS VAR-MAP))))
(OTHERWISE NIL))))))
Now let us apply this semantics for type descriptors in the setting of the type signatures generated by the
inference algorithm. A simple example will be a good vehicle for communicating the relevant ideas.
Consider a function:
(DEFUN FOO (X)
(DECLARE (XARGS :GUARD (CONSP X)))
(IF (EQUAL (CAR X) 0)
0
(CONS (CAR X) NIL)))
Signature segments:
((*CONS $INTEGER *UNIVERSAL)) -> $INTEGER
((*CONS &5 *UNIVERSAL)) -> (*CONS &5 $NIL)
There are two "segments" in this signature. The first says that if the argument Y to FOO is a CONS
whose CAR is an integer, then the result of evaluating (FOO Y) can be an integer. In the second segment,
&5 is a "type variable". This segment says that if the argument Y is a CONS whose CAR is an object of
some arbitrary type T, then the result can be a CONS whose CAR is an object of type T and whose CDR
is NIL.
Taken as a whole, this signature means that, on any call (FOO Y), where Y is some data object which
satisfies FOO’s guard, there is some segment of this signature and some binding of all type variables
appearing in the segment such that Y satisfies the descriptor on the left hand side of the segment under the
binding and (FOO Y) satisfies the descriptor on the right hand side of that segment under the same
binding.
In the vernacular of meta-recognizer functions, we would say this signature means that, on any call (FOO
Y), where Y is some data object which satisfies FOO’s guard, there is some segment of this signature and
some binding of each type variable appearing in the segment to some unary meta-recognizer function such
that Y satisfies the meta-recognizer function for the descriptor on the left hand side of the segment under
the binding and (FOO Y) satisfies the meta-recognizer for the descriptor on the right hand side of that
segment under the same binding.
For convenience, I will call this interpretation "OR semantics", the intuition being that for at least one, but
not necessarily all, of the segments for which the actual parameters satisfy the left hand side (or parameter
descriptors), the function result will satisfy the right hand side. This differs from "AND semantics",
which would claim that for any segment whose left hand side is satisfied by the actuals, the right hand
side would be satisfied by the result.
Type Checking for Common Lisp 331
A Function-Based Semantics
We can state this interpretation as a collection of lemmas which should be true for the function. The
lemma which characterizes the signature is as follows:
(IMPLIES
(AND (|($CONS *UNIVERSAL *UNIVERSAL)-RECOGNIZER| X) ; from the guard
(META-RECOGNIZER-P VAR-FN-1))
(OR (AND (|(*CONS $INTEGER *UNIVERSAL)-RECOGNIZER| X)
(|$INTEGER-RECOGNIZER| (FOO X)))
(AND (|(*CONS &5 *UNIVERSAL)-RECOGNIZER| X VAR-FN-1)
(|(*CONS &5 $NIL)-RECOGNIZER| (FOO X) VAR-FN-1))))
where:
There is one disjunct in the conclusion of the theorem for each segment in the signature. Only one need
be true, though more than one can be. Each disjunct is just the conjunction of all the predicates for the left
hand side of the segment (In this example, there is only one, because FOO is unary.) with the predicate for
the right hand, or result, side of the segment, with the meta-recognizer function VAR-FN-1 providing the
glue to bind the interpretation of the type variables.
The task of the checker would be to generate and prove these theorems.
There is also an excruciatingly tedious chain of lemmas which capture the notion of the guard being
complete and verified. To get a flavor for these, consider a slightly different function BAR.
(DEFUN BAR (X)
(DECLARE (XARGS :GUARD (IF (CONSP X) (INTEGERP (CAR X)) NIL)))
(IF (EQUAL (CAR X) 0)
(CONS (CAR X) T)
(CONS (CAR X) NIL)))
Signature segments:
((*CONS $INTEGER *UNIVERSAL)) -> (*CONS $INTEGER (*OR $NIL $T))
Note that we have the functions CAR, CONSP, and INTEGERP already resident in our state. Their
signatures are:
For CONSP:
Guards: *UNIVERSAL
Signature segments:
((*CONS *UNIVERSAL *UNIVERSAL)) -> $T
((*OR $CHARACTER $INTEGER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T))
-> $NIL
For CAR:
Guards: (*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL))
Signature segments:
((*CONS &1 *UNIVERSAL)) -> &1
($NIL) -> $NIL
For INTEGERP:
Guards: *UNIVERSAL
Signature segments:
($INTEGER) -> $T
((*OR $CHARACTER $NIL $NON-INTEGER-RATIONAL $NON-T-NIL-SYMBOL
$STRING $T (*CONS *UNIVERSAL *UNIVERSAL)))
-> $NIL
Now consider the verification of guards for functions called within the guard expression itself,
(IF (CONSP X) (INTEGERP (CAR X)) NIL)
We will proceed in evaluation order. To verify the guards for BAR, we would also have to do this
analysis on the body of BAR, but it goes the same way.
On entry, we know nothing of the type of X. We need to verify the guard of CONSP is satisfied.
Type Checking for Common Lisp 333
A Function-Based Semantics
Fortunately, this is easy, since the guard is *UNIVERSAL, i.e., any object is acceptable. We must prove
the lemma:
Guard Lemma 1.
(IMPLIES ($UNIVERSAL-RECOGNIZER X) ; the predicate characterizing X
($UNIVERSAL-RECOGNIZER X)) ; the predicate for CONSP’s guard
where:
Easy. Next we evaluate (CAR X), so we need to check its guard, which is
(*OR $NIL (*CONS *UNIVERSAL *UNIVERSAL))
To do this, we need help from the context. Fortunately, we have it. We have the context provided by the
(CONSP X), which gives us a type-alist entry of
(X . (*CONS *UNIVERSAL *UNIVERSAL))
This is also trivially true. Now for (INTEGERP (CAR X)). The guard descriptor for INTEGERP is
*UNIVERSAL. X is still of type
(*CONS *UNIVERSAL *UNIVERSAL)
The algorithm has determined that (CAR X) under this context has type *UNIVERSAL. Thus, we
generate the lemma:
Guard Lemma 3.
(IMPLIES
(AND (|(*CONS *UNIVERSAL *UNIVERSAL)-RECOGNIZER| X) ; from the context
($UNIVERSAL-RECOGNIZER (CAR X))) ; surmised by the inference tool
($UNIVERSAL-RECOGNIZER (CAR X))) ; required by the guard of INTEGERP
Moving along, NIL requires no guard verification. IF is OK if its three arguments are OK. We are almost
done verifying the guards in the guard of BAR. The "top level" lemma that claims we are OK is:
(AND (VALID-FS FUNCTION-SIGNATURES)
;; COMPLETEP is a predicate which says that the guard is complete
;; and the function calls only functions whose guards are complete.
(COMPLETEP ’CONSP FUNCTION-SIGNATURES)
(COMPLETEP ’CAR FUNCTION-SIGNATURES)
(COMPLETEP ’INTEGERP FUNCTION-SIGNATURES)
<the conjunction of the three lemmas above>)
Also, there is the syntax check that BAR is not called in the guard of BAR, which I will omit.
All this supports the notion that if the world embodied in the initial state consists only functions whose
guard descriptors are "complete", and if we add to that world only functions which we determine are
complete, then we can guarantee that for any new function which calls only complete functions, if the
334 Type Checking for Common Lisp
A Function-Based Semantics
inference tool decides there are no guard violations, then in fact there can be no guard violations as a
result of applying the new function to any arguments which satisfy its guards.
Proof of the soundness of the inference system in the presence of incompletely guarded functions, i.e.,
functions whose guards are not determined by the system to be complete, is beyond the scope of this
exercise. We will confine ourselves to a world in which all functions have guards which are complete and
verified.
Finally, there is the issue of the well-formedness of a signature. If we are to generate a theorem whose
proof represents the soundness of a function signature, we first need to ensure that the signature is
well-formed. This builds fairly directly upon the well-formedness of descriptors. For our purposes, the
signature is composed of the GUARD, the GUARD-COMPLETE flag, the function calls complete
functions flag, and the SEGMENTS. A signature is well-formed if:
1. The arity of the guard is the same as the arity of the function.
2. The guard is a list of well-formed descriptors.
3. The GUARD-COMPLETE flag is T.
4. The function calls only functions whose guards are complete.
5. The left hand side of each segment is a list of well-formed descriptors whose arity is the
same as that of the function.
6. The right hand side of each segment is a well-formed descriptor.
7. All *REC descriptors throughout the signature with the same name must be identical.
Type Checking for Common Lisp 335
A Semantics with Composed Substitutions
Appendix E
A Semantics with Composed Substitutions
This appendix describes an attempt to formulate a semantics for function signatures which was
computational in style and allowed for a notion of substitutions for variables which were not necessarily
ground, i.e., a variable could be replaced by a descriptor containing a variable. The difficulty we had in
formulating this model revealed the painful complexity of a semantics for our function signatures in which
type variables represented types. This presentation, in fact, includes the discovery that one of the notions
being used was flawed. As such, this discussion provides a real, pragmatic motivation for the switch to
value-instantiated variables, a change which was made as the effort to further develop this model began to
deteriorate.
We first imagine three functions, two of which relate ground Lisp values to descriptors and the other
which relates two descriptors.
TCO (TD O A)
(for TYPE-CHECK-OBJECT) determines whether a data object O satisfies a
descriptor TD under some substitution A of variable-free descriptors for
all the variables in the descriptor.
TCOV (TDV OV A)
(for TYPE-CHECK-OBJECT-VECTOR) determines whether a vector of data
objects OV satisfies a vector (of the same arity) of descriptors TDV
under the complete substitution A.
Now imagine that we have a function F for which the inference tool returns the following:
Guard: (GD1..GDn) == GDV (guard descriptors are variable-free)
Segments:
(TD11 .. TDn1) -> TD1
..
(TD1k .. TDnk) -> TDk
where
&iV represents a vector of type variables, one for each formal argument
P is a relation which indicates that all the substitutions in A’
are consistent with those in A. By this we mean (TCTD &j TDji AA’),
i.e., that the substitutions in A’ are such that when applied to
our original assignment of variables to formal and to the descriptor
characterizing the formal, the subset relation holds.
AA’ is the composition of substitutions A and A’.
336 Type Checking for Common Lisp
A Semantics with Composed Substitutions
The purpose of complicated relationships among A, A’, and P is to enable us to universally quantify our
entire signature over A, but yet to extend A to A’ in order to cover variables nested within a segment’s
descriptors which are not in &iV. These extensions are peculiar to each segment, but must be consistent
with A. We were facing the situation where we needed an existential quantifier on the right hand side of
the main implication, a situation which should raise a red flag, and we wanted to use P to qualify our
existentially quantified variable in the strongest way possible.
For example, consider the APPEND function, which might have the signature:
Guard: ((*rec tl (*or $nil (*cons *universal (*recur tl))))
*universal)
Segments:
($nil &2) -> &2
((*cons &3 (*rec tl (*or $nil (*cons *universal (*recur tl))))) &2)
-> (*cons &3 (*rec app (*or &2 (*cons *universal (*recur tl)))))
We choose &iV = (&1 &2) and let XV = (X Y), and thus we have the following:
For all substitutions A grounding &1 and &2 and all values x and y,
(TCOV (&1 &2) (x y) A)
and
(TCOV ((*rec tl (*or $nil (*cons *universal (*recur tl)))) *universal)
(x y)
nil)
->
(TCOV ($nil &2 &2) (x y (APPEND x y)) A)
or
there exists a substitution A’ grounding &3 such that
(TCTD &1
(*cons &3 (*rec tl (*or $nil (*cons *universal (*recur tl)))))
AA’)
and
(TCOV ((*cons &3 (*rec tl (*or $nil (*cons *universal (*recur tl)))))
&2
(*cons &3 (*rec app (*or &2 (*cons *universal (*recur tl))))))
(x y (append x y))
AA’)
Now for example, suppose X has the value (CONS 2 (CONS 3 NIL)) and Y = T. Consider the trivial
A = ((&1 . *UNIVERSAL) (&2 . *UNIVERSAL)). Clearly,
(TCOV (&1 &2) (x y) A)
The first segment does not apply to these values of x and y, so the second one must. Let A’ be
((&3 . $INTEGER)). We can see that
(TCTD &1
(*cons &3 (*rec tl (*or $nil (*cons *universal (*recur tl)))))
AA’)
(*cons &3 (*rec app (*or &2 (*cons *universal (*recur tl))))))
(x y (append x y))
AA’)
which becomes clearer if we apply the substitutions and fill in the values:
(TCOV ((*cons $integer
(*rec tl (*or $nil (*cons *universal (*recur tl)))))
*universal
(*cons $integer
(*rec app
(*or *universal (*cons *universal (*recur tl))))))
((2 3) T (2 3 . t))
nil)
does not hold. But do need the TCTD at all? Why not streamline the specification as follows?
For all substitutions A grounding &1 and &2 and all values x and y,
(TCOV (&1 &2) (x y) A)
and
(TCOV ((*rec tl (*or $nil (*cons *universal (*recur tl)))) *universal)
(x y)
nil)
->
(TCOV ($nil &2 &2) (x y (APPEND x y)) A)
or
there exists a substitution A’ grounding &3 such that
(TCOV ((*cons &3 (*rec tl (*or $nil (*cons *universal (*recur tl)))))
&2
(*cons &3 (*rec app (*or &2 (*cons *universal (*recur tl))))))
(x y (append x y))
AA’)
where
&iV represents a vector of type variables, one for each formal argument
AA’ is the composition of substitutions A and A’.
338 Type Checking for Common Lisp
Thoughts on the Termination of DUNIFY-DESCRIPTORS
Type Checking for Common Lisp 339
Thoughts on the Termination of DUNIFY-DESCRIPTORS
Appendix F
Thoughts on the Termination of DUNIFY-DESCRIPTORS
Our termination argument for DUNIFY-DESCRIPTORS is ridiculously difficult. There are no fewer than
four separate factors which contribute to the composite count for the function. None of these factors are
simple, and we have not worked them out fully. But here is an overview of them, why they are necessary
components, where they come into play, how they relate to one another, and generally why we think they
work.
1) The size of the two descriptor arguments. This is the most commonly decreased measure, and the most
obvious choice. For instance, we decrease the size when we case split on the disjuncts of an *OR, or
when we take the first args of a pair of *CONS forms. It is not a straightforward measure, however,
because of complications introduced in several of the *REC rules. Here, for example (from Rule *Rec5),
where there are no variables anywhere in the descriptors, is the rule:
(dunify-descriptors (*rec foo (*or d1 (*cons d2 (*recur foo))))
(*rec bar (*or d3 (*cons d4 (*recur bar))))
s)
=
((*rec bim (*or d13 d14bar d32foo (*cons d24 (*recur bim)))) . s)
When we compute d14bar, we recur with d1, which is definitely smaller than foo, but the other argument
is the *CONS disjunct of bar, with the recursion opened, so the size of second argument is definitely
larger. If the size measure were simply the number of tokens involved, we could lose, because d24 may
be larger than foo. Therefore, we would need to give some special premium to the fact that we liquidated
the recursive form foo. But even the number of recursive forms is not good enough, because there may be
multiple recursive forms in d24, and we could even end up with more than we started. So, it appears we
would need to assign a weight to *RECs which nest other *RECs, or even worse (as seems to be required
in this case), to *RECs being unified with other *RECs nesting *RECs. One possible way out is to use
the TERM-RECS measure discussed below, which seems reasonable given that we open a *REC.
Unfortunately, the TERM-RECS mechanism is not employed in the *REC rules as implemented. It would
not be hard to do, though.
2) The TERM-RECS measure. This currently comes into play only once, though it may well be the key to
reducing the complexity in some other quarters as well, the above-mentioned being only one. The
TERM-RECS mechanism is used in DUNIFY-REC-WITH-NON-REC to ensure that we do not get caught
in overlapping recursions. For example:
(dunify-descriptors
’(*rec foo (*or $nil (*cons *universal
(*cons *universal (*recur foo)))))
’(*cons *universal
(*rec foo (*or $nil (*cons *universal
(*cons *universal (*recur foo))))))
nil nil)
would open up the first *REC, then on recursion open up the second, then the first, etc. But DUNIFY-
DESCRIPTORS maintains a stack of pairs of operands, so that when it is recursively called with a pair
340 Type Checking for Common Lisp
Thoughts on the Termination of DUNIFY-DESCRIPTORS
which it has already seen, it can recognize that it is in an endless recursive chain and return a default
result. It is possible that a TERM-RECS count could alleviate the problem with the size count, and even
eliminate the var-equalities count, to be discussed. But there is a technical problem. Just what is the
count corresponding to the TERM-RECS mechanism? Counts must be monotonically decreasing and
always non-negative, and the TERM-REC stack grows. So the length of the stack must be subtracted
from something, but what is that something? I would guess it to be something like the number of
descriptor pairs which could be generated by pairwise combining each of the *REC descriptors in the
problem with all descriptors in the full disjunction of both arguments, wherein *REC descriptors are
opened once and then treated atomically. Thus, there would only be finitely many pairs, and once we
generated them all and put them into TERM-RECS, the next time occasion we had to add one, we would
certainly find it already present, which would force termination. Incidentally, I believe the TERM-RECS
count would have to be the most dominant count, in that when it is used (i.e., when a *REC is opened on
recursion), the size measure is clearly increased, and in doing so we might also increase the variable
closure measure yet to be discussed (depending on how it is defined).
3) The variable equalities measure is necessary because of a single case where a variable is being dunified
with a *REC in which some possible disjunct is another variable. Examples:
(dunify-descriptors &i (*rec foo (*or &j .. (*cons .. (*recur foo)))) s)
(dunify-descriptors
&i (*rec foo (*or (*rec bar (*or &j .. (*cons .. (*recur bar))))
.. (*cons .. (*recur foo))))
s)
Here, we case split, recurring on &i and &j in one case, and on &i and the other possibilities in the other.
The reason for this has to do with wanting to prevent substitutions of the form
(&i . (*rec foo (*or &j .. (*cons .. (*recur foo)))))
where i < j, thus ensuring that, even in the case of narrowing a binding, all our substitutions are downward
directed on the lexical ordering of variable names. This is important to our method for avoiding loops.
The problem, of course, is that when we recur in the second case, the second argument is larger in size,
there may be replicated variables, (there are certainly no fewer) and so no natural variable measure is
decreasing, and nothing else decreases either. But what we ARE eliminating is the possibility that the
binding of one variable is another, and if we treat the number of instances where this possibility is evident
in the descriptors as a count, then we would in fact be reducing it. Again, the difficulty is to say exactly
how to compute this count, and again, the trick is possibly to treat *REC descriptors specially and do an
exhaustive counting as follows. If we unify two different variables, the count is 1. If we unify with a
term not containing a variable, the count is 0. If we unify something with an *OR, the count is the sum of
the counts of unifying with the disjuncts. If we unify with two *CONSes, the count is the sum of the
counts of the CARs and the CDRs. If we unify a non-*CONS (other than a *REC) with a *CONS, the
count is 0. If we unify a variable with a *REC, the count is the number of possible occurrences of a
different var as a disjunct of the *REC. If we unify a non-variable with a *REC, the count is the count
from recurring with the opened *REC, but subject to the TERM-RECS termination mechanism for
opening *RECs. If we unify two *RECs, the count is 0 (we do not need to use the count from their
exposed guts). Thus, the count never increases, and we reduce it in two cases. The first, when we unify
two variables, is a terminal case. The other is the case for which we need it. So for example, when we
unify
(*CONS &1 &2)
and
Type Checking for Common Lisp 341
Thoughts on the Termination of DUNIFY-DESCRIPTORS
(*OR &3 (*CONS &4 (*REC FOO (*OR &5 (*CONS *UNIVERSAL (*RECUR FOO))))))
the count is 2, one for the pairing of &1 and &4, and the other for the pairing of &2 with &5. What seems
odd is that, on the one case which matters, we had to inject the TERM-RECS mechanism to keep the
counter from running away. This seems to suggest we could substitute the TERM-RECS count, since it is
maximally significant.
4) Finally, we need some measure on the variables in our problem, which I will call the variable closure
measure. I do not know just what this measure should be. In the termination argument for classical
unification, this is just the number of unbound variables in the terms, under the binding list. Thus, the
measure goes down every time a variable is bound. But, for our algorithm, we need a measure which will
decrease whenever we proceed from unifying a variable with some descriptor to recursively unifying the
descriptor with the previous binding of a variable in the substitution list. We know that the variable will
not appear in either term, nor can there be a loop through the binding list back to the variable. But a
couple of examples will illustrate the difficulties of finding the right measure. First, an unbound variables
measure will not do. Consider:
(dunify-descriptors
’&4 ’(*cons &2 $nil) ((&4 . (*cons &3 $nil)) (&2 . &1) (&1 . $nil)))
The recursion, then, is to unify (*CONS &2 $NIL) with (*CONS &3 $NIL). In taking this step, we are
not reducing the number of unbound variables in our problem, since &4 was already bound. We could be
said to be reducing the length of the substitution chains down to unbound variables, since we are knocking
&4 out of consideration. But neither the length of such chains, nor the sum of their lengths, nor the
maximum of their lengths, nor any other measure I can think of, is strictly monitonically decreasing. In
the previous example, we can see that the unification will bind &3 to &2, thus linking to the chain down
to &1, which is longer chain than what we started with. Furthermore, we are not always taking the
variable whose binding we are considering out of future consideration. Consider:
(dunify-descriptors
’&2
’(*cons &3 $integer)
’((&2 . &1)
(&1 . (*rec bar (*or $integer (*cons $integer (*recur bar)))))
(&3 . &2)))
This results in unifying (*CONS &3 $INTEGER) with &1, then recusively unifying
(*CONS &3 $INTEGER) with
(*REC BAR (*OR $INTEGER (*CONS $INTEGER (*RECUR BAR))))
&3 with $INTEGER, &2 with $INTEGER, and &1 with $INTEGER. Thus, we revisited both &2 and
&1, albeit with shorter arguments. (Only upon emerging from this recursion do we discover we have an
ill-formed substitution caused by the inconsistent bindings of &1, but this does not bear on the termination
question.) So all I can say is that I believe there is a measure which goes down when we unify a new
binding for a variable with its old one, but I have not found it yet. It is probable that this measure is itself
a composite, perhaps involving the number of *REC forms in the substitution list, but that is speculation.
This speculation lends itself to the further speculation that TERM-RECS might play yet another role.
Assuming these four measures exist, however, and depending on their final definitions, I would probably
claim that the TERM-RECS is the most dominant, followed by the variable equalities and the variable
closure measures, and finally the term size, and that none increases except in cases where some more
dominant measure decreased, and that at least one decreases in all cases, as noted in the source code.
342 Type Checking for Common Lisp
Selected Functions from the Implementation
Type Checking for Common Lisp 343
Selected Functions from the Implementation
Appendix G
Selected Functions from the Implementation
This appendix contains the definitions of a few selected functions from the system implementation which
seem worthy of inclusion in this document. All the code in the implementation is available via
anonymous ftp from ftp.cli.com. Appendix H provides instructions on how to retrieve the code and other
information relevant to this thesis.
G.1 WELL-FORMED-DESCRIPTOR
;; --------------------------------------------------------------------
;; WELL-FORMED-DESCRIPTOR
;; --------------------------------------------------------------------
(CADR DESCRIPTOR))
(NULL (CDDDR DESCRIPTOR))))
(*RECUR
(AND (CONSP (CDR DESCRIPTOR))
(EQUAL (CADR DESCRIPTOR) NEAREST-REC)
(NULL (CDDR DESCRIPTOR))
(AND INSIDE-OR INSIDE-CONS)))
(OTHERWISE NIL))))))
G.2 WELL-FORMED-SIGNATURE
;; ----------------------------------------------------------------------
;; WELL-FORMED-SIGNATURE
;; ----------------------------------------------------------------------
G.3 WELL-FORMED-STATE-ASSUMING-ARITY
;; ----------------------------------------------------------------------
;; WELL-FORMEDNESS OF THE STATE
;; ----------------------------------------------------------------------
;; ----------------------------------------------------------------------
;; Well-formedness predicates for the arguments to E
;; ----------------------------------------------------------------------
;; ----------------------------------------------------------------------
;; Well-formedness predicates for the arguments to INTERP-SIMPLE
;; ----------------------------------------------------------------------
G.6 TC-PREPASS
;; ----------------------------------------------------------------------
;; TC-PREPASS
;; ----------------------------------------------------------------------
(TC-ONLY-T-NIL-RESULTS-FN-1
(CDR (SIGNATURE-SEGMENTS SIGNATURE))))))
G.7 DESCRIPTOR-FROM-QUOTE
G.8 WELL-FORMED-SUBSTS
;; ----------------------------------------------------------------------
;; The predicates for well-formed substitutions
;; ----------------------------------------------------------------------
G.9 DMERGE-NEW-SUBST
VARIABLE
(DUNIFIED-FORM-DESCRIPTOR DUNIFIED-FORM)
SECOND-ORDER-SUBSTS)))))
(DEFUN DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST
(DIFFERENT-SUBST-VARIABLE DUNIFIED-FORM SUBSTS-LIST)
(IF (NULL SUBSTS-LIST)
NIL
(APPEND (DMERGE-NEW-SUBST
DIFFERENT-SUBST-VARIABLE
(CDR (ASSOC DIFFERENT-SUBST-VARIABLE
(DUNIFIED-FORM-SUBSTS DUNIFIED-FORM)))
(CAR SUBSTS-LIST))
(DMERGE-DIFFERENT-SUBST-INTO-SUBSTS-LIST
DIFFERENT-SUBST-VARIABLE
DUNIFIED-FORM
(CDR SUBSTS-LIST)))))
(DEFUN DREPLACE-SUBST-IN-SECOND-ORDER-SUBSTS
(VARIABLE DESCRIPTOR SECOND-ORDER-SUBSTS)
(IF (NULL SECOND-ORDER-SUBSTS)
NIL
(CONS
(DREPLACE-SUBST VARIABLE DESCRIPTOR (CAR SECOND-ORDER-SUBSTS))
(DREPLACE-SUBST-IN-SECOND-ORDER-SUBSTS
VARIABLE DESCRIPTOR (CDR SECOND-ORDER-SUBSTS)))))
G.10 SCREEN-VAR-FROM-DESCRIPTOR
(DEFUN DUNIFY-VAR-WITH-CONTAINING-DESCRIPTOR
(VARIABLE DESCRIPTOR SUBSTS TERM-RECS)
;; VARIABLE should be a type variable.
;; DESCRIPTOR should be a well-formed descriptor.
;; SUBSTS should be an alist whose keys are type variables and
;; whose associated values are well-formed descriptors.
;; TERM-RECS should be a list of pairs of well-formed descriptors.
;;
;; This function implements the case of dunify-descriptors
;; corresponding to Case 5.3.2 of the proof.
(LET ((SCREENED-DESCRIPTOR
(SCREEN-VAR-FROM-DESCRIPTOR VARIABLE DESCRIPTOR)))
(IF (NOT (EQUAL SCREENED-DESCRIPTOR ’*FAIL-SCREEN))
;; Termination: Down on size
(DUNIFY-DESCRIPTORS
VARIABLE SCREENED-DESCRIPTOR SUBSTS TERM-RECS)
NIL)))
(LIST ’*REC
REC-NAME
(SUBST REC-NAME
(CADR DESCRIPTOR)
BODY))))
DESCRIPTOR))
(IF (RECUR-DESCRIPTORP DESCRIPTOR)
DESCRIPTOR
(INTERNAL-ERROR
"SCREEN-VAR-FROM-DESCRIPTOR -- Unexpected descriptor"
DESCRIPTOR))))))))
Appendix H
Supplementary Material Available Via FTP
This appendix provides a complete listing of files supplementary to this report which are available via ftp
from server ftp.cli.com and includes instructions on how to retrieve the files.
The following log illustrates the retrieval of these files from the ftp server.
axiom:akers[15]% ftp ftp.cli.com
Connected to jingles.
220 jingles FTP server (Version 5.90 Sat Apr 17 13:50:05 CDT 1993) ready.
Name (ftp.cli.com:akers): anonymous
331 Guest login ok, send ident as password.
Password:[email protected]
...
ftp> disconnect
221 Goodbye.
ftp> bye
axiom:akers[16]%
Bibliography
MAKE-BINDING-FROM-MAPPING OTHER-SUBSTS-DIFFERENT
definition 295 definition 352
MAKE-CONS-UNIFIED-FORM 44 Overloading 15
MAP-E
definition 82 P
MAP-VARS-TO-GOOD-NAMES definition in set-based semantics 318
definition 329 in composed substitution semantics 335
Mapping Parameteric polymorphism 3
grammar 141, 142 Parametric polymorphism 11
Mapping list 143 Partial polymorphic type inference 13
MAPPINGS-DEMONSTRATE-CONTAINMENT PCANONICALIZE-CONS-DESCRIPTOR 280,
136, 137, 146, 174 281
Maximal segment 104, 106, 110, 113, 116, 199 PCANONICALIZE-DESCRIPTOR 131, 148, 172
MEMBER 193 discussion 131
MERGE-BINDINGS PCANONICALIZE-DESCRIPTOR-1 132, 172
definition 250 PCANONICALIZE-DESCRIPTOR-OK 148
MERGE-BINDINGS* 235 definition 132, 172
definition 250 proof 172
MERGE-BINDINGS-EXTENDS-BINDINGS use 171
definition 250 PCANONICALIZE-OR-DESCRIPTOR 277, 278,
proof 250 280
use 235, 241 PCONSOLIDATE-UNIVERSAL-DESCRIPTORS
MERGE-BINDINGS. 235 172, 276
MERGEABLE-BINDINGS PFOLD-RECS-IF-POSSIBLE 172
definition 250 PMERGE-OR-DESCRIPTOR-CONSES 278, 279
Meta-function Polymorphic function 1
in function-based semantics 325 Polymorphic language 1
Minimal segment 104, 106, 110, 113, 116, 199 Polymorphic type 1
ML 3, 11 PREAL-CANONICALIZE-DESCRIPTOR 172
ML type inference PREAL-PCANONICALIZE-DESCRIPTOR, 132
complexity 13 PREMOVE-RECURSIVE-EXPANSION-
Modes 16 DUPLICATES 278
Monotonicity PREPASS 36, 74
in set-based semantics 319 discussion 50
Multiple-value functions 202 Principal type scheme 12
Mutually recursive functions 202 Proof
overview 30
NEW-FUNCTION 180 PROPER-CONSP
NO-REPLICATING-VARS test suite example 187
definition 344
NO-REPLICATING-VARS-1 Quote
definition 344 notation convention 28
Notation convention
character case 28 REC-TAIL 141
quote 28 Recognizer 6, 17, 51, 153, 223
NTH 201 definition 92
overview 24
OPEN-REC-DESCRIPTOR 48 Recognizer meta-function
OPEN-REC-DESCRIPTOR-ABSOLUTE 124, 255, in function-based semantics 325
260, 277, 288, 297, 303 Recognizer validation 102
definition 309 RECOGNIZER-SEGMENTS-COMPLETE 92
Optimizing compilers 19 definition 154, 223
OR semantics 93 proof 223
in function-based semantics 330 use 155
OR versus AND semantics 322 RECOGNIZERP
ORDERED-IF-VAR descriptor 55
definition 350 Recursive types 11, 17
ORIG-CADR Referential transparency 3
test suite example 190 RENAMING-PRESERVES-I
Type Checking for Common Lisp 373
Index
SAD-BUT-TRUE-LISTP TC-ALL-CALLED-FUNCTIONS-COMPLETE 28
inference algorithm example 73 TC-INFER 101, 226, 246
SAME-ROOT discussion 104, 106
definition 175 formal specification 105
SATISFIES 84 handling of function call, discussion 107
SATISFIES-1 84 handling of function call, example 111
Screen-var-from-descriptor 258 TC-INFER-EXAMPLE 112
definition 353 TC-INFER-OK 167
SCREEN-VAR-FROM-DESCRIPTOR-LIST definition 105, 167, 226
definition 354 proof 227
Screening 124 use 157, 163
Second order typed lambda calculus 12 TC-INFER-SIGNATURE 149, 223
Segment 6, 23 TC-INFER-SIGNATURE-GUARD-OK
Segment containment failure definition 156
in test suite 183 discussion 155
Segments 5 proof 156
Semantics of signatures use 162
function-based semantics 330 TC-MAKE-ARG-CROSS-PRODUCT 108, 114,
in set-based semantics 322 239, 246
Semantics with Composed Substitutions 95 TC-MAKE-ARG-CROSS-PRODUCT-OK
discussion 335 definition 245
Semibindings 146 proof 246
Set-based semantics 95 use 239
discussion 317 TC-MATCH-ACTUAL-HALF-SEG-TO-FORMAL-
Signature 22 SEG 110
All called functions complete flag 91 TC-ONLY-T-NIL-RESULTS-FN
formal semantics 88 definition 348
Guard complete flag 91 TC-ONLY-T-NIL-RESULTS-FN-1
Recognizer descriptor flag 92 definition 349
Signature is certified sound flag 92 TC-PREPASS 214
TC All called functions complete flag 91 definition 347
TC Guard complete flag 91 discussion 100
TC Guard Replaced by Tool Guard flag 92 TC-PREPASS-IF-PRED 214
TC segments contained in Segments flag 92 definition 348
TC validates recognizer flag 92 TC-PREPASS-IF-PRED-OK
Signature components, misc. definition 214
discussion 91 proof 218
Signatures use 216, 217, 218
well-formedness 81 TC-PREPASS-OK 101
Simple mapping definition 214
grammar 141 proof 214
SOLVE-EQUATIONS 25, 74 proof discussion 213
discussion 59 use 157, 164, 221, 222
STANDARD-CHAR-LISTP TC-SIGNATURE 30
test suite example 193 discussion 100
Static type checking 1 overview 28
STEP-VARS-IN-FORM TC-SIGNATURE-OK 103, 149
definition 350 definition 30, 97, 151
Strongly typed language 1 proof 159
Subr 21, 102 proof intuition 149
374 Type Checking for Common Lisp
Index
definition 344
Working segments 68
World
definition 27
parameter to E 81
Table of Contents
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Static Type Checking and Common Lisp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. A Comparison of ML-style Languages and Functional Common Lisp . . . . . . . . . 3
1.3. Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1. The Lisp Dialect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2. Type Checking and Function Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3. Recognizer Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4. The Inference Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5. The Formal Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6. The Signature Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6.1. The Proof of Soundness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7. Two Simple Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
i
Chapter 5. The Formal Semantic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1. A Discussion of the Style of Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2. Type Descriptors and Their Well-Formedness . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3. Well Formedness of Function Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4. A Simple Lisp Evaluator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5. The Formal Semantics of Type Descriptors and Signatures . . . . . . . . . . . . . . . . . 84
5.6. Other Components of the Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.7. Alternate Semantic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7.1. OR Semantics versus AND Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.7.2. A More Constructive Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.7.3. Other Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
ii
Chapter 8. Testing the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.1. The Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
8.2. Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
8.3. Selected Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
iii
B.8. Proof of Lemma CONTAINED-IN-OK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
B-E. CONTAINED-IN-OK, Top Level Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
B-F. The CONTAINED-IN *REC Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
B.9. ICONTAINED-IN-OK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
B-G. Proof of Lemma ICONTAINED-IN-OK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
B-H. Proof of Lemma ICONTAINED-IN-EQUAL-TDS . . . . . . . . . . . . . . . . . . . . . . . . . . 301
B-I. The ICONTAINED-IN *REC Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
B.10. Proof of Lemma UNIVERSALIZE-SINGLETON-VARS-1-OK . . . . . . . . . . . . . . . . . 313
iv
List of Figures
v
List of Tables
vi