0% found this document useful (0 votes)
23 views16 pages

The Analysis of Proximities Multidimensional Scaling With An Unknown Distance

The document discusses a computer program designed to reconstruct the metric configuration of points in Euclidean space based on nonmetric proximity information. It introduces the concept of analyzing proximities, which encompasses measures of psychological similarity and association, and aims to determine the minimum dimensionality and coordinates for the points while establishing a monotonic relationship between proximity measures and distances. The proposed method leverages digital computing to achieve a unique quantitative solution without assuming a specific form for the distance function.

Uploaded by

clpxqf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

The Analysis of Proximities Multidimensional Scaling With An Unknown Distance

The document discusses a computer program designed to reconstruct the metric configuration of points in Euclidean space based on nonmetric proximity information. It introduces the concept of analyzing proximities, which encompasses measures of psychological similarity and association, and aims to determine the minimum dimensionality and coordinates for the points while establishing a monotonic relationship between proximity measures and distances. The proposed method leverages digital computing to achieve a unique quantitative solution without assuming a specific form for the distance function.

Uploaded by

clpxqf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

PSYCHO:METRIKA--¥OL.27, No.

2
JV~E, 1962

T H E ANALYSIS OF P R O X I M I T I E S : M U L T I D I M E N S I O N A L
S C A L I N G W I T H A N U N K N O W N D I S T A N C E F U N C T I O N . I.

~:~OGER N . SHEPARD

BELL TELEPHONE LABORATORIES

A computer program is described that is designed to reconstruct the


metric configuration of a set of points in Euclidean space on the basis of
essentially nonmetrie information about that configuration. A minimum
set of Cartesian coordinates for the points is determined when the only
available information specifies for each pair of those points--not the distance
between them--but some unknown, fixed monotonic function of that
distance. The program is proposed as a tool for reductively analyzing several
types of psychological data, particularly measures of interstimulus similarity
or confusability, by making explicit the multidimensional structure under-
lying such data.

Empirical procedures of several diverse kinds have this in common:


they start with a fixed set of entities and determine, for every pair of these,
a number reflecting how closely the two entities are related psychologically.
The nature of the psychological relation depends upon the nature of the
entities. If the entities are all stimuli or all responses, we are inclined to
think of the relation as one of similarity. A somewhat more objective (though
less intuitive) characterization of such a relation, perhaps, is that of sub-
stitutabitity. The statement that stimulus A is more similar to B than to C,
for example, could be interpreted to say that the psychological (or behavioral)
consequences are greater when C, rather than B, is substituted for A. From
this standpoint a natural procedure for determining similarities of stimuli
or responses is by recording substitution errors during identification learning
[2, 7, 12, 14, 17, 18]. In addition, though, disjunctive reaction time and
sorting time have also been proposed as measures of psychological similarity
[20]. Finally, of course, individuals have sometimes been instructed simply
to rate each pair of stimuli, directly, on a scale of apparent similarity [1, 6].
The notion of similarity is not necessarily restricted to stimuli or responses
(in the narrow sense of these words), however. Serviceable measures of
similarity may also be found for concepts, attitudes, personality structures,
or even social institutions, political systems, and the like.
With some classes of entities, the notions of substitutability and, hence,
similarity seem inappropriate. For example, data have been collected on
the number of times each pair of persons in an isolated group communicates
or otherwise associates with each other [3]. A number of this kind might be
125
126 PSYCHOMETRIKA

interpreted as a measure of the degree of association or mutual choice be-


tween two persons. It would not, however, be taken as a measure of their
similarity. Experiments using word-association or free-recall methods furnish
another example [16]. Although the word "butter" is frequently produced
as an association to the word "bread," we are inclined to attribute this fact
to an associative connection between the two words but not, again, to any
high degree of similarity between these words or their referents.
Since the method of analysis to be described here may have some ap-
plication to measures of association as well as to measures of similarity or
substitutability, a generic term is needed to cover all of these various types
of measures. Fortunately, there is one notion (albeit a rather abstract one)
that does seem to run through all of these more specific concepts: that is
the notion of psychological nearness, closeness, or degree of proximity. Thus
one of ~wo similar colors is said to be very near to the other, or two words
are said to be closely associated. For all sets of data of this general kind,
the number representing the closeness of the relation between a pair of
entities will be called the proximity measure for that pair. The method of anal-
ysis for such numbers will accordingly be termed the analysis of proximities.
The notion of nearness or proximity, which is objectively defined only
for pairs of objects in physical space, tends to be carried over to very different
situations where the space in which entities can be closer together or further
apart is not at all evident. As one possible basis for this extension of usage,
Shepard [20] has argued that there is a rough isomorphism between the
constraints that seem to govern all of these measures of similarity or associa-
tion, on the one hand, and the metric axioms (whicb formalize some of
the most fundamental properties of physical space), on the other. In par-
ticular, to the metric requirement that distance be symmetric, there is the
corresponding intuition that if A is near B then B is also near A. To the
metric requirement that the length of one side of a triangle cannot exceed
the sum of the other two, there is the corresponding intuition that, if A is
close to B and B to C, then A must be at least, moderately close to C. The
insertion of words like "very" and "moderately" attests to the loss of pre-
cision entailed by a shift from the operationally defined concept of distance
to the intuitively defined notion of proximity. But this in turn invites an
attempt to carry the powerful quantitative machinery that has developed
around the concept of distance over into the qualitative area, where one has
been able to speak only of nearness or proximity. An attempt of just this
kind forms the subject matter of this paper.
If successful, such an attempt might advance us appreciably towards
the solution of a problem that immediately confronts anyone who collects
data of the proximity type; namely, the problem of the reduction of such
data. The following example may help to indicate the magnitude of the
reduction problem. There has been both theoretical and practical interest
ROGER N. S H E P A R D 127

in the factors that determine confusions between morse code signals [10, 12,
13, 14, 15]. But, since there are 36 different signals to be investigated, at
the very outset one is faced with the bewildering task of trying to find some
sort of pattern or structure in the N ( N - 1)/2 or 630 different similarities
between these signals. Once some underlying pattern is found, though, the
way would thereby be opened for the systematic exploration of how this
pattern depends upon the physical properties of the stimuli, the stage of
learning, prior training, the individual learner, and so on.
But the proposition has already been tendered that similarity is inter-
pretable as a relation of proximity; this immediately suggests that the struc-
ture we are seeking in data of this kind is a spatial structure. Certainly we
usually think that a greater degree of proximity implies a smaller distance of
separation. If some monotonic transformation of the proximity measures
could be found that would convert these implicit distances into explicit
distances, then we should be in a position to recover the spatial structure
contained only latently in the original data. Indeed, methods have already
been developed whereby, given an explicit set of distances of this kind,
one can map the stimuli, as points, into a Euclidean space (see Torgerson
[21], pp. 254-259). Such a mapping constitutes a true reduction of the data
since the originally bewildering array of proximity measures can then be
reconstructed trom a relatively much smaller set of coordinates for the
resulting points in Euclidean space.
Clearly, the success of such an undertaking depends upon the selection
of the proper distance function; that is, the function that will transforn~ the
proximity measures into Euclidean distances. Sometimes there is some in-
dependent information about the shape of this function. For example, Shepard
[19] has adduced evidence for an approximately exponential relation between
substitution errors during identification learning and the psychological dis-
tance between the stimuli or between the responses. Thus the distance function
should presumably be logarithmic in form for this special case. Unfortunately,
though, there is no reason to suppose that proximity measures obtained under
quite different conditions wilI also be exponentially related to distance. Even
in identification learning this relation apparently departs from the simple
exponential as soon as feedback as to the correctness of the responses is
curtailed [19]. As a matter of fact, in some cases we are primarily interested
in discovering the shape of this unknown function, rather than the spatial
configuration of the points.
Furthermore, even when the shape of the distance function is already
known, there is this remaining difficulty: since proximity measures are usually
bounded from below (typically by zero), they necessarily level off as distance
becomes very large. The consequent nonlinearity of the distance function
can magnify seemingly negligible statistical fluctuations of small proximity
measures into wild swings of the corresponding estimates for the large dis-
128 PSYCHOMETRIKA

tances. Because of this, stable configurations have been achieved in practice


only by resorting to laborious smoothing procedures of a rather inelegant
and ad hoe nature ([17], p. 341).

Basic Ideas ]or a N e w Approach


A quite different approach to this whole problem is described here--an
approach that has become feasible, only recently, with the advent of digital
computers of sufficient speed and capacity. In this approach, no assumption
is made about the form of the distance function; the only requirement is that
it be monotonic. The objectives of the analysis are to determine three things:
(i) the minimum number of dimensions of the Euclidean space required such
that the distances in this space are monotonically related to the initially
given proximity measures, (ii) an actual set of orthogonal coordinates for
the points in this minimum space, and (iii) a plot showing the true shape of
the initially unknown function relating proximity to distance. It is a sur-
prising outcome of the present research that the two conditions of mono-
tonicity and minimum dimensionality (which seem nonmetrie or qualitative
in nature) are generally sufficient to lead to a unique and quantitative solution.

Achievement of Monotonicity
The first of the two fundamental conditions to be imposed on the solution,
that of monotonicity, states that the rank order of the N ( N - 1)/2 dis-
tances between the N points should be the inverse of the rank order of the
N(N - 1)/2 proximity measures (at least to a sufficient degree of approxi-
-

mation). Thus the two points corresponding to the pair of entities with the
largest proximity measure should end up with the smallest (or very nearly
the smallest) distance of separation in Euclidean space. The device used to
secure this condition of monotonicity is basically quite simple. First, any
prescribed rank order of the distances whatever (including ties) can be realized
so long as we are free to arrange the N points in a space of at least N - 1
dimensions. Suppose, then, that ~ve have some particular arrangement of
the points in an (N - 1)-dimensional space. If this particular configuration
does not meet the requirement of monotonicity, then we should be able to
find a better configuration simply by moving the points in such a way as
to stretch those distances that are too small and compress those distances
that are too large.
Stated more completely, the proposed procedure is as follows. First,
from the N ( N -- 1) coordinates specifying the initial configuration of points,
the N ( N -- 1)/2 Euclidean distances between these points are computed.
The computed distances are then rank ordered from smallest to largest, and
the resulting ranking is compared with the ranking already obtained for
the proximity measures. From each point N -- 1 vectors are then constructed
in such a way that each vector falls along the line connecting that point with
ROGER N . S H E P A R D 129

one of the N - 1 other points. The vector is directed towards or away from
the other point depending upon whether the rank of the distance between
those two points is too large or too small (as determined from the rank of
their proximity measure). The length of the vector is determined by the
size of this discrepancy in rank. Thus the N - 1 vectors issuing from a point
can be interpreted as N - 1 forces that are tending to pull that point towards
those other points that are too distant and away from those other points
that are too close.
Once the set of N - 1 vectors has been determined for all N points,
these points are simultaneously displaced in their (N - 1)-dimensional space.
The vector governing the displacement of each point is computed as the
vector sum (or resultant) of the N -- 1 individual vectors attached to that
point. Since the displacement of any one point is a compromise between
N - 1 separate tendencies, there is no guarantee that the simultaneous
displacement of all N points will immediately instate the desired condition
of inverse ranking. But there is no need to insist that the final solution be
achieved in a single stroke. Rather, the procedure can be iterated, each time
on the basis of the successively smaller residue of departures from monoton-
icity remaining from all preceding iterations.
In order to get the iterative process under way, some starting coordinates
must be supplied. Fortunately coordinates can always be found in N - 1
dimensions such that all N ( N - 1)/2 interpoint distances are equal. This
insures that no bias will be introduced into the final solution. The points
having this property coincide with the vertices of a regular simplex. (This
configuration is simply the generalization, to higher spaces, of the equilateral
triangle in two dimensions and the regular tetrahedron in three dimensions.)
As the iterative process proceeds, then, each point moves away from its
initial position at one vertex of the regular simplex. The trajectory of the
point is controlled by vector forces that are constantly changing so as to
satisfy the requirement of monotonicity. The whole configuration comes to
rest only after the rank order of the distances is the inverse of the rank order
of the initially given proximity measures.

Achievement oJ M i n i m u m Dimensionality
Actually the process, as just described, would come to rest almost im-
mediately. This is because a trivial solution can always be found in N -- 1
dimensions by malting arbitrarily small adjustments in the regular simplex.
An intuitive feeling for this fact can be gained by considering the special case
of four points in ordinary three-space. Initially these four points will coincide
with the vertices of a regular tetrahedron (a figure that cannot be isometrically
embedded in a space of fewer than three dimensions). ClearIy, any one of
the six edges of this tetrahedron could now be shortened by some small
amount As without affecting the lengths of the other five edges. (It is suffi-
130 PSYCHOMETRIKA

cient to subject two of the four triangular faces to a very small rigid rotation,
towards each other, about the edge opposite the one to be shortened.) More-
over, any one of the other five edges could then be shortened by some amount
smaller than As in exactly the same way without altering any of the other
edges.
This process could be continued so that each edge (taken in any sequence)
is shortened by a successively smaller amount. The final result would be a
tetrahedral configuration in which the distances between the four vertices
conform to any predesignated ordering. The extension of this argument to
the general case of N points in N - 1 dimensions need not be elaborated here;
for the fact that all orderings of distances are possible in N - 1 dimensions
follows from results already presented by Bennett and Hays ([4], pp. 37-38).
Taken by itself, then, the method outlined above for securing monotonicity
is of little utility. It leads to no real reduction of the original data. Further-
more, it does not yield a unique solution, since the amount each successive
edge is to be shortened is not completely specified--it can, in fact, be chosen
in infinitely many ways.
A nontrivial and unique solution can only be secured by supplementing
the monotonicity requirement with an additional requirement; namely, the
final configuration should be of the smallest possible dimensionality. As a
helpful heuristic, consider a mapping of a linear segment of length s onto a
semicircular segment spanning 0 radians of a circle. The mapping can be
chosen so that the distance d between any two points of the segment is
transformed into the new distance ](d) given by
(1) ](d) -= r~¢/2[1 -- cos (t~ d/s)-] (0 < 0 <_ 7r).
Here r denotes the radius of curvature of the semicircular segment. This
radius can always be chosen so that the mean interpoint distance remains
invariant during the mapping. For example, if the N points are evenly dis-
tributed along the segment, invariance is approached (for large N) by choosing
sO
r --~
2411 -- (2/8) sin (0/2)] '
in which case the mean distance remains fixed in the vicinity of s/3.
In any case, since 0 < 0 _< ~r, the function ] is monotonic; hence the rank
order of the interpoint distances is necessarily preserved. Clearly, then, a
choice cannot be made between the straight and the curved segments on the
basis of the rank order of the distances alone. Note, however, that whereas
the reconstruction of the distances from Cartesian coordinates can be effected
with only one coordinate axis in the case of the linear segment, two such
axes are required for the semicircular segment. Note, also, that whereas
there is only one linear configuration, there is a different semicircular con-
figuration for each choice of the parameter 0 governing the total curvature
ROGER N. SHEPARD 131

of the segment. Thus considerations of simplicity or parsimony, on the one


hand, and of uniqueness, on the other, lead us to prefer the solution that
can be represented in the space of lesser dimensionality.
In this particular example the inverse of the function ] can be used to
flatten the semicircular segment back into the straight line by systematically
stretching the larger distances and shrinking the smaller distances. A number
of other examples reinforce the inference that, in general, the collapsing of
a configuration into a space of smaller dimensionality is accompanied by an
increase in the variance of the distances. Suppose, in particular, that a set
of points is to be confined to a hyperspherical region of fixed radius. We have
already noted that the case of minimum variance of the interpoint distances
can only be achieved in N - 1 dimensions, where the vertices of a regular
simplex are all separated by exactly the same distance. At the other extreme,
if the rank order of the distances is disregarded, then the case of maximum
variance can only be realized by dividing the points evenly between two
spatially separated clusters in such a way that the distances between any
two points within the same cluster approaches zero. This is clearly a one-
dimensional configuration, since all distances can be reconstructed from
coordinates on a single axis passing through the two clusters. More generally,
Hammersley [8] has shown that the variance of the distribution of distances
in a hyperspherical region of K-dimensional space is a decreasing function
of the number of dimensions K. Indeed, this distribution is asymptotically
normal with variance decreasing approximately as the reciprocal of the
number of dimensions [8, 11].
These considerations suggest that the way to flatten a configuration into
a space of smaller dimensionality is to increase the variance of the distances
by further stretching those distances that are already large and by further
shrinking those distances that are already small. This is just what is done
in the method of analysis proposed here. During each iteration two sets of
N -- 1 vectors are computed for each of the N points. While the first set
(already described) is designed to improve the position of the point with
respect to the requirement of monotonicity, the second set is intended to
move that same point closer to nearby points and further from remote
points. Since the proximity measures are, by hypothesis, monotonically
related to the distances, these measures can be used in place of the distances
them~lves to compute the second set of vectors. (This device seems to avoid
some instabilities inherent in the other alternative.) In particular, for any one
point a vector of this second type is extended towards or away from each
other point depending upon whether the proximity measure for the two points
is larger or smaller than the mean for all N ( N - 1 ) / 2 proximity measures.
The length of the vector, again, is proportional to the magnitude of this
deviation from the mean.
Hence the final displacement of each point during one iteration is the
13~ PSYCHOMETRIKA.

resultant of altogether 2(N - 1) vectors. Half of these represent forces


that are acting to achieve monotonicity; the remaining half represent forces
that are acting to drive the configuration into a space of smaller dimen-
sionality. With the addition of this second set of vectors, therefore, the
spatial configuration continues to change until both criteria are satisfied.
At that point there are no further changes, since the vectors tending to flatten
the configuration still further are exactly counteracted by the vectors tending
to maintain monotonicity.
The convergence to this final configuration does not immediately supply
the reduction we are seeking, however, since each of the N points is still
specified by N - 1 coordinates. The situation is analogous to a coplanar set
of points in ordinary three-space. In order to reduce the number of coordinates
for each point from three to two, in that case, the coordinate system (with
origin at the centroid) must be rotated so that all proiections on the third
axis vanish. Only then can the distances between the coplanar points be
reconstructed from their coordinates on two axes alone. Similarly in the
general case, the attainment of the most economical description requires
that the coordinate system be rotated into "princip~l axes." When this has
been done the total variance of the coordinates (which is invariant under
rotation) will be partitioned among the axes in such a way that each suc-
cessive axis accounts for the greatest possible fraction of that variance not
accounted for by all preceding axes. When only a negligible fraction remains
to be accounted for, all succeeding axes can be eliminated as superfluous.
Hopefully, the number of axes that must be retained will be small compared
with the number of points N. For it is the set of coordinates on these retained
axes that constitutes the reduced characterization of those points and, hence,
of the original proximity measures.

Method o] Computation
A program for carrying out the operations outlined only roughly in the
preceding sections has now been compiled and tested on an IBM 7090 com-
puter. The remainder of this paper is devoted to the presentation of a more
complete and precise statement of the mathematical details of this program.*
A second paper (Part II, to follow in a subsequent issue of this journal)
will present the results of a number of tests that were performed to show that
this program does in fact yield solutions of the kind claimed for it.

Preparations ]or the Iterative Procedure


As soon as the N ( N - 1)/2 initially given proximity measures are fed
into the computer, two operations are performed. First, these measures are
linearly transformed to bring them into a standard form in which the largest
*Copies of the actual FORTRAN listing for this program will be made available,
by the author, to those who might wish to use it.
ROGER :N. S H E P A R D 133

measure is unity and the smallest measure is zero. The standardized proximity
measure obtained in this way for the points i and j will be denoted by si; •
Second, these standardized measures are then rank ordered from smallest
to largest. Tied measures are, for convenience, put into an arbitrary order.
(Owing to a refinement in the use of the ranking, this treatment of ties has
no influence upon the outcome.)
The next step is to generate a set of coordinates for each of the N vertices
of a regular simplex in N - 1 dimensions. These are the starting coordinates
that will be modified during the first iteration. The orientation of the simplex
is arbitrary but, again for convenience, its position and size are so chosen
t h a t the centroid is at the origin and all edges are of unit length. Specifically,
if x~o denotes the coordinate for vertex i on axis a, then
cos [2q(i -- 1)~r/N]
(2a) x,c2~-1) = ~/'N '

sin [2q(i - 1)r/N]


(2b) x.~) = %/N '

as q runs from one to the greatest integer in (N - 1)/2. In case N is even.


the coordinates on the last axis are given by
( _ i ) ~-'
(2c) x,(~-l, = ~ .

T h a t the N points with these coordinates are in fact all separated by unit
distances can easily be shown (see Coxeter [5], p. 245).
Comparison o] R a n k Orders
At the beginning of each iteration, the Euclidean distances between
the N points are calculated from their (N -- 1)-dimensional coordinates for
that iteration by means of the generalized Pythagorean formula
Ar~l

These distances are then ranked from smallest to largest for purposes of
comparison with the ranking already determined for the proximity measures.
Tied distances are always put into an order that is exactly the opposite of
the order dictated by the ranking of the proximity measures. This is done
to insure t h a t distances that should not be tied will be pulled apart during
the iteration.
3_ certain refinement is introduced into the comparison of the two rank-
ings for the following reason. Although any ranking of the distances can be
exactly realized in N - 1 dimensions, some small violations of monotonicity
must be anticipated whenever the configuration is forced into a space of
smaller dimensionality--partieularly since the initial proximity measures
134 :PSYC:ttOMETR]{KA

are usually subject to errors of measurement. However, some departures


from perfect inverse ranking can be tolerated better than others. Consider,
for example, a case in which the violation of monotonicity consists merely
in the reversal of two distances of adjacent ranks. Clearly, the seriousness
of such a reversal would depend upon the magnitude of the difference between
the two corresponding proximity measures. Indeed, if these two proximity
measures were sufficiently close together, the direction of their difference
would not be statistically reliable anyway. In that event, the reversal of
the ranks of the two distances could be disregarded. In comparing ranks,
discrepancies should somehow be weighted to reflect the magnitude of the
difference between the proximity measures.
Actually such a refinement is quite easy to implement. Instead of com-
paring the ranks themselves, these integers are first replaced by the cor-
responding standardized proximity measures. To be more precise, let R(s~i)
denote the rank of the proximity measure s , for the two points i and j;
let R ' ( d , ) denote the reversed rank of their distances so that, for perfect
inverse ranking, R ( s . ) = R ' ( d ~ ) . An obvious measure of the departure
from monotonicity of the rank of the distance between i and j is the algebraic
difference R ( s . ) - R ' ( d . ) . However, in place of this, the present proposal
is to use the more refined measure s~. - s(d.), where s(d,i) stands for the
proximity measure of rank R ' ( d ~ ) . This amounts, in effect, to a comparison
of each proximity measure with its corresponding distance after the scale
of distance has been subjected to a nonlinear transformation that renders
the distribution of distances identical with the distribution of proximity
measures. If the relation between these two variables is monotonic, then
this procedure insures that it will also be linear.

Vectors for the Approach to Monotonicity


Since the standardized proximity measures range from zero to one,
the algebraic differences s , - s ( d , ) can assame any value between plus and
minus unity. A positive value indicates that the distance is too great, a
negative value that the distance is too small, and zero that there is no de-
parture from monotonicity for that pair of points. The vectors designed to
achieve a closer approach to monotonicity during a given iteration can
therefore be constructed on the basis of these algebraic differences. In par-
ticular, the component on axis a for the vector directed from point i along
the line passing through some other point j is given by
a [ s . -- s(d.)](zio -- x~o)
(4) a,i. = d, '

where the parameter a (without subscripts) is a constant multiplier for the


lengths of all vectors. Larger values of a promote more rapid convergence.
But, as in all errorocorrecting systems of this kind, if a is chosen too large
ROGER N. SHEPARD 135

the system over-corrects and, hence, degenerates into nonconvergent oscil-


lation.
Vectors ]or the Approach to M i n i m u m Dimensionality
A second set of vectors is then constructed in order to induce a slight
stretching of the larger distances and shrinkage of the smaller distances.
The component on axis a for the vector (of this second type) directed from
point i along the line passing through point j is given by
(5) 8.° = ~[s. - ~](x~o - x,o)
.dl i
where ~ is the mean of all N ( N -- 1)/2 proximity measures. Again the param-
eter ~ (without subscripts) governs the rate at which the configuration is
forced into spaces of smaller and smaller dimensionality. As will be seen
must generally be less than a. Otherwise the tendency to flatten the con-
figuration overrides the tendency to maintain monotonicity, and the config-
uration is consequently driven into a space of spuriously low dimensionality.

Displacement o/the Points


The vector specifying the actual displacement of point i during the
given iteration is computed as the vector sum of both types of vectors.
Explicitly, the component of the displacement on axis a is given by
(6) ~X,o = ] E (~,io + ~,,o),
i

where the summation is carried out over all N - 1 other points j ( # i ) . Thus,
during one iteration the coordinate for point i on axis a is changed from
xla , at the beginning of the iteration, into the new value x~, given by
(7) xL = X,o + ~x,o .

Finally, at the end of each iteration the adjusted coordinates x~o undergo a
"similarity" transformation to re-center the centroid at the origin and to
re-scale all distances so that the mean interpoint separation is maintained at
unity. The adjusted and transformed coordinates are treated in exactly the
same way during the following iteration. The iterative process can be
terminated when the adjustments Ax,.o have become too small to have any
appreciable effect on the configuration.
Rotation to Principal Axes
Following the last iteration, the coordinate system should be rotated
to principal axes. To this end, the matrix of scalar products of vectors from
the centroid to all pairs of points is constructed. The scalar product for the
points i and j is given by
N--1
(8) b, = ~ x,oxi. .
136 I~SYCHOMETRIKA

Next, the characteristic roots and vectors of this matrix (which is real and
symmetric) are computed by the Jacobi method of diagonalization (e.g., [9],
pp. 180-185). These roots, with their associated vectors, are then ordered
so that the roots form a decreasing sequence. The ath such root X~ specifies
the variance accounted for by the ath most important of the rotated axes.
If ~o denotes the ith component of thc characteristic vector associated wi~h X~,
then the actual coordinates on the ath rotated axis are given by
(9) X*o =
Hopefully the first few axes will account for almost all of the variance. The
coordinates on the remaining axes can then be eliminated to achieve the
reduced representation of the spatial configuration.
After rotation to principal axes it is sometimes desirable to carry out
further iterations in the collapsed space. The reason for this is as follows.
First, as long as ~ > 0 (as it must be to achieve minimum dimensionality)
some sIight distortion is necessarily introduced into the stabilized config-
uration. This distortion can be reduced to any desired level by choosing #
sufficiently small. But this, in tm~, leads to a slower convergence. In order
to conserve computing time, therefore, it is often expedient to select a larger
f~ and even to rotate to principal axes before the configuration has come
completely to rest.
The results of this premature rotation cannot of course be taken as a
final solution. These results can be used to make an inference as to the number
of dimensions required and to provide a set of approximate coordinates in
the space of this number of dimensions. This approximate solution can then
be brought into its exact form by setting # = 0 and iterating to a final con-
figuration in the eoliapsed space. Since there will be no stretching or shrinking
with # -- 0, the only criterion governing the final configuration is that of
monotonicity. If a sufficient degree of monotonicity is not achieved, this
indicates that the number of required dimensions has been underestimated.
In this event one can easily return to the coordinates obtained from the
"premature" rotation, add the next axis, and then iterate to a new solution
in the resulting augmented space.

Criterion for Terminating the Iterative Process


An explicit measure of the over-all departure from monotonicity remain-
ing after any iteration is the mean square discrepancy between the adjusted
rank numbers; namely,
hr i-1

2 5 2 [8.. - 8 ( d . ) ?
N ( N -- 1)
In the following section and in the sequel (Part II) ~ will]be shown to provide
ROGER N . S H E P A R D 137

a satisfactory indicator for determining when to perform the "premature"


rotation as well as when to terminate, finally, the iterative process.

Choice o] Values ]or ~ and fl


Owing to the trading relation, already noted, between speed and ac-
curacy in the process of convergence, it seemed desirable to undertake a
systematic exploration of the effects of these two parameters. In order to
provide for an ob]ective evaluation of convergence, a set of artificial proximity
measures was used. These were generated by applying a known fmlction
to the distances between points in a known configuration. Thus the spatial
configuration reconstructed during any stage of the iterative process could
be directly compared with the known "true" configuration. In order to save
computing time, a two-dimensional array of only five points was selected as
this known configuration. For convenience, the scale of distance was chosen
so that the mean of the ten interpoint distances was unity. The set of ten
artificial proximity measures was then generated by arbitrarily defining the
proximity measure for any two points to be a normal (Gaussian) function
of the distance between them.
The long rectangular panel in the upper left corner of Fig. 1 shows how
the discrepancy between the configuration reconstructed during each iteration
and the "true" configuration of five points declined during 80 consecutive
iterations with a = .20 and fl - .05. The measure of discrepancy used is
the sum of squared deviations of the ten reconstructed distances from the
ten true distances divided by the sum of squared deviations of the true
distances from their mean (which in this case is unity). Since the ten re-
constructed distances are initially all unity, this measure of discrepancy
necessarily starts at 1.0 before the first iteration. Note that, with these
values for a and/~, there was a very marked drop in the discrepancy during
the first iteration; convergence to the true solution was essentially achieved
by the 15th iteration. Thereafter, the discrepancy remained below .02.
The triangular array of square panels in the lower right section of the
figure uses the same measure of discrepancy to exhibit ~he analogous curves
for 21 combinations of values of a and B. The 32-fold range from 0.05 to 1.60
is covered for both parameters. But each panel displays tbe results for the
first 20 iterations only. (Thus the panel for a = .20 and B = .05 in this array
duplicates the square section at the left of the upper rectangular graph already
described.) Apparently, for small values of a, i.e., to the left in the two-
dimensional array of panels, the curves tend to be smooth and orderly; whereas
for large values of a, i.e., to the right, the curves tend to be more jagged and
erratic. Indeed, the excursions for a = 1.60 were so large that only the oc-
casional re-entries into the panels could be shown.
Apart from these oscillations for large values of a, the curves tend to
decrease more or less monotonically for fl < a/4, say. But, for larger values
138 PSYCHOMETRIKA

"°iL"
0 20 ' '40 60
I
80
~ ! ,60

.OMBO 0 ' ,TR^T,ONS

ALP"A=O
v [ x BETA = 0.05

m
0.20

0.o5 o.10 0.Zo 0.4o O.SO t.60


ALPHA

FZOtTRE 1
Discrepancy between the reconstructed configuration and the true configuration as a func-
tion of number of iterations for each of 21 different combinations of values for a and 3.

of #, the curves pass through a minimum and then increase again. Evidently
the most precise convergence is achieved b y choosing a as small as ,20 and
by choosing ~ considerably smaller than a. Thus, if computing time is of
little concern, a solution of high accuracy could probably be achieved with
a = .20 and # = .01.
Unfortunately, when N is at all large convergence can be quite slow with
these values for a and #; hence, computing time does become a m a t t e r for
concern. The curves in Fig. 1 suggest a more expedient strategy in such cases.
Note, for example, the curve for a = .4 and ~ = .2. Although this curve
rises after the second iteration, it does indicate a rather close approach to
the true configuration after only two iterations. Other neighboring curves
exhibit similar behavior. If there were some way to establish just when the
discrepancy is minimum, the coordinate system could be rotated to principal
axes at t h a t point. The final solution could then be secured rather quickly
ROGER 1~. SI=IEPARD 139

b y iterating in the collapsed space with a = .2 and f~ = .0, say. Of course,


since the true configuration is not generally k n o ~ in advance, curves like
those shown in Fig. 1 cannot be plotted in practice. However, the measure
of the over-all departure from monotonicity, ~, can always be computed.
Moreover, this measure turns out, in general, to attain its minimum v e r y
close to the point a t which the discrepancy between the distances is minimum.
The points at which the quantity ~ is m i n i m u m are designated b y the
small vertical arrows above the curves in Fig. 1. (When there are two arrows,
these indicate the first and last iterations for which the departure front
monotonicity attained its m i n i m u m value during the t w e n t y iterations. When
N is as large as ten or fifteen, however, this departure usually reaches its
minimum only once during a comparable number of iterations.) These results
s u g g e s t - - a n d subsequent experience c o n f i r m s - - t h a t the n u m b e r of dimensions
can be adequately estimated on the basis of the variances of the coordinates
on the principal axes when the departure from monotonicity is minimum.
The effectiveness of this strategy, particularly for cases of substantially larger
N, will become apparent in the sequel (Part I I ) . T h a t paper will be devoted
to the detailed description and evaluation of a n u m b e r of test applications
of the method presented here.

REFERENCES
[1] Abelson, R. P. and Sermat, V. Multidimensional scaling of facial expressions. J. exp.
Psychol, in press.
[2] Attneave, F. Dimensions of similarity. Amer. J. Psychol, 1950, 63, 516-556.
[3] Bales, R. F., Strodtbeck, F. L., Mills, T. M., and Roseborough, M. E. ChanneLs of
communication in small groups. Amer. sociol. Rev., 1951, 16, 461-468.
[4] Bennett, J. F. and Hays, W. L. Multidimensional unfolding: Determining the dimen-
sionality of ranked preference data. Psychometrika, 1960, 25, 27-43.
[5] Coxeter, H. S. M. Regular polytopes. London: Methuen, 1948.
[6] Ekman, G. Dimensions of color vision. J. Psychol., 1954, 38, 467-474.
[7] Green, B. F. and Anderson, L. K. The tactual identification of shapes for coding
switch handles. J. appl. Psychol., 1955, 39, 219-226.
[8] Hammersley, J. M. The distribution of distances in a hypersphere. Ann. math. Statist.,
1950, 21, 447-452.
[9] Harman, H. H. Modern factor analysis. Chicago: Univ. Chicago Press, 1960.
[10] Keller, F. S. and Taubman, R. E. Studies in international morse code. II. Errors
made in code reception. J. appl. Psychol., 1943, 27, 504-509.
[11] Lord, R. D. The distribution of distance in a hypersphere. Ann. math.. Statist., 1954,
25, 794-798.
[12] Plotkin, L. Stimulus generalization in morse code learning. Arch. Psychol., 1943, 40,
No. 287.
[13] Rothkopf, E. Z. Signal similarity and reception errors in early morse code training.
In G. Finch and F. Cameron (Eds.), Symposium on Air Force human engineering,
personnel, and training research. Washington: National Academy of Sciences--National
Research Council Publication 455, 1956. Pp. 229-235.
[14] Rothkopf, E. Z. A measure of stimulus similarity and errors in some paired-associate
learning tasks. J. exp. Psychol., 1957, 53, 94-101.
]40 PSYCHOMETRIKA

[15] Seashore, It. and Kurtz, A. K. Analysis of errors in copying code. Washington, D. C. :
Office of Scientific Research and Development, Rep. No. 4010, 1944.
[16] Shepard, R. N. Multidimensional scaling of concepts based upon sequences of re-
stricted associative responses. Amer. Psychologist, 1957, 12, 440-441. (Abstract)
[17] Shepard, It. N. Stimulus and.response generalization: A stochastic model relating
generalization to distance in psychological space. Psychometrika, 1957, 22, 325-345.
[18] Shepard, R. N. Stimulus and response generalization: Tests of a model relating
generalization to distance in psychological space. J. exp. Psychot., 1958~ 55, 509-523.
[19] Shepard, R. N. Stimulus and response generalization: Deduction of the generalization
gradient from a trace model. Psychol. Rev., 1958, 55, 242-256.
[20] Shepard, R. N. Similarity of stimuli and metric properties of behavioral data. In
H. Gulliksen and S. Messick (Eds.), Psychological scaling: theory and method. New
York: Wiley, 1960. Pp. 33-43.
[21] Torgerson, W. S. Theory and melhods of scaling. New York: Wiley, 1958.
M a:nuscrip~ received t 0/~/61
Revised manuscript received 12/7/61

You might also like