The Analysis of Proximities Multidimensional Scaling With An Unknown Distance
The Analysis of Proximities Multidimensional Scaling With An Unknown Distance
2
        JV~E, 1962
   T H E ANALYSIS OF P R O X I M I T I E S : M U L T I D I M E N S I O N A L
    S C A L I N G W I T H A N U N K N O W N D I S T A N C E F U N C T I O N . I.
~:~OGER N . SHEPARD
in the factors that determine confusions between morse code signals [10, 12,
13, 14, 15]. But, since there are 36 different signals to be investigated, at
the very outset one is faced with the bewildering task of trying to find some
sort of pattern or structure in the N ( N - 1)/2 or 630 different similarities
between these signals. Once some underlying pattern is found, though, the
way would thereby be opened for the systematic exploration of how this
pattern depends upon the physical properties of the stimuli, the stage of
learning, prior training, the individual learner, and so on.
      But the proposition has already been tendered that similarity is inter-
pretable as a relation of proximity; this immediately suggests that the struc-
 ture we are seeking in data of this kind is a spatial structure. Certainly we
usually think that a greater degree of proximity implies a smaller distance of
 separation. If some monotonic transformation of the proximity measures
 could be found that would convert these implicit distances into explicit
 distances, then we should be in a position to recover the spatial structure
 contained only latently in the original data. Indeed, methods have already
 been developed whereby, given an explicit set of distances of this kind,
 one can map the stimuli, as points, into a Euclidean space (see Torgerson
 [21], pp. 254-259). Such a mapping constitutes a true reduction of the data
 since the originally bewildering array of proximity measures can then be
 reconstructed trom a relatively much smaller set of coordinates for the
 resulting points in Euclidean space.
      Clearly, the success of such an undertaking depends upon the selection
 of the proper distance function; that is, the function that will transforn~ the
 proximity measures into Euclidean distances. Sometimes there is some in-
 dependent information about the shape of this function. For example, Shepard
 [19] has adduced evidence for an approximately exponential relation between
 substitution errors during identification learning and the psychological dis-
 tance between the stimuli or between the responses. Thus the distance function
 should presumably be logarithmic in form for this special case. Unfortunately,
 though, there is no reason to suppose that proximity measures obtained under
 quite different conditions wilI also be exponentially related to distance. Even
 in identification learning this relation apparently departs from the simple
 exponential as soon as feedback as to the correctness of the responses is
 curtailed [19]. As a matter of fact, in some cases we are primarily interested
 in discovering the shape of this unknown function, rather than the spatial
  configuration of the points.
      Furthermore, even when the shape of the distance function is already
 known, there is this remaining difficulty: since proximity measures are usually
 bounded from below (typically by zero), they necessarily level off as distance
 becomes very large. The consequent nonlinearity of the distance function
 can magnify seemingly negligible statistical fluctuations of small proximity
 measures into wild swings of the corresponding estimates for the large dis-
128                             PSYCHOMETRIKA
Achievement of Monotonicity
     The first of the two fundamental conditions to be imposed on the solution,
that of monotonicity, states that the rank order of the N ( N - 1)/2 dis-
tances between the N points should be the inverse of the rank order of the
N(N   -  1)/2 proximity measures (at least to a sufficient degree of approxi-
          -
mation). Thus the two points corresponding to the pair of entities with the
largest proximity measure should end up with the smallest (or very nearly
the smallest) distance of separation in Euclidean space. The device used to
secure this condition of monotonicity is basically quite simple. First, any
prescribed rank order of the distances whatever (including ties) can be realized
so long as we are free to arrange the N points in a space of at least N - 1
dimensions. Suppose, then, that ~ve have some particular arrangement of
the points in an (N - 1)-dimensional space. If this particular configuration
does not meet the requirement of monotonicity, then we should be able to
find a better configuration simply by moving the points in such a way as
to stretch those distances that are too small and compress those distances
that are too large.
     Stated more completely, the proposed procedure is as follows. First,
from the N ( N -- 1) coordinates specifying the initial configuration of points,
the N ( N -- 1)/2 Euclidean distances between these points are computed.
The computed distances are then rank ordered from smallest to largest, and
the resulting ranking is compared with the ranking already obtained for
the proximity measures. From each point N -- 1 vectors are then constructed
in such a way that each vector falls along the line connecting that point with
                               ROGER N . S H E P A R D                      129
one of the N - 1 other points. The vector is directed towards or away from
the other point depending upon whether the rank of the distance between
those two points is too large or too small (as determined from the rank of
their proximity measure). The length of the vector is determined by the
size of this discrepancy in rank. Thus the N - 1 vectors issuing from a point
can be interpreted as N - 1 forces that are tending to pull that point towards
those other points that are too distant and away from those other points
that are too close.
      Once the set of N - 1 vectors has been determined for all N points,
these points are simultaneously displaced in their (N - 1)-dimensional space.
The vector governing the displacement of each point is computed as the
vector sum (or resultant) of the N -- 1 individual vectors attached to that
point. Since the displacement of any one point is a compromise between
N - 1 separate tendencies, there is no guarantee that the simultaneous
displacement of all N points will immediately instate the desired condition
of inverse ranking. But there is no need to insist that the final solution be
achieved in a single stroke. Rather, the procedure can be iterated, each time
on the basis of the successively smaller residue of departures from monoton-
icity remaining from all preceding iterations.
      In order to get the iterative process under way, some starting coordinates
must be supplied. Fortunately coordinates can always be found in N - 1
dimensions such that all N ( N - 1)/2 interpoint distances are equal. This
insures that no bias will be introduced into the final solution. The points
having this property coincide with the vertices of a regular simplex. (This
configuration is simply the generalization, to higher spaces, of the equilateral
triangle in two dimensions and the regular tetrahedron in three dimensions.)
As the iterative process proceeds, then, each point moves away from its
initial position at one vertex of the regular simplex. The trajectory of the
point is controlled by vector forces that are constantly changing so as to
 satisfy the requirement of monotonicity. The whole configuration comes to
rest only after the rank order of the distances is the inverse of the rank order
 of the initially given proximity measures.
Achievement oJ M i n i m u m Dimensionality
     Actually the process, as just described, would come to rest almost im-
mediately. This is because a trivial solution can always be found in N -- 1
dimensions by malting arbitrarily small adjustments in the regular simplex.
An intuitive feeling for this fact can be gained by considering the special case
of four points in ordinary three-space. Initially these four points will coincide
with the vertices of a regular tetrahedron (a figure that cannot be isometrically
embedded in a space of fewer than three dimensions). ClearIy, any one of
the six edges of this tetrahedron could now be shortened by some small
amount As without affecting the lengths of the other five edges. (It is suffi-
130                                  PSYCHOMETRIKA
 cient to subject two of the four triangular faces to a very small rigid rotation,
 towards each other, about the edge opposite the one to be shortened.) More-
 over, any one of the other five edges could then be shortened by some amount
 smaller than As in exactly the same way without altering any of the other
 edges.
      This process could be continued so that each edge (taken in any sequence)
 is shortened by a successively smaller amount. The final result would be a
tetrahedral configuration in which the distances between the four vertices
conform to any predesignated ordering. The extension of this argument to
the general case of N points in N - 1 dimensions need not be elaborated here;
for the fact that all orderings of distances are possible in N - 1 dimensions
follows from results already presented by Bennett and Hays ([4], pp. 37-38).
Taken by itself, then, the method outlined above for securing monotonicity
is of little utility. It leads to no real reduction of the original data. Further-
more, it does not yield a unique solution, since the amount each successive
edge is to be shortened is not completely specified--it can, in fact, be chosen
in infinitely many ways.
      A nontrivial and unique solution can only be secured by supplementing
the monotonicity requirement with an additional requirement; namely, the
final configuration should be of the smallest possible dimensionality. As a
helpful heuristic, consider a mapping of a linear segment of length s onto a
semicircular segment spanning 0 radians of a circle. The mapping can be
chosen so that the distance d between any two points of the segment is
transformed into the new distance ](d) given by
(1)                      ](d) -= r~¢/2[1 -- cos (t~ d/s)-]         (0 < 0 <_ 7r).
Here r denotes the radius of curvature of the semicircular segment. This
radius can always be chosen so that the mean interpoint distance remains
invariant during the mapping. For example, if the N points are evenly dis-
tributed along the segment, invariance is approached (for large N) by choosing
                                              sO
                         r   --~
                                   2411 -- (2/8) sin (0/2)] '
in which case the mean distance remains fixed in the vicinity of s/3.
     In any case, since 0 < 0 _< ~r, the function ] is monotonic; hence the rank
order of the interpoint distances is necessarily preserved. Clearly, then, a
choice cannot be made between the straight and the curved segments on the
basis of the rank order of the distances alone. Note, however, that whereas
the reconstruction of the distances from Cartesian coordinates can be effected
with only one coordinate axis in the case of the linear segment, two such
axes are required for the semicircular segment. Note, also, that whereas
there is only one linear configuration, there is a different semicircular con-
figuration for each choice of the parameter 0 governing the total curvature
                              ROGER N. SHEPARD                               131
                            Method o] Computation
     A program for carrying out the operations outlined only roughly in the
preceding sections has now been compiled and tested on an IBM 7090 com-
puter. The remainder of this paper is devoted to the presentation of a more
complete and precise statement of the mathematical details of this program.*
A second paper (Part II, to follow in a subsequent issue of this journal)
will present the results of a number of tests that were performed to show that
this program does in fact yield solutions of the kind claimed for it.
measure is unity and the smallest measure is zero. The standardized proximity
measure obtained in this way for the points i and j will be denoted by si; •
Second, these standardized measures are then rank ordered from smallest
to largest. Tied measures are, for convenience, put into an arbitrary order.
(Owing to a refinement in the use of the ranking, this treatment of ties has
no influence upon the outcome.)
        The next step is to generate a set of coordinates for each of the N vertices
of a regular simplex in N - 1 dimensions. These are the starting coordinates
that will be modified during the first iteration. The orientation of the simplex
is arbitrary but, again for convenience, its position and size are so chosen
t h a t the centroid is at the origin and all edges are of unit length. Specifically,
if x~o denotes the coordinate for vertex i on axis a, then
                                      cos [2q(i -- 1)~r/N]
(2a)                     x,c2~-1) =           ~/'N           '
T h a t the N points with these coordinates are in fact all separated by unit
distances can easily be shown (see Coxeter [5], p. 245).
Comparison o] R a n k Orders
     At the beginning of each iteration, the Euclidean distances between
the N points are calculated from their (N -- 1)-dimensional coordinates for
that iteration by means of the generalized Pythagorean formula
                                       Ar~l
These distances are then ranked from smallest to largest for purposes of
comparison with the ranking already determined for the proximity measures.
Tied distances are always put into an order that is exactly the opposite of
the order dictated by the ranking of the proximity measures. This is done
to insure t h a t distances that should not be tied will be pulled apart during
the iteration.
     3_ certain refinement is introduced into the comparison of the two rank-
ings for the following reason. Although any ranking of the distances can be
exactly realized in N - 1 dimensions, some small violations of monotonicity
must be anticipated whenever the configuration is forced into a space of
smaller dimensionality--partieularly since the initial proximity measures
134                              :PSYC:ttOMETR]{KA
where the summation is carried out over all N - 1 other points j ( # i ) . Thus,
during one iteration the coordinate for point i on axis a is changed from
xla , at the beginning of the iteration, into the new value x~, given by
(7)                             xL     = X,o +        ~x,o .
Finally, at the end of each iteration the adjusted coordinates x~o undergo a
"similarity" transformation to re-center the centroid at the origin and to
re-scale all distances so that the mean interpoint separation is maintained at
unity. The adjusted and transformed coordinates are treated in exactly the
same way during the following iteration. The iterative process can be
terminated when the adjustments Ax,.o have become too small to have any
appreciable effect on the configuration.
Rotation to Principal Axes
     Following the last iteration, the coordinate system should be rotated
to principal axes. To this end, the matrix of scalar products of vectors from
the centroid to all pairs of points is constructed. The scalar product for the
points i and j is given by
                                             N--1
(8)                              b, = ~             x,oxi. .
136                               I~SYCHOMETRIKA
Next, the characteristic roots and vectors of this matrix (which is real and
symmetric) are computed by the Jacobi method of diagonalization (e.g., [9],
pp. 180-185). These roots, with their associated vectors, are then ordered
so that the roots form a decreasing sequence. The ath such root X~ specifies
the variance accounted for by the ath most important of the rotated axes.
If ~o denotes the ith component of thc characteristic vector associated wi~h X~,
then the actual coordinates on the ath rotated axis are given by
(9)                               X*o =
Hopefully the first few axes will account for almost all of the variance. The
coordinates on the remaining axes can then be eliminated to achieve the
reduced representation of the spatial configuration.
     After rotation to principal axes it is sometimes desirable to carry out
further iterations in the collapsed space. The reason for this is as follows.
First, as long as ~ > 0 (as it must be to achieve minimum dimensionality)
some sIight distortion is necessarily introduced into the stabilized config-
uration. This distortion can be reduced to any desired level by choosing #
sufficiently small. But this, in tm~, leads to a slower convergence. In order
to conserve computing time, therefore, it is often expedient to select a larger
f~ and even to rotate to principal axes before the configuration has come
completely to rest.
     The results of this premature rotation cannot of course be taken as a
final solution. These results can be used to make an inference as to the number
of dimensions required and to provide a set of approximate coordinates in
the space of this number of dimensions. This approximate solution can then
be brought into its exact form by setting # = 0 and iterating to a final con-
figuration in the eoliapsed space. Since there will be no stretching or shrinking
with # -- 0, the only criterion governing the final configuration is that of
monotonicity. If a sufficient degree of monotonicity is not achieved, this
indicates that the number of required dimensions has been underestimated.
In this event one can easily return to the coordinates obtained from the
"premature" rotation, add the next axis, and then iterate to a new solution
in the resulting augmented space.
                              2         5 2 [8.. - 8 ( d . ) ?
                                        N ( N -- 1)
In the following section and in the sequel (Part II) ~ will]be shown to provide
                                ROGER N . S H E P A R D                      137
   "°iL"
       0           20 '   '40       60
                                                    I
                                                   80
                                                                      ~          ! ,60
                     ALP"A=O
           v   [ x    BETA = 0.05
                                                                                         m
                                                                                 0.20
                                           FZOtTRE 1
Discrepancy between the reconstructed configuration and the true configuration as a func-
  tion of number of iterations for each of 21 different combinations of values for a and 3.
of #, the curves pass through a minimum and then increase again. Evidently
the most precise convergence is achieved b y choosing a as small as ,20 and
by choosing ~ considerably smaller than a. Thus, if computing time is of
little concern, a solution of high accuracy could probably be achieved with
a = .20 and # = .01.
      Unfortunately, when N is at all large convergence can be quite slow with
these values for a and #; hence, computing time does become a m a t t e r for
concern. The curves in Fig. 1 suggest a more expedient strategy in such cases.
Note, for example, the curve for a = .4 and ~ = .2. Although this curve
rises after the second iteration, it does indicate a rather close approach to
the true configuration after only two iterations. Other neighboring curves
exhibit similar behavior. If there were some way to establish just when the
discrepancy is minimum, the coordinate system could be rotated to principal
axes at t h a t point. The final solution could then be secured rather quickly
                                       ROGER 1~. SI=IEPARD                                        139
                                          REFERENCES
 [1] Abelson, R. P. and Sermat, V. Multidimensional scaling of facial expressions. J. exp.
     Psychol, in press.
 [2] Attneave, F. Dimensions of similarity. Amer. J. Psychol, 1950, 63, 516-556.
 [3] Bales, R. F., Strodtbeck, F. L., Mills, T. M., and Roseborough, M. E. ChanneLs of
     communication in small groups. Amer. sociol. Rev., 1951, 16, 461-468.
 [4] Bennett, J. F. and Hays, W. L. Multidimensional unfolding: Determining the dimen-
     sionality of ranked preference data. Psychometrika, 1960, 25, 27-43.
 [5] Coxeter, H. S. M. Regular polytopes. London: Methuen, 1948.
 [6] Ekman, G. Dimensions of color vision. J. Psychol., 1954, 38, 467-474.
 [7] Green, B. F. and Anderson, L. K. The tactual identification of shapes for coding
     switch handles. J. appl. Psychol., 1955, 39, 219-226.
 [8] Hammersley, J. M. The distribution of distances in a hypersphere. Ann. math. Statist.,
     1950, 21, 447-452.
 [9] Harman, H. H. Modern factor analysis. Chicago: Univ. Chicago Press, 1960.
[10] Keller, F. S. and Taubman, R. E. Studies in international morse code. II. Errors
     made in code reception. J. appl. Psychol., 1943, 27, 504-509.
[11] Lord, R. D. The distribution of distance in a hypersphere. Ann. math.. Statist., 1954,
     25, 794-798.
[12] Plotkin, L. Stimulus generalization in morse code learning. Arch. Psychol., 1943, 40,
     No. 287.
[13] Rothkopf, E. Z. Signal similarity and reception errors in early morse code training.
     In G. Finch and F. Cameron (Eds.), Symposium on Air Force human engineering,
     personnel, and training research. Washington: National Academy of Sciences--National
     Research Council Publication 455, 1956. Pp. 229-235.
[14] Rothkopf, E. Z. A measure of stimulus similarity and errors in some paired-associate
     learning tasks. J. exp. Psychol., 1957, 53, 94-101.
]40                                   PSYCHOMETRIKA
[15] Seashore, It. and Kurtz, A. K. Analysis of errors in copying code. Washington, D. C. :
     Office of Scientific Research and Development, Rep. No. 4010, 1944.
[16] Shepard, R. N. Multidimensional scaling of concepts based upon sequences of re-
     stricted associative responses. Amer. Psychologist, 1957, 12, 440-441. (Abstract)
[17] Shepard, It. N. Stimulus and.response generalization: A stochastic model relating
     generalization to distance in psychological space. Psychometrika, 1957, 22, 325-345.
[18] Shepard, R. N. Stimulus and response generalization: Tests of a model relating
     generalization to distance in psychological space. J. exp. Psychot., 1958~ 55, 509-523.
[19] Shepard, R. N. Stimulus and response generalization: Deduction of the generalization
     gradient from a trace model. Psychol. Rev., 1958, 55, 242-256.
[20] Shepard, R. N. Similarity of stimuli and metric properties of behavioral data. In
     H. Gulliksen and S. Messick (Eds.), Psychological scaling: theory and method. New
     York: Wiley, 1960. Pp. 33-43.
[21] Torgerson, W. S. Theory and melhods of scaling. New York: Wiley, 1958.
M a:nuscrip~ received t 0/~/61
Revised manuscript received 12/7/61