Downloaded from http://cshprotocols.cshlp.
org/ at GEORGE MASON UNIVERSITY on June 23, 2014 - Published by
Cold Spring Harbor Laboratory Press
Comparison of the PAM and BLOSUM Amino Acid Substitution Matrices
David W. Mount
Cold Spring Harb Protoc; doi: 10.1101/pdb.ip59
Email Alerting Receive free email alerts when new articles cite this article - click here.
Service
Subject Browse articles on similar topics from Cold Spring Harbor Protocols.
Categories
Alignment of Pairs of Sequences (12 articles)
Alignment of Sequences (33 articles)
Bioinformatics/Genomics, general (131 articles)
Computational Biology (74 articles)
Genetics, general (322 articles)
Genome Analysis (102 articles)
Proteins and Proteomics, general (488 articles)
Proteomics (60 articles)
To subscribe to Cold Spring Harbor Protocols go to:
http://cshprotocols.cshlp.org/subscriptions
Downloaded from http://cshprotocols.cshlp.org/ at GEORGE MASON UNIVERSITY on June 23, 2014 - Published by
Cold Spring Harbor Laboratory Press
Information Panel
Comparison of the PAM and BLOSUM Amino Acid Substitution
Matrices
David W. Mount
Adapted from “Alignment of Pairs of Sequences,” Chapter 3, in Bioinformatics: Sequence and Genome
Analysis, 2nd edition, by David W. Mount. Cold Spring Harbor Laboratory Press, Cold Spring Harbor,
NY, USA, 2004.
INTRODUCTION
The choice of a scoring system including scores for matches, mismatches, substitutions, insertions,
and deletions influences the alignment of both DNA and protein sequences. To score matches and
mismatches in alignments of proteins, it is necessary to know how often one amino acid is substi-
tuted for another in related proteins. Percent accepted mutation (PAM) matrices list the likelihood of
change from one amino acid to another in homologous protein sequences during evolution and thus
are focused on tracking the evolutionary origins of proteins. In contrast, the blocks amino acid sub-
stitution matrices (BLOSUM) are based on scoring substitutions found over a range of evolutionary
periods. There are important differences in the ways that the PAM and BLOSUM scoring matrices
were derived. These differences, which are discussed in this article, should be appreciated when inter-
preting the results of protein sequence alignments obtained with these matrices.
RELATED INFORMATION
Additional information on using PAM and BLOSUM matrices is provided in Using PAM Matrices in
Sequence Alignments (Mount 2008a) and Using BLOSUM in Sequence Alignments (Mount
2008b), respectively. PAM matrices are based on a Markov model of protein evolution. This model is
tested in A Test of the Markov Model of Evolution in Proteins (Mount 2008c). The appropriate
choice for gap penalties to be used with various matrices is discussed in Using Gaps and Gap
Penalties to Optimize Pairwise Sequence Alignments (Mount 2008d). BLOSUM and other scoring
matrices are compared in combination with various alignment algorithms and gap penalties in
Studies of Varying Alignment Algorithm, Amino Acid Scoring Matrix, and Gap Penalties (Mount
2008e).
COMPARISON OF THE PAM AND BLOSUM MATRICES
The PAM matrices are based on a mutational model of evolution which assumes that amino acid
changes occur as a Markov process, with each amino acid change at a site being independent of
previous changes at that site. Changes are scored in sequences that are 85% similar after predict-
ing a phylogenetic history of the changes in each family. Thus, the PAM matrices are based on pre-
diction of the first changes that occur as proteins diverge from a common ancestor during
evolution of a protein family. Matrices that may be used to compare more distantly related pro-
teins are then derived by extrapolation from these short-term changes, assuming that these more
distant changes are a reflection of the short-term changes occurring over and over again. For each
Please cite as: CSH Protocols; 2008; doi:10.1101/pdb.ip59 www.cshprotocols.org
© 2008 Cold Spring Harbor Laboratory Press 1 Vol. 3, Issue 6, June 2008
Downloaded from http://cshprotocols.cshlp.org/ at GEORGE MASON UNIVERSITY on June 23, 2014 - Published by
Cold Spring Harbor Laboratory Press
longer evolutionary interval, each amino acid can change to any other with the same frequency as
observed in the short term.
In contrast, the BLOSUM matrices are not based on an explicit evolutionary model. They are
derived from considering all amino acid changes observed in an aligned region from a related family
of proteins, regardless of the overall degree of similarity between the protein sequences. However,
these proteins are known to be related biochemically and, hence, should share common ancestry. The
evolutionary model implied in such a scheme is that the proteins in each family share a common ori-
gin, but closer versus distal relationships are ignored, as if they all were derived equally from the same
ancestor. This is called a starburst model of protein evolution.
The PAM matrices are based on scoring all amino acid positions in related sequences, whereas the
BLOSUM matrices are based on substitutions and conserved positions in blocks, which represent the
most-alike common regions in related sequences. The PAM model is thus designed to track the evo-
lutionary origins of proteins, whereas the BLOSUM model is designed to find their conserved domains.
The choice of which matrix to use depends on the goals of the investigator.
Other Amino Acid Scoring Matrices
In addition to the Dayhoff PAM and the related Gonnet et al. (1992), Benner et al. (1994), and Jones
et al. (1992) matrices, and the BLOSUM matrices, a number of other amino acid substitution matri-
ces have been used for producing protein sequence alignments, and several representative ones are
listed in Table 1. For a more complete list and comparison, see Vogt et al. (1995). These tables vary
from a comparison of simple chemical properties of amino acids to a complex analysis of the substi-
tutions found in secondary structural domains of proteins. Because most of these tables are designed
to align proteins on the basis of some such feature of the amino acids, and not on an evolutionary
model, they are not particularly suitable for evolutionary analysis. They should be useful, however, for
discovering structural and functional relationships, or family relationships among proteins.
A sequence alignment program that uses a combination of these tables has been found to be
particularly useful for detecting distant protein relationships (Argos 1987; Rechid et al. 1989). There
have been extensive comparisons (Henikoff and Henikoff 1993; Pearson 1995, 1996, 1998) of the
usefulness of various amino acid substitution matrices for aligning sequences, for finding similar
sequences in a protein sequence database, or for aligning similar sequences based on structure. The
use of these scoring matrices depends also on the appropriate choice for gap penalties (see Using
Gaps and Gap Penalties to Optimize Pairwise Sequence Alignments [Mount 2008d]), and on a
proper statistical evaluation of local alignment scores.
Table 1. Criteria used in amino acid scoring matrices for sequence alignments
1. Simple identity, which scores only identical amino acids as a match and all others as a mismatch.
2. Genetic code changes, which score the minimum number of nucleotide changes to change a codon for one
amino acid into a codon for another, due to Fitch (1966), and also with added information based on structural
similarity of amino acid side chains (Feng et al. 1985). A similar matrix based on the assumption that genetic
code is the only factor influencing amino acid substitutions has been produced (Benner et al. 1994).
3. Matrices based on chemical similarity of amino acid side chains, molecular volume, and polarity and
hydrophobicity of amino acid side chains (see Vogt et al. 1995).
4. Amino acid substitutions in structurally aligned three-dimensional structures (Risler et al. 1988; matrix JO93,
Johnson and Overington 1993). A similar matrix was described by Henikoff and Henikoff (1993). Sander and
Schneider (1991) prepared a similar matrix based on these same substitutions but augmented by substitutions
found in proteins which are so similar to the structure-solved group that they undoubtedly have the same three-
dimensional structure.
5. Gonnet et al. (1994) have prepared a 400 × 400 dipeptide substitution matrix for aligning proteins based on
the possibility that amino acid substitutions at a particular site are influenced by neighboring amino acids, and
thus that the environment of an amino acid plays a role in protein evolution.
6. Jones et al. (1994) have prepared a scoring matrix specifically for transmembrane proteins. This matrix was
prepared using an analysis similar to that used for preparing the original Dayhoff PAM matrices, and therefore
provides an estimate of evolutionary distances among members of this class of proteins.
www.cshprotocols.org 2 CSH Protocols
Downloaded from http://cshprotocols.cshlp.org/ at GEORGE MASON UNIVERSITY on June 23, 2014 - Published by
Cold Spring Harbor Laboratory Press
REFERENCES
Argos, P. 1987. A sensitive procedure to compare amino acid Mount, D.W. 2008b. Using BLOSUM in Sequence Alignments.
sequences. J. Mol. Biol. 193: 385–396. CSH Protocols (this issue) doi: 10.1101/pdb.top39.
Benner, S.A., Cohen, M.A., and Gonnet, G.H. 1994. Amino acid sub- Mount, D.W. 2008c. A test of the Markov model of evolution in pro-
stitution during functionally constrained divergent evolution of teins. CSH Protocols (this issue) doi: 10.1101/pdb.ip58.
protein sequences. Protein Eng. 7: 1323–1332. Mount, D.W. 2008d. Using gaps and gap penalties to optimize pair-
Feng, D.F., Johnson, M.S., and Doolittle, R.F. 1985. Aligning amino wise sequence alignments. CSH Protocols (this issue) doi:
acid sequences: Comparison of commonly used methods. J. Mol. 10.1101/pdb.top40.
Evol. 21: 112–125. Mount, D.W. 2008e. Studies of varying alignment algorithm, amino
Fitch, W.M. 1966. An improved method of testing for evolutionary acid scoring matrix and gap penalties. CSH Protocols (this issue)
homology. J. Mol. Biol. 16: 9–16. doi: 10.1101/pdb.ip60.
Gonnet, G.H., Cohen, M.A., and Benner, S.A. 1992. Exhaustive Pearson, W.R. 1995. Comparison of methods for searching protein
matching of the entire protein sequence database. Science 256: sequence databases. Protein Sci. 4: 1150–1160.
1443–1445. Pearson, W.R. 1996. Effective protein sequence comparison. Methods
Gonnet, G.H., Cohen, M.A., and Benner, S.A. 1994. Analysis of Enzymol. 266: 227–258.
amino acid substitution during divergent evolution: The 400 by Pearson, W.R. 1998. Empirical statistical estimates for sequence
400 dipeptide substitution matrix. Biochem. Biophys. Res. similarity searches. J. Mol. Biol. 276: 71–84.
Commun. 199: 489–496. Rechid, R., Vingron, M., and Argos, P. 1989. A new interactive pro-
Henikoff, S. and Henikoff, J.G. 1993. Performance evaluation of tein sequence alignment program and comparison of its results
amino acid substitution matrices. Proteins Struct. Funct. Genet. 17: with widely used programs. Comput. Appl. Biosci. 5: 107–113.
49–61. Risler, J.L., Delorme, M.O., Delacroix, H., and Henaut, A. 1988.
Johnson, M.S. and Overington, J.P. 1993. A structural basis for Amino acid substitutions in structurally related proteins: A pattern
sequence comparisons: An evaluation of scoring methodologies. recognition approach. J. Mol. Biol. 204: 1019–1029.
J. Mol. Biol. 233: 716–738. Sander, C. and Schneider, R. 1991. Database of homology derived
Jones, D.T., Taylor, W.R., and Thornton, J.M. 1994. A mutation protein structures and the structural meaning of sequence align-
data matrix for transmembrane proteins. FEBS Lett. 339: 269– ment. Proteins 9: 56–68.
275. Vogt, G., Etzold, T., and Argos, P. 1995. An assessment of amino acid
Mount, D.W. 2008a. Using PAM matrices in sequence alignments. exchange matrices: The twilight zone re-visited. J. Mol. Biol. 249:
CSH Protocols (this issue) doi: 10.1101/pdbtop38. 816–831.
www.cshprotocols.org 3 CSH Protocols