0% found this document useful (0 votes)
254 views8 pages

Introduction to GNU m4 Macro Processor

m4 is a macro processor that copies its input to the output while expanding macros. It has built-in functions for tasks like file inclusion, running shell commands, and text manipulation. m4 was developed in the 1970s at Bell Labs and is now widely used in software build systems.

Uploaded by

codework10101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
254 views8 pages

Introduction to GNU m4 Macro Processor

m4 is a macro processor that copies its input to the output while expanding macros. It has built-in functions for tasks like file inclusion, running shell commands, and text manipulation. m4 was developed in the 1970s at Bell Labs and is now widely used in software build systems.

Uploaded by

codework10101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

1.

1 Introduction to m4

m4 is a macro processor, in the sense that it copies its input to the output,
expanding macros as it goes. Macros are either builtin or user-defined, and can
take any number of arguments. Besides just doing macro expansion, m4 has builtin
functions for including named files, running shell commands, doing integer
arithmetic, manipulating text in various ways, performing recursion, etc.… m4 can
be used either as a front-end to a compiler, or as a macro processor in its own
right.

The m4 macro processor is widely available on all UNIXes, and has been standardized
by POSIX. Usually, only a small percentage of users are aware of its existence.
However, those who find it often become committed users. The popularity of GNU
Autoconf, which requires GNU m4 for generating configure scripts, is an incentive
for many to install it, while these people will not themselves program in m4. GNU
m4 is mostly compatible with the System V, Release 4 version, except for some minor
differences. See Compatibility, for more details.

Some people find m4 to be fairly addictive. They first use m4 for simple problems,
then take bigger and bigger challenges, learning how to write complex sets of m4
macros along the way. Once really addicted, users pursue writing of sophisticated
m4 applications even to solve simple problems, devoting more time debugging their
m4 scripts than doing real work. Beware that m4 may be dangerous for the health of
compulsive programmers.

Next: Bugs, Previous: Intro, Up: Preliminaries [Contents][Index]


1.2 Historical references

Macro languages were invented early in the history of computing. In the 1950s Alan
Perlis suggested that the macro language be independent of the language being
processed. Techniques such as conditional and recursive macros, and using macros to
define other macros, were described by Doug McIlroy of Bell Labs in “Macro
Instruction Extensions of Compiler Languages”, Communications of the ACM 3, 4
(1960), 214–20, [Link]

An important precursor of m4 was GPM; see C. Strachey, “A general purpose


macrogenerator”, Computer Journal 8, 3 (1965), 225–41,
[Link] GPM is also succinctly
described in David Gries’s book Compiler Construction for Digital Computers, Wiley
(1971). Strachey was a brilliant programmer: GPM fit into 250 machine instructions!

Inspired by GPM while visiting Strachey’s Lab in 1968, McIlroy wrote a model
preprocessor in that fit into a page of Snobol 3 code, and McIlroy and Robert
Morris developed a series of further models at Bell Labs. Andrew D. Hall followed
up with M6, a general purpose macro processor used to port the Fortran source code
of the Altran computer algebra system; see Hall’s “The M6 Macro Processor”,
Computing Science Technical Report #2, Bell Labs (1972),
[Link] M6’s source code consisted of about 600
Fortran statements. Its name was the first of the m4 line.

The Brian Kernighan and P.J. Plauger book Software Tools, Addison-Wesley (1976),
describes and implements a Unix macro-processor language, which inspired Dennis
Ritchie to write m3, a macro processor for the AP-3 minicomputer.

Kernighan and Ritchie then joined forces to develop the original m4, described in
“The M4 Macro Processor”, Bell Laboratories (1977),
[Link] It had only 21 builtin
macros.
While GPM was more pure, m4 is meant to deal with the true intricacies of real
life: macros can be recognized without being pre-announced, skipping whitespace or
end-of-lines is easier, more constructs are builtin instead of derived, etc.

Originally, the Kernighan and Plauger macro-processor, and then m3, formed the
engine for the Rational FORTRAN preprocessor, that is, the Ratfor equivalent of
cpp. Later, m4 was used as a front-end for Ratfor, C and Cobol.

René Seindal released his implementation of m4, GNU m4, in 1990, with the aim of
removing the artificial limitations in many of the traditional m4 implementations,
such as maximum line length, macro size, or number of macros.

The late Professor A. Dain Samples described and implemented a further evolution in
the form of M5: “User’s Guide to the M5 Macro Language: 2nd edition”, Electronic
Announcement on [Link] newsgroup (1992).

François Pinard took over maintenance of GNU m4 in 1992, until 1994 when he
released GNU m4 1.4, which was the stable release for 10 years. It was at this time
that GNU Autoconf decided to require GNU m4 as its underlying engine, since all
other implementations of m4 had too many limitations.

More recently, in 2004, Paul Eggert released 1.4.1 and 1.4.2 which addressed some
long standing bugs in the venerable 1.4 release. Then in 2005, Gary V. Vaughan
collected together the many patches to GNU m4 1.4 that were floating around the net
and released 1.4.3 and 1.4.4. And in 2006, Eric Blake joined the team and prepared
patches for the release of 1.4.5, with subsequent releases through intervening
years, as recent as 1.4.18 in 2016.

Meanwhile, development has continued on new features for m4, such as dynamic module
loading and additional builtins. When complete, GNU m4 2.0 will start a new series
of releases.

Next: Manual, Previous: History, Up: Preliminaries [Contents][Index]


1.3 Problems and bugs

If you have problems with GNU M4 or think you’ve found a bug, please report it.
Before reporting a bug, make sure you’ve actually found a real bug. Carefully
reread the documentation and see if it really says you can do what you’re trying to
do. If it’s not clear whether you should be able to do something or not, report
that too; it’s a bug in the documentation!

Before reporting a bug or trying to fix it yourself, try to isolate it to the


smallest possible input file that reproduces the problem. Then send us the input
file and the exact results m4 gave you. Also say what you expected to occur; this
will help us decide whether the problem was really in the documentation.

Once you’ve got a precise problem, send e-mail to bug-m4@[Link]. Please include
the version number of m4 you are using. You can get this information with the
command m4 --version. Also provide details about the platform you are executing on.

Non-bug suggestions are always welcome as well. If you have questions about things
that are unclear in the documentation or are just obscure features, please report
them too.

Previous: Bugs, Up: Preliminaries [Contents][Index]


1.4 Using this manual

This manual contains a number of examples of m4 input and output, and a simple
notation is used to distinguish input, output and error messages from m4. Examples
are set out from the normal text, and shown in a fixed width font, like this

This is an example of an example!

To distinguish input from output, all output from m4 is prefixed by the string ‘⇒’,
and all error messages by the string ‘error→’. When showing how command line
options affect matters, the command line is shown with a prompt ‘$ like this’,
otherwise, you can assume that a simple m4 invocation will work. Thus:

$ command line to invoke m4


Example of input line
⇒Output line from m4
error→and an error message

The sequence ‘^D’ in an example indicates the end of the input file. The sequence
‘NL’ refers to the newline character. The majority of these examples are self-
contained, and you can run them with similar results by invoking m4 -d. In fact,
the testsuite that is bundled in the GNU M4 package consists of the examples in
this document! Some of the examples assume that your current directory is located
where you unpacked the installation, so if you plan on following along, you may
find it helpful to do this now:

$ cd m4-1.4.19

As each of the predefined macros in m4 is described, a prototype call of the macro


will be shown, giving descriptive names to the arguments, e.g.,

Composite: example (string, [count = ‘1’], [argument]…)

This is a sample prototype. There is not really a macro named example, but this
documents that if there were, it would be a Composite macro, rather than a Builtin.
It requires at least one argument, string. Remember that in m4, there must not be a
space between the macro name and the opening parenthesis, unless it was intended to
call the macro without any arguments. The brackets around count and argument show
that these arguments are optional. If count is omitted, the macro behaves as if
count were ‘1’, whereas if argument is omitted, the macro behaves as if it were the
empty string. A blank argument is not the same as an omitted argument. For example,
‘example(`a')’, ‘example(`a',`1')’, and ‘example(`a',`1',)’ would behave
identically with count set to ‘1’; while ‘example(`a',)’ and ‘example(`a',`')’
would explicitly pass the empty string for count. The ellipses (‘…’) show that the
macro processes additional arguments after argument, rather than ignoring them.

All macro arguments in m4 are strings, but some are given special interpretation,
e.g., as numbers, file names, regular expressions, etc. The documentation for each
macro will state how the parameters are interpreted, and what happens if the
argument cannot be parsed according to the desired interpretation. Unless specified
otherwise, a parameter specified to be a number is parsed as a decimal, even if the
argument has leading zeros; and parsing the empty string as a number results in 0
rather than an error, although a warning will be issued.

This document consistently writes and uses builtin, without a hyphen, as if it were
an English word. This is how the builtin primitive is spelled within m4.

Next: Syntax, Previous: Preliminaries, Up: Top [Contents][Index]


2 Invoking m4

The format of the m4 command is:

m4 [option…] [file…]
All options begin with ‘-’, or if long option names are used, with ‘--’. A long
option name need not be written completely, any unambiguous prefix is sufficient.
POSIX requires m4 to recognize arguments intermixed with files, even when
POSIXLY_CORRECT is set in the environment. Most options take effect at startup
regardless of their position, but some are documented below as taking effect after
any files that occurred earlier in the command line. The argument -- is a marker to
denote the end of options.

With short options, options that do not take arguments may be combined into a
single command line argument with subsequent options, options with mandatory
arguments may be provided either as a single command line argument or as two
arguments, and options with optional arguments must be provided as a single
argument. In other words, m4 -QPDfoo -d a -df is equivalent to m4 -Q -P -D foo -d -
df -- ./a, although the latter form is considered canonical.

With long options, options with mandatory arguments may be provided with an equal
sign (‘=’) in a single argument, or as two arguments, and options with optional
arguments must be provided as a single argument. In other words, m4 --def foo --
debug a is equivalent to m4 --define=foo --debug= -- ./a, although the latter form
is considered canonical (not to mention more robust, in case a future version of m4
introduces an option named --default).

m4 understands the following options, grouped by functionality.


• Operation modes Command line options for operation modes
• Preprocessor features Command line options for preprocessor features
• Limits control Command line options for limits control
• Frozen state Command line options for frozen state
• Debugging options Command line options for debugging
• Command line files Specifying input files on the command line

Next: Preprocessor features, Up: Invoking m4 [Contents][Index]


2.1 Command line options for operation modes

Several options control the overall operation of m4:

--help

Print a help summary on standard output, then immediately exit m4 without


reading any input files or performing any other actions.
--version

Print the version number of the program on standard output, then immediately
exit m4 without reading any input files or performing any other actions.
-E
--fatal-warnings

Controls the effect of warnings. If unspecified, then execution continues and


exit status is unaffected when a warning is printed. If specified exactly once,
warnings become fatal; when one is issued, execution continues, but the exit status
will be non-zero. If specified multiple times, then execution halts with non-zero
status the first time a warning is issued. The introduction of behavior levels is
new to M4 1.4.9; for behavior consistent with earlier versions, you should specify
-E twice.
-i
--interactive
-e

Makes this invocation of m4 interactive. This means that all output will be
unbuffered, and interrupts will be ignored. The spelling -e exists for
compatibility with other m4 implementations, and issues a warning because it may be
withdrawn in a future version of GNU M4.
-P
--prefix-builtins

Internally modify all builtin macro names so they all start with the prefix
‘m4_’. For example, using this option, one should write ‘m4_define’ instead of
‘define’, and ‘m4___file__’ instead of ‘__file__’. This option has no effect if -R
is also specified.
-Q
--quiet
--silent

Suppress warnings, such as missing or superfluous arguments in macro calls, or


treating the empty string as zero.
--warn-macro-sequence[=regexp]

Issue a warning if the regular expression regexp has a non-empty match in any
macro definition (either by define or pushdef). Empty matches are ignored;
therefore, supplying the empty string as regexp disables any warning. If the
optional regexp is not supplied, then the default regular expression is ‘\$\
({[^}]*}\|[0-9][0-9]+\)’ (a literal ‘$’ followed by multiple digits or by an open
brace), since these sequences will change semantics in the default operation of GNU
M4 2.0 (due to a change in how more than 9 arguments in a macro definition will be
handled, see Arguments). Providing an alternate regular expression can provide a
useful reverse lookup feature of finding where a macro is defined to have a given
definition.
-W regexp
--word-regexp=regexp

Use regexp as an alternative syntax for macro names. This experimental option
will not be present in all GNU m4 implementations (see Changeword).

Next: Limits control, Previous: Operation modes, Up: Invoking m4 [Contents]


[Index]
2.2 Command line options for preprocessor features

Several options allow m4 to behave more like a preprocessor. Macro definitions and
deletions can be made on the command line, the search path can be altered, and the
output file can track where the input came from. These features occur with the
following options:

-D name[=value]
--define=name[=value]

This enters name into the symbol table. If ‘=value’ is missing, the value is
taken to be the empty string. The value can be any string, and the macro can be
defined to take arguments, just as if it was defined from within the input. This
option may be given more than once; order with respect to file names is
significant, and redefining the same name loses the previous value.
-I directory
--include=directory

Make m4 search directory for included files that are not found in the current
working directory. See Search Path, for more details. This option may be given more
than once.
-s
--synclines
Generate synchronization lines, for use by the C preprocessor or other similar
tools. Order is significant with respect to file names. This option is useful, for
example, when m4 is used as a front end to a compiler. Source file name and line
number information is conveyed by directives of the form ‘#line linenum "file"’,
which are inserted as needed into the middle of the output. Such directives mean
that the following line originated or was expanded from the contents of input file
file at line linenum. The ‘"file"’ part is often omitted when the file name did not
change from the previous directive.

Synchronization directives are always given on complete lines by themselves.


When a synchronization discrepancy occurs in the middle of an output line, the
associated synchronization directive is delayed until the next newline that does
not occur in the middle of a quoted string or comment.

define(`twoline', `1
2')
⇒#line 2 "stdin"

changecom(`/*', `*/')

define(`comment', `/*1
2*/')
⇒#line 5

dnl no line
hello
⇒#line 7
⇒hello
twoline
⇒1
⇒#line 8
⇒2
comment
⇒/*1
⇒2*/
one comment `two
three'
⇒#line 10
⇒one /*1
⇒2*/ two
⇒three
goodbye
⇒#line 12
⇒goodbye

-U name
--undefine=name

This deletes any predefined meaning name might have. Obviously, only predefined
macros can be deleted in this way. This option may be given more than once;
undefining a name that does not have a definition is silently ignored. Order is
significant with respect to file names.

Next: Frozen state, Previous: Preprocessor features, Up: Invoking m4 [Contents]


[Index]
2.3 Command line options for limits control

There are some limits within m4 that can be tuned. For compatibility, m4 also
accepts some options that control limits in other implementations, but which are
automatically unbounded (limited only by your hardware and operating system
constraints) in GNU m4.

-g
--gnu

Enable all the extensions in this implementation. In this release of M4, this
option is always on by default; it is currently only useful when overriding a prior
use of --traditional. However, having GNU behavior as default makes it impossible
to write a strictly POSIX-compliant client that avoids all incompatible GNU M4
extensions, since such a client would have to use the non-POSIX command-line option
to force full POSIX behavior. Thus, a future version of M4 will be changed to
implicitly use the option --traditional if the environment variable POSIXLY_CORRECT
is set. Projects that intentionally use GNU extensions should consider using --gnu
to state their intentions, so that the project will not mysteriously break if the
user upgrades to a newer M4 and has POSIXLY_CORRECT set in their environment.
-G
--traditional

Suppress all the extensions made in this implementation, compared to the System
V version. See Compatibility, for a list of these.
-H num
--hashsize=num

Make the internal hash table for symbol lookup be num entries big. For better
performance, the number should be prime, but this is not checked. The default is
65537 entries. It should not be necessary to increase this value, unless you define
an excessive number of macros.
-L num
--nesting-limit=num

Artificially limit the nesting of macro calls to num levels, stopping program
execution if this limit is ever exceeded. When not specified, nesting defaults to
unlimited on platforms that can detect stack overflow, and to 1024 levels
otherwise. A value of zero means unlimited; but then heavily nested code could
potentially cause a stack overflow.

The precise effect of this option is more correctly associated with textual
nesting than dynamic recursion. It has been useful when some complex m4 input was
generated by mechanical means, and also in diagnosing recursive algorithms that do
not scale well. Most users never need to change this option from its default.

This option does not have the ability to break endless rescanning loops, since
these do not necessarily consume much memory or stack space. Through clever usage
of rescanning loops, one can request complex, time-consuming computations from m4
with useful results. Putting limitations in this area would break m4 power. There
are many pathological cases: ‘define(`a', `a')a’ is only the simplest example (but
see Compatibility). Expecting GNU m4 to detect these would be a little like
expecting a compiler system to detect and diagnose endless loops: it is a quite
hard problem in general, if not undecidable!
-B num
-S num
-T num

These options are present for compatibility with System V m4, but do nothing in
this implementation. They may disappear in future releases, and issue a warning to
that effect.
-N num
--diversions=num

These options are present only for compatibility with previous versions of GNU
m4, and were controlling the number of possible diversions which could be used at
the same time. They do nothing, because there is no fixed limit anymore. They may
disappear in future releases, and issue a warning to that effect.

Next: Debugging options, Previous: Limits control, Up: Invoking m4 [Contents]


[Index]
2.4 Command line options for frozen state

GNU m4 comes with a feature of freezing internal state (see Frozen files). This can
be used to speed up m4 execution when reusing a common initialization script.

-F file
--freeze-state=file

Once execution is finished, write out the frozen state on the specified file.
It is conventional, but not required, for file to end in ‘.m4f’.
-R file
--reload-state=file

Before execution starts, recover the internal state from the specified frozen
file. The options -D, -U, and -t take effect after state is reloaded, but before
the input files are read.

Next: Command line files, Previous: Frozen state, Up: Invoking m4 [Contents]
[Index]
2.5 Command line options for debugging

You might also like