Variational Principles in Physics: From Classical To Quantum Realm
Variational Principles in Physics: From Classical To Quantum Realm
Variational Principles in
Physics
From Classical to Quantum
Realm
SpringerBriefs in Physics
Series Editors
Balasubramanian Ananthanarayan, Centre for High Energy Physics, Indian
Institute of Science, Bangalore, Karnataka, India
Egor Babaev, Department of Physics, Royal Institute of Technology, Stockholm,
Sweden
Malcolm Bremer, H. H. Wills Physics Laboratory, University of Bristol, Bristol,
UK
Xavier Calmet, Department of Physics and Astronomy, University of Sussex,
Brighton, UK
Francesca Di Lodovico, Department of Physics, Queen Mary University of
London, London, UK
Pablo D. Esquinazi, Institute for Experimental Physics II, University of Leipzig,
Leipzig, Germany
Maarten Hoogerland, University of Auckland, Auckland, New Zealand
Eric Le Ru, School of Chemical and Physical Sciences, Victoria University of
Wellington, Kelburn, Wellington, New Zealand
Dario Narducci, University of Milano-Bicocca, Milan, Italy
James Overduin, Towson University, Towson, MD, USA
Vesselin Petkov, Montreal, QC, Canada
Stefan Theisen, Max-Planck-Institut für Gravitationsphysik, Golm, Germany
Charles H. T. Wang, Department of Physics, University of Aberdeen, Aberdeen, UK
James D. Wells, Department of Physics, University of Michigan, Ann Arbor, MI,
USA
Andrew Whitaker, Department of Physics and Astronomy, Queen’s University
Belfast, Belfast, UK
SpringerBriefs in Physics are a series of slim high-quality publications encompassing
the entire spectrum of physics. Manuscripts for SpringerBriefs in Physics will be
evaluated by Springer and by members of the Editorial Board. Proposals and other
communication should be sent to your Publishing Editors at Springer.
Featuring compact volumes of 50 to 125 pages (approximately 20,000–45,000
words), Briefs are shorter than a conventional book but longer than a journal article.
Thus, Briefs serve as timely, concise tools for students, researchers, and professionals.
Typical texts for publication might include:
• A snapshot review of the current state of a hot or emerging field
• A concise introduction to core concepts that students must understand in order to
make independent contributions
• An extended research report giving more details and discussion than is possible
in a conventional journal article
• A manual describing underlying principles and best practices for an experimental
technique
• An essay exploring new ideas within physics, related philosophical issues, or
broader topics such as science and society
Briefs allow authors to present their ideas and readers to absorb them with minimal
time investment. Briefs will be published as part of Springer’s eBook collection,
with millions of users worldwide. In addition, they will be available, just like other
books, for individual print and electronic purchase. Briefs are characterized by
fast, global electronic dissemination, straightforward publishing agreements, easy-
to-use manuscript preparation and formatting guidelines, and expedited production
schedules. We aim for publication 8–12 weeks after acceptance.
Tamás Sándor Biró
Variational Principles
in Physics
From Classical to Quantum Realm
Tamás Sándor Biró
Nano-Plasmonic Laser Inertial Fusion
Experiment (NAPLIFE)
Wigner Research Centre for Physics
Budapest, Hungary
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Man if ever learns it,
will do it in his kitchen.
Imre Madách: The Tragedy of Man
Preface
This book has been written based on a series of lectures given for physics students at
the Justus Liebig University (JLU) in Giessen, Germany first, and later to physics and
engineering students at the Technical and Economical University (BME) in Budapest,
Hungary. Deviating from other traditional courses in theoretical physics curriculum,
this course is not organized around a narrow circle of problems or disciplines. To
the contrary, here we follow a given principle in thinking using the related math-
ematical methods, the variational calculus across the classical mechanics, optics,
electrodynamics, thermodynamics, and quantum mechanics, both in interpretation
and derivation of the known basic equations. This method is versatile, not only in
classical mechanics and particle physics courses.
Crossing therefore the usual theoretical physics disciplines we use the variational
principle here as a guideline. Not the everyday calculational help is important here,
but the role in theory making that variational and symmetry principles play. This
appears in the treatise of such details as classical electrodynamics or the Schrödinger
equation in quantum mechanics, where most textbooks and lectures ignore the
variational principle and start away with the discussion of the field equations.
Meanwhile a derivation of the—in high schools also mentioned—Newton equa-
tions of motion from the variational principle of Lagrange function and the related
action is part of the university theoretical physics curriculum worldwide. At several
places however the discussion of technical details of the Maxwell equations is
far earlier than a mention of the Lagrange function of electrodynamics, if that is
mentioned at all in hasty courses. The Schrödinger equation occurs usually as a
deus ex machina, mentioning that its consequences are justified by experiments and
modern technology. Students have to learn its use and solution in practical mini
problems. Since understanding it in the sense of re-discovery with all the connected
emotions and cognitive rewards looks hopeless, its “derivation” or a review of alter-
native but equivalent presentations became a prohibited heresy. Meanwhile it is a
fact that quantum mechanics is the closest compromise to keep as much from clas-
sical mechanics as possible. Schrödinger’s variational principle formulates exactly
vii
viii Preface
1A more abstract mathematical approach also points out that the classical mechanics can be defined
on a distributive network in quantum logic, while quantum mechanics on its closest generalization,
on an orthomodular net.
Acknowledgements
First of all T.S.B. owes gratitude towards Dr. László Orosz, docent at the Department
of Physics at the Technical University Budapest (BME), for initiating the idea of
writing a textbook and for his persistent encouragement, as well as pedagogical
advice during the Hungarian edition. To Prof. János Kertész, member of Hungarian
Academy of Science (H.A.S.), I express my acknowledgment for his support of my
teaching in the framework of the Physics Institute at BME. Experiences from this
activity are precipitated in this book.
Professional lector of the Hungarian edition in 2012 was Prof. András Patkós,
member of H.A.S., that time head of the Department for Atomic Physics at the
Eötvös Roland University (ELTE). His several comments and questions inspired
me to increase the precision of some statements, and at a few locations to extend
explanations beyond my original plans.
A special thank belongs to József Pálinkás, that time president of Hungarian
Academy of Science, and to the Physics Class of that Academy for a financial support
for the Hungarian publication of this book.
Important help arrived from my colleagues at the Institute for Particle and Nuclear
Physics (KFKI RMKI), Dr. Péter Ván and Dr. Árpád Lukács, who have read my
original manuscript before submission.
For their various contributions in theoretical research, acknowledgement is due
for Drs. Antal Jakovác, Péter Ván, Etele Molnár, Rogerio Rosenfeld and for Gábor
Purcsel and Károly Ürmössy.
For the English edition Dr. Archana Kumari was of indispensable help
in commenting from the young researcher’s point of view. She deserves
acknowledgement.
Finally, last but not least, every scientific research and university teaching work
needs a corresponding background: without the exerted patience and understanding
from the side of my family this book could not have been written. I hereby acknowl-
edge the support from my wife, Tünde Biróné Riz, my daughters, Szilvia and Réka
Biró, and my step-children Katalin and Gábor Csengery.
ix
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Brief History of Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Antiquity: Paradoxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Medieval Thinking: Transition . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Renaissance: Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.4 Enlightenment: The Global View . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Romanticism: Grand Unification . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.6 Modern Times: Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.7 Contemporary Adventure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Variational Calculus Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Function and Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Higher Functional Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 A Simple Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Completion to a Full Square . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Limit of an Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Extremum of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.4 Variation of a Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 A Somewhat More Involved Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Mechanics: Geometry of Orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 The Principle of Virtual Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.1 Halt on the Slope ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 Bascule in Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 D’Alambert Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Free Circular Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 The Action Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Second Variation of the Free Motion . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Gauss Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 The Method of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . 34
xi
xii Contents
Further Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Glossary of People Appearing in the Hungarian Book . . . . . . . . . . . . . . . . . 101
Chapter 1
Introduction
In this book, a brief history of the principles that define how a physicist thinks is
presented. After reviewing that, the most important elements of the mathematical
formalism of the variational calculus with particular regard to the application of the
concept of the functional derivations are summarized. In the second, first technical
chapter, we deal with the variational principles of mechanics. It is used to solve
static (equilibrium) problems, which appear as a generalization of the principle of
virtual work (by Bernoulli) and D’Alambert’s principle, which serves as the basis for
Hamilton’s action principle. Various constraints, treated as side conditions, are taken
into account by the Gauss principle, and are illuminated with the help of Lagrange
multipliers; in the process providing us with the modern idea of the effective potential.
This is followed by the analysis of Maupertuis principle; not only as the story of
a mistake and its correction, but also as a foreshadowing of the general geodesic
motion. So far this principle, when applied to motion, focuses on the shape of the
trajectories instead of their temporal evolution. The chapter on the principles of
mechanical variation mentions the general action–angle variables and their use. It
concludes with a description of Fermat’s Least-Propagation-Time principle.
The third chapter deals with gravity, built up on the treatment of the Mapertuis
principle applied to a relativistically moving mass point. Similarly, the relativistic
motion of electric point charges follow a geometric variational principle, from which
the Lorentz force can be derived. The Newtonian (i.e. weak) gravitational field due to
the equivalence of gravitating and inert mass, in the temporal coordinate differentials
can be interpreted as a curvature effect. This opens the way to the curvature of total
spacetime by minimizing the Einstein-Hilbert action, and from this to derive the
Einstein equations.
The fourth chapter describes the classical fundamental laws of electric and mag-
netic phenomena, derived from corresponding variational principles. The Gauss the-
ory of electrostatics, Ampère’s law of magnetostatic mechanics and energy car-
ried by the fields with certain additional conditions are derived. In electrodynamics,
the symmetry of the magnetic and electric forces (EM-duality) are formulated by
describing the motion point by point, in small, causal steps. At the same time further
circumstances of the evolution of a physical system, in the form of initial, boundary
or auxiliary conditions can be taken into account more naturally, clearly, easy to
follow and deeply intuitive in the framework of the mathematical formalism applied
by the variational method. The determination of an optimum implicitly assumes the
view of alternatives, and as such it promotes the understanding of the path integral
concept, and in general the functional integrals summing over alternative orbits.
The variational method has its ancestors. The Laws of Nature, in particular that of the
motion was in the focus of natural philosophy already at the dawn of the European
culture, which had roots in the East, in the first civilizations. The fight between
calculability and the paradox nature of motion was long part of the evolution of
ideas.
Here we review a few stations of this evolution leading to our modern physics and
the central use of variational principle in its heart. The sampling is almost random,
but a skeleton of the arrow of time should be seen in it.
Our knowledge dates back to the old Greeks. For long time the European thinking
was determined by Aristotle’s principle, cited as “horror vacui” (angst from the
nothingness), that from the role of highest authority advanced later to the most
criticized one, equally unjustified. To date it is no more easy to reconstruct Aristotle’s
original thoughts. The phrase “horror vacui”, the abhorrence from the vacuum is
frequently interpreted nowadays as he would have regarded a sucking force of the
vacuum as the cause of motion.
Transgressing this vulgarized interpretation, questioned with right, it is worth to
consider another, more extended reconstruction. For the men of the antique, and so
for Aristotle, the vacuum was the archetype of paradoxical and therefore impos-
sible things: the “nothing”, about which we talk as if it were “something” located
“somewhere”. Can the nothing exist? According to them not, since the question itself
is paradoxical. The “horror vacui” principle in this context meant that phenomena
described by contradictory arguments cannot be realized in nature, “the contradiction
is not a real existing thing”. To be free of contradiction is even until today a require-
ment for scientific theories. According to experience during hundreds of years we
even believe that this principle expresses one of the most important qualities of the
reality.
A further development of this principle, connected to the Pythagoreans, sounds
more like aesthetics than experience: the nature is “beautiful”, the universe is har-
4 1 Introduction
monic, therefore its description must be based on beautiful and harmonic laws.
According to Pythagoras, science must describe everything by rational numbers or
in the language of geometry. This strict requirement is already abandoned in physics,
but still, the inappropriate numbers are called “irrational”, meaning “meaningless”
or “disproportionate”. Like the square root of two: it existed as a geometrical object,
as the diagonal of square with unit length sides, but not as a ratio of two integers.
By all means, the sophists, first of all Zenon, also have to be mentioned here.
The essence of their derivations of arguments was to demonstrate the paradox nature
of (continuous) motion. Up to our today’s knowledge this statement shows a point.
The conclusion that what is paradoxical is also impossible, and therefore in reality
there is no motion, at most its illusion, is contradicting the experience. We should
not forget that the paradox emerges from dividing the motion to smaller and smaller
pieces and not caring for the division of time accordingly, as opposed to the view
of comparing the initial and final state of a given motion. These views can only be
harmonized in the framework of infinitesimal mathematics, pioneered later.
Into that direction already Archimedes made first steps by seeking for formulas of
calculating volumes and surfaces of geometrical bodies and by determining centers
of weight for variously shaped masses. However, Newton was the first to achieve
breakthrough, and Leibniz, Euler and Lagrange made this technique understandable
and standardized much later.
The most known intellectual successor of Aristotle in the medieval age was Thomas
of Aquino. He was most interested in the question: Does motion has a goal? In
our modern context he might have had asked whether all physical motion can be
described within a local theory, using causal chains. This obviously depends on the
system under study. In more complex systems embracing several factors it is not
rare to observe a memory effect, that kills the locality in time. And we know that
the clever rat, who remembers where is the cheese placed in a labyrinth, performs a
motion which looks aimed and purposeful.
It occurred as obvious for the medieval thinkers that humans and animals move
tendentiously, and dead bodies do not. It became famous an example about the
burro of Buridan: the donkey finds himself between two fields, one with wheat the
other with barley. It feels hungry and loves both delicacies, it would run to the
nearest field. But whenever its position is symmetric, equally far from both fields,
the burro cannot decide and the purposeful motion is impossible. The donkey—since
it is a donkey—stays motionless and starves. What here emerges is the principle of
symmetry connected to the theory of motion and the breaking of symmetry as an
attribute of reality. At the same time the paradoxical formulation of the story reminds
to the great ancestors in the antique. The summary under the line is nevertheless
pointing out the indefensibility of the purposefulness in the description of physical
motion.
1.1 Brief History of Roots 5
Renaissance means being born anew; the birth of modern physics coincides with this
period. In line signalled by Kepler, Galilei and Newton the perception and concept
of motion had been changed entirely. Meanwhile Kepler’s stand was being a court
astrologist and Newton had alchemy as his hobby. Their basic principles about motion
are until to date the fundamental pillars of physics taught in schools: the uniform,
inertial motion itself is a natural state without any extra cause, and therefore no
absolute motion exists. We have to chase causes only behind the changes in the state
of motion, we call these causes “force” since then. The purpose of physics is to find
and calculate the most beautiful (state of) motion, not the most beautiful state.
The basic principles of Nature (principia naturalis) are causal: the changes develop
from point to point, form cause to consequence. The following phrasing of this stems
from Huygens: “all effect is local”. However, the theory of Newtonian gravity did
not yet follow this ideal image, it contained action at a distance. Newton himself
referred to this as a manque to be removed from his theory.
The age of enlightenment had enriched physics with new principles, first of all due
to the works of Euler, Lagrange, Hamilton and Jacobi, and of course Gauss in which
a new form of the Newtonian mechanics, but leading to mathematically equivalent
results, was elaborated. Our formulas used nowadays are traced back to that period.
Then the first variational principles in physics were born.
The action functional takes a calculation of cost, valid for the totality of the motion,
as a basis. We feel in this idea the developing world trade, the more and more public
attitude on calculating a maximal profit from the counterbalance of expenditure
and income. The notions of momentum, energy and velocity were mathematically
defined and generalized. The most determining experience is that holistic principles
lead to local, causal equations of motion. The opposite side of this coin is that the
interplay of local forces lead to a global harmony. The forces of free market will
make everyone happy—and whom not, he is guilty himself. It becomes clear in the
modern mechanics that certain conditions, constraints, in particular those arising
from symmetry, influence the motions to a great extent.
The 19th century was the age of great theoretical works, grand unifications. Based
on Maxwell’s equations the electrical, magnetic and optical phenomena are unified
in electrodynamics with a still sensible ancestry from mechanics. At the same time
6 1 Introduction
this is the first field theory, an important step towards the view of local forces and
natural phenomena as played on a multidimensional stage.
Another important development is the crystallization of thermodynamics. Varia-
tional principles with varying conditions are interrelated and can be transformed into
each other; in the background the mathematics of Legendre transformation hides.
Moreover nature has some laws which are expressed in an inequality instead of an
equation: the entropy in closed systems cannot decrease, as a trend. Related to this
the final state of universe, the “heat death” is visioned.
An imprescriptible merit of thermodynamics is that it raised the attention of physi-
cists to the atomic world, a playground only for chemists until then. The mechanics
as a role model was a midwife at the birth of kinetic gas theory. The trust in the
fundamental principles is stronger then ever.
The 20th century brought a revolution in several issues, so did in physics. The quan-
tum mechanics as “new physics” seemingly broke all known rules. However, inter-
estingly the natural laws in quantum and classical physics, in the micro and macro
world, not only differ but they also show similarities. The famous Schrödinger equa-
tion was derived by him from such a variational principle that optimized the violation
of the classical Hamilton–Jacobi equation. Quantum mechanics is the most classical
non classical physics. What we thought to know before is untrue, but we shall be
somehow happy, that it is untrue only to a least extent.
Another break with the classical tradition in 20th century physics is the theory of
relativity. Partially known since Galilei, inherent in the Maxwell equations, but at
the first time by Einstein raised to the rank of a fundamental principle is the special
relativity principle. A further generalization of it led to the unified geometric view
of space, time and gravity. Whether all laws of nature are geometrical? This would
not deny the applicability of algebraic or other, e.g. numerical symbols, but it would
be superfluous to cite the causes of changes as genuine forces.
The history of ideas after the Great World Wars I and II, perhaps due to their proximity
to our age, shows a fractal, chaotic picture. Yet we grab a few elements from the
evolution of fundamental physical principles, even if arbitrarily.
The quantum version of field theory and the physics of elementary particles
brought the reconciliation of the quantum and relativity principles to the agenda.
Also the measurement technology reached that level where minuscule quantum par-
ticles moving with velocities near to that of the light can be studied. New among the
principles is the method of Feynman’s path integral, which instead of choosing the
1.2 Variational Calculus Basics 7
“best” orbit democratically sum over all possible orbits “contributing to the reality”.
True, their contribution is restricted by strict rules, and possible paths interfere like
waves. Under certain circumstances at the end some possibilities dominate over the
others; and exactly then the physical world occurs as “classical”.
It is still under discussion, the quest for the Holy Grail of the physics at the turn of
millennia, the quest for the original “ancestor” symmetry. Grand unification theories,
the “Theory of Everything”, quantum gravity and elementary string theory all show
a fermentation in the mathematical arsenal. The experimental control is not possible
in all aspects, in some other aspects it looks rather discouraging so far. We would
wish that a good fundamental principle is both beautiful and it functions in praxis.
Let the solution of a physical problem be a rule connecting a real number to another
real number, briefly a function, f : R → R. This can be the solution of an equation
describing dynamics, giving the spatial coordinates as functions of time, or describing
a two-dimensional orbit of a moving mass point in space. The set of these functions
are noted here by F , the sought solution functions are elements of this set. A (real)
functional orders a (real) number to such a function: I : F → R. In the followings
we are going to use a distinction in the notation of the number valued functions
and functionals: the functions are written using round, the functionals using square
brackets. In this way the real number f (x) is the value of the function f at the
argument x, while I [ f ] is the value of the functional I for the selected function f .
1A functional in the mathematics is a mapping of a V vector space over a base body into the base
body. The base body can be R, C or even R n .
8 1 Introduction
1
I[ f ] = d x | f (x)|2 − x 2 | f (x)|2 , (1.1)
0
those value is a definite integral over an integrand composed using the function, its
derivative and in some cases another function of the independent variable. Another
example describes the length of a curve, parameterized by the function f (x) in a
two-dimensional space, between the points at x = a and x = b:
b
1/2
L[ f ; a, b] = d x 1 + f (x)2 . (1.2)
a
1.2.2 Variation
In the classical physics the possible solutions are overwhelmingly continuous and
repeatedly differentiable, short smooth, functions. These functions are many and
they are dense in the set F . Therefore it makes sense to consider “close” functions,
and to talk about a nearby function, f + ∈ F . A well-applicable definition of the
“nearby” property for functionals of type L( f, f , x)d x is as follows:
Definition 1.1 Let f and be smooth functions on the interval [a, b]. We call the
function f + near to f if the function norm of can be made arbitrarily small,
e.g.2
= max[a,b] |(x)| + | (x)| −→ 0+ ,
A further condition is that the derivative of the difference function, also contributes
to the function norm used, otherwise non-differentiable functions could also be clas-
sified as nearby to a given smooth solution for an orbit stemming from classical
physics. We call the nearby function, f + as the varied pendant of the function f ,
frequently also denoted as f + δ f .
This construction leads us to the determination of the variation and the functional
derivative of an I [ f ] functional. The (first) variation of a functional is given as
b
δ I = I [ f + ] − I [ f ] = d x (x) D(x) + O () . (1.3)
a
The notation O is the “small ordo”, denoting such terms which tend to zero even if
divided by their argument. The coefficient function in the integrand, first order in
the difference function, D, is the functional derivative of I . Using the traditional
notation
δI
D= . (1.4)
δf
The operation of the derivative for functionals can be used for searching for extreme
values of the functional—in analogy to the world of functions. The condition δ I = 0
is fulfilled only if D(x) = 0, except some possible zeroth of the function . The latter
must be restricted to a set of zero measure inside the interval [a, b]. In the practice the
used test functions, (x), are nowhere zero. For an arbitrary , on the other hand, this
means that at the extremum of the functional the function emerging from D(x) = 0
is the sought one. D(x) is identically zero for the whole interval in this case, not only
in some points xi .
δ 2 I [ f ] = δ I [ f ] − δ I [ f − ] = I [ f + ] + I [ f − ] − 2I [ f ]. (1.5)
b b
δ I = d x dy (x) M(x, y) (y) + O 2 .
2
(1.6)
a a
3 In field theory the n-point functions are gained from the path integral this way.
10 1 Introduction
with G(x, y) being the associated Green function. It is a specific case to solve the
trivial equation, F(x) = J (x), in this case P = 1 identically. The corresponding
Green function is not a “function”, but a mathematical distribution, the so called
Dirac-delta. By definition its action under the integral is identical in the sense of
F(x) = δ(x − y) J (y) dy = J (x) (1.9)
for all possible J (x) source functions. This property helps to obtain the inverse of a
Green function. Considering
K (x, z) G(z, y) dz = δ(x − y) (1.10)
The kernel K (x, y) = G −1 (x, y) is to be determined. Based on Eq. (1.8) the follow-
ing differential equation determines the Green function
d
P G(x, y) = δ(x − y). (1.11)
dx
A comparison between the Eqs. (1.10) and (1.11) reveals the connection between
the polynomial differential operator, P ddx and the integration kernel K (x, y). Both
expressions are the inverse of the Green function.
Seeking for an optimum or extremum raises the question of the stability of the
found function or orbit. Similarly to the functional analysis δ 2 I > 0 describes a
1.3 A Simple Exercise 11
minimum, and δ 2 I < 0 a maximum. In the case δ 2 I = 0 the nature of the extremum
remains indeterminate. The kernels can be analyzed by projecting them into a given
basis consisting of functions, h i (x), for example a Fourier analysis, which reduces
the stability investigation to the study of the spectrum of an infinite matrix Mi j . In
benevolent cases the subspace belonging to zero eigenvalues can be factorized out,
and the stability of the extremizing solution can be determined by the rest of the
eigenvalues and subdeterminants.
Finally the question naturally arises that if one can calculate with derivatives of
functionals against functions, whether the inverse operation, the integration is also
possible. Such an operation—and its result—is the functional integral. This does
not obtain a “primitive” functional to a given function, assumed to be the result
of a functional derivation, but it rather resembles the definite Riemann integral,
approximated by ever finer trapezoid contributions: it sums over a functional for all
possible function values. In this way such a procedure maps a number to the starting
function. Or in more general cases, when containing several functions as arguments,
and not integrating over all functions as variables, the result is another functional
just having the remaining functions as argument:
N
Z= D f I [ f ] = lim d fi I [ f (xi )]. (1.13)
N →∞
i=1
Several methods are known for determining extremes of functions, and the strategy
is similar in using variational principles. The vanishing functional derivative aka
the variation, the complete square form in quadratic cases, investigation of limits
of inequalities, or arguing with symmetry, all occur for variational problems, too.
In order to demonstrate these basic strategies we solve a very simple problem with
several of these methods in the followings. The use of a functional will be the last in
this queue (Fig. 1.1).
We seek among right triangles the one with maximal area. Let the length of its
span be c, the sides a and b. The area is t = ab/2 and the Pythagoras formula,
c2 = a 2 + b2 holds.
12 1 Introduction
Fig. 1.1 A simple problem: we seek among right triangles with a fixed span those which have a
maximal area. Based on the symmetry revealed in the Thales circle (left) this is the case for equal
sides (a = b), this being the highest of all triangles with the same span. The area squared as a
function of the square of one side is shown in the right part of this figure
Twice the area, 2t = ab can be interpreted as a geometrical mean, being always less
or equal to the arithmetic mean:
√
ab = a 2 b2 ≤ (a 2 + b2 )/2 = c2 /2.
The limit of the inequality above is for equal side squares, a 2 = b2 , equivalently at
equal sides a = b.
Ignoring the geometrical nature of the problem one may try more
√ general meth-
ods. Seeking for the maximum of the function f (a) = 2t = a c2 − a 2 , the first
derivative vanishes at the maximum:
f (a) = c2 − a 2 − a 2 / c2 − a 2 = 0.
1.4 A Somewhat More Involved Exercise 13
From this condition a 2 = c2 − a 2 = b2 is given, the known result. The novelty here
is that this method is more automatic and works for a number of other problems, too.
One may analyse the above problem using the notion of a functional, too, even if it is
not compulsory. The area of a triangle can be calculated as the integral of a function
containing two linear pieces. This is a real functional. Denoting the radius of the
Thales circle drawn around the triangle by r and putting the coordinate system to
its center, the integration runs over the interval [−r, +r ], with r = c/2 fixed. In this
way we are looking for the maximum of the integral
+r
I[ f ] = f (x)d x.
−r
By this move we have generalized the problem, we may seek for the maximal area
under any curve described by an f (x) function. Even the condition of the fixed span
can be changed, e.g. to a given length of the curve. In our case the coordinates of the
peak diametrical to the span are x0 and f 0 = f (x0 ). Now the integral I [ f ] consists
of two linear piece contributions and its value is given by
I [ f ] = r f 0 = r r 2 − x02 .
In our next problem we seek for that curve of given length which closes a maximal
area with the horizontal line. We show, using functional technique, that such a curve
must be an arc of a circle.
Let us consider a symmetric arrangement: the curve spans between x = −a and
x = +a on the line, and is symmetric to the y axis. Its length then is
+a
L= 1 + f (x)2 d x, (1.14)
−a
as long as the function y = f (x) describes the curve. Here f (x) is the derivative of
the function f with respect to x. The area to be maximized is the definite integral
14 1 Introduction
+a
T = f (x) d x. (1.15)
−a
+a
f (x)
δT − μ δL = δf − μ δ f (x) d x. (1.17)
1 + f (x)2
−a
Noting now that δ f (x) = ddx δ f (x), we perform an integration by parts in the second
term. The coefficient of δ f (x) under the integral is the functional derivative to be set
to zero:
δ d f (x)
(T − μ L) = 1 + μ = 0. (1.18)
δf dx 1 + f (x)2
This is a second order ordinary differential equation with the general solution
f (x) x
=− + C. (1.19)
1 + f (x)2 μ
Since due to the symmetry f (0) = 0, the constant C = 0 vanishes. Now f can be
expressed as
df x
f (x) = =± 2 . (1.20)
dx μ − x2
Here the relation μ2 ≥ a 2 must hold. Furthermore, since f (x) is rising for negative
x and falling for positive x, the negative sign is to be chosen. The solution of the Eq.
(1.20) reads as
x
t
f (x) = f (0) − dt = f (0) + μ2 − x 2 − μ. (1.21)
μ2 − t 2
0
Owing to f (±a) = 0 one determines f (0) and obtains the final result
f (x) = μ2 − x 2 − μ2 − a 2 . (1.22)
Finally the Lagrange multiplier, μ, can be related to the more physical quantities
of total length, L, and area T as follows
a
T = μ2 arc sin − a μ2 − a 2 , (1.23)
μ
It is customary by now to date the birth of modern physics to the dawn of the sev-
enteenth century. Walking in steps of Kepler, Galilei and Newton the mechanical
worldview develops, and this serves as an idol for other disciplines till date. Elec-
trodynamics at the end of the nineteenth century was interpreted with help of a
mechanism transporting fluids and consisting of valves and taps by some contem-
porary authors, based on the natural existence of a carrying medium of the electric
fluid, i.e. the ether. Also the theoretical separating line, quantum mechanics, car-
ries in its name that ideal. Moreover time to time there are public discussions about
economical, political and social mechanisms.
In the Anglo-Saxon world this process is used to be regarded as the birth of the
“scientific method”, which selects out the analysis of causal connections as a noble
and effective method in aiming at improvements in our life, unifying the related
activity in theory building, discussing an experimental proof in a system of rules of
thinking. This basic relation, the causality, is not only a local, step by step principle.
Whole causal chains or a contiguous network of those, caused by mutual forces
and the changes induced by them, in the mechanics form a complex picture as a
whole, and is—although technically involved—always calculable. After clarifying
what changes (the state of motion) and how (e.g. following Newton’s second law),
the noblest challenge in physics what remains is to investigate the nature of forces
and the causes behind their appearance.
At the same time it has been gradually revealed that not only the links in the causal
chain but also the beginning and the end of the whole chain, initial and boundary
conditions play an important role in realizing the basic laws of physics, too. The view
based on variational principles comprises these in a single principle, i.e. the local
action and the global boundary condition. In this theoretical approach the role of
symmetry is ever more emerging; initially just as simplifying ease in understanding
the world, but later more and more as an aesthetic category, as a fundamental principle
in exploring physical laws in its own right.
Why does one need more than Newton’s third law, assuming the balance between
forces? Because the magnitudes of the so called coercive forces are not known,
only their direction. A good example is a massive body moving on a slope with a
given shape. The direction of the coercive force in a given point is orthogonal to
the solid surface, but its magnitude is just as big as ensuring that the movement of
the body alongside the slope surface happens. Not a dynamics information, but the
geometrical nature of the restriction determines the magnitude of such a force. It
does it independently of the material properties of the heavy body and the slope, yet
only until a certain limit where either the mass or the slope or both would become
too elastic or even fluid. Then dissipation, memory effects and turbulence would
influence the motion in a more complicated manner.
The principle of virtual work derives the determination of a static mechanical
equilibrium by using the geometry of restrictions by surfaces. Let us review this
procedure on the basis of an example of a simple mechanical system.
Figure 2.1 demonstrates the problem of finding equilibrium on a slope. The left side
shows the balance of physical forces, on the right side we have used an inherent sym-
metry in the situation. It is clear that the chain cannot turn right or left autonomously,
without an external action. Furthermore the hanging part obviously balances itself.
What remains is the weight proportional to the height of the slope, that is balanced
by another weight proportional to the length (span) of the slope. This solution was
obtained without using formulas from algebra, purely based on the geometry of
arrangement.
Here we shall not discuss methods involving detailed searches of forces, this can
be found in several high school textbooks. Instead we turn to the problem solving
based on the principle of virtual work, PVW. A consequence of the PVW is that in
2.1 The Principle of Virtual Work 19
K
a c
b
G2
G1
Fig. 2.1 The problem of balance on a slope based on the view of forces (left) and on a symmetric
arrangement of weights (right)
gravitational field a small vertical movement of both weights belongs to zero work
to the leading order:
G 1 y1 + G 2 y2 = 0. (2.2)
Of course, the weight G 2 moves alongside the slope surface, its total displacement
has a vertical component y2 . The tightness of the connecting cord and its “inability
to be stretched” delivers a pure geometrical condition, not involving the weights:
y1 + = 0. (2.3)
Finally a rule for similar triangles known already from the classical antique connects
the ratio of displacements to sizes of the slope:
c
= , (2.4)
y2 a
with c being the span or length of the slope and a its vertical side, i.e. its height.
These three information leads now easily to the “rule of slope”, obtained directly
from symmetry arguments above:
G1 : G2 = a : c (2.5)
is the condition for equilibrium. In a state near to this equilibrium only a small force
suffices to hive heavy weights, especially if the inclination of the slope and with that
the ratio a : c is small enough. Using this knowledge gigantic pyramids had been
erected.
20 2 Mechanics: Geometry of Orbits
In a similar way one can use the principle of virtual work (PVW) for describing the
equilibrium of a bascule. According to the PVW, when the weight G 1 is lifted by
y1 the counterweight G 2 sinks by y2 , the balance is given by
The geometry of the constraint is based on the rigidity of the arms: the angles at
an imagined small rotation around an axis spearing through the point of support are
equal:
y1 y2
tan ϕ = = . (2.7)
1 2
From these two equations the “bascule rule” emerges (Fig. 2.2)
G 1 1 = G 2 2 . (2.8)
The D’Alambert principle in modern view is nothing else than the principle of virtual
work if inertial forces due to acceleration are taken into account. These inertial
forces appear in accelerating coordinate systems, common in all of them, that their
magnitude is proportional to the inertial mass. Such are the centrifugal force felt in
a vehicle taking a sudden turn, or the Coriolis force acting on moving masses in a
rotating system, also responsible for huge cyclones in Earth’s atmosphere. In some
classical western movies a glass of whiskey is shown as it slides on the table: since
the surface of the liquid is orthogonal to the sum of forces, this direction is no more
horizontal, unlike for a standing glass with whiskey. (See Fig. 2.3).
The principle of general relativity states that the laws of physics are independent
from the accelerating motion of observers, those can even be automatic devices,
detector systems without a human. Therefore all accelerating motion can be described
as a static equilibrium problem inside a comover coordinate system. General relativity
2.2 D’Alambert Principle 21
Fig. 2.3 Illustration to the interpretation of the D’Alambert principle as a PVW: the inertial force
−ma has to be taken into account when explaining the surface of liquid in an accelerating glass
tells us something about gravitation, because the gravitating mass, occurring in the
Newtonian law of gravity, and the inertial mass, occurring in Newton’s second law,
are equal, as it was justified first by experiments run by the Hungarian Baron Roland
Eötvös.
However, for the contemporaries the D’Alambert principle appeared as an
autonomous principle on its own. It served to describe the motion of mass points
or center of mass points in generally moving systems. If all forces—including the
inertial “pseudo” forces—are taken into account in a calculation of the work for a
virtual displacement, the vanishing of this total work to leading order constitutes
D’Alambert’s principle:
(Fi − m ẍi ) xi = 0. (2.9)
i
Some regard this formula as a consequence of Newton’s second law, but in that
derivation not only the sum of all terms vanishes, but also each term separately.
This situation is symmetric: for taking the D’Alambert principle (DAP) for arbitrary
virtual displacements, xi , is already equivalent with Newton’s law. In this sense
DAP is a (discrete) variational principle.
For conservative force fields the force is the negative gradient of a potential energy,
for one dimensional motion F = −d V /d x. As an illustration we show here that in
this case the DAP expresses the constancy of the sum of kinetic and potential energy.
For this purpose we consider infinitesimal virtual displacements and turn the sum
into an integral
22 2 Mechanics: Geometry of Orbits
dV d2x
D= (F − m ẍ) d x = − − m 2 d x. (2.10)
dx dt
Transforming this expression into an integral over the time coordinate passing on x(t)
orbits we apply d x = v dt with v being the instanteneous velocity along the orbit.
Now one recognizes that the DAP led to a time integral of a total time derivative:
dV d2x d x dV dv
D= −−m 2 dt = − − mv dt
dx dt dt dt dt
d mv2
=− V+ dt. (2.11)
dt 2
mv2
E = V (x) + , (2.12)
2
the sum of kinetic and potential energy is constant in time.
A particular case is the free motion of a mass point when the potential energy
vanishes, V = 0. The kinetic energy of an arbitrary system of freely moving masses
m i with velocities vi becomes
1
N
T = m i vi2 , (2.13)
2 i=1
constant in time, therefore one may describe that motion in the 3N -dimensional
configurational space by a differential arc length square, weighted by masses, as
follows:
N
ds 2 = 2T dt 2 = m i (d xi2 + dyi2 + dz i2 ). (2.14)
i=1
This is the differential geometric description of the free motion. The use of general-
ized coordinates (like angles and radii) leads to a more powerful formula:
ds 2 = aik (q)dq i dq k . (2.15)
i,k
This form will be the foundation for the Mapertuis approach, seeking for the shortest
path.
In the followings we use two examples for applying the DAP: free motion on a
circle and the swings of a pendulum.
2.2 D’Alambert Principle 23
ds 2 = 2T dt 2 = m(d x 2 + dy 2 ). (2.16)
On the other hand it is immediately obvious that in polar coordinates the radius of a
circle is constant, so the arc length square has a more concise expression in terms of
an angle differential:
ds 2 = m R 2 dϕ 2 . (2.17)
The free motion on a circle is one-dimensional, although not straight, but still smooth:
ϕ(t) = ϕ(0) + ω t. By a fixed kinetic energy, T , the angular frequency of the circular
motion is ω = 2T /m R 2 , and the period is 2π/ω. The velocity on the orbit has a
constant magnitude, v = Rω.
Alternatively, the DAP can be applied blindly not considering any speciality of
the circular geometry in the beginning. More precisely the latter will be taken into
account in the language of naive, Cartesian coordinates. The DAP confirms
follows. From this dy = −xd x/y is expressed and the arc length square at constant
kinetic energy becomes
m R2
ds 2 = 2T dt 2 = m(1 + x 2 /y 2 )d x 2 = d x 2. (2.20)
R2 − x 2
2.2.2 Pendulum
The motion of a pendulum in our age appears as a training exercise in high schools.
However, in the 16th century it was a very important problem for the humanity. An
ancestor of the pendulum clock, Galilei’s pendulum, was good for measuring time,
better than counting men’s own pulse or using a sand glass. Its only disadvantage is
that its motion is not uniform over time, since the weight moves in a gravitational
field, changing its potential energy, too. A reliable measurement of time became also
important for navigation on the sea, because the longitude of a ship’s position without
knowing the precise time, only based on the instantaneous picture of the night sky
could not be determined. No wonder that the British Admiralty announced a special
prize for constructing a mechanical clockwork which is beating uniformly in time.
A variation of the pendulum which ticks uniformly in time, the solution of the so
called isochron problem, is due to Huygens. The practical solution some time later
became the mechanical clock with springs and gears.
The swings of the simple physical pendulum can be described in a coordinate
system using the suspension point as origin. The y axis is vertical, along that the
potential energy changes, whose derivative coincides with −mg force. The DAP is
of the following form
The geometry of the pendulum, with a tensed cord with constant length, provides
the following relation between the coordinates:
y = L (1 − cos ϕ) ,
x = L sin ϕ. (2.25)
2.3 The Action Principle 25
Since only the actual angle of the cord changes during the motion of a pendulum,
the coordinate differentials are related to the angle differential as
dy = L sin ϕ dϕ,
d x = L cos ϕ dϕ. (2.26)
In the complete DAP also the work of gravity is taken into account, so we arrive at
g
− m L 2 dϕ sin ϕ + ϕ̈ = 0. (2.29)
L
The vanishing of the quantity in the bracket delivers the equation of motion for the
pendulum. It is noteworthy that we have obtained this result based only on the inertial
forces and the principle of virtual work; no word about the angular momentum has
been used.
The action principle is the successor of the DAP on the throne of variational principles
in mechanics. In fact it can be derived from it. However the point of view is different:
so far we have analyzed a sum of little work terms connected to virtual displacements,
declared to vanish in the leading order. From now on we shall be looking for an
extremum (maximum or minimum) of something.
As the PVW (principle of virtual work) due to Bernoulli regards a sum of imagined
works, the dW = i Fi d x i one-form functional, we construct another one, the
action functional, connected to D’Alambert’s principle:
(Fi − m ẍi )d x i = d Ṡ [xi ] . (2.30)
i
Certainly, since the work done by inertial forces contains the accelerations, such an
action must depend on the time derivatives of orbital curve xi (t), too. Finally
26 2 Mechanics: Geometry of Orbits
is the general form of the action. Higher than first derivatives of the sought solution
curve may appear in problems regarding the shape of elastic rods under weight,
but in the mechanics of mass points these will not occur. The explicit time interval
dependence is also absent for conservative (non dissipative) systems.
In the presence of conservative force fields, Fi = −∂ V /∂ x i , so the complete
action can be written as a time integral of a Lagrange function. We write the sum of
work and “inertial work” as an infinitesimal change in the action:
t2
d
dS = Fi − (mvi ) d x i dt. (2.32)
i
dt
t1
t2
∂V
dS = − + m v̇i d x i dt. (2.33)
i
∂x i
t1
t2
∂V
d S = − mvi d x i t2
+ mvi (d x˙ i ) − d x i
dt. (2.34)
t1
i
∂x i
t1
t2
m 2
dS = d v − V (x) dt. (2.35)
i
2 i
t1
Formulating in words, the difference between kinetic and potential energy is the
Lagrange function, whose integral in time is the action functional,
t2 t2
S= (T − V ) dt = L dt. (2.36)
t1 t1
2.3 The Action Principle 27
The first variation of this action functional set to zero determines the classical equa-
tions of motion.
For a general conservative Lagrange function, L = L(x, ẋ), the variational action
principle leads to a general form of equations of motion, to the Euler-Lagrange equa-
tions. This derivation is widespread, it can be found in all textbooks on theoretical
mechanics. We present it here for the sake of completeness.
The variation of the action,
∂L ∂L
δS = δ L dt = δxi + δ x˙i dt (2.37)
∂ xi ∂ x˙i
Since this action-variation must vanish for arbitrary orbit-variations, δxi (t), the
expression inside the brackets—being at the same time the first functional derivative
of the action functional—is identically zero:
δS ∂L d ∂L
= − = 0. (2.39)
δxi ∂ xi dt ∂ x˙i
The orbital velocities, vi (t) = ẋi (t) and positions, xi (t) satisfy the equation of clas-
sical motion above. Here the xi (t) are not necessarily Cartesian coordinates, they are
in general real functions of time describing the motion in general terms.
Another view of the variation considers the case when the variation does not
vanish at the integration limits (this is sometimes called a “second type” variation),
but the Euler–Lagrange equation (2.39) is fulfilled. Then the remaining variation of
the action is from the integrated product term by the integration by parts:
t2
∂L
δ(2) S = δxi = 0. (2.40)
∂ ẋi t1
In case the coordinate variations at the endpoints are nonzero and arbitrary, it can
be viewed as a consequence of an infinitesimal coordinate transformation, xi =
xi + δxi ( a ), parameterized by the a quantities, during which the action functional
is invariant, then the following expression is constant in time:
∂L ∂
Qa = · a δxi . (2.41)
i
∂ ẋi ∂
Such Q a conserved quantities are called Noether charge. Most known are those
belonging to a coordinate shift, δxi = i , being the momenta, Pi = ∂∂ẋLi and to a time
shift t = t + , with the corresponding energy. In the latter case the change in the
28 2 Mechanics: Geometry of Orbits
action by such a transformation can be calculated in two ways leading to the same
result: first by adding -times the total time derivative of the action functional to
itself, second by pushing the orbital points towards those belonging to a later time.
In this way
S = S + L = L(x + ẋ, ẋ + ẍ) dt. (2.42)
Using now the Euler-Lagrange equation (2.39), the expression under the integral
becomes a total time derivative, whose integral between the time instants t1 and t2
gives the difference
d ∂L
H (t2 ) − H (t1 ) = ẋ −L = 0. (2.44)
dt ∂ ẋ
The difference is zero, therefore the conserved energy can be expressed from the
Lagrange functions. The result is the Hamilton function:
H= ẋi P i − L . (2.45)
i
This formula states also that the Hamiltonian is a Legendre transform to the Lagrange
function. Here the independent variables are changed from the time derivatives of
generalized coordinates, i.e. generalized velocities vi = ẋi , to the corresponding
momentum components, Pi , defined by the symmetry against spatial shifts.
For non conservative dynamical systems, the action explicitly depends on time.
In this more general case the partial derivative of the action contributes to the total
change, too. The Hamiltonian is not constant in time in this case, but it exactly
compensates the change due to the partial time derivative. This statement is comprised
in the Hamilton–Jacobi equation:
∂S
H+ = 0. (2.46)
∂t
Here in the arguments of the Hamiltonian the momenta are replaced by the corre-
sponding gradient components of the action, Pi = ∂∂xSi .
2.4 Second Variation of the Free Motion 29
The annihilation of the first variation of the action delivers those differential equations
which describe the classical motion. The second variation hints at the stability of those
orbits in the framework of the variational principle. Here we do not deal with the
general case, instead for the free motion of a mass point with mass m will be the
second variation of the action calculated.
The motion is defined by the following Lagrange function:
1
L= m i ẋi2 . (2.47)
2 i
∂L d ∂L
− = −m i ẍi = 0, (2.48)
∂ xi dt ∂ ẋi
one has to solve this “secular” equation. In order to obtain this one uses the eigen-
functions of the matrix Mi j , i.e.
The eigenfunctions for the free motion without acceleration are solutions of the
following differential equations:
d 2 (n)
−m f (t) = ωn2 f i(n) (t). (2.52)
dt 2 i
The boundary conditions fix the variations at the beginning and at the end of the
time-integration, therefore for all eigenfunctions f i(n) (t1 ) = 0 and f i(n) (t2 ) = 0. Such
solutions to the Eq. (2.52) are sine functions:
30 2 Mechanics: Geometry of Orbits
ωn
f i(n) (t) = Ai(n) sin √ (t − t1 ) (2.53)
m
Satisfying now also the vanishing first variation of the action, the eigenvalues must
obey
ωn
√ (t2 − t1 ) = n π. (2.54)
m
The variation of orbits, like a string fixed at its endpoints, produces oscillations. Its
frequencies are functions of the length of the time-interval:
√
π m
ωn = n. (2.55)
t2 − t1
The orbit solution found by variation of the action, since all ωn2 values are posi-
tive, and therefore the frequencies are real, is stable. Formally there is also a mode
with zero frequency, but this would describe a variation identically zero in the full
time interval. Such a null-variation is usually excluded when solving the variational
principle (Fig. 2.4).
An arbitrary variation can be expanded in terms of the above eigenfunctions,
δxi (t) = cn f i(n) (t) (2.56)
n
is a periodic function with the period t2 − t1 . Its function norm can be expressed with
the help of Fourier coefficients, where a norm of one can be chosen for each mode
rescaling the coefficients, cn , only. Finally
t2
1
δxi (t) =
2
δxi2 (t)dt = |cn |2 |Ai(n) |2 , (2.57)
i
2 n,i
t1
is the norm of the orbit variation, based on the energy theorem of Fourier.
Carl Friedrich Gauss has enriched the science in mathematics and physics with sev-
eral novelties, we use even to date. Besides the well-known Gauss–distribution (the
bell shape curve) and the—among physicists and mathematicians popular Gauss–
Ostrogodskii theorem, such an everyday tool, like the fit of a most probable line onto
a number of data points minimizing the sum of the squares of distances from this
line, all are related to his name. The Gauss principle in the mechanics is in fact the
minimization of deviances from Newton’s second law by the Gauss method.
This result can be motivated starting with D’Alambert’s principle (DAP). Imagine
that during the motion the coordinate functions change a little due to a change in the
time:
1
xi (t + τ ) = xi (t) + vi (t) τ + ai (t) τ 2 + · · · (2.58)
2
Since in the mechanics the initial position and velocity can be fixed, the leading
order in the variation during this short τ time can be traced back to the variation of
acceleration:
1
δxi (t + τ ) = δai (t) τ 2 (2.59)
2
being the difference between the original and the varied motion. Writing the DAP
both at t and at t + τ , their difference will contain the variation of acceleration only:
1 2
τ (Fi − m i ai ) δai = 0. (2.60)
2 i
Since the forces Fi are given by the natural laws and by the specialities of the
arrangement under scrutiny, these cannot be varied: δ Fi = 0. Therefore the variation
of the acceleration is essentially the variation of the whole expression in the bracket,
δai = −δ(Fi − m i ai )/m i , so Eq. (2.60) cab be written as a total variation being zero:
32 2 Mechanics: Geometry of Orbits
1 2 1
τ δ (Fi − m i ai )2 = 0. (2.61)
2 i
2m i
This is equivalent with the condition for an extremum of the varied sum—logically
its minimum:
1
G= (Fi − m i ai )2 = min. (2.62)
i
2m i
The above is the Gauss–principle, it minimizes the squares of deviances from New-
ton’s second law weighted by inverses of the masses. Although in the absolute min-
imum of this expression Newton’s law is fulfilled for all particles with mass m i
separately, for describing the motion less is enough. The Gauss–principle is best
used if certain forces are unknown, and rather geometric constraints cause them
(Fig. 2.5).
As an example we discuss the motion on a flat slope. The equation determining
the general constraint surface is of the form z = f (x, y), for a flat slope this function
is multilinear, z = ax + by + c. Gravity is a known force, acting in the z direction,
Fz = −mg. Due to the constraint the z-component of the velocity and acceleration
are not independent from the other two components: derivatives of the constraint
give
ż = f x ẋ + f y ẏ,
z̈ = f x ẍ + f y ÿ + f x x ẋ 2 + ( f x y + f yx )ẋ ẏ + f yy ẏ 2 . (2.63)
Here the lower indices to the function f denote partial derivatives with respect to the
noted variable. The Gauss–principle for a single mass is reduced to the problem of
minimization of the expression below:
This quantity is to be varied against ẍ and ÿ, a possible variation against z̈ would not
contain additional information:
δ δ z̈
(Fz − m z̈) = −m = −m f x ,
δ ẍ δ ẍ
δ δ z̈
(Fz − m z̈) = −m = −m f y . (2.65)
δ ÿ δ ÿ
δG
2m = 2(Fx − m ẍ)(−m) + 2(Fz − m z̈)(−m f x ) = 0,
δ ẍ
δG
2m = 2(Fy − m ÿ)(−m) + 2(Fz − m z̈)(−m f y ) = 0. (2.66)
δ ÿ
What is still needed is a substitution of z̈ from the Eq. (2.63), in the general case
leading to rather involved formulas. However, there exists a combination of the above
two lines which is independent from z̈:
L̇ z = m( f y ẍ − f x ÿ) = f y Fx − f x Fy . (2.68)
Finally using the formula for a flat slope, z = ax + by + c, and the known compo-
nents of gravitational force, Fx = 0, Fy = 0, Fz = −mg, the equations of motion in
(2.67) resemble a much simpler form:
1
ẍ + a ẍ + b ÿ = −g,
a
1
ÿ + a ẍ + b ÿ = −g. (2.69)
b
This in turn delivers the following result for the acceleration components:
a
ẍ = −g ,
1 + a 2 + b2
b
ÿ = −g ,
1 + a 2 + b2
a 2 + b2
z̈ = −g . (2.70)
1 + a 2 + b2
34 2 Mechanics: Geometry of Orbits
The absolute value of the acceleration vector can be expressed from the above formula
for its components as being
a 2 + b2
|a| = ẍ 2 + ÿ 2 + z̈ 2 = g = g sin α, (2.71)
1 + a 2 + b2
with α being the angle of the slope to the horizontal plane. Exactly this reduction
of the gravitational acceleration, g sin α led Galileo Galilei to make experiments on
slopes, because in this way the free fall can be examined in a time zoom.
Lagrange suggested instead to think differently: the constraints are valid both for the
original and for the varied orbits, therefore both F’s and f ’s first variation vanishes
at the solution:
∂F
δF = δxi = 0,
i
∂ xi
∂f
δf = δxi = 0. (2.73)
i
∂ xi
Furthermore an arbitrary linear combination of the above two equations is also zero;
the admixture coefficient is the Lagrange multiplier. λ:
2.6 The Method of Lagrange Multipliers 35
δ (F + λ f ) = 0. (2.74)
The strategy with Lagrange multipliers is as follows: (i) from one of the variations
above, say the n-th, we obtain λ, and then ii) the rest n − 1 equations are treated as
equations of motion for the coordinates (x1 , x2 , . . . , xn−1 ). Certainly beyond that the
constraint f = 0 is also to be used at the end, but this type of equations are often
simpler than those containing the derivatives of the original functions, like when
using the Gauss-principle in the previous section.
In case of several constraints the Lagrange method leads to the modified variational
principle
m
F̄ = F + λi f i = extremum, (2.75)
i=1
δ F̄ δF δ fi
m
= + λi = 0,
δxk δxk i=1
δxk
δ F̄
= f i = 0. (2.76)
δλi
N −1
N −1
gσ
V =g m k ȳk = (yk + yk+1 ) k , (2.78)
k=0
2 k=0
with g being the gravitational acceleration and σ the constant mass density along the
chain. It is purposeful to choose the Lagrange multipliers λk /2 and to minimize
N −1
λk
V̄ = V + 2k (2.79)
k=0
2
N −1
1
V̄ = (yk + yk+1 )gσ k + λk (xk+1 − xk )2 + λk (yk+1 − yk )2 − λk 2k .
2 k=0
(2.80)
Variation with respect to the xk coordinates gives
δ V̄
= λk (xk+1 − xk ) − λk−1 (xk − xk−1 ) = 0. (2.81)
δxk
δ V̄ gσ
= (k + k−1 ) − λk yk + λk−1 yk−1 = 0. (2.83)
δyk 2
It is clear that a useful choice for the constant c is c = gσ (k + k−1 )/2, meaning
a uniform length division of the chain to elements. In the limit of continuity the
2.6 The Method of Lagrange Multipliers 37
Finally one may be interested in the shape of the curve in a non-parametric descrip-
tion, as an y(x) function. Then the above equation appears as a second order differ-
ential equation,
1
y = 1. (2.86)
1 + y (x)2
m
L̄ = L − λi f i . (2.87)
i1
The negative gradient of this completed potential gives exactly the coercive force
components:
38 2 Mechanics: Geometry of Orbits
∂ L̄ ∂L ∂fjm
Ki = − =− λj , (2.88)
∂ xi ∂ xi j
∂ xi
1
while the constraints f j (x1 , . . . , xn ) = 0 are satisfied. Imagine now a cost factor in
the space of the f j components, acting analogously to the potential energy. Starting
with a single constraint, which does not change the essence of the logic, we expand
this quantity in a Taylor series around the point f = 0:
1
V1 = ( f ) = (0) + (0) f + (0) f 2 + · · · (2.89)
2
Here (0) can be chosen to be zero, since a constant in the energy landscape does
not matter, and we omit the linear term assuming (0) = 0. That can be achieved by
choosing a corresponding linear combination of constraint as the “real” constraint.
The leading term in the cost factor in this way is the quadratic one:
1 2
V1 = f . (2.90)
2
One may consider this form also as additive while applying the Lagrange method;
the condition f 2 /2 = 0 is equivalent to f = 0. The coercive forces, determined from
these two—quadratic and linear—approaches, must be equal,
∂ V1 f ∂f ∂f
Ki = − =− = −λ . (2.91)
∂ xi ∂ xi ∂ xi
This relation connects the linear and quadratic Lagrange multiplier methods: λ =
f / . In this sense the quadratic method handles small imagined violations of the
f = 0 condition in the typical size of f = λ . This consideration is sometimes
called a “softening” of the constraint, resulting in a soft version of it. This becomes
again hard in the → 0 limit. In fact in quantum field theory the quadratic term
supports the naive Lagrange function, and is not set to zero from the beginning of
the calculations.
According to the original formulation of the Mapertuis principle the motion follows
that path with minimal length. This was the principle of shortest path. Although for
the general motion, under influence of forces, this proved to be erroneous, for the
free motion it is valid. As such also can be applied to motions which are free in time
intervals and only change at instants of zero measure. Its generalized form, valid for
all motions in conservative force fields, was derived by Euler and Lagrange. This
2.7 Mapertuis Principle and Geodetic Motion 39
was a typical case, when an error helped to find the true relation! Please also note
that this principle does not cite forces or energy, it states something about the orbits.
This principle applied to free motion trespassing borders of media leads to for-
mulas akin to those used in geometrical optics. The main difference is that for the
propagation of waves instead of mass points the principle of shortest time, the Fermat
principle, applies.
For a free motion the Lagrange function contains a kinetic energy term only, the
solution of the equation of motion ensures a velocity constant in time. Looking at
the motion of a single mass point practically its kinetic energy time integral provides
the action, it is to be minimized
mv2
T dt = dt = minimum (2.92)
2
For a constant velocity v, these two principles may agree, but for a general motion
not.
For a motion under influence of forces the principle of shortest path is modified to
a statement about a weighted length of that path. Essentially one seeks for a particular
form of the action principle without any explicit reference to the time; only integrals
over paths occur. During this the total energy is still conserved.
Let us consider a Lagrange function as follows:
1
L= aik (q)q̇i q̇k − V (q), (2.94)
2 i,k
with qi generalized coordinates and V (q) potential energy being a function of the
position only. The generalized momenta are
∂L
pi = = aik (q)q̇k . (2.95)
∂ q̇i k
1
E= aik (q) q̇i q̇k + V (q). (2.96)
2 i,k
This information can be used for rewriting the action S = L dt in the new form:
40 2 Mechanics: Geometry of Orbits
1
S= aik (q) q̇i q̇k dt − aik (q) q̇i q̇k + V (q) dt. (2.97)
i,k
2 i,k
Here in the first term the generalized momenta (2.95), in the second term the energy
given by Eq. (2.96) is substituted in to get
S= pi q̇i dt − E dt. (2.98)
i
For constant energy, E, the second integral is proportional to the total time interval for
the motion and is independent from the variations of the orbit. Therefore it suffices
to vary the reduced action,
S = S + E(t2 − t1 ), (2.99)
resulting in
δS = δ pi dqi = 0. (2.100)
i
This is the Hamilton principle, an equivalent to the action principle for constant
energy motion. The Eq. (2.100) does not refer to time coordinates or to derivatives
against time. The varied quantities can be described by the variation of the paths
followed during the motion in arbitrary parameterization. So this principle refers to
paths and not to the time history of any motion.
In order to derive the formula called Mapertuis action to date we start with the
expression
2 (E − V (q)) = aik (q) q̇i q̇k (2.101)
i,k
Integrating the right hand side (rhs) of this expression delivers the reduced action
appearing in Eq. (2.100) as Hamilton’s principle:
S = pi dqi = aik (q) q̇i q̇k dt. (2.102)
i i,k
Due to Eq. (2.101) this integrand can be viewed as the square root of the rhs and left
hand side expression; by this trick we remove time derivatives and use coordinate
differentials only:
S = 2 (E − V (q)) aik (q) dqi dqk . (2.103)
i,k
2.7 Mapertuis Principle and Geodetic Motion 41
We note that for a single mass point the square root of the weighted sum of coordinate
differential square is by definition the differential arc length alongside the motion’s
path:
ds = gik dqi dqk , (2.104)
i,k
with aik = mgik . In this case the reduced action really becomes the shortest path
principle:
S = 2m (E − V (q)) ds = extremum (2.105)
Here we have used the relation ds 2 = i d xi2 and its consequence dsdδs = i d xi
dδxi , as well as the fundamental relation in variational calculus, according to which
δ(ds) = dδs. Integrating now the second term by parts we obtain the vanishing first
variation condition as
√ d √ d xi ∂V
2 E−V E−V =− = Fi . (2.107)
ds ds ∂ xi
Here we denoted the force component, obtained as gradient of the potential energy,
by Fi . This result has a wonderful geometrical interpretation. Observing that
√ d √ dV d xi
2 E−V E−V = − = Fi , (2.108)
ds ds i
ds
d x j d xi d 2 xi
Fj + 2(E − V ) 2 = Fi . (2.109)
j
ds ds ds
The second derivative along the path replaces the Newtonian acceleration in this
formula. Besides a term containing the square of first derivatives appears: This is anal-
ogous to the formula for geodesic motion in general relativity. An inertial force, pro-
42 2 Mechanics: Geometry of Orbits
portional to the “velocity” square emerges as correction to direct forces. In this case
the general Christoffel symbol is restricted to a special form, ikj = F j δik (Fig. 2.8).
Introducing the tangential vector to the path, ti = d xi /ds, it is easy to convince
oneself that its length is unity: i ti ti = 1. Using this result the second derivative
along the orbit can be expressed from Eq. (2.109):
d 2 xi Fi − ( j F j t j ) ti
= . (2.110)
ds 2 2(E − V )
This formula points out the followings: Only the orthogonal component of the force
causes acceleration, Fi⊥ = Fi − ( j F j t j ) ti . At the same time this second derivative
is nothing else than the first derivative of the vector ti tangent to the path. Due to the
unit length of this vector its derivative is orthogonal to the tangent. Its direction can
be denoted by another unit vector, n i , and its magnitude as a curvature, 1/R, of the
path in a given point:
d 2 xi dti 1
= = ni . (2.111)
ds 2 ds R
Indeed R is the radius of a circle adjusted to the path in that point. Using this result
and the expression E − V = mv2 /2 for the kinetic energy, Eq. (2.109) expresses the
centripetal acceleration on a circle touching the path:
mv2
n i = Fi⊥ . (2.112)
R
2.8 Legendre Transformation and Action—Angle Variables 43
The fall of an apple and the circular orbit of the moon are connected to each other;
not only in a tale about Newton, but also by the Mapertuis principle—in its correct
form due to Euler and Lagrange.
We have mentioned the Legendre transformation earlier in connection with the rela-
tion between the Lagrange and Hamilton functions. During this transformation part
of or all the descriptive variables will be replaced by new variables. The function
governing the motion must be changed accordingly. By this special transformation
the new variables are the old partial derivatives of the old governing function, and
vice versa. Repeating a Legendre transformation towards the original variables we
get back the original governing function.
Let us denote the “old” variables by u i , the “new” ones by vi . Given a func-
∂F
tion, F(u 1 , . . . , u n ) the new variables are the vi = ∂u i
partial derivatives, and the
transformed function is given by
n
∂F
G= ui − F. (2.113)
i=1
∂u i
Variations of functionals based on the new function contains these partial derivatives
by construction: The total differential,
∂G
dG = vi , (2.114)
i
∂vi
∂F
dG = (u i dvi + du i vi ) − d F = u i dvi + vi − du i . (2.115)
i i
∂u i
It follows that the partial derivatives of the new function are exactly the old variables:
∂G
= ui . (2.116)
∂vi
∂L
pi = , H= pi q̇i − L ,
∂ q̇i i
∂H
q̇i = , L= pi q̇i − H. (2.117)
∂ pi i
t2
t2 t2
S= pi q̇i − H dt = pi qi − qi ṗi + H dt (2.118)
t1 i i t1 t1 i
Varying this action with respect to the pi (t) momenta and qi (t) position coordinates
one obtains the equations of motion in the Hamiltonian dynamics:
∂H
q̇i − = 0,
∂ pi
∂H
− ṗi − = 0. (2.119)
∂qi
∂ q̇i
it follows
∂ ṗi
+ = 0, (2.121)
i
∂qi ∂ pi
upon using the equations of motion (2.119). This can be interpreted as the diver-
genceless property of a current vector, v = (q̇i , ṗi ), in 6N -dimensional phase space
for N particles:
∇ · v = 0, (2.122)
with ∇ = ( ∂q∂ i , ∂∂pi ). The trajectories, describing the classical motion in phase space,
constitute an incompressible (divergenceless) fluid. This is analogous to the theory of
fluids, so the above result indicates the constancy of the phase space volume occupied
by those moving points:
This is stated by the Liouville theorem. Its essence is that the solution of the equations
of motion derived from an action principle for a short dt time step is always a
canonical transformation leaving the phase space volume invariant.
2.8 Legendre Transformation and Action—Angle Variables 45
after reaching its extremum does not change in time. We have d/dt = 0 along the
solution curves of the equation of motion. Due to Stokes’ theorem an integral over a
closed curve can be calculated from the flux going through the circumvented area:
pi dqi = m vds = m (∇ × v)d 2 f. (2.125)
i
∂u ∂v ∂v ∂u
{u, v} := − . (2.127)
i
∂qi ∂ pi ∂qi ∂ pi
Finally we demonstrate that the actions calculated over periodic orbits themselves
can be used as generalized momenta. Moreover, exactly this canonical description
is the simplest, as it reflects the integrability of a motion.
We render a quantity measured in action units to each pair of generalized coordi-
nate and momentum components:
1
Jk = pk dqk . (2.129)
2π
∂S
ϑi = . (2.130)
∂ Ji
These conjugate variables have no physical unit dimension, they are called angle
variables. The integral of the angle variables for exactly one period of the motion is
exactly 2π , as it is shown by the short derivation below:
∂ ∂ ∂ ∂ Jk
dϑi = Sdqk = pk dqk = 2π = 2π. (2.131)
k
∂ Ji ∂qk ∂ Ji k k
∂ Ji
The goal is to find that very canonical transformation, which describes the motion
instead of the original (q, p) phase space in terms of action–angle variables, (ϑ, J ).
Observing that the reduced action can be expressed as a function of the conserved
energy, E,
S (q, E) = p(q, E)dq, (2.132)
the total differential, d S , can be written both in terms of old and new variables:
∂ S ∂ S ∂ S ∂ S
d S = dq + dp = dϑ + d J. (2.133)
∂q ∂p ∂ϑ ∂J
∂ S
Now if one wishes to change from the variable p = ∂q
to J , then the conditions ∂∂Sp =
0 and ∂∂ϑS = 0 have to be satisfied. From this the following Hamiltonian equations of
motion are derived:
∂ S ∂E
J˙ = 0, ϑ̇ = =T . (2.134)
∂J ∂J
with T being the period time.
Summarizing, the J quantities are constant during the motion and the total change
of the action is a sum of 2π J -s. Therefore the conservative motions are characterized
besides the conserved energy, E, also by the action components reminding to the
angular momentum, Jk . In case the number of such Jk -s reaches the number of
general coordinates, then the motion is integrable.
The laws of geometrical optics can be derived from the principle of shortest time
proposed by Fermat; this is complementary to the Maupertuis principle. Here the
most typical case is free propagation except at points of medium change. According
to Fermat’s principle
2.9 Fermat Principle 47
t2 x2
dx
T = dt = = minimum, (2.135)
v(x)
t1 x1
Derivating this expression with respect to y we obtain a condition for this extremum:
∂T y−h y
c = + = 0. (2.137)
∂y d + (h − y)
2 2 d + y2
2
This result ensures the equality of incoming and reflected ray angles to the plane.
The Snellius–Descartes law of rarefaction on the border of two homogeneous but
different media also can be derived from Fermat’s principle. The above derivation
changes at a few points: now we follow light ray starting at (x2 , h) and arriving at
(−x1 , 0). A part of this path proceeds in a medium with optical index n, where the
speed of light is c/n. The point to cross the border be at (0, y). According to Fermat’s
principle
cT = n x12 + y 2 + x22 + (h − y)2 = minimum. (2.138)
∂T y y−h
c = n + = n sin β − sin α = 0. (2.139)
∂y x1 + y 2
2
x2 + (h − y)2
2
From this the Snellius–Descartes law for the ratio of sines trivially follows (Fig. 2.9).
Finally we note that based on the similarity between the principles of shortest time
and shortest path, it is suggestive to consider some correspondence between the laws
of optics and mechanics:
The Mapertuis principle is based on the integral pdq, the
Fermat principle on dq/v. This suggests a correspondence like p ∼ 1/v. Consid-
ering quantities of dimension length in both cases and the usual expression p = mu
for momenta, one arrives at the uv = c2 reciprocity formula. This coincides with
relation between group velocity and phase velocity known from wave propagation:
an excellent motivation for de Broglie to state the particle wave duality. It is only an
unexpected bonus on the top that due to this the quantization of action for periodic
orbits in the atomic model by Bohr ensures that only an integer times the wavelength
can be associated to an orbit of 2πr length.
Chapter 3
Gravity: The Optimal Curvature
Till date the theory of gravity went to the farthest on the way to interpret motion in
terms of geometry: changing position does not happen in a separate time, but space
and time follow a common, 4-dimensional geometry (special relativity) and the left
alone bodies in this spacetime follow shortest paths, i.e. geodesics (general relativity).
Intriguing a consequence of the principle stating the equivalence of descriptions of
physical laws in either free falling or accelerating reference frames is that the local
density of energy and momentum determines the geometry of spacetime, that the
source of the curvature in spacetime is exactly the presence of the moving matter. The
Einstein–Hilbert action minimizes the sum of covariant action integrals stemming
from the curvature of spacetime and from the presence of matter or radiation energy.
As a first step we extend the Mapertuis principle related to the reduced action over
paths to the four-dimensional spacetime; this will be the variational principle describ-
ing the (special) relativistic motion of a free mass point.
In this chapter, unless it may have a special meaning, we use the unit system
c = 1, setting the lightspeed to one lightyear per year. After it has been revealed
based on the Maxwell equations that the propagation speed of light waves in vacuum
(without the presence of any material medium) is a natural constant, we may use it as a
unity—following Albert Einstein. Adopting this convention the Mapertuis principle
treats the lengths of paths in four-dimensional spacetime, i.e. that of worldlines, re-
weighted in the free case only with the mass, as minimal—or at least as an integral
to be varied.
The principle of special relativity, however, contains one more important notion:
there is no absolute rest, for only the velocity of the relative motion counts in physics.
This postulate, dating back to Galileo Galilei, together with the constancy of the
lightspeed, enforces the Lorentz-transformation as the unique linear transformation
between spacetime coordinates among moving reference frames. This transformation
leaves a four-dimensional arc length invariant:
c2 dτ 2 = x2 = c2 dt 2 − d x 2 − dy 2 − dz 2 , (3.1)
assuming a so called flat spactime geometry with hyperbolic signature, also called a
Minkowski spacetime. Minkowski himself—being a mathematician—suggested to
view the fourth dimension pure imaginary, ict. We in this book refer to the length
of the four-dimensional arc length, cdτ in (Eq. 3.1), as proper time (multiplied with
c = 1). Since the ratio of space and time differentials deliver the actual velocity,
v = dr
dt
, one easily sees that
dτ = dt 1 − v2 /c2 = dt/γ , (3.2)
Viewing this as a coordinate-time integral we derive the Lagrange function for the
free relativistic motion: starting from
S= L dt = − mc2 1 − v2 /c2 dt (3.4)
one obtains
mv2
L = −mc2 1 − v2 /c2 = −mc2 + + ··· (3.5)
2
For low velocities (v = |v| c) the Lagrange function, disregarding the negative
signed constant, gives back the familiar kinetic energy. The variation of the action
(3.3) provides the equation of motion.
3.1 Mapertuis Principle in Spacetime 51
To obtain this we utilize the four-vector index notation for the coordinate differen-
tials, d x i = (cdt, d x, dy, dz) for the covariant, while d xi = (cdt, −d x, −dy, −dz)
for the contravariant version. The square of the proper time differential can now be
expressed as
c2 dτ 2 = d xi d x i , (3.6)
where using the Einstein convention lower and upper (contravariant and covariant)
indices occurring twice are to be summed over for i = 0, 1, 2, 3. The variation of
the proper time length, cdτ , is therefore as follows
d xi δd x i d xi
cδ dτ = = δd x i . (3.7)
dx jdx j cdτ
d xi
ui = , (3.8)
dτ
which by construction is the normalized length tangential four-vector to the world-
line, x i (τ ), u i u i = c2 . Its components are u i = (γ c, γ v).
Using these notations the variation of the relativistic mass point action is given as
2
δS = −m u i dδx i = − mu i δx i 1 + m δx i du i . (3.9)
Annulling the initial and final point contributions, occurring during the integration
by parts, the equation of motion follows from this variation:
du i
m = 0. (3.10)
dτ
This is equivalent to the constancy of the velocity, v.
On the other hand from the Lagrange function (3.5) the canonical momentum
follows,
∂L mv
p= = . (3.11)
∂v 1 − v2 /c2
By the help of this expression the Hamilton function, equal to the energy, is expressed:
mc2
H = E = pv − L = . (3.12)
1 − v2 /c2
The momentum and the energy can be united in a single four-vector, too. This
pi = (E/c, p) = mu i four momentum is constant during the free motion, so both
52 3 Gravity: The Optimal Curvature
g i = (γ v f, γ cf), (3.15)
∂S
pi = − , (3.16)
∂ xi
Subtracting the rest energy contribution from the relativistic energy, E = − ∂∂tS , one
obtains a reduced action, S = S + mc2 t, satisfying the above Hamilton–Jacobi equa-
tion. In the nonrelativistic expansion, for 1/c2 → 0 one obtains:
2 2
∂ S 1 ∂ S 1 ∂ S
+ = . (3.18)
∂t 2m ∂r 2mc2 ∂t
Finally we emphasize that the action for a relativistically moving mass point is at the
same time the four-dimensional generalization of the Hamilton action and expresses
the principle of shortest time:
S=− pi d x =i
pdx − Edt = −mc 2
dτ. (3.19)
3.2 Motion of Charges, Lorentz Force 53
The action for a point charge moving in electromagnetic field is easily derivable from
that of the free mass point: only the four-momentum will be supplemented with a
term, which couples the otherwise free system to the electromagnetic four-potential.
The replacement pi → pi + e Ai results in the action
S=− ( pi + e Ai ) d x i ; (3.20)
e
pi + e Ai = ∂i − Ai . (3.21)
i i
The four-vector vector potential contains both the scalar and the vector potential
parts of the classical electrodynamics, Ai = (, A).
The first term in the action integral Eq. (3.20) can be rewritten in Mapertuis
form, as it has been discussed in the previous section. By doing so we arrive at an
action whose differential consists of products of Lorentz scalars as well as of Lorentz
vectors:
S = − m dτ − e Ai d x i . (3.22)
∂L m
H =v −L= √ +e (3.24)
∂v 1 − v2
∂L mv
π= =√ + eA. (3.25)
∂v 1 − v2
d ∂L ∂L
− = 0. (3.26)
dt ∂v ∂r
In order to study this form, we transform the partial derivative of the Lagrangian with
respect to the coordinate vector, r, using a brief notation ∇ for this special operation:
Using this result and the formula for the canonical momentum, (3.25), we ascertain
that
d
(p + eA) = e(v ∇)A + ev × (∇ × A) − e∇. (3.28)
dt
Taking into account furthermore that the total time derivative of the vector potential,
occurring on the left hand side of the above equation, can be written as a sum of
changes due to the partial time derivative and a term due to the velocity, i.e.
d ∂
A = A + (v ∇)A, (3.29)
dt ∂t
in the final result some terms are cancelled:
dp ∂A
= −e − e∇ + ev × (∇ × A). (3.30)
dt ∂t
Now, knowing expressions of the electric and magnetic field strengths in terms of
potentials from electrodynamics, c.f.
∂A
E=− − ∇,
∂t
B = ∇ × A, (3.31)
dp
= e (E + v × B) . (3.32)
dt
The very same result can be achieved of course also without utilizing three-vector
notation and field strength vectors. All we need is to start from the four-dimensional
form of the action Eq. (3.20). The variation means the variation of the worldline
orbits, x i (τ ). This has two effects: one directly through the coordinate differential
featured in the action integral, and another, indirect one through the dependence of
3.3 The Weak (Newtonian) Gravitational Field 55
Expanding the differential of the four-vector potential in this formula we obtain the
following condition for the vanishing of the action variation:
dpi ∂ Ai ∂ Aj
+ e j u j − e i u j = 0. (3.35)
dτ ∂x ∂x
Introducing the four-dimensional field strength tensor,
∂ Aj ∂ Ai
Fi j = − = ∂i A j − ∂ j Ai , (3.36)
∂x i ∂x j
being nothing else than the four-rotation of the four-vector potential, the following
compact equation of motion for a point charge in the four-dimensional form emerges
dpi
= eFi j u j . (3.37)
dτ
Quoting the antisymmetric property of the field strength tensor, F ji = −Fi j , based
on the definition Eq. (3.36), one sees that the right hand side of the (3.37) delivers a
genuine four-force, since u i (eFi j u j ) = 0 due to the antisymmetry.
dτ 2 = dt 2 − d x 2 − dy 2 − dz 2 . (3.40)
Rotating the reference frame with constant ω angular velocity around the z axis we
express the spacetime coordinates with those in the rotating system as follows:
x = x cos(ω t ) − y sin(ω t ),
y = y cos(ω t ) + x sin(ω t ),
z = z,
t = t . (3.41)
Based on this the coordinate differentials are obtained and the result is substituted
into the Minkowski metric (3.40), delivering
dτ 2 = 1 − ω2 (x 2 + y 2 ) dt 2 − d x 2 − dy 2 − dz 2
+ 2ω y d x dt − 2ω x dy dt . (3.42)
Here, in the coefficient of the term with dt 2 besides the 1 the centrifugal “force”
is recognized, while in the mixed (not pure quadratic) term reflects the effect of the
Coriolis “force”. The expression “force” is put between quotation marks because,
as it is shown, these effects are only due to a time-dependent transformation of the
3.3 The Weak (Newtonian) Gravitational Field 57
spatial coordinates. They are in fact pseudo forces. In the four-dimensional spacetime
this transformation is a point-dependent mixing of coordinates; a continuous and
differentiable geometrical transformation. The source of this “force” is nothing else
than the accelerating motion of the observer. Similarly gravity also does not have a
special “cause”. A free falling system is an inertial system, as it is experienced by
the “weightless” motion of astronauts while orbiting around the Earth.
According to the general recipe the geometry of spacetime is modified compared
to the gravitation free case. In a general metric the proper time differential length
squared is given by
dτ 2 = gik (x) d x i d x k , (3.43)
The set of gik coefficients is called metric tensor. Based on the above construction
this tensor is symmetric in all its indices. Whenever an experienced force effect (and
its potential energy contribution, respectively) is proportional to the inertial mass,
then this effect can be viewed in spacetime as a coordinate transformation: the effect
is purely geometrical. The potential energy with de-factorized the m, is usually called
a gravity potential, ϕ(r).
A given metric tensor describes the geometry of spacetime, it contains all informa-
tion with respect to apparent acceleration. In particular the coefficients, occurring in
the expression for the arc length squared, i.e. the metric tensor, also serves as a start-
ing point in calculating four-dimensional lengths, areas, volumes or four-volumes.
√
In the so called covariant integrals the integration measure is given by det g d 4 x
(due to the role that the Levi–Civitta symbol, i jk plays in calculating determinants).
Finally, since there exists a smooth coordination of spacetime, where the components
of the metric tensor are locally Minkowskian, i.e. only diagonal elements occur as
gii = (1, −1, −1, −1), the relation between general differentials and these specific
differentials is given exactly by these metric tensor elements:
∂ x i k
dxi = d x = g ik (x )d x k . (3.44)
∂ xk
contains this component of the metric tensor. For motions with velocities much
smaller than the lightspeed, and applying a weak gravitational potential, the Lagrange
function is given by
mv2
L = −mc2 + − mϕ, (3.46)
2
and the corresponding action integral by
v2 ϕ
S = −mc c− + dt = −mc2 dτ. (3.47)
2c c
Here the second equality sign indicates that we interpret this action as that of a freely
moving mass point in the language of proper coordinates. Hence dτ is expanded like
v2 ϕ
cdτ = c − + dt. (3.48)
2c c
Using now that vdt = dr and v c, in leading order we get for the square of the
proper time differential
Comparing this with the general expression (3.43) for the metric tensor, we realize
that only the g00 component differs from the Minkowski spacetime values:
2ϕ
g00 = 1 + . (3.50)
c2
It is revealed that since the gravitational potential, ϕ, in the Newtonian gravity theory
is a function of space, the spacetime is curved, while the three-dimensional Euclidean
space itself is flat.
In this section we determine the equations for the free motion of a mass point, m, in
case of having a general metric tensor. We trace back the variation of the Maupertuis
action, −m dτ , to the variation of the arc-length squared, which comprises infor-
mation on the spacetime geometry. In the latter partial derivatives due to the space
and time dependency of the metric tensor will play a role. Collecting them in a smart
way one arrives at the equation describing the free motion as a geodesic acceleration.
In this section we use units with c = 1.
3.4 Geodesic Motion in General Gravitational Field 59
∂gik j
δ gik d x i d x k = d x i d x k δx + 2gik d x i δd x k . (3.52)
∂x j
Using this relation the variation of the Mapertuis action becomes
1 d x i d x k ∂gik j d x i dδx k
δS = −m dτ δx + gik . (3.53)
2 dτ dτ ∂ x j dτ dτ
Finally we change the summation index from k to j in the second term, for making
it possible to factorize the δx j variation from the total expression under the integral.
i
Introducing the four-velocity, u i = ddτx -t, we arrive at the following form of the
Euler–Lagrange equation:
1 i k ∂gik d
uu − gi j u i = 0. (3.55)
2 ∂x j dτ
In order to reach the familiar form of the geodesic equation we need some more index
magic. Applying the derivation with respect to τ to the product gi j u j , we apply the
relation
dgi j dxk ∂ ∂gi j
= gi j = u k k . (3.56)
dτ dτ ∂ x k ∂x
Further we utilize the i ↔ k index symmetry implied in the second term to obtain
∂gi j 1 ∂gi j ∂gk j
ui uk = ui uk + . (3.57)
∂x k 2 ∂x k ∂xi
Then we arrive at the geodesic motion describing accelerating motion due to the
metric:
du i
gi j + j,ik u i u k = 0, (3.58)
dτ
60 3 Gravity: The Optimal Curvature
where the
1 ∂gi j ∂gk j ∂gik
j,ik = + − (3.59)
2 ∂x k ∂x i ∂x j
coefficients, the Christoffel symbols, can be determined from first derivatives of the
metric tensor. These quantities do not construct a three index tensor, since their
transformation properties do not follow those of co- or contravariant vectors.
In this equation the acceleration mixes with the square of velocity, similarly to
the centrifugal acceleration terms discussed in the study of non-relativistic, curved
orbits. The four-acceleration is caused by four-centrifugal forces.
is true for an infinitesimally small path. The deviation in vector components can be
written as a linear combination of coordinate derivatives and the moved vector:
A k = · · · Ai d x j , (3.60)
where the ellipsis . . . denotes unknown coefficients. Since in this problem three
directed vectors occur, these coefficients are quantities with three indices. These
happen to be the Christoffel–symbols, introduced in the geodesic equation:
Ak = i
kj Ai d x j . (3.61)
However, not only the distance left behind but also the dimension of the piece of
the surface closed around can characterize the change in the transported vector. The
latter is a two-dimensional two-index quantity. A connection between these two
representations is made by the Stokes theorem. For a f jm small surface element
we write
Ai ) ∂( k j Ai )
i
1 ∂( km i
Ak = − f jm . (3.62)
2 ∂x j ∂xm
Expanding the derivatives we obtain terms belonging to two classes: one containing
the derivatives of Christoffel-symbols and another one containing the derivatives of
the vector components. The factor 1/2 is necessary due to the “double” counting of
the indices of the infinitesimal surface element.
The partial derivatives of the vector components express both physical changes and
location dependence. The “real”, purely physical change is carried by the covariant
62 3 Gravity: The Optimal Curvature
derivative, where from the result of derivatives with respect to coordinates the effect
of parallel transport is subtracted:
∂
D j Ai = Ai − n
An . (3.63)
∂x j ij
If only a pure transport happens, the covariant derivative of the vector is zero, D j Ai =
0. In this case the partial derivatives against the coordinates can be expressed by the
Christoffel–symbols:
∂ Ai
= inj An . (3.64)
∂x j
Substituting this relation into Eq. (3.62), which reflects the Stokes theorem, we obtain
that the change of the parallel transported vector is proportional to that vector and to
the enclosed surface element. The proportionality coefficient has four indices:
1 i
Ak = R m Ai f m
, (3.65)
2 k
where
j
∂ km
i
∂ k
Rki = − + i n
− i n
. (3.66)
m
∂x ∂xm n km nm k
This four index quantity is now a tensor, it transforms in each of its indices similar
to coordinate differentials. A spacetime is called flat whenever all components of
the Riemann tensor vanish, Rki m = 0. Furthermore this tensor can be used for the
derivation of the scalar curvature. In two dimensions this coincides with the Gauss
curvature, the reciprocal of the product of radii in the tangential circles.
We flash another face of the Riemann tensor, related to gauge theory; i.e. we
spell out the similarity of the mathematical arsenal above to the field strength tensor
in electrodynamics. Namely, both can be written as the commutator of covariant
derivative operations. Taking into account that
(D Ai ) = ∂ Ai − k
i Ak = δik ∂ − k
i Ak (3.67)
the covariant derivative operation in the direction can be viewed as a two index
matrix:
(D )ik = δik ∂ − ik . (3.68)
This is a matrix in the indices and k, and at the same time an operator when acting on
spacetime functions, like physical fields. Similarly, in electrodynamics the covariant
four-derivative consists of a coordinate derivative and from a multiplication with
a—non gauge invariant—vector potential:
D = ∂ − ie A , (3.69)
3.5 Einstein-Hilbert Action 63
with A = (, A) being the four vector potential. The correspondence in quantum
mechanics between the momentum and the operation of derivation leads to the so
called kinetic momentum,
D = P − eA , (3.70)
i
which describes a coupling between charges and electromagnetic fields and modifies
energy and momentum.
Since the covariant derivation is operator and matrix at the same time, its com-
mutators are not trivial. Based on Eq. (3.66) one obtains
[Dk , D ] Ai = Rik
m
Am . (3.71)
It is noteworthy that here the indices k and belong to the directions of covariant
derivations while the indices i and m denote the components of the vector, i.e. stand
for the basis vectors denoting the axes directions. In the classical, four dimensional
theory of gravity these indices stem from the same set, therefore here the distinction
is only philosophical, in physical models inessential. Knowing all this the Riemann
tensor can be written in a more abstract form, suppressing the basis direction indices:
Rk = [Dk , D ] . (3.72)
This form is analogous to the connection of the electromagnetic field strength tensor
with the four potential:
− ieFk = [Dk , D ] . (3.73)
The coefficient K contains the lightspeed as a natural constant and K = −c3 /(16π κ).
Although this action functional is linear in the scalar curvature it is highly nonlinear
in terms of the metric tensor. This is the origin of several difficulties.
In describing the “matter” part the central element is a Lagrange density, L,
composed from contributions stemming from physical fields, e.g. electromagnetic
fields or other fields describing elementary particles, sometimes classical fluids.
These contributions to the action are also integrated over the spacetime using the
same covariant measure:
√
Sm = L −g d 4 x. (3.75)
Gravitating systems are described by the common use of gravity field and matter
terms, in fact the Hilbert–Einstein action proper is S = Sg + Sm . As it was men-
tioned, the severe nonlinearity in the determinant of the metric tensor couples the
“material” Lagrange density to the spacetime geometry. The consequence is not only
a modification of equations known from other areas of physics in gravitational field,
but also the fact that the source of the gravity is just in these material fields. The cou-
pling between matter and spacetime via the metric tensor makes this theory tensorial,
characterized by exactly ten independent quantity due to the symmetry inherent in
the metric tensor, gik = gki . The source of gravity can be comprised into another
symmetric tensor, the energy-momentum tensor, Tik = Tki .
To derive the energy-momentum tensor we regard the action term, Sm . Varying
this against the metric tensor gives
√ √
∂L −g ∂ ∂L −g
δSm = − δg ik d 4 x. (3.76)
∂g ik ∂ x ∂ ∂gik
∂x
Here the components of the metric tensor, g ik , play the role of the general coordinates
to be varied. The corresponding general velocities are its derivatives with respect to
coordinates (not only the time derivatives because an integration both in time and
space is made for obtaining the action). Therefore the above result is already in a form
prepared for the Euler–Lagrange field equation, since the δg ik has been factorized.
Still, the integration must be rewritten in a general covariant form with the measure
√
−gd 4 x. Beyond that “all what is needed” is the determination of the derivative of
the determinant of the metric tensor with respect to its tensor components.
In this endeavour it can help that the mixed upper and lower indexed metric tensor
is always unity, therefore its inverse coincides with its lifted component form: gik =
(g −1 )ik . This is simply the abstract identity, g = gg −1 g. Taking this into account
and observing that a determinant can be expanded according to corresponding sub-
determinants, who are all another term in the inverse matrix up to a determinant factor,
we gain the following relation for the variation of the square root of the determinant:
3.5 Einstein-Hilbert Action 65
√ 1 1√
δ −g = − √ δg = − −ggik δg ik . (3.77)
2 −g 2
Collecting all these we arrive at the following variation of the matter action part:
1 √
δSm = Tik δg ik −g d 4 x, (3.78)
2
This definition results from the variation of the general covariant action integral with
respect to the components of the metric tensor. But does this result in agreement with
other derived results for the energy and momentum utilizing Noether’s theorem? First
of all is the energy and momentum described by Tik conserved?
It can be shown that this tensor satisfies a local, covariant conservation law. Let’s
start from the variation of the matter action, (3.78). We interpret the components of
the g ik metric tensor as coefficients in a linear coordinate transformation between
locally geodetic ξ k and the general x i coordinates,
∂ξ i
g ik = . (3.80)
∂ xk
1 k i
δg ik = D δξ + D i δξ k (3.81)
2
is the variation of the metric tensor. The rules of integration by parts are valid for
√
covariant integrals with the measure −gd 4 x, with the only difference that the
simple partial derivatives have to be replaced by covariant derivatives. With this
knowledge the variation of the matter action part can be transformed into
√
δSm = − Dk (Tik )δξ i −g d 4 x. (3.82)
Consequently the Tik energy-momentum tensor satisfies the following local, covari-
ant conservation law
Dk T ki = 0. (3.83)
66 3 Gravity: The Optimal Curvature
For obtaining the full theory of gravity one needs to vary the action term containing
the scalar curvature, too. This we trace back to the variation of the Ricci and the
metric tensor. Here the former contains second derivatives of the latter. The variation
√ √
δSg /K = δ (R −g) d x = δ g ik Rik −gd 4 x.
4
(3.84)
contains three types of terms: the direct variation of the metric tensor, the variation
of the Ricci tensor, and the variation of the square root of the determinant. For the
last term we utilize the result (3.77). We obtain
1 √ √
δSg /K = Rik − gik R δg ik −g d 4 x + g ik δ Rik −gd 4 x. (3.85)
2
In the last term above an exchange of the summation indices and k does not change
the result. Therefore
∂ ∂
g ik δ Rik = g ik δ − gi δ ik .
k
(3.88)
∂x ik
∂x
∂
g ik δ Rik = w , (3.89)
∂x
1 ∂ √
g ik δ Rik = D w = √ (w −g). (3.90)
−g ∂ x
3.5 Einstein-Hilbert Action 67
1 8π κ
G ik = Rik − gik R = 4 Tik . (3.92)
2 c
Chapter 4
Electrodynamics: Forces, Fields, Waves
Here 0 is the dielectric constant of the vacuum. Of course, since the integrand is
everywhere positive or zero, the minimal value of the above integral is also zero.
In the presence of charges, however, with a general electric charge density ρ, the
electric fields must satisfy the Gauss law:
∇E = ρ/0 . (4.2)
The functional derivative of this quantity with respect to the electric field delivers a
new equation:
δW
= 0 (E + ∇) = 0. (4.4)
δE
The variation against the Lagrange multiplicator, , trivially leads back to the Gauss
law.
According to the variational solution, the electric field strength is the negative
gradient of a scalar field, E = −∇. This scalar field is the electric potential. An
immediate consequence is the basic equation of electrostatics
∇ × E = 0. (4.5)
Using furthermore the Poisson equation, derived from the E = −∇ relation and
the Gauss law,
∇ 2 = −ρ/0 , (4.8)
The total energy put into the electric field, H , based on the Eq. (4.8) also coincides
with this integrated value:
0 1
H0 = (∇)2 d 3r = ρ d 3 r, (4.10)
2 2
Varying this functional with respect to , of course, gives back the Poisson equation
(4.8).
4.2 Magnetostatics
It is a natural question whether something similar can be formulated for the connec-
tion between stationary currents and magnetic fields. The B magnetic field1 carries
an energy analogous to the case in electrostatics:
1 2 3
H= B d r. (4.12)
2
1 Not distinguishing here magnetic induction and field, treating the magnetic permeability as unity.
72 4 Electrodynamics: Forces, Fields, Waves
Here is also a condition to be added with Lagrange multiplier, connecting the currents
with derivatives of the field strength. This is formulated in Ampére’s law,
∇ × B − j = 0, (4.13)
with j being the current density vector. This extra condition is a vector (with three
components in this case), therefore the Lagrange multiplier is also a vector field:
1 2
W = B − A (∇ × B − j) d 3r = minimum (4.14)
2
δW
= B − ∇ × A = 0. (4.15)
δB
In order to arrive at this formula with the correct signs one has to keep in mind that
not only the integration by parts results in a relative minus, but also the permutation
of indices in the Levi-Civitta symbol inherent in the definition of rotation:
A (∇ × δB) d r = 3
εi jk Ai ∇ j (δ B k ) d 3r
=− (εi jk ∇ j Ai )δ B k d 3 r = (εi jk ∇ i A j )δ B k d 3 r. (4.16)
Viewing from variational principles, both the scalar potential introduced in the elec-
trostatics and the vector potential in magnetostatics play the role of Lagrange mul-
tipliers.
Similarly to the electrostatics the B field can be eliminated by substituting Eq.
(4.15) into Eq. (4.13):
∇ × (∇ × A) = j. (4.17)
The rotation of a rotation can be expressed as the gradient of the divergence minus
the Laplace operator:
∇ × (∇ × A) = ∇(∇A) − A, (4.18)
B = ∇ × A = ∇ × (A + ∇χ ), (4.19)
4.3 Variational Principle of Electrodynamics 73
since the rotation of a gradient is identically zero. This so called gauge freedom is
which authorizes us to choose an appropriate vector potential to our calculations, that
simplifies our formula. If we are able to reach that the divergence of the new vector
potential, A = A + ∇χ , vanishes, i.e. ∇A = 0, then we obtain a purely Laplacian
equation for each component:
A = j. (4.20)
This condition can be reached if the harmonic condition for the function χ ,
∇(A + ∇χ ) = 0 (4.21)
χ = −∇A. (4.22)
As one easily recognizes a little freedom of choice is left over, but it is restricted to
solutions of the homogeneous Laplace equation.
It was James Clark Maxwell’s discovery that besides the current density occurring
in Eq. (4.13) there is another current, tagged as displacement current, proportional
to the velocity of change of the electric field in time. Using reduced units, in which
dielectric constants and magnetic permeability are unity, we simply write
∇ × B = j + Ė (4.23)
Here the overdot denotes a partial derivative against time. The task is to correct
the current density, j, in the above discussed variational principles. However, since
electric and magnetic effects combine with each other in the dynamics, we have to
construct a single dynamical variational principle.
Following experimental results not the sum but the difference of electric and
magnetic field energies become parts of this new principle, analogous to having
the difference of kinetic and potential energy terms in the Lagrange function. The
variational principle of the full electrodynamics is therefore constructed as the action
in mechanics:
S = dt {We − Wm } (4.24)
with
74 4 Electrodynamics: Forces, Fields, Waves
1 2
We = E − (∇E − ρ) ,
d 3r
2
1 2
Wm = d 3 r B − A ∇ × B − j − Ė . (4.25)
2
In these formulas we use such units where the dielectric constant and the magnetic
permeability are unity, this means measuring charge and current in appropriate units.
This renders the lightspeed also to unity, as in studies of the theory of relativity it is
frequently done.
The action integral of electrodynamics is over space and time, the action func-
tional is invariant. This relativistic invariance against the necessary Lorentz transfor-
mations, however, stay hidden in the present notations. At the same time it is to be
noted already at this point that the terms connected to external sources, i.e. charge
and current densities, are collected in a form identical to the scalar product of Lorentz
four vectors: (ρ, j) · (, A).
The full electrodynamics action (4.24) is constructed by involving the Gauss and
Ampére laws. Nothing else has to be added, the description of the induction discov-
ered by Faraday automatically follows from this action by functional derivation.
Variations against and A give back the conditions, but variations against the field
strength vectors E and B lead to new relations:
δS
= E + ∇ + Ȧ = 0,
δE
δS
= −B + ∇ × A = 0. (4.26)
δB
All this can be obtained by doing integration by parts both in spatial and temporal
integrations. From the two lines in Eq. (4.26) it follows the Faraday equation describ-
ing the magnetic induction: only the rotation of the first line has to be compared with
the (partial) time derivative of the second line:
∇ × E = −∇ × Ȧ = −Ḃ. (4.27)
∇B = 0. (4.28)
with
Fe = (∇E − ρ) ,
Fm = m (∇B + ρm ) ,
G e = A ∇ × B − j − Ė ,
G m = Am ∇ × E + jm + Ḃ . (4.30)
The list above contains, besides the original electric contributions, indexed by e,
also a dual counterpart, indexed by m. In this setting the variations against Lagrange
multipliers directly deliver the original Maxwell equations in the language of field
strengths, at least when we substitute ρm = 0 and jm = 0.
δSd
= −∇E + ρ = 0,
δ
δSd
= −∇B − ρm = 0,
δm
δSd
= ∇ × B − j − Ė = 0,
δA
δSd
= ∇ × E + jm + Ḃ = 0. (4.31)
δAm
In this interpretation the variations against the field strength components deliver
relations between potentials (Lagrange multipliers) and field strengths:
δSd
= E + ∇ + Ȧ + ∇ × Am = 0,
δE
δSd
= −B + ∇ × A − Ȧm + ∇m = 0. (4.32)
δB
76 4 Electrodynamics: Forces, Fields, Waves
Studying the Maxwell equations without source terms reveals that a so called dual
transformation exchanges the magnetic and electric fields in an antisymmetric man-
ner. The following transformation,
E = −B, B = E, (4.33)
changes the action into its negative counterpart. This means that the extremizing
variational equations, reflecting the observable dynamics, do not change.
When using all (real and imagined) source terms we need more: in order to bring
Sd defined in (4.29) into its negative also the following changes have to apply:
It is extraordinarily simple and beautiful to take into account this duality by using
a complex field strength vector,
F = E + iB, (4.35)
with i being the imaginary unit. Doing so the dual transformation is simply a mul-
tiplication with i: F = iF. The complete dual action itself is easier to be handled
when using complex quantities. For the sake of brevity we mean from now on the
complex construction by the unindexed quantities,
A = Ae + iAm , = e − im ,
j = je + ijm , ρ = ρe − iρm . (4.36)
In this case the dual transformations listed under Eq. (4.34) are also achieved by a
simple multiplication with i.
Let us now formulate a new action principle, without the magnetic charges and
their current but using the complex field strength:
1 2
Sd = 3
dtd r F − (∇F − ρ) − A i∇ × F + Ḟ + j , (4.37)
2
and its extremum delivers electrodynamics. In the above formula conditions resem-
bling the Gauss and Ampére laws are involved in complex notation. The square of
the complex field strength,
F2 = E2 − B2 + 2iE · B (4.38)
gives back the traditional term E 2 − B 2 as its real part, but its imaginary part is a new
contribution at this level. We have to convince ourselves that this new term would
not spoil the original theory!
4.4 Electric—Magnetic Duality 77
From here on we annul magnetic monopole like terms, indexed with m, but keep
the complex notation. Now the variations against complex functions lead to the
following results:
Variation against the complex field strength
δSd
= F + ∇ + Ȧ − i∇ × A = 0. (4.39)
δF
The real part of this complex equation connects the electric, its imaginary part
the magnetic fields with the Lagrange multiplicators: E + ∇ + Ȧ = 0 and B −
∇ × A = 0.
Variation against the scalar potential results in
δSd
= −∇F + ρ = 0, (4.40)
δ
whose real part is the Gauss law, ∇E = ρ, and its imaginary part expresses the
absence of magnetic monopoles, ∇B = 0.
Finally variation against the vector potential delivers,
δSd
= −i∇ × F − Ḟ − j = 0, (4.41)
δA
disturbing? To our luck the variation of the imaginary part of the action does not lead
to new equations:
δ δ
m Sd = E + ∇ + Ȧ = e Sd ,
δB δE
δ δ
m Sd = B − ∇ × A = − e Sd , (4.43)
δE δB
The structure of the above functional derivative relations is a full analogy to the
Cauchy–Riemann relations known from the theory of complex analytic functions.
Summarizing briefly, the complex action is a complex analytic functional of the
complex field strength.
78 4 Electrodynamics: Forces, Fields, Waves
The Maxwell equations not only unified the descriptions of electric and magnetic
behavior, but they predicted a brand new physical phenomenon: the electromagnetic
waves. A specially narrow wavelength range of it is the visible light. The existence
of waves must be recognizable on the level of the variational principle, too. The clue
is the Poisson equation; there we have seen that formulating the action in terms of
the Lagrange multiplier fields (in electrostatics the scalar potential, ) leads to a
second order partial differential equation. A similar form can be achieved also for
the magnetic sector, but there we exploited the gauge freedom, too. In this section we
rearrange the action of the full electrodynamics containing the complex field strength
in a way that the description of electromagnetic waves will become obvious.
In the action functional presented in Eq. (4.37) we integrate by parts those terms
which contain the complex field strength, F. There are three such terms: that con-
taining a divergence, a rotation and a partial time derivative.
⎡ ⎤
1
Sd = dtd 3r ⎣ F2 − ( ∇F F + Ḟ +j)⎦
−ρ) − A(i ∇ × (4.44)
2
par c.int. par c.int.
with
= ∇ + Ȧ − i∇ × A. (4.47)
It is obvious that the vanishing of the first quadratic term at F = − delivers the
known relations between field strengths and potentials. This at the same time prepares
the optimization with a complete square Lagrange density: the vanishing of the
variation against the complex field strength, F. The remaining action, similarly to
the tricks at the derivation of the Poisson equation, gets simplified:
1 2
S0 = dtd r ρ − j · A − .
3
(4.48)
2
4.5 Electromagnetic Waves 79
Using the expression (4.47) for the complex potential, , the real and imaginary part
of the above action reads as:
1 2 1
e S0 = dtd 3 r ρ − j · A − ∇ + Ȧ + (∇ × A)2 ,
2 2
m S0 = dtd 3 r ∇ + Ȧ · (∇ × A) . (4.49)
For our analysis of the variations, we start with the imaginary part:
δ
m S0 = −∇(∇ × A) = 0,
δ
δ
m S0 = ∇ × Ȧ − (∇ × A)˙ = 0. (4.50)
δA
We conclude that the variations both against the scalar and against the vector potential
lead to identities! Therefore the dynamics described by the complex reduced action
or its real part only are identical to each other. This complex action is equivalent to
its real part.
The equations responsible for the dynamics of the potentials are therefore derived
from the real part of the complex action. We have to arrive at the same result if we
vary the full complex action against the scalar and vector potential and then take the
real part. The variations are
δ
S0 = ∇ ∇ + Ȧ − i∇ × A + ρ = 0,
δ
δ ∂
S0 = ∇ + Ȧ − i∇ × A
δA ∂t
+i∇ × ∇ + Ȧ − i∇ × A − j = 0. (4.51)
The real parts of the above equations contain the second time- and spatial derivatives
of the potentials:
+ ∇ Ȧ = −ρ,
˙
∇ + Ä + ∇ × (∇ × A) = j. (4.52)
Here in the second equation the rotation of the rotation of the vector potential appears,
which can be expressed using the Laplace operator. So we obtain the following two
equations with charge and current densities as source terms:
− ˙ + ∇A · +
¨ − = ρ,
∇ ˙ + ∇A + Ä − A = j. (4.53)
80 4 Electrodynamics: Forces, Fields, Waves
˙ = 0.
∇A + (4.54)
Using this a magic happens: in the first and second line of Eq. (4.53) the first terms
vanish and the rest become to pure wave equations for the scalar and vector operator:
¨ − = ρ,
Ä − A = j. (4.55)
Particular solutions to these wave equations in absence of the sources, i.e. charges
and currents, are the electromagnetic waves (EM waves). They can be observed
experimentally at numerous frequencies and are utilized in various technologies.
Beyond the visible light also the radio and microwaves and the infrared radiation
are parts of modern technology, penetrating into the life of civilized humanity. Our
picture about the far universe on the other hand singularly is from the detection
of these waves until a few years ago gravitational waves have also been detected.
These EM waves propagate in vacuum with the speed of light, taken as unity in this
chapter. They are describing vibrations transverse to the direction of propagation.
At the same time they carry energy and momentum, exactly in the direction of
their propagation. The relation between energy and momentum, according to the
frequency—wave number vector relation, the dispersion relation of the EM wave,
belongs to a massless particle, the photon.
In the previous section we have experienced that the EM wave equations could be
derived from the complex variational principle in electrodynamics only if a particular
gauge fixing, the Lorenz gauge, is applied. This gauge fixing choice, however, is not
only a condition on the level of equations, but it can be built in and should be built
in into the action itself. Almost all variational action can be supplemented by side
conditions by using the method of Lagrange. The Lorenz gauge fixing we include in
a quadratic form, with a special coefficient, 1/2, now:
1 2 1 1
e S0 = dtd r − ∇ + Ȧ +
3
˙ + ∇A 2 + (∇ × A)2
2 2 2
+ (ρ − j A) ] . (4.56)
4.6 Variational Principle with Gauge Fixing 81
In this case the sum of the first two terms can be intelligently rearranged so that the
mixed terms together constitute a total time derivative. Such parts of the Lagrangians
can be omitted from the action. With more detail,
1 2 1
− ∇ + Ȧ + ˙ + ∇A 2 =
2 2
1 2 1 2
˙ − (∇) −
2 ˙
Ȧ − (∇A)2 + ∇A − Ȧ∇ , (4.57)
2 2
from which after integrating by parts the contribution of the last bracket, once in the
time once in the space integral, we obtain:
˙
dtd 3r ∇A − Ȧ∇ = ˙
dtd 3 r ∇A + ∇ Ȧ . (4.58)
This expression contains the time derivative of a product, whose contribution after
integration again can be left out from the action:
∂
˙
dtd 3r ∇A + ∇ Ȧ = dt d 3r (∇A) = 0. (4.59)
∂t
What is left from the action of electrodynamics by including the Lorenz gauge in
quadratic form is as follows:
1 2 1 2
e S0 = 3
dtd r ˙ − (∇)2 − Ȧ − (∇A)2 − (∇ × A)2
2 2
+ (ρ − j A) ] . (4.60)
This formula is now more and more symmetric with respect to the scalar and vector
potential, we start to see Lorentz-invariant structures. The time-derivative square
minus space derivative squares type combinations as well as the last term reminding
to the product of four vectors reflect the signature of the metric in the Minkowski
spacetime.
The gauge fixed reduced action, Eq. (4.60), varied against the potentials now
directly deliver wave equations:
δ
e S0 = − ¨ + + ρ = 0,
δ
δ
e S0 = Ä − A − j = 0. (4.61)
δA
82 4 Electrodynamics: Forces, Fields, Waves
The complex field strength notation promotes another interesting observation beyond
the above discussed dual symmetry and Lorentz transformation behavior: The elec-
tromagnetic field equations in vacuum in complex notation can look as a quantum
mechanical setup. In the Heisenberg representation of energy and momentum the
complex field strength behaves as a polarization vector part of a wave function in
the Schrödinger picture. This means that the electromagnetic action and therefore
the Maxwell equations induce a spin one wave. The impossibility of the longitudinal
photon is an extra condition on the top of that; the sourceless Gauss equation enforces
it.
The Maxwell equations in vacuum for the complex field strength, F = E + iB,
are as follows:
Ḟ + i(∇ × F) = 0, ∇F = 0. (4.62)
This equation has the same form as a time dependent Schrödinger equation. Albert
Einstein “almost” created quantum theory by studying the photons, what he could not
make a peace with was the uncertainty relation. The classical theory describing the
photons is so similar to the Schrödinger formalism, only the operator interpretation
appears strained. In the next chapter, dealing with the variational principle behind
quantum mechanics, we shall see that neither the operator algebra nor the uncertainty
relation is needed for obtaining the Schrödinger equation.
Introducing now the momentum operator component-wise, p j = i ∇ j , Eq. (4.63)
delivers a particular Hamilton operator:
Hik = i i jk p j . (4.64)
This operator in matrix representation has a spectrum; the eigenvalues can be calcu-
lated by solving the characteristic equation
det i i jk p j − ω δik = 0. (4.65)
In this expression the triple product i p1 p2 p3 occurs twice, but with opposite signs,
these terms cancel. In the rest the sum of the squares of the pi components occurs,
this is the length square of the momentum vector. So the characteristic equation
simplifies to
D = −ω (ω2 − |p|2 ) = 0. (4.68)
ip × F = ω F, ip · F = 0, (4.69)
We have experienced in the previous section how to use complex fields in exploring
the symmetries of the electrodynamical action. The question arises now, whether
further algebraic structures may also hide in the system of Maxwell equations, or in
the action in the background, respectively. The answer is positive as we know from
the history of special relativity, since the formulas of the Lorentz transformation,
mixing the components of four vectors linearly, are inherent in the equations of
electrodynamics.
On the level of the action our task is to find that elegant and concise structure
whose composed scalar serves as the Lagrange density. The theory of special relativ-
84 4 Electrodynamics: Forces, Fields, Waves
ity, unifying time and space coordinates and potential components into four vectors,
hints at this. Such structures had occurred, however, well before the theory of relativ-
ity and the Minkowski geometry in physics and mathematics. Especially as results of
mathematical aspirations to generalize the so successful complex numbers. Hamil-
ton called these four element structures quaternions while Gauss tagged them as
hypercomplex numbers. These two structures are equivalent. All operations with
such quantities can be derived from the corresponding operations of certain “unit
elements”, spanning a ring over the algebra.
In general complex numbers can be given by two real numbers, and their multi-
plication rules are based on the corresponding rules for the imaginary unit, i, and the
real unit, 1. The product of general complex numbers, (x, y) = x + i y, owing to the
rule i 2 = −1 follows the following pattern:
Beyond the real number components we have three, pairwise “orthogonal” imaginary
units, i, j, k. Using them as “axes” we obtain the quaternion representation.
However, one problem is still left. If k = ji is another imaginary unit, then what
is i j ?. In a product of two different hypercomplex numbers namely both orders of
i and j occur. We have to find a rational agreement, which closes the quaternion
algebra multiplication table. A proof can be given that the correct choice is only
i j = −k, if we want to conserve the imaginary unit property of the square being
minus one for all, i 2 = j 2 = k 2 = −1. Since according to the above rules we have
i jk = i j ji = −i 2 = 1, therefore −i j = i jk 2 = i jk · k = k so i j = −k.
From the same requirement it follows that exchanging the order of different imag-
inary units a sign change enters, i.e. these quaternion imaginary units anticommute.
Counting them with the index notation of Minkowski geometry from 0 to 3, it is
purposeful to define e0 = 1, e1 = j, e2 = i and e3 = k, for containing the ring of the
quaternion algebra as
e02 = e0 , e0 ei = ei e0 = ei ,
ei e j = −δi j e0 + i jk ek , (4.72)
So a general quaternion can be viewed as a four real number structure, but also
as a combined structure of a scalar and a three-vector:
We shall see that the scalar–vector notation fits to spacetime and to the electromag-
netic equations well, the symmetries of the action are pictured in the multiplication
rule above. The product of two general quaternions, based on the above rule, becomes
or in 3-vector notation
This rules combine the scalar and vector product applied to three-vectors. That shows
that they are fit for exploring the algebraic structure of the Maxwell equations.
The multiplication rules (4.72) can be represented in various ways. The most
frugal one uses the Pauli matrices with a factor i and the 2 × 2 unit matrix:
1 0 i 0
e0 = e1 =
0 1 0 −i
0 1 0 i
e2 = e3 = (4.76)
−1 0 i 0
The quaternion conjugate changes the sign of the vector part (all imaginary parts),
Q † = (q0 , −q)-t, so it can be used for creating the length square as a sure scalar
quantity:
Q2 = Q Q † = Q † Q = (q02 + q2 , 0) (4.77)
is symmetric, contains only a real component and it is then and only then zero if all
components are zero. On the other hand the square of a quaternion, like the square
of a complex number, is still a general quaternion, Q 2 = (q02 − q2 , 2q0 q).
The quaternions inherent in the electrodynamics, however, contain their vectorial
part multiplied with another imaginary unit, i, due to the Minkowski metric. This i is
different from all ek quaternion units! Only the vector component is pure imaginary.
The four vector potential therefore is represented by a quaternion, A = (, iA), the
derivations against space and time coordinates by the partial derivative quaternion,
∂ = (∂t , i∇). In this section the field strength is also a quaternion, for some later
purpose we define it as a conjugate one:
F = ∂ † A† . (4.78)
86 4 Electrodynamics: Forces, Fields, Waves
Examining the scalar and vector components of the result, F = (G, iF), we can
determine that the 3-vector part is the complex field strength, F = E + iB, and the
scalar part, G = ˙ + ∇A, is exactly the quantity set to zero in the Lorenz gauge.
The half of the length square of the field strength quaternion delivers the Lagrange
density part without the coupling to charges and currents:
1 † 1 2
F F= G − F2 , (4.80)
2 2
that substituting the expressions for G and F from Eq. (4.56) gives
1 † 1 1 2
F F= ˙ + ∇A 2 − Ȧ + ∇ − i∇ × A . (4.81)
2 2 2
For the coupling to charges and currents we utilize the four current quaternion,
J = (ρ, ij). This, multiplied with the four vector potential quaternion, still contains
some vectorial components. We cure this problem by adding the conjugate term.
and
A J † = (, iA) · (ρ, −ij) (4.83)
1 †
J A + A J † = (ρ − j A, 0 ). (4.84)
2
Based on all this the variational principle of electrodynamics as a functional of the
quaternion fields and currents, in Lorenz gauge, takes the form
1 †
S= d4x F F + J A† + A J † . (4.85)
2
This form is simple and elegant. Moreover the wave equations can be derived from
this easily. Only the basic properties of the quaternion multiplication and conjugation
have to be used for this. Since the conjugate of a product is the product of the
conjugates in opposite order (cf. the simplest quaternion representation contains
matrices), the conjugate quaternion field strength can be expressed by using the
derivation operator “acting backwards”:
† ←
F † = ∂ † A† = A ∂ . (4.86)
4.8 Quaternion Formalism 87
Here the partial derivation as a quaternion stands after the four potential quaternion,
but it acts on the components of A, as “backwards”. That is denoted by the backward
arrow over the partial derivative operation quaternion.
Using all this we obtain for the part independent of sources
1 1 ←
S0 = d4x F† F = d4x (A ∂ ) (∂ † A† ) . (4.87)
2 2
Integrations by part in this notation can be arranged by a single step. The “backwards”
derivative becomes then a “forward” derivative:
1
S0 = d 4 x A(∂ ∂ † )A† . (4.88)
2
In this form only the square length of the partial derivative operation quaternion
occurs, this is a scalar operator.
= ∂ ∂ † = (∂t2 − ∇ 2 , 0 ) (4.89)
δS
= −A† + J † = 0,
δA
δS
= −A + J = 0. (4.91)
δ A†
The result of variation against the quaternion four vector potential is a pure wave
equation for all components (since is a quaternion scalar operation): A = J .
Viewing the electromagnetic field strength as a quaternion plays a useful role
beyond the elegant formulation for the action and the simple derivation of the wave
equation. It can also be used for constructing the energy–momentum tensor com-
ponents. If in the product with F we use the component-wise complex conjugated
instead of the quaternion conjugate, F ∗ = (G, iF∗ ), then the energy density and
current appears:
1 1
T = F F ∗ = (G, iF) · (G, iF∗ ) (4.92)
2 2
can be cast into the form
1 2
T = G + F · F∗ , iGF∗ + iGF − F × F∗ . (4.93)
2
88 4 Electrodynamics: Forces, Fields, Waves
Here using the electric and magnetic fields in the complex field strength this expres-
sion simplifies to
1 2
T = G + E2 + B2 , 2iGE + 2iE × B . (4.94)
2
In the Lorenz gauge G = 0, so what remain are only the familiar expressions of
energy density and Poynting vector,
1 2
T = E + B2 , iE × B = (w, iS) . (4.95)
2
Chapter 5
Quantum Mechanics: The Most Classical
Non-classical Theory
with S being the action, Q the generalized coordinate, t the time parameter and
H (Q, P, t) the Hamilton function.
In the stationary case the Hamilton function does not depend on time explicitly,
therefore the time dependence of the action is trivial and can be separated: S =
S (Q, Q̇) − t E. The Q-derivative of the reduced action, S , is the same as that of the
complete action. The Hamilton–Jacobi equation takes a simpler form,
∂S
H Q, − E = 0, (5.2)
∂Q
telling that the value of the Hamilton function is the energy. The classical motion in
phase space happens on a hypersurface with constant energy (shell).
In the quantum mechanics a breaking of the classical dynamics is realized. The
question is, how. Let us characterize the measure of the break with classical physics
by the following integral,
∂S
K stac = dQ H Q, −E w(Q), (5.3)
∂Q
and search for that theory, which minimizes this break. This minimization is a vari-
ational principle, that leads to the stationary Schrödinger equation, provided the
assumptions below.
First we express the action in terms of the eikonal, ψ, as
S = k ln ψ, (5.4)
with k being a constant to be determined later. Although this formula reminds about
the Boltzmannian definition of entropy, we should not mix it up. The action S is not
an entropy, and the eikonal factor is not a probability or phase space volume. The
weighting factor, w(Q), in fact depends on the whole orbit; in simple cases only
through the action. We search for that w(ψ) weighting factor under the integral,
which makes the variational minimum equation linear in ψ. This requirement of
linearity corresponds to the experienced wavelike behavior, to the interference of
possible orbits in quantum mechanics.
First we carry out this analysis for a certain class of Hamilton functions,
1 2
H (Q, P) = P + V (Q). (5.5)
2M
This means a separable and nonrelativistic kinetic energy term. Our quantity to be
varied is the following integral:
2
k2 1 ∂ψ
K stac = dQ + V (Q) − E w(ψ). (5.6)
2M ψ ∂Q
In order to achieve a variational equation linear in ψ, the expression under the integral
must be quadratic, so we choose
w(ψ) = ψ 2 . (5.7)
By doing so we obtain
2
k2 ∂ψ
lin
K stac = dQ + (V (Q) − E)ψ 2 . (5.8)
2M ∂Q
5.2 Dynamical Case 91
The variational equation minimizing the above expression, quantifying the break with
the classical dynamics, is nothing else than the stationary Schrödinger equation.
δ K stac
lin
k2 ∂ 2ψ
=− + (V (Q) − E)ψ = 0. (5.9)
δψ 2M ∂ Q 2
1
P 2 + V (Q) − E = 0, (5.10)
2M
in general a Hamilton–Jacobi equation, only P carries a meaning and a role different
from classical mechanics. This is mathematically equivalent to fixing a consequence
of the variational principle of Schrödinger to a classical formula on the energy:
1
P̂ 2 + V ( Q̂) − E 1̂ ψ = 0. (5.11)
2M
Here by the sign ˆ we emphasize that those quantities are no more functions or
variables, but operators:
k ∂
P̂ = , Q̂ = Q· (5.12)
i ∂Q
k
P̂, Q̂ = . (5.13)
i
Finally the value of the constant k is , the Planck constant divided by 2π .
In the dynamical case the variational principle contains a time integral, too. Wanting
the linearity in ψ further, we have to allow for complex for it. A complex eikonal
leads to a complex action, and therefore the momentum is also not necessarily real.
This is the situation in the ”prohibited” zone, where the kinetic energy is negative.
This is in the classical physics forbidden.
92 5 Quantum Mechanics: The Most Classical Non-classical Theory
Generalizing the discussion in the previous section, the weighting factor under
the integral is modified to w = |ψ|2 = ψ ∗ ψ. The term expressing the kinetic energy
also changes,
1 ∂S 2
T = ≥ 0, (5.14)
2M ∂ Q
is the non-negative expression. The break with the classical Hamilton–Jacobi equa-
tion is now integrated both over space and time:
∂S ∂S
K din = dt dQ H , Q, t + w(Q, t). (5.15)
∂Q ∂t
δ K din |k|2 ∂ 2 ∂ψ
∗
= − ψ + V (Q)ψ + k = 0. (5.17)
δψ 2M ∂ Q 2 ∂t
We can be satisfied with this result, if the complex conjugate of the variation with
respect to ψ agrees with the variation result against the conjugate ψ ∗ :
δ K din |k|2 ∂ 2 ∗ ∗ ∂ψ ∗
=− ψ + V (Q)ψ − k = 0. (5.18)
δψ 2M ∂ Q 2 ∂t
k = −k ∗ , (5.20)
i.e. k is purely imaginary. The familiar time dependent Schrödinger equation emerges
with the choice k = /i
∂ψ 2 ∂ 2
i =− ψ + V (Q)ψ. (5.21)
∂t 2M ∂ Q 2
5.3 Relativistic Case: Time-Independent Potential 93
Insisting formally on the classical dynamical equation, i.e. using operator versions
P̂ and Q̂ one defines a Hamilton operator,
∂ψ ∂
i =H , Q ψ = Ĥ ψ. (5.22)
∂t i ∂Q
We note that for a free motion w = 1 is constant and normalized all over the orbit.
The relativistic Hamilton–Jacobi equation for a mass point is based on the energy—
momentum relation, the dispersion relation for the wave:
1 2
E = Mc2 + P + V. (5.25)
2M
This expression contains the usual kinetic and potential energy terms besides a rest
mass energy, Mc2 . The corresponding Hamilton–Jacobi equation relies on the sub-
stitutions
∂S ∂S
P= , E = Mc2 − , (5.26)
∂Q ∂t
In the nonrelativistic approximation the right hand side (rhs) is zero. Physically this
means that the kinetic energy
94 5 Quantum Mechanics: The Most Classical Non-classical Theory
2
1 ∂S
T = (5.29)
2M ∂Q
would be equal to − ∂∂tS − V , that—substituted into the rhs translates to the negligi-
bility of the ratio kinetic to rest mass energy, T /Mc2 .
We allow for a complex eikonal in the relation S = klnψ learning from the expe-
rience of dynamical case. Therefore the factor, w = |ψ|2 , weights the break with the
Hamilton–Jacobi equation, and the kinetic terms contain the absolute value squares
of complex terms. The variational principle is to extremize the following functional:
2 2
k ∂ψ k ∂ψ
K = dtd Q Mc2 − V − − c − (Mc2 )2 |ψ|2 . (5.30)
ψ ∂t ψ ∂Q
If the operator 1 − Â can be inverted and expanded into a convergent series, meaning
the smallness of the kinetic energy compared to the rest mass energy, we obtain
∞
 +  T̂
n n
ψ = 0. (5.36)
n=0
Another interesting limiting case occurs when V = Mc2 . With this choice Eq. (5.33)
becomes the Klein–Gordon equation:
2
1 ∂2 Mc
ψ − ∇2ψ + ψ = 0. (5.37)
c2 ∂t 2
It is also worth to discuss the extreme relativistic case, occurring for vanishing rest
mass. Then from Eq. (5.32) we gain
1 2i 1
ψ̈ − ∇ 2 ψ + 2 V ψ̇ − 2 V 2 ψ = 0. (5.38)
c 2 c
Its solution is
ψ = ϕe− t V ,
i
(5.39)
and for ϕ we obtain a wave equation describing propagation with the speed of light,
1
ϕ̈ − ∇ 2 ϕ = 0. (5.40)
c2
This results holds also for time-dependent
V potentials, with the only difference that
in the exponent not t V , but V dt appears.
The above analysis is incomplete, it does not handle either the spin or the antimatter.
The spin can be noticed in the presence of a magnetic field, the antiparticle carries an
electric charge opposite to that of a particle. So the relativistic Schrödinger equation
is even more interesting when a general electromagnetic vector potential is present.
In a relativistic dispersion relation both the energy and the momentum get shifted
in external electromagnetic fields:
Variation against ψ ∗ is now equivalent with the complex conjugate of the variation
against ψ for arbitrary complex k. It provides
δK ∗ ∂
= (Mc − e ) ψ − (Mc − e )k ψ̇ + k
2 2 2
(Mc − e )ψ − |k| ψ̈
2 2
δψ ∗ ∂t
2 2 2
− −c |k| ∇ ψ − ckeA∇ψ + ck ∗ ∇ (eAψ) + e2 A2 ψ
− (Mc2 )2 ψ (5.43)
Using the Lorenz gauge fixing, c∇A + ˙ = 0, so derivatives of the scalar and vec-
tor potential can cancel each other. The rest can be comprised into the following
equation,
c2 |k|2 ∇ 2 ψ − |k|2 ψ̈ + (k − k ∗ ) (Mc2 − e )ψ̇ − ecA∇ψ
+ −2Mc2 e + e2 ( 2 − A2 ) ψ = 0. (5.44)
It is rewarding to rewrite Eq. (5.45) using the g = e/c coupling constant and the
μ = Mc/ inverse Compton wavelength parameters. It is
1 1
ψ̈ − ∇ 2 ψ + 2g μψ = 2i(μ ψ̇ − gA∇ψ) + g 2 ( 2
− A2 )ψ. (5.47)
c2 c
The massless limit appears at μ = 0:
1
ψ̈ − ∇ 2 ψ = −2igA∇ψ + g 2 ( 2
− A2 )ψ. (5.48)
c2
The zero coupling limit looks like
1 1
ψ̈ − ∇ 2 ψ = 2iμ ψ̇, (5.49)
c2 c
5.5 Speculations 97
or rearranged somewhat,
2 2
iψ̇ = ∂ ψ, (5.50)
2M
with
1 ∂2
∂2 = − ∇2 (5.51)
c2 ∂t 2
being the D’Alambert wave operator. In the limit c2 → ∞ we gain back the free
Schrödinger equation. At the same time the solution of the wave equation (5.50),
i 2
ψ = ϕ e Mc t , (5.52)
5.5 Speculations
From the variational principle approach the Copenhagen interpretation does not fol-
low. Here w = |ψ|2 is a weighting factor in measuring the break with the classical
dynamics, not a probability density. Therefore it is not guaranteed that its integral
over the space would be finite, so whether the wave function is ”normalizable” is a
further question. On the other hand, theoretically, it may differ from zero on disjoint
intervals. In that way regions may exist in spacetime where w = 0 and therefore
arbitrary deviations from the classical Hamilton–Jacobi equation are allowed.
From the classical definition of the eikonal it follows that if S/k is purely imagi-
nary, i.e. if using k = /i the action, S, as is real then the weighting factor is unity:
w = 1. This means that all trajectories with this property count with the same factor
with respect to the variational principle. To the contrary, in typical classically for-
bidden zones, e.g. with negative kinetic energy, the momentum is purely imaginary
and so is the S = Pd Q action. For such regions the weighting factor is less than
one, and it decreases exponentially with the intrusion depth. Here the break with the
classical dynamics is less important, so can be bigger. This is the physics behind
quantum tunneling.
The following is speculative, but possibly a challenging exercise for the fan-
tasy. Let us investigate step by step the thoughts summarized above by looking for
mathematical alternatives and their physical interpretation. One question is why the
constant k cannot be real in connecting the action and the eikonal. Or genuinely
complex instead of pure imaginary. Perhaps the real part is suppressed by a c2 factor.
We emphasize that for a real k = b the Schrödinger equation would not emerge,
since the term describing imaginary diffusion would be missing. On the other hand it
98 5 Quantum Mechanics: The Most Classical Non-classical Theory
would for sure be causal, since the term describing the wave propagation in the rela-
tivistic treatment contains |k|2 only. Then the functional derivative of the variational
principle (5.32) would deliver
1 1
ψ̈ − ∇ 2 ψ + 2 V (2Mc2 − V )ψ = 0. (5.54)
c2 b
This equation gives wavelike solution in the range V < 2Mc2 , and exponentially
damped ones for V > 2Mc2 . V = 2Mc2 is the threshold for pair creation, above this
the Klein paradox occurs . Here quantum field theory starts improving on quantum
mechanics.
Speculations about what more can be done based on a variational principle are also
possible. Formulating the Hamilton–Jacobi equation in gravitational field the above
recipe may be applied to seek for an equation breaking minimally the classical theory
in a variational sense. It is also an interesting question whether the Einstein–Hilbert
action possesses a wave like limit, describing perhaps a post-newtonian gravity wave,
where the above strategy could be repeated. This would be a Schrödinger variational
type approach to quantum gravity. It is not clear on this level whether this could be
related to string theory or not.
We formulate it again: the breaking of the equation describing the classical dynam-
ics with a weight factor in spacetime so that the variational problems becomes lin-
ear in something (we have discussed so far the complex eikonal), truly contains
quantum uncertainty. Obviously the Schrödinger equation, solving the variational
problem of breaking with the Hamilton–Jacobi equation, is not identical with the
Hamilton–Jacobi equation itself. When and if someone insists on the validity of the
Hamilton–Jacobi equation (HJ), then that person would be at pains to transform the
Schrödinger equation to a HJ form. The price to pay is the operator formalism. And
so happened in the history of physics in the present universe.
The rest follows shortly. The operators P̂ and Q̂ cannot be exchanged, and there-
fore due to an algebraic identity their variances are correlated: the product cannot
be made smaller than /2 at each component. Consequently the operator Ĥ is not
exactly the physical energy, it ”quantum fluctuates”. The variational principle, we
were discussing in this chapter, also says H = E, smeared in space but as a total
minimally.
This view might help to resolve a contradiction between point particles and plane
waves. The relativistic improvements on Schrödinger equation on the other hand
ensure that no physical effect propagates faster than light, so no EPR paradox arises.
Further Readings
• I.M. Gelfand, S.V. Fomin, Calculus of Variations (translated to English by R.A. Sil-
vermann) (Prentice-Hall Inc., Englewood Clitts, N.J., 1963) (Reprinted by Dover
Publication, New York, 2000).
• J.T. Deversee, G.V. Berghe, Magic is No Magic; The Wonderful World of Simon
Stevin (WIT Press, Antwerpen, 2008).
Ampére, André Marie 1775–1836, French physicist and mathematician, one of the
founding fathers of the theory of electricity, the unit of electric current strength is
named after him. The Ampére-law is also associated to his name, describing the
magnetic effect of electric currents.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 101
T. S. Biró, Variational Principles in Physics,
SpringerBriefs in Physics,
https://doi.org/10.1007/978-3-031-27876-1
102 Glossary of People Appearing in the Hungarian Book
Bohr, Niels 1885–1962, Danish physicist, one of the pioneers of quantum physics.
The Bohr model was the first non-classical model about atoms. His name appears in
phrases like Bohr radius, and Bohr magneton, too. Infamous are his disputations with
Einstein about the essence and interpretation of quantum uncertainty. The element
107 is named Bohrium. A crater on the Moon and the asteroid 3948 carry his name.
The Niels Bohr Institute in Copenhagen is renown among physics research institutes.
Buridan, Jean 1295–1358, by his Latin name Johannes Buridanus, French vicar,
who helped the distribution of the Copernican theory in Europe. As the most impor-
tant philosopher in the late Medieval, he founded the notion of inertia. The theory of
impetus is assigned to him, and—more known—a parabola about a hungry asinine
not being able to choose the shortest path to the food.
Dirac, Paul Adrian Maurice 1902–1984, British physicist, one of the founders of
quantum mechanics, brother-in-law of Eugene Wigner. He received the Nobel Prize
in 1933 together with Schrödinger, “for the discovery of new effective forms of the
atomic theory”. It binds to his name the Dirac equation, which predicted the antipar-
ticle to the electron, the positron, first. The Dirac distribution, also called Dirac delta,
is frequently used in mathematical physics. The statistics of fermions is often cited
as the Fermi–Dirac distribution.
Einstein, Albert 1879–1955, Germany born, then Swiss, later Prussian physicist;
he escapes from Nazi Germany to the USA. He is most known due to his theories
of special and general relativity, however, he was awarded the Nobel Prize in 1905
for explaining the photoeffect. He was first in explaining the Brown motion, and
pioneered the photon theory of light. His name is composed into the Bose–Einstein
distribution and into the element Einsteinium. A pacifist, pantheist, worked long
on the final unified field theory. His disputes with Bohr about the interpretation of
quantum mechanics (according to Einstein God does not throw dice), as well as his
role in promoting the construction of the first atomic bomb in his letter to president
Roosevelt contributed to his world fame. The position of being the first president to
modern Israel was offered to him, but he refused.
104 Glossary of People Appearing in the Hungarian Book
Eötvös, Loránd (Roland) 1848–1947, Baron, Hungarian physicist. Most known are
his experiments with the Eötvös pendulum for measuring the strength and gradients
of the gravitational acceleration, but he worked on the capillarity of fluids, too. He
was Chairman of the Hungarian Academy of Science and minister for education. His
name is carried by the Eötvoös Physical Society, and the Roland Eötvös University
in Budapest. The mineral lorandit is named after him. The CGS unit for the gravity
gradient is 1 eotvos.
Euler, Leonhard Paul 1707–1783, mathematician and physicist with Swiss origin,
but he spent the majority of his life on German ground and in Russia, Saint Peters-
burg. His results are important for the fields of mathematical analysis, number theory,
geometry and graph theory. In physics, the Euler equation describes the streaming
of fluids, while the Euler–Lagrange equations are the derived equations of motion to
a Lagrangian.
Fermat, Pierre de 1601–1665, French mathematician. Most known due to the Fer-
mat conjecture, which could have been proven only 300 years later, in the recent past.
His interest included the calculation of probability and the prime numbers. In phys-
ical optics the principle of shortest light ray propagation time is named as Fermat
principle.
Fock, Vladimir Alexandrovich 1898–1974, Soviet physicist. Most known from the
many body quantum mechanics; his name is connected to the phrase “Fock space”
(and to corresponding operators), the Fock representation and the Hartree–Fock
method.
Glossary of People Appearing in the Hungarian Book 105
Galilei, Galileo 1564–1642, the most renown Italian physicist. The foundation of
mechanics, the study of free fall, the relativity principle for motion is tagged to his
name as well as the discovery of the four largest moons around the Jupiter with a
self-assembled telescope. He discovered the phases of Venus and the sunspots, he
was a resolute promoter of the Copernican worldview. The Holy Inquisition made
him to withdraw his theses, admittedly after this he commented that “eppur si muove”
(yet it moves, i.e. the Earth). The CGS unit of the gravitational acceleration is 1 gal,
to honor him.
others in the Gibbs distribution. He pioneered the use of vector analysis in theoretical
physics and chemistry. It is to be mentioned the Gibbs–Duhem relation and the Gibbs
free energy.
Hamilton, William Rowan, Sir 1805–1865, Irish physicist, astronomer and math-
ematician. In physics his name is most known due to Hamiltonian mechanics, but a
central quantity in quantum mechanics, the Hamilton operator, also carries his name.
He initiated the mechanical action principle, the Hamilton–Jacobi equation, the use
of the phrase “tensor”, the application of the “nabla” symbol, the quaternion algebra
and some further theorems in algebra and group theory, like the Cayley–Hamilton
theorem, the Hamiltonian path in diagrams and more.
I, J
Langevin, Paul 1872–1946, French physicist. Most known is the Langevin equation,
developed by him to describe the stochastic Brownian motion. He dealt with para-
and diamagnetism, and also took part in the development of submarine detection by
ultrasound. The twin paradox in special relativity stems from him.
108 Glossary of People Appearing in the Hungarian Book
Levi-Civita, Tullio 1873–1941, Italian mathematician, most known are his works in
tensor calculus. The totally antisymmetric unit tensor wears his name.
Lorentz, Heindrik Antoon 1853–1928, Nobel Prized physicist from Holland. His
name is remembered in physics by the Lorentz transformation, and by the Lorentz
force acting on charges which move in electromagnetic fields. He received the Nobel
Prize in 1902 for the theoretical explanation of the Zeeman effect.
Navier, Claude-Louis 1785–1836, French physicist and engineer. His name is pre-
served in the Navier–Stokes equation. Since 1824 member of the French Academy,
he invested ample time in researching elasticity.
Newton, Isaac, Sir 1643–1727, the most known English physicist of all times, more-
over he was an astronomer, mathematician and philosopher, as well as an alchemist
and theologist. His central work, “Philosophiae Naturalis Principia Mathematica”
(mathematical principles of natural philosophy) lays down the basic principles of
Newtonian mechanics. Besides the Newton equation further axioms laid by him in
the description of mechanical motion and his law of general mass gravitation uni-
fied the earthly and celestial physics. Known are his experiments in optics (prism,
telescope), further his integral and differential calculus, “fluxion theory”, at the fun-
daments of mathematical analysis, like the Newton–Leibniz theorem.
Pythagoras, of Samos 575–495 BC, ancient Greek ionic philosopher and mathe-
matician. Most known due to the Pythagoras theorem, relating the side lengths of an
orthogonal triangle. Based on his views about the cosmic harmony and the impor-
tance of mathematics in the classical antique a religious movement was founded. The
proof that the square root of two is an irrational number and the phrase “harmony of
spheres” are attributed to him.
110 Glossary of People Appearing in the Hungarian Book
Planck, Max 1858–1947, German physicist, born as Karl Ernst Ludwig Marx Planck,
his name Marx he himself changed to Max later. Most famous is the Planck constant
named after him. That quantity was introduced in the Planck law of black body
radiation, and it had been revealed later that it is a fundamental constant in nature.
Nobel Prize winner (1918), most known promoter of Albert Einstein. The German
research institution network, formerly Kaiser Wilhelm Institutes, are named to date
to Max Planck Institutes. His name is worn in the Planck length, also as Planck
scale, Planck mass, Planck time, Planck energy, Planck temperature; beyond which
the entanglement between gravity and quantum physics cannot be prevented any
more.
Podolsky, Boris 1896–1966, Russian, later American physicist. Known from the
EPR (Einstein–Podolsky–Rosen) paradox, originally formulated to demonstrate the
incompatibility between quantum mechanics and special relativity. A closer investi-
gation of the mechanism of the information propagation resolves this paradox.
Poynting, John Henry 1852–1914, English physicist. His name is known from the
Poynting vector, assigned to the energy current flowing in electromagnetic fields.
Prigogine, Ilya Viscount 1917–2003, born in the Soviet Union, a Belgian chem-
ical physicist with Russian origin. He won in 1977 the Nobel Prize for his results
on dissipative structures. Later the theory of Self Organizing Criticality (SOC) was
developed from those results.
Ritz, Walter 1878–1909, Swiss theoretical physicist. He is known due to the Ritz
method and the Rydberg–Ritz formula.
U, V
Zenon, 490–430 B.C., classical Greek philosopher before Socrates in Eleia, South
Italy. Aristotle called him as the inventor of dialectics (the science of disputes). He
is known due to his paradoxes about motion and time.