Contents How Did We Get Here? The Tangled History of the Second Law of Thermodynamics

Now available in hardback and on Kindle:The Second Law: Resolving the Mystery of the Second Law of Thermodynamics

How Did We Get Here? The Tangled History of the Second Law of Thermodynamics

January 31, 2023

This is part 3 in a 3-part series about the Second Law:

How Did We Get Here? The Tangled History of the Second Law of Thermodynamics

The Basic Arc of the Story

As I’ve explained elsewhere, I think I now finally understand the Second Law of thermodynamics. But it’s a new understanding, and to get to it I’ve had to overcome a certain amount of conventional wisdom about the Second Law that I at least have long taken for granted. And to check myself I’ve been keen to know just where this conventional wisdom came from, how it’s been validated, and what might have made it go astray.

And from this I’ve been led into a rather detailed examination of the origins and history of thermodynamics. All in all, it’s a fascinating story, that both explains what’s been believed about thermodynamics, and provides some powerful examples of the complicated dynamics of the development and acceptance of ideas.

The basic concept of the Second Law was first formulated in the 1850s, and rather rapidly took on something close to its modern form. It began partly as an empirical law, and partly as something abstractly constructed on the basis of the idea of molecules, that nobody at the time knew for sure existed. But by the end of the 1800s, with the existence of molecules increasingly firmly established, the Second Law began to often be treated as an almost-mathematically-proven necessary law of physics. There were still mathematical loose ends, as well as issues such as its application to living systems and to systems involving gravity. But the almost-universal conventional wisdom became that the Second Law must always hold, and if it didn’t seem to in a particular case, then that must just be because there was something one didn’t yet understand about that case.

There was also a sense that regardless of its foundations, the Second Law was successfully used in practice. And indeed particularly in chemistry and engineering it’s often been in the background, justifying all the computations routinely done using entropy. But despite its ubiquitous appearance in textbooks, when it comes to foundational questions, there’s always been a certain air of mystery around the Second Law. Though after 150 years there’s typically an assumption that “somehow it must all have been worked out”. I myself have been interested in the Second Law now for a little more than 50 years, and over that time I’ve had a growing awareness that actually, no, it hasn’t all been worked out. Which is why, now, it’s wonderful to see the computational paradigm—and ideas from our Physics Project—after all these years be able to provide solid foundations for understanding the Second Law, as well as seeing its limitations.

And from the vantage point of the understanding we now have, we can go back and realize that there were precursors of it even from long ago. In some ways it’s all an inspiring tale—of how there were scientists with ideas ahead of their time, blocked only by the lack of a conceptual framework that would take another century to develop. But in other ways it’s also a cautionary tale, of how the forces of “conventional wisdom” can blind people to unanswered questions and—over a surprisingly long time—inhibit the development of new ideas.

But, first and foremost, the story of the Second Law is the story of a great intellectual achievement of the mid-19th century. It’s exciting now, of course, to be able to use the latest 21st-century ideas to take another step. But to appreciate how this fits in with what’s already known we have to go back and study the history of what originally led to the Second Law, and how what emerged as conventional wisdom about it took shape.

What Is Heat?

Once it became clear what heat is, it actually didn’t take long for the Second Law to be formulated. But for centuries—and indeed until the mid-1800s—there was all sorts of confusion about the nature of heat.

That there’s a distinction between hot and cold is a matter of basic human perception. And seeing fire one might imagine it as a disembodied form of heat. In ancient Greek times Heraclitus (~500 BC) talked about everything somehow being “made of fire”, and also somehow being intrinsically “in motion”. Democritus (~460–~370 BC) and the Epicureans had the important idea (that also arose independently in other cultures) that everything might be made of large numbers of a few types of tiny discrete atoms. They imagined these atoms moving around in the “void” of space. And when it came to heat, they seem to have correctly associated it with the motion of atoms—though they imagined it came from particular spherical “fire” atoms that could slide more quickly between other atoms, and they also thought that souls were the ultimate sources of motion and heat (at least in warm-blooded animals?), and were made of fire atoms.

And for two thousand years that’s pretty much where things stood. And indeed in 1623 Galileo (1564–1642) (in his book The Assayer, about weighing competing world theories) was still saying:

Those materials which produce heat in us and make us feel warmth, which are known by the general name of “fire,” would then be a multitude of minute particles having certain shapes and moving with certain velocities. Meeting with our bodies, they penetrate by means of their extreme subtlety, and their touch as felt by us when they pass through our substance is the sensation we call “heat.”

He goes on:

Since the presence of fire-corpuscles alone does not suffice to excite heat, but their motion is needed also, it seems to me that one may very reasonably say that motion is the cause of heat… But I hold it to be silly to accept that proposition in the ordinary way, as if a stone or piece of iron or a stick must heat up when moved. The rubbing together and friction of two hard bodies, either by resolving their parts into very subtle flying particles or by opening an exit for the tiny fire-corpuscles within, ultimately sets these in motion; and when they meet our bodies and penetrate them, our conscious mind feels those pleasant or unpleasant sensations which we have named heat…

And although he can tell there’s something different about it, he thinks of heat as effectively being associated with a substance or material:

The tenuous material which produces heat is even more subtle than that which causes odor, for the latter cannot leak through a glass container, whereas the material of heat makes its way through any substance.

In 1620, Francis Bacon (1561–1626) (in his “update on Aristotle”, The New Organon) says, a little more abstractly, if obscurely—and without any reference to atoms or substances:

[It is not] that heat generates motion or that motion generates heat (though both are true in certain cases), but that heat itself, its essence and quiddity, is motion and nothing else.

But real progress in understanding the nature of heat had to wait for more understanding about the nature of gases, with air being the prime example. (It was actually only in the 1640s that any kind of general notion of gas began to emerge—with the word “gas” being invented by the “anti-Galen” physician Jan Baptista van Helmont (1580–1644), as a Dutch rendering of the Greek word “chaos”, that meant essentially “void”, or primordial formlessness.) Ever since antiquity there’d been Aristotle-style explanations like “nature abhors a vacuum” about what nature “wants to do”. But by the mid-1600s the idea was emerging that there could be more explicit and mechanical explanations for phenomena in the natural world.

And in 1660 Robert Boyle (1627–1691)—now thoroughly committed to the experimental approach to science—published New Experiments Physico-mechanicall, Touching the Spring of the Air and its Effects in which he argued that air has an intrinsic pressure associated with it, which pushes it to fill spaces, and for which he effectively found Boyle’s Law PV = constant.

But what was air actually made of? Boyle had two basic hypotheses that he explained in rather flowery terms:

His first hypothesis was that air might be like a “fleece of wool” made of “aerial corpuscles” (gases were later often called “aeriform fluids”) with a “power or principle of self-dilatation” that resulted from there being “hairs” or “little springs” between these corpuscles. But he had a second hypothesis too—based, he said, on the ideas of “that most ingenious gentleman, Monsieur Descartes”: that instead air consists of “flexible particles” that are “so whirled around” that “each corpuscle endeavors to beat off all others”. In this second hypothesis, Boyle’s “spring of the air” was effectively the result of particles bouncing off each other.

And, as it happens, in 1668 there was quite an effort to understand the “laws of impact” (that would for example be applicable to balls in games like croquet and billiards, that had existed since at least the 1300s, and were becoming popular), with John Wallis (1616–1703), Christopher Wren (1632–1723) and Christiaan Huygens (1629–1695) all contributing, and Huygens producing diagrams like:

But while some understanding developed of what amount to impacts between pairs of hard spheres, there wasn’t the mathematical methodology—or probably the idea—to apply this to large collections of spheres.

Meanwhile, in his 1687 Principia Mathematica, Isaac Newton (1642–1727), wanting to analyze the properties of self-gravitating spheres of fluid, discussed the idea that fluids could in effect be made up of arrays of particles held apart by repulsive forces, as in Boyle’s first hypothesis. Newton had of course had great success with his 1/r² universal attractive force for gravity. But now he noted (writing originally in Latin) that with a 1/r repulsive force between particles in a fluid, he could essentially reproduce Boyle’s law:

Newton discussed questions like whether one particle would “shield” others from the force, but then concluded:

But whether elastic fluids do really consist of particles so repelling each other, is a physical question. We have here demonstrated mathematically the property of fluids consisting of particles of this kind, that hence philosophers may take occasion to discuss that question.

Well, in fact, particularly given Newton’s authority, for well over a century people pretty much just assumed that this was how gases worked. There was one major exception, however, in 1738, when—as part of his eclectic mathematical career spanning probability theory, elasticity theory, biostatistics, economics and more—Daniel Bernoulli (1700–1782) published his book on hydrodynamics. Mostly he discusses incompressible fluids and their flow, but in one section he considers “elastic fluids”—and along with a whole variety of experimental results about atmospheric pressure in different places—draws the picture

and says

Let the space ECDF contain very small particles in rapid motion; as they strike against the piston EF and hold it up by their impact, they constitute an elastic fluid which expands as the weight P is removed or reduced; but if P is increased it becomes denser and presses on the horizontal case CD just as if it were endowed with no elastic property.

Then—in a direct and clear anticipation of the kinetic theory of heat—he goes on:

The pressure of the air is increased not only by reduction in volume but also by rise in temperature. As it is well known that heat is intensified as the internal motion of the particles increases, it follows that any increase in the pressure of air that has not changed its volume indicates more intense motion of its particles, which is in agreement with our hypothesis…

But at the time, and in fact for more than a century thereafter, this wasn’t followed up.

A large part of the reason seems to have been that people just assumed that heat ultimately had to have some kind of material existence; to think that it was merely a manifestation of microscopic motion was too abstract an idea. And then there was the observation of “radiant heat” (i.e. infrared radiation)—that seemed like it could only work by explicitly transferring some kind of “heat material” from one body to another.

But what was this “heat material”? It was thought of as a fluid—called caloric—that could suffuse matter, and for example flow from a hotter body to a colder. And in an echo of Democritus, it was often assumed that caloric consisted of particles that could slide between ordinary particles of matter. There was some thought that it might be related to the concept of phlogiston from the mid-1600s, that was effectively a chemical substance, for example participating in chemical reactions or being generated in combustion (through the “principle of fire”). But the more mainstream view was that there were caloric particles that would collect around ordinary particles of matter (often called “molecules”, after the use of that term by Descartes (1596–1650) in 1620), generating a repulsive force that would for example expand gases—and that in various circumstances these caloric particles would move around, corresponding to the transfer of heat.

To us today it might seem hacky and implausible (perhaps a little like dark matter, cosmological inflation, etc.), but the caloric theory lasted for more than two hundred years and managed to explain plenty of phenomena—and indeed was certainly going strong in 1825 when Laplace wrote his A Treatise of Celestial Mechanics, which included a successful computation of properties of gases like the speed of sound and the ratio of specific heats, on the basis of a somewhat elaborated and mathematicized version of caloric theory (that by then included the concept of “caloric rays” associated with radiant heat).

But even though it wasn’t understood what heat ultimately was, one could still measure its attributes. Already in antiquity there were devices that made use of heat to produce pressure or mechanical motion. And by the beginning of the 1600s—catalyzed by Galileo’s development of the thermoscope (in which heated liquid could be seen to expand up a tube)—the idea quickly caught on of making thermometers, and of quantitatively measuring temperature.

And given a measurement of temperature, one could correlate it with effects one saw. So, for example, in the late 1700s the French balloonist Jacques Charles (1746–1823) noted the linear increase of volume of a gas with temperature. Meanwhile, at the beginning of the 1800s Joseph Fourier (1768–1830) (science advisor to Napoleon) developed what became his 1822 Analytical Theory of Heat, and in it he begins by noting that:

Heat, like gravity, penetrates every substance of the universe, its rays occupy all parts of space. The object of our work is to set forth the mathematical laws which this element obeys. The theory of heat will hereafter form one of the most important branches of general physics.

Later he describes what he calls the “Principle of the Communication of Heat”. He refers to “molecules”—though basically just to indicate a small amount of substance—and says

When two molecules of the same solid are extremely near and at unequal temperatures, the most heated molecule communicates to that which is less heated a quantity of heat exactly expressed by the product of the duration of the instant, of the extremely small difference of the temperatures, and of certain function of the distance of the molecules.

then goes on to develop what’s now called the heat equation and all sorts of mathematics around it, all the while effectively adopting a caloric theory of heat. (And, yes, if you think of heat as a fluid it does lead you to describe its “motion” in terms of differential equations just like Fourier did. Though it’s then ironic that Bernoulli, even though he studied hydrodynamics, seemed to have a less “fluid-based” view of heat.)

Heat Engines and the Beginnings of Thermodynamics

At the beginning of the 1800s the Industrial Revolution was in full swing—driven in no small part by the availability of increasingly efficient steam engines. There had been precursors of steam engines even in antiquity, but it was only in 1712 that the first practical steam engine was developed. And after James Watt (1736–1819) produced a much more efficient version in 1776, the adoption of steam engines began to take off.

Over the years that followed there were all sorts of engineering innovations that increased the efficiency of steam engines. But it wasn’t clear how far it could go—and whether for example there was a limit to how much mechanical work could ever, even in principle, be derived from a given amount of heat. And it was the investigation of this question—in the hands of a young French engineer named Sadi Carnot (1796–1832)—that began the development of an abstract basic science of thermodynamics, and to the Second Law.

The story really begins with Sadi Carnot’s father, Lazare Carnot (1753–1823), who was trained as an engineer but ascended to the highest levels of French politics, and was involved with both the French Revolution and Napoleon. Particularly in years when he was out of political favor, Lazare Carnot worked on mathematics and mathematical engineering. His first significant work—in 1778—was entitled Memoir on the Theory of Machines. The mathematical and geometrical science of mechanics was by then fairly well developed; Lazare Carnot’s objective was to understand its consequences for actual engineering machines, and to somehow abstract general principles from the mechanical details of the operation of those machines. In 1803 (alongside works on the geometrical theory of fortifications) he published his Fundamental Principles of [Mechanical] Equilibrium and Movement, which argued for what was at one time called (in a strange foreshadowing of reversible thermodynamic processes) “Carnot’s Principle”: that useful work in a machine will be maximized if accelerations and shocks of moving parts are minimized—and that a machine with perpetual motion is impossible.

Sadi Carnot was born in 1796, and was largely educated by his father until he went to college in 1812. It’s notable that during the years when Sadi Carnot was a kid, one of his father’s activities was to give opinions on a whole range of inventions—including many steam engines and their generalizations. Lazare Carnot died in 1823. Sadi Carnot was by that point a well-educated but professionally undistinguished French military engineer. But in 1824, at the age of 28, he produced his one published work, Reflections on the Motive Power of Fire, and on Machines to Develop That Power (where by “fire” he meant what we would call heat):

The style and approach of the younger Carnot’s work is quite similar to his father’s. But the subject matter turned out to be more fruitful. The book begins:

Everyone knows that heat can produce motion. That it possesses vast motive-power none can doubt, in these days when the steam-engine is everywhere so well known… The study of these engines is of the greatest interest, their importance is enormous, their use is continually increasing, and they seem destined to produce a great revolution in the civilized world. Already the steam-engine works our mines, impels our ships, excavates our ports and our rivers, forges iron, fashions wood, grinds grain, spins and weaves our cloths, transports the heaviest burdens, etc. It appears that it must some day serve as a universal motor, and be substituted for animal power, water-falls, and air currents. …

Notwithstanding the work of all kinds done by steam-engines, notwithstanding the satisfactory condition to which they have been brought to-day, their theory is very little understood, and the attempts to improve them are still directed almost by chance. …

The question has often been raised whether the motive power of heat is unbounded, whether the possible improvements in steam-engines have an assignable limit, a limit which the nature of things will not allow to be passed by any means whatever; or whether, on the contrary, these improvements may be carried on indefinitely. We propose now to submit these questions to a deliberate examination.

Carnot operated very much within the framework of caloric theory, and indeed his ideas were crucially based on the concept that one could think about “heat itself” (which for him was caloric fluid), independent of the material substance (like steam) that was hot. But—like his father’s efforts with mechanical machines—his goal was to develop an abstract “metamodel” of something like a steam engine, crucially assuming that the generation of unbounded heat or mechanical work (i.e. perpetual motion) in the closed cycle of the operation of the machine was impossible, and noting (again with a reflection of his father’s work) that the system would necessarily maximize efficiency if it operated reversibly. And he then argued that:

The production of motive power is then due in steam-engines not to an actual consumption of caloric, but to its transportation from a warm body to a cold body, that is, to its re-establishment of equilibrium…

In other words, what was important about a steam engine was that it was a “heat engine”, that “moved heat around”. His book is mostly words, with just a few formulas related to the behavior of ideal gases, and some tables of actual parameters for particular materials. But even though his underlying conceptual framework—of caloric theory—was not correct, the abstract arguments that he made (that involved essentially logical consequences of reversibility and of operating in a closed cycle) were robust enough that it didn’t matter, and in particular he was able to successfully show that there was a theoretical maximum efficiency for a heat engine, that depended only on the temperatures of its hot and cold reservoirs of heat. But what’s important for our purposes here is that in the setup Carnot constructed he basically ended up introducing the Second Law.

At the time it appeared, however, Carnot’s book was basically ignored, and Carnot died in obscurity from cholera in 1832 (about 9 months after Évariste Galois (1811–1832)) at the age of 36. (The Sadi Carnot who would later become president of France was his nephew.) But in 1834, Émile Clapeyron (1799–1864)—a rather distinguished French engineering professor (and steam engine designer)—wrote a paper entitled “Memoir on the Motive Power of Heat”. He starts off by saying about Carnot’s book:

The idea which serves as a basis of his researches seems to me to be both fertile and beyond question; his demonstrations are founded on the absurdity of the possibility of creating motive power or heat out of nothing. …

This new method of demonstration seems to me worthy of the attention of theoreticians; it seems to me to be free of all objection …

I believe that it is of some interest to take up this theory again; S. Carnot, avoiding the use of mathematical analysis, arrives by a chain of difficult and elusive arguments at results which can be deduced easily from a more general law which I shall attempt to prove…

Clapeyron’s paper doesn’t live up to the claims of originality or rigor expressed here, but it served as a more accessible (both in terms of where it was published and how it was written) exposition of Carnot’s work, featuring, for example, for the first time a diagrammatic representation of a Carnot cycle

as well as notations like Q-for-heat that are still in use today:

The Second Law Is Formulated

One of the implications of Newton’s Laws of Motion is that momentum is conserved. But what else might also be conserved? In the 1680s Gottfried Leibniz (1646–1716) suggested the quantity m v², which he called, rather grandly, vis viva—or, in English, “life force”. And yes, in things like elastic collisions, this quantity did seem to be conserved. But in plenty of situations it wasn’t. By 1807 the term “energy” had been introduced, but the question remained of whether it could in any sense globally be thought of as conserved.

It had seemed for a long time that heat was something a bit like mechanical energy, but the relation wasn’t clear—and the caloric theory of heat implied that caloric (i.e. the fluid corresponding to heat) was conserved, and so certainly wasn’t something that for example could be interconverted with mechanical energy. But in 1798 Benjamin Thompson (Count Rumford) (1753–1814) measured the heat produced by the mechanical process of boring a cannon, and began to make the argument that, in contradiction to the caloric theory, there was actually some kind of correspondence between mechanical energy and amount of heat.

It wasn’t a very accurate experiment, and it took until the 1840s—with new experiments by the English brewer and “amateur” scientist James Joule (1818–1889) and the German physician Robert Mayer (1814–1878)—before the idea of some kind of equivalence between heat and mechanical work began to look more plausible. And in 1847 this was something William Thomson (1824–1907) (later Lord Kelvin)—a prolific young physicist recently graduated from the Mathematical Tripos in Cambridge and now installed as a professor of “natural philosophy” (i.e. physics) in Glasgow—began to be curious about.

But first we have to go back a bit in the story. In 1845 Kelvin (as we’ll call him) had spent some time in Paris (primarily at at a lab that was measuring properties of steam for the French government), and there he’d learned about Carnot’s work from Clapeyron’s paper (at first he couldn’t get a copy of Carnot’s actual book). Meanwhile, one of the issues of the time was a proliferation of different temperature scales based on using different kinds of thermometers based on different substances. And in 1848 Kelvin realized that Carnot’s concept of a “pure heat engine”—assumed at the time to be based on caloric—could be used to define an “absolute” scale of temperature in which, for example, at absolute zero all caloric would have been removed from all substances:

Having found Carnot’s ideas useful, Kelvin in 1849 wrote a 33-page summary of them (small world that it was then, the immediately preceding paper in the journal is “On the Theory of Rolling Curves”, written by the then-17-year-old James Clerk Maxwell (1831–1879), while the one that follows is “Theoretical Considerations on the Effect of Pressure in Lowering the Freezing Point of Water” by James Thomson (1822–1892), engineering-oriented older brother of William):

He characterizes Carnot’s work as being based not so much on physics and experiment, but on the “strictest principles of philosophy”:

He doesn’t immediately mention “caloric” (though it does slip in later), referring instead to a vaguer concept of “thermal agency”:

In keeping with the idea that this is more philosophy than experimental science, he refers to “Carnot’s fundamental principle”—that after a complete cycle an engine can be treated as back in the “same state”—while adding the footnote that “this is tacitly assumed as an axiom”:

In actuality, to say that an engine comes back to the same state is a nontrivial statement of the existence of some kind of unique equilibrium in the system, related to the Second Law. But in 1848 Kelvin brushes this off by saying that the “axiom” has “never, so far as I am aware, been questioned by practical engineers”.

His next page is notable for the first-ever use of the term “thermo-dynamic” (then hyphenated) to discuss systems where what matters is “the dynamics of heat”:

That same page has a curious footnote presaging what will come, and making the statement that “no energy can be destroyed”, and considering it “perplexing” that this seems incompatible with Carnot’s work and its caloric theory framework:

After going through Carnot’s basic arguments, the paper ends with an appendix in which Kelvin basically says that even though the theory seems to just be based on a formal axiom, it should be experimentally tested:

He proceeds to give some tests, which he claims agree with Carnot’s results—and finally ends with a very practical (but probably not correct) table of theoretical efficiencies for steam engines of his day:

But now what of Joule’s and Mayer’s experiments, and their apparent disagreement with the caloric theory of heat? By 1849 a new idea had emerged: that perhaps heat was itself a form of energy, and that, when heat was accounted for, the total energy of a system would always be conserved. And what this suggested was that heat was somehow a dynamical phenomenon, associated with microscopic motion—which in turn suggested that gases might indeed consist just of molecules in motion.

And so it was that in 1850 Kelvin (then still “William Thomson”) wrote a long exposition “On the Dynamical Theory of Heat”, attempting to reconcile Carnot’s ideas with the new concept that heat was dynamical in origin:

He begins by quoting—presumably for some kind of “British-based authority”—an “anti-caloric” experiment apparently done by Humphry Davy (1778–1829) as a teenager, involving melting pieces of ice by rubbing them together, and included anonymously in a 1799 list of pieces of knowledge “principally from the west of England”:

But soon Kelvin is getting to the main point:

And then we have it: a statement of the Second Law (albeit with some hedging to which we’ll come back later):

And there’s immediately a footnote that basically asserts the “absurdity” of a Second-Law-violating perpetual motion machine:

But by the next page we find out that Kelvin admits he’s in some sense been “scooped”—by a certain Rudolf Clausius (1822–1888), who we’ll be discussing soon. But what’s remarkable is that Clausius’s “axiom” turns out to be exactly equivalent to Kelvin’s statement:

And what this suggests is that the underlying concept—the Second Law—is something quite robust. And indeed, as Kelvin implies, it’s the main thing that ultimately underlies Carnot’s results. And so even though Carnot is operating on the now-outmoded idea of caloric theory, his main results are still correct, because in the end all they really depend on is a certain amount of “logical structure”, together with the Second Law (and a version of the First Law, but that’s a slightly trickier story).

Kelvin recognized, though, that Carnot had chosen to look at the particular (“equilibrium thermodynamics”) case of processes that occur reversibly, effectively at an infinitesimal rate. And at the end of the first installment of his exposition, he explains that things will be more complicated if finite rates are considered—and that in particular the results one gets in such cases will depend on things like having a correct model for the nature of heat.

Kelvin’s exposition on the “dynamical nature of heat” runs to four installments, and the next two dive into detailed derivations and attempted comparison with experiment:

But before Kelvin gets to publish part four of his exposition he publishes two other pieces. In the first, he’s talking about sources of energy for human use (now that he believes energy is conserved):

He emphasizes that the Sun is—directly or indirectly—the main source of energy on Earth (later he’ll argue that coal will run out, etc.):

But he wonders how animals actually manage to produce mechanical work, noting that “the animal body does not act as a thermo-dynamic engine; and [it is] very probable that the chemical forces produce the external mechanical effects through electrical means”:

And then, by April 1852, he’s back to thinking directly about the Second Law, and he’s cut through the technicalities, and is stating the Second Law in everyday (if slightly ponderous) terms:

It’s interesting to see his apparently rather deeply held Presbyterian beliefs manifest themselves here in his mention that “Creative Power” is what must set the total energy of the universe. He ends his piece with:

In (2) the hedging is interesting. He makes the definitive assertion that what amounts to a violation of the Second Law “is impossible in inanimate material processes”. And he’s pretty sure the same is true for “vegetable life” (recognizing that in his previous paper he discussed the harvesting of sunlight by plants). But what about “animal life”, like us humans? Here he says that “by our will” we can’t violate the Second Law—so we can’t, for example, build a machine to do it. But he leaves it open whether we as humans might have some innate (“God-given”?) ability to overcome the Second Law.

And then there’s his (3). It’s worth realizing that his whole paper is less than 3 pages long, and right before his conclusions we’re seeing triple integrals:

So what is (3) about? It’s presumably something like a Second-Law-implies-heat-death-of-the-universe statement (but what’s this stuff about the past?)—but with an added twist that there’s something (God?) beyond the “known operations going on at present in the material world” that might be able to swoop in to save the world for us humans.

It doesn’t take people long to pick up on the “cosmic significance” of all this. But in the fall of 1852, Kelvin’s colleague, the Glasgow engineering professor William Rankine (1820–1872) (who was deeply involved with the First Law of thermodynamics), is writing about a way the universe might save itself:

After touting the increasingly solid evidence for energy conservation and the First Law

he goes on to talk about dissipation of energy and what we now call the Second Law

and the fact that it implies an “end of all physical phenomena”, i.e. heat death of the universe. He continues:

But now he offers a “ray of hope”. He believes that there must exist a “medium capable of transmitting light and heat”, i.e. an aether, “[between] the heavenly bodies”. And if this aether can’t itself acquire heat, he concludes that all energy must be converted into a radiant form:

Now he supposes that the universe is effectively a giant drop of aether, with nothing outside, so that all this radiant energy will get totally internally reflected from its surface, allowing the universe to “[reconcentrate] its physical energies, and [renew] its activity and life”—and save it from heat death:

He ends with the speculation that perhaps “some of the luminous objects which we see in distant regions of space may be, not stars, but foci in the interstellar aether”.

But independent of cosmic speculations, Kelvin himself continues to study the “dynamical theory of gases”. It’s often a bit unclear what’s being assumed. There’s the First Law (energy conservation). And the Second Law. But there’s also reversibility. Equilibrium. And the ideal gas law (P V = R T). But it soon becomes clear that that’s not always correct for real gases—as the Joule–Thomson effect demonstrates:

Kelvin soon returned to more cosmic speculations, suggesting that perhaps gravitation—rather than direct “Creative Power”—might “in reality [be] the ultimate created antecedent of all motion…”:

Not long after these papers Kelvin got involved with the practical “electrical” problem of laying a transatlantic telegraph cable, and in 1858 was on the ship that first succeeded in doing this. (His commercial efforts soon allowed him to buy a 126-ton yacht.) But he continued to write physics papers, which ranged over many different areas, occasionally touching thermodynamics, though most often in the service of answering a “general science” question—like how old the Sun is (he estimated 32,000 years from thermodynamic arguments, though of course without knowledge of nuclear reactions).

Kelvin’s ideas about the inevitable dissipation of “useful energy” spread quickly—by 1854, for example, finding their way into an eloquent public lecture by Hermann von Helmholtz (1821–1894). Helmholtz had trained as a doctor, becoming in 1843 a surgeon to a German military regiment. But he was also doing experiments and developing theories about “animal heat” and how muscles manage to “do mechanical work”, for example publishing an 1845 paper entitled “On Metabolism during Muscular Activity”. And in 1847 he was one of the inventors of the law of conservation of energy—and the First Law of thermodynamics—as well as perhaps its clearest expositor at the time (the word “force” in the title is what we now call “energy”):

By 1854 Helmholtz was a physiology professor, beginning a distinguished career in physics, psychophysics and physiology—and talking about the Second Law and its implications. He began his lecture by saying that “A new conquest of very general interest has been recently made by natural philosophy”—and what he’s referring to here is the Second Law:

Having discussed the inability of “automata” (he uses that word) to reproduce living systems, he starts talking about perpetual motion machines:

First he disposes of the idea that perpetual motion can be achieved by generating energy from nothing (i.e. violating the First Law), charmingly including the anecdote:

And then he’s on to talking about the Second Law

and discussing how it implies the heat death of the universe:

He notes, correctly, that the Second Law hasn’t been “proved”. But he’s impressed at how Kelvin was able to go from a “mathematical formula” to a global fact about the fate of the universe:

He ends the whole lecture quite poetically:

We’ve talked quite a bit about Kelvin and how his ideas spread. But let’s turn now to Rudolf Clausius, who in 1850 at least to some extent “scooped” Kelvin on the Second Law. At that time Clausius was a freshly minted German physics PhD. His thesis had been on an ingenious but ultimately incorrect theory of why the sky is blue. But he’d also worked on elasticity theory, and there he’d been led to start thinking about molecules and their configurations in materials. By 1850 caloric theory had become fairly elaborate, complete with concepts like “latent heat” (bound to molecules) and “free heat” (able to be transferred). Clausius’s experience in elasticity theory made him skeptical, and knowing Mayer’s and Joule’s results he decided to break with the caloric theory—writing his career-launching paper (translated from German in 1851, with Carnot’s puissance motrice [“motive power”] being rendered as “moving force”):

The first installment of the English version of the paper gives a clear description of the ideal gas laws and the Carnot cycle, having started from a statement of the “caloric-busting” First Law:

The general discussion continues in the second installment, but now there’s a critical side comment that describes the “general deportment of heat, which every-where exhibits the tendency to annul differences of temperature, and therefore to pass from a warmer body to a colder one”:

Clausius “has” the Second Law, as Carnot basically did before him. But when Kelvin quotes Clausius he does so much more forcefully:

But there it is: by 1852 the Second Law is out in the open, in at least two different forms. The path to reach it has been circuitous and quite technical. But in the end, stripped of its technical origins, the law seems somehow unsurprising and even obvious. For it’s a matter of common experience that heat flows from hotter bodies to colder ones, and that motion is dissipated by friction into heat. But the point is that it wasn’t until basically 1850 that the overall scientific framework existed to make it useful—or even really possible—to enunciate such observations as a formal scientific law.

Of course the fact that a law “seems true” based on common experience doesn’t mean it’ll always be true, and that there won’t be some special circumstance or elaborate construction that will evade it. But somehow the very fact that the Second Law had in a sense been “technically hard won”—yet in the end seemed so “obvious”—appears to have given it a sense of inevitability and certainty. And it didn’t hurt that somehow it seemed to have emerged from Carnot’s work, which had a certain air of “logical necessity”. (Of course, in reality, the Second Law entered Carnot’s logical structure as an “axiom”.) But all this helped set the stage for some of the curious confusions about the Second Law that would develop over the century that followed.

The Concept of Entropy

In the first half of the 1850s the Second Law had in a sense been presented in two ways. First, as an almost “footnote-style” assumption needed to support the “pure thermodynamics” that had grown out of Carnot’s work. And second, as an explicitly-stated-for-the-first-time—if “obvious”—“everyday” feature of nature, that was now realized as having potentially cosmic significance. But an important feature of the decade that followed was a certain progressive at-least-phenomenological “mathematicization” of the Second Law—pursued most notably by Rudolf Clausius.

In 1854 Clausius was already beginning this process. Perhaps confusingly, he refers to the Second Law as the “second fundamental theorem [Hauptsatz]” in the “mechanical theory of heat”—suggesting it’s something that is proved, even though it’s really introduced just as an empirical law of nature, or perhaps a theoretical axiom:

He starts off by discussing the “first fundamental theorem”, i.e. the First Law. And he emphasizes that this implies that there’s a quantity U (which we now call “internal energy”) that is a pure “function of state”—so that its value depends only on the state of a system, and not the path by which that state was reached. And as an “application” of this, he then points out that the overall change in U in a cyclic process (like the one executed by Carnot’s heat engine) must be zero.

And now he’s ready to tackle the Second Law. He gives a statement that at first seems somewhat convoluted:

But soon he’s deriving this from a more “everyday” statement of the Second Law (which, notably, is clearly not a “theorem” in any normal sense):

After giving a Carnot-style argument he’s then got a new statement (that he calls “the theorem of the equivalence of transformations”) of the Second Law:

And there it is: basically what we now call entropy (even with the same notation of Q for heat and T for temperature)—together with the statement that this quantity is a function of state, so that its differences are “independent of the nature of the process by which the transformation is effected”.

Pretty soon there’s a familiar expression for entropy change:

And by the next page he’s giving what he describes as “the analytical expression” of the Second Law, for the particular case of reversible cyclic processes:

A bit later he backs out of the assumption of reversibility, concluding that:

(And, yes, with modern mathematical rigor, that should be “non-negative” rather than “positive”.)

He goes on to say that if something has changed after going around a cycle, he’ll call that an “uncompensated transformation”—or what we would now refer to as an irreversible change. He lists a few possible (now very familiar) examples:

Earlier in his paper he’s careful to say that T is “a function of temperature”; he doesn’t say it’s actually the quantity we measure as temperature. But now he wants to determine what it is:

He doesn’t talk about the ultimately critical assumption (effectively the Zeroth Law of thermodynamics) that the system is “in equilibrium”, with a uniform temperature. But he uses an ideal gas as a kind of “standard material”, and determines that, yes, in that case T can be simply the absolute temperature.

So there it is: in 1854 Clausius has effectively defined entropy and described its relation to the Second Law, though everything is being done in a very “heat-engine” style. And pretty soon he’s writing about “Theory of the Steam-Engine” and filling actual approximate steam tables into his theoretical formulas:

After a few years “off” (working, as we’ll discuss later, on the kinetic theory of gases) Clausius is back in 1862 talking about the Second Law again, in terms of his “theorem of the equivalence of transformations”:

He’s slightly tightened up his 1854 discussion, but, more importantly, he’s now stating a result not just for reversible cyclic processes, but for general ones:

But what does this result really mean? Clausius claims that this “theorem admits of strict mathematical proof if we start from the fundamental proposition above quoted”—though it’s not particularly clear just what that proposition is. But then he says he wants to find a “physical cause”:

A little earlier in the paper he said:

So what does he think the “physical cause” is? He says that even from his first investigations he’d assumed a general law:

What are these “resistances”? He’s basically saying they are the forces between molecules in a material (which from his work on the kinetic theory of gases he now imagines exist):

He introduces what he calls the “disgregation” to represent the microscopic effect of adding heat:

For ideal gases things are straightforward, including the proportionality of “resistance” to absolute temperature. But in other cases, it’s not so clear what’s going on. A decade later he identifies “disgregation” with average kinetic energy per molecule—which is indeed proportional to absolute temperature. But in 1862 it’s all still quite muddy, with somewhat curious statements like:

And then the main part of the paper ends with what seems to be an anticipation of the Third Law of thermodynamics:

There’s an appendix entitled “On Terminology” which admits that between Clausius’s own work, and other people’s, it’s become rather difficult to follow what’s going on. He agrees that the term “energy” that Kelvin is using makes sense. He suggests “energy of the body” for what he calls U and we now call “internal energy”. He suggests “heat of the body” or “thermal content of the body” for Q. But then he talks about the fact that these are measured in thermal units (say the amount of heat needed to increase the temperature of water by 1°), while mechanical work is measured in units related to kilograms and meters. He proposes therefore to introduce the concept of “ergon” for “work measured in thermal units”:

And pretty soon he’s talking about the “interior ergon” and “exterior ergon”, as well as concepts like “ergonized heat”. (In later work he also tries to introduce the concept of “ergal” to go along with his development of what he called—in a name that did stick—the “virial theorem”.)

But in 1865 he has his biggest success in introducing a term. He’s writing a paper, he says, basically to clarify the Second Law, (or, as he calls it, “the second fundamental theorem”—rather confidently asserting that he will “prove this theorem”):

Part of the issue he’s trying to address is how the calculus is done:

The partial derivative symbol ∂ had been introduced in the late 1700s. He doesn’t use it, but he does introduce the now-standard-in-thermodynamics subscript notation for variables that are kept constant:

A little later, as part of the “notational cleanup”, we see the variable S:

And then—there it is—Clausius introduces the term “entropy”, “Greekifying” his concept of “transformation”:

His paper ends with his famous crisp statements of the First and Second Laws of thermodynamics—manifesting the parallelism he’s been claiming between energy and entropy:

The Kinetic Theory of Gases

We began above by discussing the history of the question of “What is heat?” Was it like a fluid—the caloric theory? Or was it something more dynamical, and in a sense more abstract? But then we saw how Carnot—followed by Kelvin and Clausius—managed in effect to sidestep the question, and come up with all sorts of “thermodynamic conclusions”, by talking just about “what heat does” without ever really having to seriously address the question of “what heat is”. But to be able to discuss the foundations of the Second Law—and what it says about heat—we have to know more about what heat actually is. And the crucial development that began to clarify the nature of heat was the kinetic theory of gases.

Central to the kinetic theory of gases is the idea that gases are made up of discrete molecules. And it’s important to remember that it wasn’t until the beginning of the 1900s that anyone knew for sure that molecules existed. Yes, something like them had been discussed ever since antiquity, and in the 1800s there was increasing “circumstantial evidence” for them. But nobody had directly “seen a molecule”, or been able, for example, until about 1870, to even guess what the size of molecules might be. Still, by the mid-1800s it had become common for physicists to talk and reason in terms of ordinary matter at least effectively being made of up molecules.

But if a gas was made of molecules bouncing off each other like billiard balls according to the laws of mechanics, what would its overall properties be? Daniel Bernoulli had in 1738 already worked out the basic answer that pressure would vary inversely with volume, or in his notation, π = P/s (and he even also gave formulas for molecules of nonzero size—in a precursor of van der Waals):

Results like Bernouilli’s would be rediscovered several times, for example in 1820 by John Herapath (1790–1868), a math teacher in England, who developed a fairly elaborate theory that purported to describe gravity as well as heat (but for example implied a P V = a T² gas law):

Then there was the case of John Waterston (1811–1883), a naval instructor for the East India company, who in 1843 published a book called Thoughts on the Mental Functions, which included results on what he called the “vis viva theory of heat”—that he developed in more detail in a paper he wrote in 1846. But when he submitted the paper to the Royal Society it was rejected as “nonsense”, and its manuscript was “lost” until 1891 when it was finally published (with an “explanation” of the “delay”):

The paper had included a perfectly sensible mathematical analysis that included a derivation of the kinetic theory relation between pressure and mean-square molecular velocity:

But with all these pieces of work unknown, it fell to a German high-school chemistry teacher (and sometime professor and philosophical/theological writer) named August Krönig (1822–1879) to publish in 1856 yet another “rediscovery”, that he entitled “Principles of a Theory of Gases”. He said it was going to analyze the “mechanical theory of heat”, and once again he wanted to compute the pressure associated with colliding molecules. But to simplify the math, he assumed that molecules went only along the coordinate directions, at a fixed speed—almost anticipating a cellular automaton fluid:

What ultimately launched the subsequent development of the kinetic theory of gases, however, was the 1857 publication by Rudolf Clausius (by then an increasingly established German physics professor) of a paper entitled rather poetically “On the Nature of the Motion Which We Call Heat” (“Über die Art der Bewegung die wir Wärme nennen”):

It’s a clean and clear paper, with none of the mathematical muddiness around Clausius’s work on the Second Law (which, by the way, isn’t even mentioned in this paper even though Clausius had recently worked on it). Clausius figures out lots of the “obvious” implications of his molecular theory, outlining for example what happens in different phases of matter:

It takes him only a couple of pages of very light mathematics to derive the standard kinetic theory formula for the pressure of an ideal gas:

He’s implicitly assuming a certain randomness to the motions of the molecules, but he barely mentions it (and this particular formula is robust enough that average values are actually all that matter):

But having derived the formula for pressure, he goes on to use the ideal gas law to derive the relation between average molecular kinetic energy (which he still calls “vis viva”) and absolute temperature:

From this he can do things like work out the actual average velocities of molecules in different gases—which he does without any mention of the question of just how real or not molecules might be. By knowing experimental results about specific heats of gases he also manages to determine that not all the energy (“heat”) in a gas is associated with “translatory motion”: he realizes that for molecules involving several atoms there can be energy associated with other (as we would now say) internal degrees of freedom:

Clausius’s paper was widely read. And it didn’t take long before the Dutch meteorologist (and effectively founder of the World Meteorological Organization) Christophorus Buys Ballot (1817–1890) asked why—if molecules were moving as quickly as Clausius suggested—gases didn’t mix much more quickly than they’re observed to do:

Within a few months, Clausius published the answer: the molecules didn’t just keep moving in straight lines; they were constantly being deflected, to follow what we would now call a random walk. He invented the concept of a mean free path to describe how far on average a molecule goes before it hits another molecule:

As a capable theoretical physicist, Clausius quickly brings in the concept of probability

and is soon computing the average number of molecules which will survive undeflected for a certain distance:

Then he works out the mean free path λ (and it’s often still called λ):

And he concludes that actually there’s no conflict between rapid microscopic motion and large-scale “diffusive” motion:

Of course, he could have actually drawn a sample random walk, but drawing diagrams wasn’t his style. And in fact it seems as if the first published drawing of a random walk was something added by John Venn (1834–1923) in the 1888 edition of his Logic of Chance—and, interestingly, in alignment with my computational irreducibility concept from a century later he used the digits of π to generate his “randomness”:

In 1859, Clausius’s paper came to the attention of the then-28-year-old James Clerk Maxwell, who had grown up in Scotland, done the Mathematical Tripos in Cambridge, and was now back in Scotland as professor of “natural philosophy” at Aberdeen. Maxwell had already worked on things like elasticity theory, color vision, the mechanics of tops, the dynamics of the rings of Saturn and electromagnetism—having published his first paper (on geometry) at age 14. And, by the way, Maxwell was quite a “diagrammist”—and his early papers include all sorts of pictures that he drew:

But in 1859 Maxwell applied his talents to what he called the “dynamical theory of gases”:

He models molecules as hard spheres, and sets about computing the “statistical” results of their collisions:

And pretty soon he’s trying to compute distribution of their velocities:

It’s a somewhat unconvincing (or, as Maxwell himself later put it, “precarious”) derivation (how does it work in 1D, for example?), but somehow it manages to produce what’s now known as the Maxwell distribution:

Maxwell observes that the distribution is the same as for “errors … in the ‘method of least squares’”:

Maxwell didn’t get back to the dynamical theory of gases until 1866, but in the meantime he was making a “dynamical theory” of something else: what he called the electromagnetic field:

Even though he’d worked extensively with the inverse square law of gravity he didn’t like the idea of “action at a distance”, and for example he wanted magnetic field lines to have some underlying “material” manifestation

imagining that they might be associated with arrays of “molecular vortices”:

We now know, of course, that there isn’t this kind of “underlying mechanics” for the electromagnetic field. But—with shades of the story of Carnot—even though the underlying framework isn’t right, Maxwell successfully derives correct equations for the electromagnetic field—that are now known as Maxwell’s equations:

His statement of how the electromagnetic field “works” is highly reminiscent of the dynamical theory of gases:

But he quickly and correctly adds:

And a few sections later he derives the idea of general electromagnetic waves

noting that there’s no evidence that the medium through which he assumes they’re propagating has elasticity:

By the way, when it comes to gravity he can’t figure out how to make his idea of a “mechanical medium” work:

But in any case, after using it as an inspiration for thinking about electromagnetism, Maxwell in 1866 returns to the actual dynamical theory of gases, still feeling that he needs to justify looking at a molecular theory:

And now he gives a recognizable (and correct, so far as it goes) derivation of the Maxwell distribution:

He goes on to try to understand experimental results on gases, about things like diffusion, viscosity and conductivity. For some reason, Maxwell doesn’t want to think of molecules, as he did before, as hard spheres. And instead he imagines that they have “action at a distance” forces, which basically work like hard squares if it’s r^-5 force law:

In the years that followed, Maxwell visited the dynamical theory of gases several more times. In 1871, a few years before he died at age 48, he wrote a textbook entitled Theory of Heat, which begins, in erudite fashion, discussing what “thermodynamics” should even be called:

Most of the book is concerned with the macroscopic “theory of heat”—though, as we’ll discuss later, in the very last chapter Maxwell does talk about the “molecular theory”, if in somewhat tentative terms.

“Deriving” the Second Law from Molecular Dynamics

The Second Law was in effect originally introduced as a formalization of everyday observations about heat. But the development of kinetic theory seemed to open up the possibility that the Second Law could actually be proved from the underlying mechanics of molecules. And this was something that Ludwig Boltzmann (1844–1906) embarked on towards the end of his physics PhD at the University of Vienna. In 1865 he’d published his first paper (“On the Movement of Electricity on Curved Surfaces”), and in 1866 he published his second paper, “On the Mechanical Meaning of the Second Law of Thermodynamics”:

The introduction promises “a purely analytical, perfectly general proof of the Second Law”. And what he seemed to imagine was that the equations of mechanics would somehow inevitably lead to motion that would reproduce the Second Law. And in a sense what computational irreducibility, rule 30, etc. now show is that in the end that’s indeed basically how things work. But the methods and conceptual framework that Boltzmann had at his disposal were very far away from being able to see that. And instead what Boltzmann did was to use standard mathematical methods from mechanics to compute average properties of cyclic mechanical motions—and then made the somewhat unconvincing claim that combinations of these averages could be related (e.g. via temperature as average kinetic energy) to “Clausius’s entropy”:

It’s not clear how much this paper was read, but in 1871 Boltzmann (now a professor of mathematical physics in Graz) published another paper entitled simply “On the Priority of Finding the Relationship between the Second Law of Thermodynamics and the Principle of Least Action” that claimed (with some justification) that Clausius’s then-newly-announced virial theorem was already contained in Boltzmann’s 1866 paper.

But back in 1868—instead of trying to get all the way to Clausius’s entropy—Boltzmann instead uses mechanics to get a generalization of Maxwell’s law for the distribution of molecular velocities. His paper “Studies on the Equilibrium of [Kinetic Energy] between [Point Masses] in Motion” opens by saying that while analytical mechanics has in effect successfully studied the evolution of mechanical systems “from a given state to another”, it’s had little to say about what happens when such systems “have been left moving on their own for a long time”. He intends to remedy that, and spends 47 pages—complete with elaborate diagrams and formulas about collisions between hard spheres—in deriving an exponential distribution of energies if one assumes “equilibrium” (or, more specifically, balance between forward and backward processes):

It’s notable that one of the mathematical approaches Boltzmann uses is to discretize (i.e. effectively quantize) things, then look at the “combinatorial” limit. (Based on his later statements, he didn’t want to trust “purely continuous” mathematics—at least in the context of discrete molecular processes—and wanted to explicitly “watch the limits happening”.) But in the end it’s not clear that Boltzmann’s 1868 arguments do more than the few-line functional-equation approach that Maxwell had already used. (Maxwell would later complain about Boltzmann’s “overly long” arguments.)

Boltzmann’s 1868 paper had derived what the distribution of molecular energies should be “in equilibrium”. (In 1871 he was talking about “equipartition” not just of kinetic energy, but also of energies associated with “internal motion” of polyatomic molecules.) But what about the approach to equilibrium? How would an initial distribution of molecular energies evolve over time? And would it always end up at the exponential (“Maxwell–Boltzmann”) distribution? These are questions deeply related to a microscopic understanding of the Second Law. And they’re what Boltzmann addressed in 1872 in his 22nd published paper “Further Studies on the Thermal Equilibrium of Gas Molecules”:

Boltzmann explains that:

Maxwell already found the value Av² e^–Bv² [for the distribution of velocities] … so that the probability of different velocities is given by a formula similar to that for the probability of different errors of observation in the theory of the method of least squares. The first proof which Maxwell gave for this formula was recognized to be incorrect even by himself. He later gave a very elegant proof that, if the above distribution has once been established, it will not be changed by collisions. He also tries to prove that it is the only velocity distribution that has this property. But the latter proof appears to me to contain a false inference. It has still not yet been proved that, whatever the initial state of the gas may be, it must always approach the limit found by Maxwell. It is possible that there may be other possible limits. This proof is easily obtained, however, by the method which I am about to explain…

(He gives a long footnote explaining why Maxwell might be wrong, talking about how a sequence of collisions might lead to a “cycle of velocity states”—which Maxwell hasn’t proved will be traversed with equal probability in each direction. Ironically, this is actually already an analog of where things are going to go wrong with Boltzmann’s own argument.)

The main idea of Boltzmann’s paper is not to assume equilibrium, but instead to write down an equation (now called the Boltzmann Transport Equation) that explicitly describes how the velocity (or energy) distribution of molecules will change as a result of collisions. He begins by defining infinitesimal changes in time:

He then goes through a rather elaborate analysis of velocities before and after collisions, and how to integrate over them, and eventually winds up with a partial differential equation for the time variation of the energy distribution (yes, he confusingly uses x to denote energy)—and argues that Maxwell’s exponential distribution is a stationary solution to this equation:

A few paragraphs further on, something important happens: Boltzmann introduces a function that here he calls E, though later he’ll call it H:

Ten pages of computation follow

and finally Boltzmann gets his main result: if the velocity distribution evolves according to his equation, H can never increase with time, becoming zero for the Maxwell distribution. In other words, he is saying that he’s proved that a gas will always (“monotonically”) approach equilibrium—which seems awfully like some kind of microscopic proof of the Second Law.

But then Boltzmann makes a bolder claim:

It has thus been rigorously proved that, whatever the initial distribution of kinetic energy may be, in the course of a very long time it must always necessarily approach the one found by Maxwell. The procedure used so far is of course nothing more than a mathematical artifice employed in order to give a rigorous proof of a theorem whose exact proof has not previously been found. It gains meaning by its applicability to the theory of polyatomic gas molecules. There one can again prove that a certain quantity E can only decrease as a consequence of molecular motion, or in a limiting case can remain constant. One can also prove that for the atomic motion of a system of arbitrarily many material points there always exists a certain quantity which, in consequence of any atomic motion, cannot increase, and this quantity agrees up to a constant factor with the value found for the well-known integral ∫dQ/T in my [1871] paper on the “Analytical proof of the 2nd law, etc.”. We have therefore prepared the way for an analytical proof of the Second Law in a completely different way from those previously investigated. Up to now the object has been to show that ∫dQ/T = 0 for reversible cyclic processes, but it has not been proved analytically that this quantity is always negative for irreversible processes, which are the only ones that occur in nature. The reversible cyclic process is only an ideal, which one can more or less closely approach but never completely attain. Here, however, we have succeeded in showing that ∫dQ/T is in general negative, and is equal to zero only for the limiting case, which is of course the reversible cyclic process (since if one can go through the process in either direction, ∫dQ/T cannot be negative).

In other words, he’s saying that the quantity H that he’s defined microscopically in terms of velocity distributions can be identified (up to a sign) with the entropy that Clausius defined as dQ/T. He says that he’ll show this in the context of analyzing the mechanics of polyatomic molecules.

But first he’s going to take a break and show that his derivation doesn’t need to assume continuity. In a pre-quantum-mechanics pre-cellular-automaton-fluid kind of way he replaces all the integrals by limits of sums of discrete quantities (i.e. he’s quantizing kinetic energy, etc.):

He says that this discrete approach makes everything clearer, and quotes Lagrange’s derivation of vibrations of a string as an example of where this has happened before. But then he argues that everything works out fine with the discrete approach, and that H still decreases, with the Maxwell distribution as the only possible end point. As an aside, he makes a jab at Maxwell’s derivation, pointing out that with Maxwell’s functional equation:

… there are infinitely many other solutions, which are not useful however since ƒ(x) comes out negative or imaginary for some values of x. Hence, it follows very clearly that Maxwell’s attempt to prove a priori that his solution is the only one must fail, since it is not the only one but rather it is the only one that gives purely positive probabilities, and therefore the only useful one.

But finally—after another aside about computing thermal conductivities of gases—Boltzmann digs into polyatomic molecules, and his claim about H being related to entropy. There’s another 26 pages of calculations, and then we get to a section entitled “Solution of Equation (81) and Calculation of Entropy”. More pages of calculation about polyatomic molecules ensue. But finally we’re computing H, and, yes, it agrees with the Clausius result—but anticlimactically he’s only dealing with the case of equilibrium for monatomic molecules, where we already knew we got the Maxwell distribution:

And now he decides he’s not talking about polyatomic molecules anymore, and instead:

In order to find the relation of the quantity [H] to the second law of thermodynamics in the form ∫dQ/T < 0, we shall interpret the system of mass points not, as previously, as a gas molecule, but rather as an entire body.

But then, in the last couple of pages of his paper, Boltzmann pulls out another idea. He’s discussed the concept that polyatomic molecules (or, now, whole systems) can be in many different configurations, or “phases”. But now he says: “We shall replace [our] single system by a large number of equivalent systems distributed over many different phases, but which do not interact with each other”. In other words, he’s introducing the idea of an ensemble of states of a system. And now he says that instead of looking at the distribution just for a single velocity, we should do it for all velocities, i.e. for the whole “phase” of the system.

[These distributions] may be discontinuous, so that they have large values when the variables are very close to certain values determined by one or more equations, and otherwise vanishingly small. We may choose these equations to be those that characterize visible external motion of the body and the kinetic energy contained in it. In this connection it should be noted that the kinetic energy of visible motion corresponds to such a large deviation from the final equilibrium distribution of kinetic energy
that it leads to an infinity in H, so that from the point of view of the Second Law of thermodynamics it acts like heat supplied from an infinite temperature.

There are a bunch of ideas swirling around here. Phase-space density (cf. Liouville’s equation). Coarse-grained variables. Microscopic representation of mechanical work. Etc. But the paper is ending. There’s a discussion about H for systems that interact, and how there’s an equilibrium value achieved. And finally there’s a formula for entropy

that Boltzmann said “agrees … with the expression I found in my previous [1871] paper”.

So what exactly did Boltzmann really do in his 1872 paper? He introduced the Boltzmann Transport Equation which allows one to compute at least certain non-equilibrium properties of gases. But is his ƒ log ƒ quantity really what we can call “entropy” in the sense Clausius meant? And is it true that he’s proved that entropy (even in his sense) increases? A century and a half later there’s still a remarkable level of confusion around both these issues.

But in any case, back in 1872 Boltzmann’s “minimum theorem” (now called his “H theorem”) created quite a stir. But after some time there was an objection raised, which we’ll discuss below. And partly in response to this, Boltzmann (after spending time working on microscopic models of electrical properties of materials—as well as doing some actual experiments) wrote another major paper on entropy and the Second Law in 1877:

The translated title of the paper is “On the Relation between the Second Law of Thermodynamics and Probability Theory with Respect to the Laws of Thermal Equilibrium”. And at the very beginning of the paper Boltzmann makes a statement that was pivotal for future discussions of the Second Law: he says it’s now clear to him that an “analytical proof” of the Second Law is “only possible on the basis of probability calculations”. Now that we know about computational irreducibility and its implications one could say that this was the point where Boltzmann and those who followed him went off track in understanding the true foundations of the Second Law. But Boltzmann’s idea of introducing probability theory was effectively what launched statistical mechanics, with all its rich and varied consequences.

Boltzmann makes his basic claim early in the paper

with the statement (quoting from a comment in a paper he’d written earlier the same year) that “it is clear” (always a dangerous thing to say!) that in thermal equilibrium all possible states of the system—say, spatially uniform and nonuniform alike—are equally probable

… comparable to the situation in the game of Lotto where every single quintet is as improbable as the quintet 12345. The higher probability that the state distribution becomes uniform with time arises only because there are far more uniform than nonuniform state distributions…

He goes on:

[Thus] it is possible to calculate the thermal equilibrium state by finding the probability of the different possible states of the system. The initial state will in most cases be highly improbable but from it the system will always rapidly approach a more probable state until it finally reaches the most probable state, i.e., that of thermal equilibrium. If we apply this to the Second Law we will be able to identify the quantity which is usually called entropy with the probability of the particular state…

He’s talked about thermal equilibrium, even in the title, but now he says:

… our main purpose here is not to limit ourselves to thermal equilibrium, but to explore the relationship of the probabilistic formulation to the [Second Law].

He says his goal is to calculate probability distribution for different states, and he’ll start with

as simple a case as possible, namely a gas of rigid absolutely elastic spherical molecules trapped in a container with absolutely elastic walls. (Which interact with central forces only within a certain small distance, but not otherwise; the latter assumption, which includes the former as a special case, does not change the calculations in the least).

In other words, yet again he’s going to look at hard sphere gases. But, he says:

Even in this case, the application of probability theory is not easy. The number of molecules is not infinite, in a mathematical sense, yet the number of velocities each molecule is capable of is effectively infinite. Given this last condition, the calculations are very difficult; to facilitate understanding, I will, as in earlier work, consider a limiting case.

And this is where he “goes discrete” again—allowing (“cellular-automaton-style”) only discrete possible velocities for each molecule:

He says that upon colliding, two molecules can exchange these discrete velocities, but nothing more. As he explains, though:

Even if, at first sight, this seems a very abstract way of treating the problem, it rapidly leads to the desired objective, and when you consider that in nature all infinities are but limiting cases, one assumes each molecule can behave in this fashion only in the limiting case where each molecule can assume more and more values of the velocity.

But now—much like in an earlier paper—he makes things even simpler, saying he’s going to ignore velocities for now, and just say that the possible energies of molecules are “in an arithmetic progression”:

He plans to look at collisions, but first he just wants to consider the combinatorial problem of distributing these energies among n molecules in all possible ways, subject to the constraint of having a certain fixed total energy. He sets up a specific example, with 7 molecules, total energy 7, and maximum energy per molecule 7—then gives an explicit table of all possible states (up to, as he puts it, “immaterial permutations of molecular labels”):

Tables like this had been common for nearly two centuries in combinatorial mathematics books like Jacob Bernoulli’s (1655–1705) Ars Conjectandi

but this might have been the first place such a table had appeared in a paper about fundamental physics.

And now Boltzmann goes into an analysis of the distribution of states—of the kind that’s now long been standard in textbooks of statistical physics, but will then have been quite unfamiliar to the pure-calculus-based physicists of the time:

He derives the average energy per molecule, as well as the fluctuations:

He says that “of course” the real interest is in the limit of an infinite number of molecules, but he still wants to show that for “moderate values” the formulas remain quite accurate. And then (even without Wolfram Language!) he’s off finding (using Newton’s method it seems) approximate roots of the necessary polynomials:

Just to show how it all works, he considers a slightly larger case as well:

Now he’s computing the probability that a given molecule has a particular energy

and determining that in the limit it’s an exponential

that is, as he says, “consistent with that known from gases in thermal equilibrium”.

He claims that in order to really get a “mechanical theory of heat” it’s necessary to take a continuum limit. And here he concludes that thermal equilibrium is achieved by maximizing the quantity Ω (where the “l” stands for log, so this is basically ƒ log ƒ):

He explains that Ω is basically the log of the number of possible permutations, and that it’s “of special importance”, and he’ll call it the “permutability measure”. He immediately notes that “the total permutability measure of two bodies is equal to the sum of the permutability measures of each body”. (Note that Boltzmann’s Ω isn’t the modern total-number-of-states Ω; confusingly, that’s essentially the exponential of Boltzmann’s Ω.)

He goes through some discussion of how to handle extra degrees of freedom in polyatomic molecules, but then he’s on to the main event: arguing that Ω is (essentially) the entropy. It doesn’t take long:

Basically he just says that in equilibrium the probability ƒ(…) for a molecule to have a particular velocity is given by the Maxwell distribution, then he substitutes this into the formula for Ω, and shows that indeed, up to a constant, Ω is exactly the “Clausius entropy” ∫dQ/T.

So, yes, in equilibrium Ω seems to be giving the entropy. But then Boltzmann makes a bit of a jump. He says that in processes that aren’t reversible both “Clausius entropy” and Ω will increase, and can still be identified—and enunciates the general principle, printed in his paper in special doubled-spaced form:

… [In] any system of bodies that undergoes state changes … even if the initial and final states are not in thermal equilibrium … the total permutability measure for the bodies will continually increase during the state changes, and can remain constant only so long as all the bodies during the state changes remain infinitely close to thermal equilibrium (reversible state changes).

In other words, he’s asserting that Ω behaves the same way entropy is said to behave according to the Second Law. He gives various thought experiments about gases in boxes with dividers, gases under gravity, etc. And finally concludes that, yes, the relationship of entropy to Ω “applies to the general case”.

There’s one final paragraph in the paper, though:

Up to this point, these propositions may be demonstrated exactly using the theory of gases. If one tries, however, to generalize to liquid drops and solid bodies, one must dispense with an exact treatment from the outset, since far too little is known about the nature of the latter states of matter, and the mathematical theory is barely developed. But I have already mentioned reasons in previous papers, in virtue of which it is likely that for these two aggregate states, the thermal equilibrium is achieved when Ω becomes a maximum, and that when thermal equilibrium exists, the entropy is given by the same expression. It can therefore be described as likely that the validity of the principle which I have developed is not just limited to gases, but that the same constitutes a general natural law applicable to solid bodies and liquid droplets, although the exact mathematical treatment of these cases still seems to encounter extraordinary difficulties.

Interestingly, Boltzmann is only saying that it’s “likely” that in thermal equilibrium his permutability measure agrees with Clausius’s entropy, and he’s implying that actually that’s really the only place where Clausius’s entropy is properly defined. But certainly his definition is more general (after all, it doesn’t refer to things like temperature that are only properly defined in equilibrium), and so—even though Boltzmann didn’t explicitly say it—one can imagine basically just using it as the definition of entropy for arbitrary cases. Needless to say, the story is actually more complicated, as we’ll see soon.

But this definition of entropy—crispened up by Max Planck (1858–1947) and with different notation—is what ended up years later “written in stone” at Boltzmann’s grave:

The Concept of Ergodicity

In his 1877 paper Boltzmann had made the claim that in equilibrium all possible microscopic states of a system would be equally probable. But why should this be true? One reason could be that in its pure “mechanical evolution” the system would just successively visit all these states. And this was an idea that Boltzmann seems to have had—with increasing clarity—from the time of his very first paper in 1866 that purported to “prove the Second Law” from mechanics.

In modern times—with our understanding of discrete systems and computational rules—it’s not difficult to describe the idea of “visiting all states”. But in Boltzmann’s time it was considerably more complicated. Did one expect to hit all the infinite possible infinitesimally separated configurations of a system? Or somehow just get close? The fact is that Boltzmann had certainly dipped his toe into thinking about things in terms of discrete quantities. But he didn’t make the jump to imagining discrete rules, even though he certainly did know about discrete iterative processes, like Newton’s method for finding roots.

Boltzmann knew about cases—like circular motion—where everything was purely periodic. But maybe when motion wasn’t periodic, it’d inevitably “visit all states”. Already in 1868 Boltzmann was writing a paper entitled “Solution to a Mechanical Problem” where he studies a single point mass moving in an α/r – β/r² potential and bouncing elastically off a line—and manages to show that it visits every position with equal probability. In this paper he’s just got traditional formulas, but by 1871, in “Some General Theorems about Thermal Equilibrium”—computing motion in the same potential as before—he’s got a picture:

Boltzmann probably knew about Lissajous figures—cataloged in 1857

and the fact that in this case a rational ratio of x and y periods gives a periodic overall curve while an irrational one always gives a curve that visits every position might have led him to suspect that all systems would either be periodic, or would visit every possible configuration (or at least, as he identified in his paper, every configuration that had the same values of “constants of the motion”, like energy).

In early 1877 Boltzmann returned to the same question, including as one section in his “Remarks on Some Problems in the Mechanical Theory of Heat” more analysis of the same potential as before, but now showing a diversity of more complicated pictures that almost seem to justify his rule-30-before-its-time idea that there could be “pure mechanics” that would lead to “Second Law” behavior:

In modern times, of course, it’s easy to solve those equations of motion, and typical results obtained for an array of values of parameters are:

Boltzmann returned to these questions in 1884, responding to Helmholtz’s analysis of what he was calling “monocyclic systems”. Boltzmann used the same potential again, but now with a name for the “visit-all-states” property: isodic. Meanwhile, Boltzmann had introduced the name “ergoden” for the collection of all possible configurations of a system with a given energy (what would now be called the microcanonical ensemble). But somehow, quite a few years later, Boltzmann’s student Paul Ehrenfest (1880–1933) (along with Tatiana Ehrenfest-Afanassjewa (1876–1964)) would introduce the term “ergodic” for Boltzmann’s isodic. And “ergodic” is the term that caught on. And in the twentieth century there was all sorts of development of “ergodic theory”, as we’ll discuss a bit later.

But back in the 1800s people continued to discuss the possibility that what would become called ergodicity was somehow generic, and would explain why all states would somehow be equally probable, why the Maxwell distribution of velocities would be obtained, and ultimately why the Second Law was true. Maxwell worked out some examples. So did Kelvin. But it remained unclear how it would all work out, as Kelvin (now with many letters after his name) discussed in a talk he gave in 1900 celebrating the new century:

The dynamical theory of light didn’t work out. And about the dynamical theory of heat, he quotes Maxwell (following Boltzmann) in one of his very last papers, published in 1878, as saying, in reference to what amounts to a proof of the Second Law from underlying dynamics:

Kelvin talks about exploring test cases:

When, for example, is the motion of a single particle bouncing around in a fixed region ergodic? He considers first an ellipse, and proves that, no, there isn’t in general ergodicity there:

Then he goes on to the much more complicated case

and now he does an “experiment” (with a rather Monte Carlo flavor):

Kelvin considers a few other examples

but mostly concludes that he can’t tell in general about ergodicity—and that probably something else is needed, or as he puts it (somehow wrapping the theory of light into the story as well):

But What about Reversibility?

Had Boltzmann’s 1872 H theorem proved the Second Law? Was the Second Law—with its rather downbeat implication about the heat death of the universe—even true? One skeptic was Boltzmann’s friend and former teacher, the chemist Josef Loschmidt (1821–1895), who in 1866 had used kinetic theory to (rather accurately) estimate the size of air molecules. And in 1876 Loschmidt wrote a paper entitled “On the State of Thermal Equilibrium in a System of Bodies with Consideration of Gravity” in which he claimed to show that when gravity was taken into account, there wouldn’t be uniform thermal equilibrium, the Maxwell distribution, or the Second Law—and thus, as he poetically explained:

The terroristic nimbus of the Second Law is destroyed, a nimbus which makes that Second Law appear as the annihilating principle of all life in the universe—and at the same time we are confronted with the comforting perspective that, as far as the conversion of heat into work is concerned, mankind will not solely be dependent on the intervention of coal or of the Sun, but will have available an inexhaustible resource of convertible heat at all times.

His main argument revolves around a thought experiment involving molecules in a gravitational field:

Over the next couple of years, despite Loschmidt’s progressively more elaborate constructions

Boltzmann and Maxwell will debunk this particular argument—even though to this day the role of gravity in relation to the Second Law remains incompletely resolved.

But what’s more important for our narrative about Loschmidt’s original paper are a couple of paragraphs tucked away at the end of one section (that in fact Kelvin had basically anticipated in 1874):

[Consider what would happen if] after a time t sufficiently long for the stationary state to obtain, we suddenly reversed the velocities of all atoms. Initially we would be in a state that would look like the stationary state. This would be true for some time, but in the long run the stationary state would deteriorate and after the time t we would inevitably return to the initial state…

It is clear that in general in any system one can revert the entire course of events by suddenly inverting the velocities of all the elements of the system. This doesn’t give a solution to the problem of undoing everything that happens [in the universe] but it does give a simple prescription: just suddenly revert the instantaneous velocities of all atoms of the universe.

How did this relate to the H theorem? The underlying molecular equations of motion that Boltzmann had assumed in his proof were reversible in time. Yet Boltzmann claimed that H was always going to a minimum. But why couldn’t one use Loschmidt’s argument to construct an equally possible “reverse evolution” in which H was instead going to a maximum?

It didn’t take Boltzmann long to answer, in print, tucked away in a section of his paper “Remarks on Some Problems in the Mechanical Theory of Heat”. He admits that Loschmidt’s argument “has great seductiveness”. But he claims it is merely “an interesting sophism”—and then says he will “locate the source of the fallacy”. He begins with a classic setup: a collection of hard spheres in a box.

Suppose that at time zero the distribution of spheres in the box is not uniform; for example, suppose that the density of spheres is greater on the right than on the left … The sophism now consists in saying that, without reference to the initial conditions, it cannot be proved that the spheres will become uniformly mixed in the course of time.

But then he rather boldly claims that with the actual initial conditions described, the spheres will “almost always [become] uniform” at a future time t. Now he imagines (following Loschmidt) reversing all the velocities in this state at time t. Then, he says:

… the spheres would sort themselves out as time progresses, and at [the analog of] time 0, they would have a completely nonuniform distribution, even though the [new] initial distribution [one had used] was almost uniform.

But now he says that, yes—given this counterexample—it won’t be possible to prove that the final distribution of spheres will always be uniform.

This is in fact a consequence of probability theory, for any nonuniform distribution, no matter how improbable it may be, is still not absolutely impossible. Indeed it is clear that any individual uniform distribution, which might arise after a certain time from some particular initial state, is just as improbable as an individual nonuniform distribution; just as in the game of Lotto, any individual set of five numbers is as improbable as the set 1, 2, 3, 4, 5. It is only because there are many more uniform distributions than nonuniform ones that the distribution of states will become uniform in the course of time. One therefore cannot prove that, whatever may be the positions and velocities of the spheres at the beginning, the distribution must become uniform after a long time; rather one can only prove that infinitely many more initial states will lead to a uniform one after a definite length of time than to a nonuniform one.

He adds:

One could even calculate, from the relative numbers of the different state distributions, their probabilities, which might lead to an interesting method for the calculation of thermal equilibrium.

And indeed within a few months Boltzmann has followed up on that “interesting method” to produce his classic paper on the probabilistic interpretation of entropy.

But in his earlier paper he goes on to argue:

Since there are infinitely many more uniform than nonuniform distributions of states, the latter case is extraordinarily improbable [to arise] and can be considered impossible for practical purposes; just as it may be considered impossible that if one starts with oxygen and nitrogen mixed in a container, after a month one will find chemically pure oxygen in the lower half and nitrogen in the upper half, although according to probability theory this is merely very improbable but not impossible.

He talks about how interesting it is that the Second Law is intimately connected with probability while the First Law is not. But at the end he does admit:

Perhaps this reduction of the Second Law to the realm of probability makes its application to the entire universe appear dubious, but the laws of probability theory are confirmed by all experiments carried out in the laboratory.

At this point it’s all rather unconvincing. The H theorem had purported to prove the Second Law. But now he’s just talking about probability theory. He seems to have given up on proving the Second Law. And he’s basically just saying that the Second Law is true because it’s observed to be true—like other laws of nature, but not like something that can be “proved”, say from underlying molecular dynamics.

For many years not much attention was paid to these issues, but by the late 1880s there were attempts to clarify things, particularly among the rather active British circle of kinetic theorists. A published 1894 letter from the Irish mathematician Edward Culverwell (1855–1931) (who also wrote about ice ages and Montessori education) summed up some of the confusions that were circulating:

At a lecture in England the next year, Boltzmann countered (conveniently, in English):

He goes on, but doesn’t get much more specific:

He then makes an argument that will be repeated many times in different forms, saying that there will be fluctuations, where H deviates temporarily from its minimum value, but these will be rare:

Later he’s talking about what he calls the “H curve” (a plot of H as a function of time), and he’s trying to describe its limiting form:

And he even refers to Weierstrass’s recent work on nondifferentiable functions:

But he doesn’t pursue this, and instead ends his “rebuttal” with a more philosophical—and in some sense anthropic—argument that he attributes to his former assistant Ignaz Schütz (1867–1927):

It’s an argument that we’ll see in various forms repeated over the century and a half that follows. In essence what it’s saying is that, yes, the Second Law implies that the universe will end up in thermal equilibrium. But there’ll always be fluctuations. And in a big enough universe there’ll be fluctuations somewhere that are large enough to correspond to the world as we experience it, where “visible motion and life exist”.

But regardless of such claims, there’s a purely formal question about the H theorem. How exactly is it that from the Boltzmann transport equation—which is supposed to describe reversible mechanical processes—the H theorem manages to prove that the H function irreversibly decreases? It wasn’t until 1895—fully 25 years after Boltzmann first claimed to prove the H theorem—that this issue was even addressed. And it first came up rather circuitously through Boltzmann’s response to comments in a textbook by Gustav Kirchhoff (1824–1887) that had been completed by Max Planck.

The key point is that Boltzmann’s equation makes an implicit assumption, that’s essentially the same as Maxwell made back in 1860: that before each collision between molecules, the molecules are statistically uncorrelated, so that the probability for the collision has the factored form ƒ(v₁) ƒ(v₂). But what about after the collision? Inevitably the collision itself will lead to correlations. So now there’s an asymmetry: there are no correlations before each collision, but there are correlations after. And that’s why the behavior of the system doesn’t have to be symmetrical—and the H theorem can prove that H irreversibly decreases.

In 1895 Boltzmann wrote a 3-page paper (after half in footnotes) entitled “More about Maxwell’s Distribution Law for Speeds” where he explained what he thought was going on:

[The reversibility of the laws of mechanics] has been recently applied in judging the assumptions necessary for a proof of [the H theorem]. This proof requires the hypothesis that the state of the gas is and remains molecularly disordered, namely, that the molecules of a given class do not always or predominantly collide in a specific manner and that, on the contrary, the number of collisions of a given kind can be found by the laws of probability.

Now, if we assume that in general a state distribution never remains molecularly ordered for an unlimited time and also that for a stationary state-distribution every velocity is as probable as the reversed velocity, then it follows that by inversion of all the velocities after an infinitely long time every stationary state-distribution remains unchanged. After the reversal, however, there are exactly as many collisions occurring in the reversed way as there were collisions occurring in the direct way. Since the two state distributions are identical, the probability of direct and indirect collisions must be equal for each of them, whence follows Maxwell’s distribution of velocities.

Boltzmann is introducing what we’d now call the “molecular chaos” assumption (and what Ehrenfest would call the Stosszahlansatz)—giving a rather self-fulfilling argument for why the assumption should be true. In Boltzmann’s time there wasn’t really anything better to do. By the 1940s the BBGKY hierarchy at least let one organize the hierarchy of correlations between molecules—though it still didn’t give one a tractable way to assess what correlations should exist in practice, and what not.

Boltzmann knew these were all complicated issues. But he wrote about them at a technical level only a few more times in his life. The last time was in 1898 when, responding to a request from the mathematician Felix Klein (1849–1925), he wrote a paper about the H curve for mathematicians. He begins by saying that although this curve comes from the theory of gases, the essence of it can be reproduced by a process based on accumulating balls randomly picked from an urn. He then goes on to outline what amounts to a story of random walks and fractals. In another paper, he actually sketches the curve

saying that his drawing “should be taken with a large grain of salt”, noting—in a remarkably fractal-reminiscent way—that “a zincographer [i.e. an engraver of printing plates] would not have been able to produce a real figure since the H-curve has a very large number of maxima and minima on each finite segment, and hence defies representation as a line of continuously changing direction.”

Of course, in modern times it’s easy to produce an approximation to the H curve according to his prescription:

But at the end of his “mathematical” paper he comes back to talking about gases. And first he makes the claim that the effective reversibility seen in the H curve will never be seen in actual physical systems because, in essence, there are always perturbations from outside. But then he ends, in a statement of ultimate reversibility that casts our everyday observation of irreversibility as tautological:

There is no doubt that it is just as conceivable to have a world in which all natural processes take place in the wrong chronological order. But a person living in this upside-down world would have feelings no different than we do: they would just describe what we call the future as the past and vice versa.

The Recurrence Objection

Probably the single most prominent research topic in mathematical physics in the 1800s was the three-body problem—of solving for the motion under gravity of three bodies, such as the Earth, Moon and Sun. And in 1890 the French mathematician Henri Poincaré (1854–1912) (whose breakout work had been on the three-body problem) wrote a paper entitled “On the Three-Body Problem and the Equations of Dynamics” in which, as he said:

It is proved that there are infinitely many ways of choosing the initial conditions such that the system will return infinitely many times as close as one wishes to its initial position. There are also an infinite number of solutions that do not have this property, but it is shown that these unstable solutions can be regarded as “exceptional” and may be said to have zero probability.

This was a mathematical result. But three years later Poincaré wrote what amounted to a philosophy paper entitled “Mechanism and Experience” which expounded on its significance for the Second Law:

In the mechanistic hypothesis, all phenomena must be reversible; for example, the stars might traverse their orbits in the retrograde sense without violating Newton’s law; this would be true for any law of attraction whatever. This is therefore not a fact peculiar to astronomy; reversibility is a necessary consequence of all mechanistic hypotheses.

Experience provides on the contrary a number of irreversible phenomena. For example, if one puts together a warm and a cold body, the former will give up its heat to the latter; the opposite phenomenon never occurs. Not only will the cold body not return to the warm one the heat which it has taken away when it is in direct contact with it; no matter what artifice one may employ, using other intervening bodies, this restitution will be impossible, at least unless the gain thereby realized is compensated by an equivalent or large loss. In other words, if a system of bodies can pass from state A to state B by a certain path, it cannot return from B to A, either by the same path or by a different one. It is this circumstance that one describes by saying that not only is there not direct reversibility, but also there is not even indirect reversibility.

But then he continues:

A theorem, easy to prove, tells us that a bounded world, governed only by the laws of mechanics, will always pass through a state very close to its initial state. On the other hand, according to accepted experimental laws (if one attributes absolute validity to them, and if one is willing to press their consequences to the extreme), the universe tends toward a certain final state, from which it will never depart. In this final state, which will be a kind of death, all bodies will be at rest at the same temperature.

But in fact, he says, the recurrence theorem shows that:

This state will not be the final death of the universe, but a sort of slumber, from which it will awake after millions of millions of centuries. According to this theory, to see heat pass from a cold body to a warm one … it will suffice to have a little patience. [And we may] hope that some day the telescope will show us a world in the process of waking up, where the laws of thermodynamics are reversed.

By 1903, Poincaré was more strident in his critique of the formalism around the Second Law, writing (in English) in a paper entitled “On Entropy”:

But back in 1896, Boltzmann and the H theorem had another critic: Ernst Zermelo (1871–1953), a recent German math PhD who was then working with Max Planck on applied mathematics—though would soon turn to foundations of mathematics and become the “Z” in ZFC set theory. Zermelo’s attack on the H theorem began with a paper entitled “On a Theorem of Dynamics and the Mechanical Theory of Heat”. After explaining Poincaré’s recurrence theorem, Zermelo gives some “mathematician-style” conditions (the gas must be in a finite region, must have no infinite energies, etc.), then says that even though there must exist states that would be non-recurrent and could show irreversible behavior, there would necessarily be infinitely more states that “would periodically repeat themselves … with arbitrarily small variations”. And, he argues, such repetition would affect macroscopic quantities discernable by our senses. He continues:

In order to retain the general validity of the Second Law, we therefore would have to assume that just those initial states leading to irreversible processes are realized in nature, their small number notwithstanding, while the other ones, whose probability of existence is higher, mathematically speaking, do not actually occur.

And he concludes that the Poincaré recurrence phenomenon means that:

… it is certainly impossible to carry out a mechanical derivation of the Second Law on the basis of the existing theory without specializing the initial states.

Boltzmann responded promptly but quite impatiently:

I have pointed out particularly often, and as clearly as I possibly could … that the Second Law is but a principle of probability theory as far as the molecular-theoretic point of view is concerned. … While the theorem by Poincaré that Zermelo discusses in the beginning of his paper is of course correct, its application to heat theory is not.

Boltzmann talks about the H curve, and first makes rather a mathematician-style point about the order of limits:

If we first take the number of gas molecules to be infinite, as was clearly done in [my 1896 proof], and only then let the time grow very large, then, in the vast majority of cases, we obtain a curve asymptotically [always close to zero]. Moreover, as can easily be seen, Poincaré’s theorem is not applicable in this case. If, however, we take the time [span] to be infinitely great and, in contrast, the number of molecules to be very great but not absolutely infinite, then the H-curve has a different character. It almost always runs very close to [zero], but in rare cases it rises above that, in what we shall call a “hump” … at which significant deviations from the Maxwell velocity distribution can occur …

Boltzmann then argues that even if you start “at a hump”, you won’t stay there, and “over an enormously long period of time” you’ll see something infinitely close to “equilibrium behavior”. But, he says:

… it is [always] possible to reach again a greater hump of the H-curve by further extending the time … In fact, it is even the case that the original state must return, provided only that we continue to sufficiently extend the time…

He continues:

Mr. Zermelo is therefore right in claiming that, mathematically speaking, the motion is periodic. He has by no means succeeded, however, in refuting my theorems, which, in fact, are entirely consistent with this periodicity.

After giving arguments about the probabilistic character of his results, and (as we would now say it) the fact that a 1D random walk is certain to repeatedly return to the origin, Boltzmann says that:

… we must not conclude that the mechanical approach has to be modified in any way. This conclusion would be justified only if the approach had a consequence that runs contrary to experience. But this would be the case only if Mr. Zermelo were able to prove that the duration of the period within which the old state of the gas must recur in accordance with Poincaré’s theorem has an observable length…

He goes on to imagine “a trillion tiny spheres, each with a [certain initial velocity] … in the one corner of a box” (and by “trillion” he means million million million, or today’s quintillion) and then says that “after a short time the spheres will be distributed fairly evenly in the box”, but the period for a “Poincaré recurrence” in which they all will return to their original corner is “so great that nobody can live to see it happen”. And to make this point more forcefully, Boltzmann has an appendix in which he tries to get an actual approximation to the recurrence time, concluding that its numerical value “has many trillions of digits”.

He concludes:

If we consider heat as a motion of molecules that occurs in accordance with the general equations of mechanics and assume that the arrangement of bodies that we perceive is currently in a highly improbable state, then a theorem follows that is in agreement with the Second Law for all phenomena so far observed.

Of course, this theorem can no longer hold once we observe bodies of so small a scale that they only contain a few molecules. Since, however, we do not have at hand any experimental results on the behavior of bodies so small, this assumption does not run counter to previous experience. In fact, certain experiments conducted on very small bodies in gases seem rather to support the assumption, although we are still far from being able to assert its correctness on the basis of experimental proof.

But then he gives an important caveat—with a small philosophical flourish:

Of course, we cannot expect natural science to answer the question as to why the bodies surrounding us currently exist in a highly improbable state, just as we cannot expect it to answer the question as to why there are any phenomena at all and why they adhere to certain given principles.

Unsurprisingly—particularly in view of his future efforts in the foundations of mathematics—Zermelo is unconvinced by all of this. And six months later he replies again in print. He admits that a full Poincaré recurrence might take astronomically long, but notes that (where, by “physical state”, he means one that we perceive):

… we are after all always concerned only with the “physical state”, which can be realized by many different combinations, and hence can recur much sooner.

Zermelo zeroes in on many of the weaknesses in Boltzmann’s arguments, saying that the thing he particularly “contests … is the analogy that is supposed to exist between the properties of the H curve and the Second Law”. He claims that irreversibility cannot be explained from “mechanical suppositions” without “new physical assumptions”—and in particular criteria for choosing appropriate initial states. He ends by saying that:

From the great successes of the kinetic theory of gases in explaining the relationships among states we must not deduce its … applicability also to temporal processes. … [For in this case I am] convinced that it necessarily fails in the absence of entirely new assumptions.

Boltzmann replies again—starting off with the strangely weak argument:

The Second Law receives a mechanical explanation by virtue of the assumption, which is of course unprovable, that the universe, when considered as a mechanical system, or at least a very extensive part thereof surrounding us, started out in a highly improbable state and still is in such a state.

And, yes, there’s clearly something missing in the understanding of the Second Law. And even as Zermelo pushes for formal mathematician-style clarity, Boltzmann responds with physicist-style “reasonable arguments”. There’s lots of rhetoric:

The applicability of the calculus of probabilities to a particular case can of course never be proved with precision. If 100 out of 100,000 objects of a particular sort are consumed by fire per year, then we cannot infer with certainty that this will also be the case next year. On the contrary! If the same conditions continue to obtain for 10¹⁰ years, then it will often be the case during this period that the 100,000 objects are all consumed by fire at once on a single day, and even that not a single object suffers damage over the course of an entire year. Nevertheless, every insurance company places its faith in the calculus of probabilities.

Or, in justification of the idea that we live in a highly improbable “low-entropy” part of the universe:

I refuse to grant the objection that a mental picture requiring so great a number of dead parts of the universe for the explanation of so small a number of animated parts is wasteful, and hence inexpedient. I still vividly remember someone who adamantly refused to believe that the Sun’s distance from the Earth is 20 million miles on the ground that it would simply be foolish to assume so vast a space only containing luminiferous aether alongside so small a space filled with life.

Curiously—given his apparent reliance on “commonsense” arguments—Boltzmann also says:

I myself have repeatedly cautioned against placing excessive trust in the extension of our mental pictures beyond experience and issued reminders that the pictures of contemporary mechanics, and in particular the conception of the smallest particles of bodies as material points, will turn out to be provisional.

In other words, we don’t know that we can think of atoms (even if they exist at all) as points, and we can’t really expect our everyday intuition to tell us about how they work. Which presumably means that we need some kind of solid, “formal” argument if we’re going to explain the Second Law.

Zermelo didn’t respond again, and moved on to other topics. But Boltzmann wrote one more paper in 1897 about “A Mechanical Theorem of Poincaré” ending with two more why-it-doesn’t-apply-in-practice arguments:

Poincaré’s theorem is of course never applicable to terrestrial bodies which we can hold in our hands as none of them is entirely closed. Nor it is applicable to an entirely closed gas of the sort considered by the kinetic theory if first the number of molecules and only then the quotients of the intervals between two neighboring collisions in the observation time is allowed to become infinite.

Ensembles, and an Effort to Make Things Rigorous

Boltzmann—and Maxwell before him—had introduced the idea of using probability theory to discuss the emergence of thermodynamics and potentially the Second Law. But it wasn’t until around 1900—with the work of J. Willard Gibbs (1839–1903)—that a principled mathematical framework for thinking about this developed. And while we can now see that this framework distracts in some ways from several of the key issues in understanding the foundations of the Second Law, it’s been important in framing the discussion of what the Second Law really says—as well as being central in defining the foundations for much of what’s been done over the past century or so under the banner of “statistical mechanics”.

Gibbs seems to have first gotten involved with thermodynamics around 1870. He’d finished his PhD at Yale on the geometry of gears in 1863—getting the first engineering PhD awarded in the US. After traveling in Europe and interacting with various leading mathematicians and physicists, he came back to Yale (where he stayed for the remaining 34 years of his life) and in 1871 became professor of mathematical physics there.

His first papers (published in 1873 when he was already 34 years old) were in a sense based on taking seriously the formalism of equilibrium thermodynamics defined by Clausius and Maxwell—treating entropy and internal energy, just like pressure, volume and temperature, as variables that defined properties of materials (and notably whether they were solids, liquids or gases). Gibbs’s main idea was to “geometrize” this setup, and make it essentially a story of multivariate calculus:

Unlike the European developers of thermodynamics, Gibbs didn’t interact deeply with other scientists—with the possible exception of Maxwell, who (a few years before his death in 1879) made a 3D version of Gibbs’s thermodynamic surface out of clay—and supplemented his 2D thermodynamic diagrams after the first edition of his textbook Theory of Heat with renderings of 3D versions:

Three years later, Gibbs began publishing what would be a 300-page work defining what has become the standard formalism for equilibrium chemical thermodynamics. He began with a quote from Clausius:

In the years that followed, Gibbs’s work—stimulated by Maxwell—mostly concentrated on electrodynamics, and later quaternions and vector analysis. But Gibbs published a few more small papers on thermodynamics—always in effect taking equilibrium (and the Second Law) for granted.

In 1882—a certain Henry Eddy (1844–1921) (who in 1879 had written a book on thermodynamics, and in 1890 would become president of the University of Cincinnati), claimed that “radiant heat” could be used to violate the Second Law:

Gibbs soon published a 2-page rebuttal (in the 6th-ever issue of Science magazine):

Then in 1889 Clausius died, and Gibbs wrote an obituary—praising Clausius but making it clear he didn’t think the kinetic theory of gases was a solved problem:

That same year Gibbs announced a short course that he would teach at Yale on “The a priori Deduction of Thermodynamic Principles from the Theory of Probabilities”. After a decade of work, this evolved into Gibbs’s last publication—an original and elegant book that’s largely defined how the Second Law has been thought about ever since:

The book begins by explaining that mechanics is about studying the time evolution of single systems:

But Gibbs says he is going to do something different: he is going to look at what he’ll call an ensemble of systems, and see how the distribution of their characteristics changes over time:

He explains that these “inquiries” originally arose in connection with deriving the laws of thermodynamics:

But he argues that this area—which he’s calling statistical mechanics—is worth investigating even independent of its connection to thermodynamics:

Still, he expects this effort will be relevant to the foundations of thermodynamics:

He immediately then goes on to what he’ll claim is the way to think about the relation of “observed thermodynamics” to his exact statistical mechanics:

Soon he makes the interesting—if, in the light of history, very overly optimistic—claim that “the laws of thermodynamics may be easily obtained from the principles of statistical mechanics”:

At first the text of the book reads very much like a typical mathematical work on mechanics:

But soon it’s “going statistical”, talking about the “density” of systems in “phase” (i.e. with respect to the variables defining the configuration of the system). And a few pages in, he’s proving the fundamental result that the density of “phase fluid” satisfies a continuity equation (which we’d now call the Liouville equation):

It’s all quite elegant, and all very rooted in the calculus-based mathematics of its time. He’s thinking about a collection of instances of a system. But while with our modern computational paradigm we’d readily be able to talk about a discrete list of instances, with his calculus-based approach he has to consider a continuous collection of instances—whose treatment inevitably seems more abstract and less explicit.

He soon makes contact with the “theory of errors”, discussing in effect how probability distributions over the space of possible states evolve. But what probability distributions should one consider? By chapter 4, he’s looking at what he calls (and is still called) the “canonical distribution”:

He gives a now-classic definition for the probability as a function of energy ϵ:

He observes that this distribution combines nicely when independent parts of a system are brought together, and soon he’s noting that:

But so far he’s careful to just talk about how things are “analogous”, without committing to a true connection:

More than halfway through the book he’s defined certain properties of his probability distributions that “may … correspond to the thermodynamic notions of entropy and temperature”:

Next he’s on to the concept of a “microcanonical ensemble” that includes only states of a given energy. For him—with his continuum-based setup—this is a slightly elaborate thing to define; in our modern computational framework it actually becomes more straightforward than his “canonical ensemble”. Or, as he already says:

But what about the Second Law? Now he’s getting a little closer:

When he says “index of probability” he’s talking about the log of a probability in his ensemble, so this result is about the fact that this quantity is extremized when all the elements of the ensemble have equal probability:

Soon he’s discussing whether he can use his index as a way—like Boltzmann tried to do with his version of entropy—to measure deviations from “statistical equilibrium”:

But now Gibbs has hit one of the classic gotchas of his approach: if you look in perfect detail at the evolution of an ensemble of systems, there’ll never be a change in the value of his index—essentially because of the overall conservation of probability. Gibbs brings in what amounts to a commonsense physics argument to handle this. He says to consider putting “coloring matter” in a liquid that one stirs. And then he says that even though the liquid (like his phase fluid) is microscopically conserved, the coloring matter will still end up being “uniformly mixed” in the liquid:

He talks about how the conclusion about whether mixing happens in effect depends on what order one takes limits in. And while he doesn’t put it quite this way, he’s essentially realized that there’s a competition between the system “mixing things up more and more finely” and the observer being able to track finer and finer details. He realizes, though, that not all systems will show this kind of mixing behavior, noting for example that there are mechanical systems that’ll just keep going in simple cycles forever.

He doesn’t really resolve the question of why “practical systems” should show mixing, more or less ending with a statement that even though his underlying mechanical systems are reversible, it’s somehow “in practice” difficult to go back:

Despite things like this, Gibbs appears to have been keen to keep the majority of his book “purely mathematical”, in effect proving theorems that necessarily followed from the setup he had given. But in the penultimate chapter of the book he makes what he seems to have viewed as a less-than-satisfactory attempt to connect what he’s done with “real thermodynamics”. He doesn’t really commit to the connection, though, characterizing it more as an “analogy”:

But he soon starts to be pretty clear that he actually wants to prove the Second Law:

He quickly backs off a little, in effect bringing in the observer to soften the requirements:

But then he fires his best shot. He says that the quantities he’s defined in connection with his canonical ensemble satisfy the same equations as Clausius originally set up for temperature and entropy:

He adds that fluctuations (or “anomalies”, as he calls them) become imperceptible in the limit of a large system:

But in physical reality, why should one have a whole collection of systems as in the canonical ensemble? Gibbs suggests it would be more natural to look at the microcanonical ensemble—and in fact to look at a “time ensemble”, i.e. an averaging over time rather than an averaging over different possible states of the system:

Gibbs has proved some results (e.g. related to the virial theorem) about the relation between time and ensemble averages. But as the future of the subject amply demonstrates, they’re not nearly strong enough to establish any general equivalence. Still, Gibbs presses on.

In the end, though, as he himself recognized, things weren’t solved—and certainly the canonical ensemble wasn’t the whole story:

He discusses the tradeoff between having a canonical ensemble “heat bath” of a known temperature, and having a microcanonical ensemble with known energy. At one point he admits that it might be better to consider the time evolution of a single state, but basically decides that—at least in his continuous-probability-distribution-based formalism—he can’t really set this up:

Gibbs definitely encourages the idea that his “statistical mechanics” has successfully “derived” thermodynamics. But he’s ultimately quite careful and circumspect in what he actually says. He mentions the Second Law only once in his whole book—and then only to note that he can get the same “mathematical expression” from his canonical ensemble as Clausius’s form of the Second Law. He doesn’t mention Boltzmann’s H theorem anywhere in the book, and—apart from one footnote concerning “difficulties long recognized by physicists”—he mentions only Boltzmann’s work on theoretical mechanics.

One can view the main achievement of Gibbs’s book as having been to define a framework in which precise results about the statistical properties of collections of systems could be defined and in some cases derived. Within the mathematics and other formalism of the time, such ensemble results represented in a sense a distinctly “higher-order” description of things. Within our current computational paradigm, though, there’s much less of a distinction to be made: whether one’s looking at a single path of evolution, or a whole collection, one’s ultimately still just dealing with a computation. And that makes it clearer that—ensembles or not—one’s thrown back into the same kinds of issues about the origin of the Second Law. But even so, Gibbs provided a language in which to talk with some clarity about many of the things that come up.

Maxwell’s Demon

In late 1867 Peter Tait (1831–1901)—a childhood friend of Maxwell’s who was by then a professor of “natural philosophy” in Edinburgh—was finishing his sixth book. It was entitled Sketch of Thermodynamics and gave a brief, historically oriented and not particularly conceptual outline of what was then known about thermodynamics. He sent a draft to Maxwell, who responded with a fairly long letter:

The letter begins:

I do not know in a controversial manner the history of thermodynamics … [and] I could make no assertions about the priority of authors …

Any contributions I could make … [involve] picking holes here and there to ensure strength and stability.

Then he continues (with “ΘΔcs” being his whimsical Greekified rendering of the word “thermodynamics”):

To pick a hole—say in the 2nd law of ΘΔcs, that if two things are in contact the hotter cannot take heat from the colder without external agency.

Now let A and B be two vessels divided by a diaphragm … Now conceive a finite being who knows the paths and velocities of all the molecules by simple inspection but who can do no work except open and close a hole in the diaphragm by means of a slide without mass. Let him … observe the molecules in A and when he sees one coming … whose velocity is less than the mean [velocity] of the molecules in B let him open the hole and let it go into B [and vice versa].

Then the number of molecules in A and B are the same as at first, but the energy in A is increased and that in B diminished, that is, the hot system has got hotter and the cold colder and yet no work has been done, only the intelligence of a very observant and neat-fingered being has been employed.

Or in short [we can] … restore a uniformly hot system to unequal temperatures… Only we can’t, not being clever enough.

And so it was that the idea of “Maxwell’s demon” was launched. Tait must at some point have shown Maxwell’s letter to Kelvin, who wrote on it:

Very good. Another way is to reverse the motion of every particle of the Universe and to preside over the unstable motion thus produced.

But the first place Maxwell’s demon idea appeared in print was in Maxwell’s 1871 textbook Theory of Heat:

Much of the book is devoted to what was by then quite traditional, experimentally oriented thermodynamics. But Maxwell included one final chapter:

Even in 1871, after all his work on kinetic theory, Maxwell is quite circumspect in his discussion of molecules:

But Maxwell’s textbook goes through a series of standard kinetic theory results, much as a modern textbook would. The second-to-last section in the whole book sounds a warning, however:

Interestingly, Maxwell continues, somewhat in anticipation of what Gibbs will say 30 years later:

But then there’s a reminder that this is being written in 1871, several decades before any clear observation of molecules was made. Maxwell says:

In other words, if there are water molecules, there must be something other than a law of averages that makes them all appear the same. And, yes, it’s now treated as a fundamental fact of physics that, for example, all electrons have exactly—not just statistically—the same properties such as mass and charge. But back in 1871 it was much less clear what characteristics molecules—if they existed as real entities at all—might have.

Maxwell included one last section in his book that to us today might seem quite wild:

In other words, aware of Darwin’s (1809–1882) 1859 Origin of Species, he’s considering a kind of “speciation” of molecules, along the lines of the discrete species observed in biology. But then he notes that unlike biological organisms, molecules are “permanent”, so their “selection” must come from some kind of pure separation process:

And at the very end he suggests that if molecules really are all identical, that suggests a level of fundamental order in the world that we might even be able to flow through to “exact principles of distributive justice” (presumably for people rather than molecules):

Maxwell has described rather clearly his idea of demons. But the actual name “demon” first appears in print in a paper by Kelvin in 1874:

It’s a British paper, so—in a nod to future nanomachinery—it’s talking about (molecular) cricket bats:

Kelvin’s paper—like his note written on Maxwell’s letter—imagines that the demons don’t just “sort” molecules; they actually reverse their velocities, thus in effect anticipating Loschmidt’s 1876 “reversibility objection” to Boltzmann’s H theorem.

In an undated note, Maxwell discusses demons, attributing the name to Kelvin—and then starts considering the “physicalization” of demons, simplifying what they need to do:

Concerning Demons.
I. Who gave them this name? Thomson.
2. What were they by nature? Very small BUT lively beings incapable of doing work but able to open and shut valves which move without friction or inertia.
3. What was their chief end? To show that the 2nd Law of Thermodynamics has only a statistical certainty.
4. Is the production of an inequality of temperature their only occupation? No, for less intelligent demons can produce a difference in pressure as well as temperature by merely allowing all particles going in one direction while stopping all those going the other way. This reduces the demon to a valve. As such value him. Call him no more a demon but a valve like that of the hydraulic ram, suppose.

It didn’t take long for Maxwell’s demon to become something of a fixture in expositions of thermodynamics, even if it wasn’t clear how it connected to other things people were saying about thermodynamics. And in 1879, for example, Kelvin gave a talk all about Maxwell’s “sorting demon” (like other British people of the time he referred to Maxwell as “Clerk Maxwell”):

Kelvin describes—without much commentary, and without mentioning the Second Law—some of the feats of which the demon would be capable. But he adds:

The description of the lecture ends:

Presumably no actual Maxwell’s demon was shown—or Kelvin wouldn’t have continued for the rest of his life to treat the Second Law as an established principle.

But in any case, Maxwell’s demon has always remained something of a fixture in discussions of the foundations of the Second Law. One might think that the observability of Brownian motion would make something like a Maxwell’s demon possible. And indeed in 1912 Marian Smoluchowski (1872–1917) suggested experiments that one could imagine would “systematically harvest” Brownian motion—but showed that in fact they couldn’t. In later years, a sequence of arguments were advanced that the mechanism of a Maxwell’s demon just couldn’t work in practice—though even today microscopic versions of what amount to Maxwell’s demons are routinely being investigated.

What Happened to Those People?

We’ve finally now come to the end of the story of how the original framework for the Second Law came to be set up. And, as we’ve seen, only a fairly small number of key players were involved:

So what became of these people? Carnot lived a generation earlier than the others, never made a living as a scientist, and was all but unknown in his time. But all the others had distinguished careers as academic scientists, and were widely known in their time. Clausius, Boltzmann and Gibbs are today celebrated mainly for their contributions to thermodynamics; Kelvin and Maxwell also for other things. Clausius and Gibbs were in a sense “pure professors”; Boltzmann, Maxwell and especially Kelvin also had engagement with the more general public.

All of them spent the majority of their lives in the countries of their birth—and all (with the exception of Carnot) were able to live out the entirety of their lives without time-consuming disruptions from war or other upheavals:

Sadi Carnot (1796–1832)

Almost all of what is known about Sadi Carnot as a person comes from a single biographical note written nearly half a century after his death by his younger brother Hippolyte Carnot (who was a distinguished French politician—and sometime education minister—and father of the Sadi Carnot who would become president of France). Hippolyte Carnot began by saying that:

As the life of Sadi Carnot was not marked by any notable event, his biography would have occupied only a few lines; but a scientific work by him, after remaining long in obscurity, brought again to light many years after his death, has caused his name to be placed among those of great inventors.

The Carnots’ father was close to Napoleon, and Hippolyte explains that when Sadi was a young child he ended up being babysat by “Madame Bonaparte”—but one day wandered off, and was found inspecting the operation of a nearby mill, and quizzing the miller about it. For the most part, however, throughout his life, Sadi Carnot apparently kept very much to himself—while with quiet intensity showing a great appetite for intellectual pursuits from mathematics and science to art, music and literature, as well as practical engineering and the science of various sports.

Even his brother Hippolyte can’t explain quite how Sadi Carnot—at the age of 28—suddenly “came out” and in 1824 published his book on thermodynamics. (As we discussed above, it no doubt had something to do with the work of his father, who died two years earlier.) Sadi Carnot funded the publication of the book himself—having 600 copies printed (at least some of which remained unsold a decade later). But after the book was published, Carnot appears to have returned to just privately doing research, living alone, and never publishing again in his lifetime. And indeed he lived only another eight years, dying (apparently after some months of ill health) in the same Paris cholera outbreak that claimed General Lamarque of Les Misérables fame.

Twenty-three pages of unpublished personal notes survive from the period after the publication of Carnot’s book. Some are general aphorisms and life principles:

Speak little of what you know, and not at all of what you do not know.

Why try to be witty? I would rather be thought stupid and modest than witty and pretentious.

God cannot punish man for not believing when he could so easily have enlightened and convinced him.

The belief in an all-powerful Being, who loves us and watches over us, gives to the mind great strength to endure misfortune.

When walking, carry a book, a notebook to preserve ideas, and a piece of bread in order to prolong the walk if need be.

But others are more technical—and in fact reveal that Carnot, despite having based his book on caloric theory, had realized that it probably wasn’t correct:

When a hypothesis no longer suffices to explain phenomena, it should be abandoned. This is the case with the hypothesis which regards caloric as matter, as a subtile fluid.

The experimental facts tending to destroy this theory are as follows: The development of heat by percussion or friction of bodies … The elevation of temperature which takes place [when] air [expands into a] vacuum …

He continues:

At present, light is generally regarded as the result of a vibratory movement of the ethereal fluid. Light produces heat, or at least accompanies radiating heat, and moves with the same velocity as heat. Radiating heat is then a vibratory movement. It would be ridiculous to suppose that it is an emission of matter while the light which accompanies it could be only a movement.

Could a motion (that of radiating heat) produce matter (caloric)? No, undoubtedly; it can only produce a motion. Heat is then the result of a motion.

And then—in a rather clear enunciation of what would become the First Law of thermodynamics:

Heat is simply motive power, or rather motion which has changed form. It is a movement among the particles of bodies. Wherever there is destruction of motive power there is, at the same time, production of heat in quantity exactly proportional to the quantity of motive power destroyed. Reciprocally, wherever there is destruction of heat, there is production of motive power.

Carnot also wonders:

Liquefaction of bodies, solidification of liquids, crystallization—are they not forms of combinations of integrant molecules? Supposing heat due to a vibratory movement, how can the passage from the solid or the liquid to the gaseous state be explained?

There is no indication of how Carnot felt about this emerging rethinking of thermodynamics, or of how it might affect the results in his book. But Carnot clearly hoped to do experiments (as outlined in his notes) to test what was really going on. But as it was, he presumably didn’t get around to any of them—and his notes, ahead of their time as they were, did not resurface for many decades, by which time the ideas they contained had already been discovered by others.

Rudolf Clausius (1822–1888)

Rudolf Clausius was born in what’s now Poland (and was then Prussia), one of more than 14 children of an education administrator and pastor. He went to university in Berlin, and, after considering doing history, eventually specialized in math and physics. After graduating in 1844 he started teaching at a top high school in Berlin (which he did for 6 years), and meanwhile earned his PhD in physics. His career took off after his breakout paper on thermodynamics appeared in 1850. For a while he was a professor in Berlin, then for 12 years in Zürich, then briefly in Würzburg, then—for the remaining 19 years of his life—in Bonn.

He was a diligent—if, one suspects, somewhat stiff—professor, notable for the clarity of his lectures, and his organizational care with students. He seems to have been a competent administrator, and late in his career he spent a couple of years as the president (“rector”) of his university. But first and foremost, he was a researcher, writing about a hundred papers over the course of his career. Most physicists of the time devoted at least some of their efforts to doing actual physics experiments. But Clausius was a pioneer in the idea of being a “pure theoretical physicist”, inspired by experiments and quoting their results, but not doing them himself.

The majority of Clausius’s papers were about thermodynamics, though late in his career his emphasis shifted more to electrodynamics. Clausius’s papers were original, clear, incisive and often fairly mathematically sophisticated. But from his very first paper on thermodynamics in 1850, he very much adopted a macroscopic approach, talking about what he considered to be “bulk” quantities like energy, and later entropy. He did explore some of the potential mechanics of molecules, but he never really made the connection between molecular phenomena and entropy—or the Second Law. He had a number of run-ins about academic credit with Kelvin, Tait, Maxwell and Boltzmann, but he didn’t seem to ever pay much attention to, for example, Boltzmann’s efforts to find molecular-based probabilistic derivations of Clausius’s results.

It probably didn’t help that after two decades of highly productive work, two misfortunes befell Clausius. First, in 1870, he had volunteered to lead an ambulance corps in the Franco-Prussian war, and was wounded in the knee, leading to chronic pain (as well as to his habit of riding to class on horseback). And then, in 1875, Clausius’s wife died in the birth of their sixth child—leaving him to care for six young children (which apparently he did with great conscientiousness). Clausius nevertheless continued to pursue his research—even to the end of his life—receiving many honors along the way (like election to no less than 40 professional societies), but it never again rose to the level of significance of his early work on thermodynamics and the Second Law.

Kelvin (William Thomson) (1824–1907)

Of the people we’re discussing here, by far the most famous during their lifetime was Kelvin. In his long career he wrote more than 600 scientific papers, received dozens of patents, started several companies and served in many administrative and governmental roles. His father was a math professor, ultimately in Glasgow, who took a great interest in the education of his children. Kelvin himself got an early start, effectively going to college at the age of 10, and becoming a professor in Glasgow at the age of 22—a position in which he continued for 53 years.

Kelvin’s breakout work, done in his twenties, was on thermodynamics. But over the years he also worked on many other areas of physics, and beyond, mixing theory, experiment and engineering. Beginning in 1854 he became involved in a technical megaproject of the time: the attempt to lay a transatlantic telegraph cable. He wound up very much on the front lines, helping out as a just-in-time physicist + engineer on the cable-laying ship. The first few attempts didn’t work out, but finally in 1866—in no small part through Kelvin’s contributions—a cable was successfully laid, and Kelvin (or William Thomson, as he then was) became something of a celebrity. He was made “Sir William Thomson” and—along with two other techies—formed his first company, which had considerable success in exploiting telegraph-cable-related engineering innovations.

Kelvin’s first wife died after a long illness in 1870, and Kelvin, with no children and already enthusiastic about the sea, bought a fairly large yacht, and pursued a number of nautical-related projects. One of these—begun in 1872—was the construction of an analog computer for calculating tides (basically with 10 gears for adding up 10 harmonic tide components), a device that, with progressive refinements, continued to be used for close to a century.

Being rather charmed by Kelvin’s physicist-with-a-big-yacht persona, I once purchased a letter that Kelvin wrote in 1877 on the letterhead of “Yacht Lalla Rookh”:

The letter—in true academic style—promises that Kelvin will soon send an article he’s been asked to write on elasticity theory. And in fact he did write the article, and it was an expository one that appeared in the 9th edition of the Encyclopedia Britannica.

Kelvin was a prolific (if, to modern ears, sometimes rather pompous) writer, who took exposition seriously. And indeed—finding the textbooks available to him as a professor inadequate—he worked over the course of a dozen years (1855–1867) with his (and Maxwell’s) friend Peter Guthrie Tait to produce the influential Treatise on Natural Philosophy.

Kelvin explored many topics and theories, some more immediately successful than others. In the 1870s he suggested that perhaps atoms might be knotted vortices in the (luminiferous) aether (causing Tait to begin developing knot theory)—a hypothesis that’s in some sense a Victorian prelude to modern ideas about particles in our Physics Project.

Throughout his life, Kelvin was a devout Christian, writing that “The more thoroughly I conduct scientific research, the more I believe science excludes atheism.” And indeed this belief seems to make an appearance in his implication that humans—presumably as a result of their special relationship with God—might avoid the Second Law. But more significant at the time was Kelvin’s skepticism about Charles Darwin’s 1859 theory of natural selection, believing that there must in the end be a “continually guiding and controlling intelligence”. Despite being somewhat ridiculed for it, Kelvin talked about the possibility that life might have come to Earth from elsewhere via meteorites, believing that his estimates of the age of the Earth (which didn’t take into account radioactivity) made it too young for the things Darwin described to have occurred.

By the 1870s, Kelvin had become a distinguished man of science, receiving all sorts of honors, assignments and invitations. And in 1876, for example, he was invited to Philadelphia to chair the committee judging electrical inventions at the US Centennial International Exhibition, notably reporting, in the terms of the time:

Then in 1892 a “peerage of the realm” was conferred on him by Queen Victoria. His wife (he had remarried) and various friends (including Charles Darwin’s son George) suggested he pick the title “Kelvin”, after the River Kelvin that flowed by the university in Glasgow. And by the end of his life “Lord Kelvin” had accumulated enough honorifics that they were just summarized with “…” (the MD was an honorary degree conferred by the University of Heidelberg because “it was the only one at their disposal which he did not already possess”):

And when Kelvin died in 1907 he was given a state funeral and buried in Westminster Abbey near Newton and Darwin.

James Clerk Maxwell (1831–1879)

James Clerk Maxwell lived only 48 years but in that time managed to do a remarkable amount of important science. His early years were spent on a 1500-acre family estate (inherited by his father) in a fairly remote part of Scotland—to which he would return later. He was an only child and was homeschooled—initially by his mother, until she died, when he was 8. At 10 he went to an upscale school in Edinburgh, and by the age of 14 had written his first scientific paper. At 16 he went as an undergraduate to the University of Edinburgh, then, effectively as a graduate student, to Cambridge—coming second in the final exams (“Second Wrangler”) to a certain Edward Routh, who would spend most of his life coaching other students on those very same exams.

Within a couple of years, Maxwell was a professor, first in Aberdeen, then in London. In Aberdeen he married the daughter of the university president, who would soon be his “Observer K” (for “Katherine”) in his classic work on color vision. But after nine fairly strenuous years as a professor, Maxwell in 1865 “retired” to his family estate, supervising a house renovation, and in “rural solitude” (recreationally riding around his estate on horseback with his wife) having the most scientifically productive time of his life. In addition to his work on things like the kinetic theory of gases, he also wrote his 2-volume Treatise on Electricity and Magnetism, which ultimately took 7 years to finish, and which, with considerable clarity, described his approach to electromagnetism and what are now called “Maxwell’s Equations”. Occasionally, there were hints of his “country life”—like his 1870 “On Hills and Dales” that in his characteristic mathematicize-everything way gave a kind of “pre-topological” analysis of contour maps (perhaps conceived as he walked half a mile every day down to the mailbox at which journals and correspondence would arrive):

As a person, Maxwell was calm, reserved and unassuming, yet cheerful and charming—and given to writing (arguably sometimes sophomoric) poetry:

With a certain sense of the absurd, he would occasionally publish satirical pieces in Nature, signing them dp/dt, which in the thermodynamic notation created by his friend Tait was equal to JCM, which were his initials. Maxwell liked games and tricks, and spinning tops featured prominently in some of his work. He enjoyed children, though never had any of his own. As a lecturer, he prepared diligently, but often got too sophisticated for his audience. In writing, though, he showed both great clarity and great erudition, for example freely quoting Latin and Greek in articles he wrote for the 9th edition of the Encyclopedia Britannica (of which he was scientific co-editor) on topics such as “Atom” and “Ether”.

As we mentioned above, Maxwell was quite an enthusiast of diagrams and visual presentation (even writing an article on “Diagrams” for the Encyclopedia Britannica). He was also a capable experimentalist, making many measurements (sometimes along with his wife), and in 1861 creating the first color photograph.

In 1871 William Cavendish, 7th Duke of Devonshire, who had studied math in Cambridge, and was now chancellor of the university, agreed to put up the money to build what became the Cavendish Laboratory and to endow a new chair of experimental physics. Kelvin having turned down the job, it was offered to the still-rather-obscure Maxwell, who somewhat reluctantly accepted—with the result that for several years he spent much of his time supervising the design and building of the lab.

The lab was finished in 1874, but then William Cavendish dropped on Maxwell a large collection of papers from his great uncle Henry Cavendish, who had been a wealthy “gentleman scientist” of the late 1700s and (among other things) the discoverer of hydrogen. Maxwell liked history (as some of us do!), noticed that Cavendish had discovered Ohm’s law 50 years before Ohm, and in the end spent several years painstakingly editing and annotating the papers into a 500-page book. By 1879 Maxwell was finally ready to energetically concentrate on physics research again, but, sadly, in the fall of that year his health failed, and he died at the age of 48—having succumbed to stomach cancer, as his mother also had at almost the same age.

J. Willard Gibbs (1839–1903)

Gibbs was born near the Yale campus, and died there 64 years later, in the same house where he had lived since he was 7 years old (save for three years spent visiting European universities as a young man, and regular summer “out-in-nature” vacations). His father (who, like, “our Gibbs” was named “Josiah Willard”—making “our Gibbs” be called “Willard”) came from an old and distinguished intellectual and religious New England family, and was a professor of sacred languages at Yale. Willard Gibbs went to college and graduate school at Yale, and then spent his whole career as a professor at Yale.

He was, it seems, a quiet, modest and rather distant person, who radiated a certain serenity, regularly attended church, had a small circle of friends and lived with his two sisters (and the husband and children of one of them). He diligently discharged his teaching responsibilities, though his lectures were very sparsely attended, and he seems not to have been thought forceful enough in dealing with people to have been called on for many administrative tasks—though he became the treasurer of his former high school, and himself was careful enough with money that by the end of his life he had accumulated what would now be several million dollars.

He had begun his academic career in practical engineering, for example patenting an “improved [railway] car-brake”, but was soon drawn in more mathematical directions, favoring a certain clarity and minimalism of formulation, and a cleanliness, if not brevity, of exposition. His work on thermodynamics (initially published in the rather obscure Transactions of the Connecticut Academy) was divided into two parts: the first, in the 1870s, concentrating on macroscopic equilibrium properties, and second, in the 1890s, concentrating on microscopic “statistical mechanics” (as Gibbs called it). Even before he started on thermodynamics, he’d been interested in electromagnetism, and between his two “thermodynamic periods”, he again worked on electromagnetism. He studied Maxwell’s work, and was at first drawn to the then-popular formalism of quaternions—but soon decided to invent his own approach and notation for vector analysis, which at first he presented only in notes for his students, though it later became widely adopted.

And while Gibbs did increasingly mathematical work, he never seems to have identified as a mathematician, modestly stating that “If I have had any success in mathematical physics, it is, I think, because I have been able to dodge mathematical difficulties.” His last work was his book on statistical mechanics, which—with considerable effort and perhaps damage to his health—he finished in time for publication in connection with the Yale bicentennial in 1901 (an event which notably also brought a visit from Kelvin), only to die soon thereafter.

Gibbs had a few graduate students at Yale, a notable one being Lee de Forest, inventor of the vacuum tube (triode) electronic amplifier, and radio entrepreneur. (de Forest’s 1899 PhD thesis was entitled “Reflection of Hertzian Waves from the Ends of Parallel Wires”.) Another student of Gibbs was Lynde Wheeler, who became a government radio scientist, and who wrote a biography of Gibbs, of which I have a copy bought years ago at a used bookstore—that I was now just about to put back on a shelf when I opened its front cover and found an inscription:

And, yes, it’s a small world, and “To Willard” refers to Gibbs’s sister’s son (Willard Gibbs Van Name, who became a naturalist and wrote a 1929 book about national park deforestation).

Ludwig Boltzmann (1844–1906)

Of the people we’re discussing, Boltzmann is the one whose career was most focused on the Second Law. Boltzmann grew up in Austria, where his father was a civil servant (who died when Boltzmann was 15) and his mother was something of an heiress. Boltzmann did his PhD at the University of Vienna, where his professor notably gave him a copy of some of Maxwell’s papers, together with an English grammar book. Boltzmann started publishing his own papers near the end of his PhD, and soon landed a position as a professor of mathematical physics in Graz. Four years later he moved to Vienna as a professor of mathematics, soon moving back to Graz as a professor of “general and experimental physics”—a position he would keep for 14 years.

He’d married in 1876, and had 5 children, though a son died in 1889, leaving 3 daughters and another son. Boltzmann was apparently a clear and lively lecturer, as well as a spirited and eager debater. He seems, at least in his younger years, to have been a happy and gregarious person, with a strong taste for music—and some charming do-it-your-own-way tendencies. For example, wanting to provide fresh milk for his children, he decided to just buy a cow, which he then led from the market through the streets—though had to consult his colleague, the professor of zoology, to find out how to milk it. Boltzmann was a capable experimental physicist, as well as a creator of gadgets, and a technology enthusiast—promoting the idea of airplanes (an application for gas theory!) and noting their potential power as a means of transportation.

Boltzmann had always had mood swings, but by the early 1890s he claimed they were getting worse. It didn’t help that he was worn down by administrative work, and had worsening asthma and increasing nearsightedness (that he’d thought might be a sign of going blind). He moved positions, but then came back to Vienna, where he embarked on writing what would become a 2-volume book on Gas Theory—in effect contextualizing his life’s work. The introduction to the first volume laments that “gas theory has gone out of fashion in Germany”. The introduction to the second volume, written in 1898 when Boltzmann was 54, then says that “attacks on the theory of gases have begun to increase”, and continues:

… it would be a great tragedy for science if the theory of gases were temporarily thrown into oblivion because of a momentary hostile attitude toward it, as, for example, was the wave theory [of light] because of Newton’s authority.

I am conscious of being only an individual struggling weakly against the stream of time. But it still remains in my power to contribute in such a way that, when the theory of gases is again revived, not too much will have to be rediscovered.

But even as he was writing this, Boltzmann had pretty much already wound down his physics research, and had basically switched to exposition, and to philosophy. He moved jobs again, but in 1902 again came back to Vienna, but now also as a professor of philosophy. He gave an inaugural lecture, first quoting his predecessor Ernst Mach (1838–1916) as saying “I do not believe that atoms exist”, then discussing the philosophical relations between reality, perception and models. Elsewhere he discussed things like his view of the different philosophical character of models associated with differential equations and with atomism—and he even wrote an article on the general topic of “Models” for Encyclopedia Britannica (which curiously also talks about “in pure mathematics, especially geometry, models constructed of papier-mâché and plaster”). Sometimes Boltzmann’s philosophy could be quite polemical, like his attack on Schopenhauer, that ends by saying that “men [should] be freed from the spiritual migraine that is called metaphysics”.

Then, in 1904, Boltzmann addressed the Vienna Philosophical Society (a kind of predecessor of the Vienna Circle) on the subject of a “Reply to a Lecture on Happiness by Professor Ostwald”. Wilhelm Ostwald (1853–1932) (a chemist and social reformer, who was a personal friend of Boltzmann’s, but intellectual adversary) had proposed the concept of “energy of will” to apply mathematical physics ideas to psychology. Boltzmann mocked this, describing its faux formalism as “dangerous for science”. Meanwhile, Boltzmann gives his own Darwinian theory for the origin of happiness, based essentially on the idea that unhappiness is needed as a way to make organisms improve their circumstances in the struggle for survival.

Boltzmann himself was continuing to have problems that he attributed to then-popular but very vague “diagnosis” of “neurasthenia”, and had even briefly been in a psychiatric hospital. But he continued to do things like travel. He visited the US three times, in 1905 going to California (mainly Berkeley)—which led him to write a witty piece entitled “A German Professor’s Trip to El Dorado” that concluded:

Yes, America will achieve great things. I believe in these people, even after seeing them at work in a setting where they’re not at their best: integrating and differentiating at a theoretical physics seminar…

In 1905 Einstein published his Boltzmann-and-atomism-based results on Brownian motion and on photons. But it’s not clear Boltzmann ever knew about them. For Boltzmann was sinking further. Perhaps he’d overexerted himself in California, but by the spring of 1906 he said he was no longer able to teach. In the summer he went with his family to an Italian seaside resort in an attempt to rejuvenate. But a day before they were to return to Vienna he failed to join his family for a swim, and his youngest daughter found him hanged in his hotel room, dead at the age of 62.

Coarse-Graining and the “Modern Formulation”

After Gibbs’s 1902 book introducing the idea of ensembles, most of the language used (at least until now!) to discuss the Second Law was basically in place. But in 1912 one additional term—representing a concept already implicit in Gibbs’s work—was added: coarse-graining. Gibbs had discussed how the phase fluid representing possible states of a system could be elaborately mixed by the mechanical time evolution of the system. But realistic practical measurements could not be expected to probe all the details of the distribution of phase fluid; instead one could say that they would only sample “coarse-grained” aspects of it.

The term “coarse-graining” first appeared in a survey article entitled “The Conceptual Foundations of the Statistical Approach in Mechanics”, written for the German-language Encyclopaedia of the Mathematical Sciences by Boltzmann’s former student Paul Ehrenfest, and his wife Tatiana Ehrenfest-Afanassjewa:

The article also introduced all sorts of now-standard notation, and in many ways can be read as a final summary of what was achieved in the original development around the foundations of thermodynamics and the Second Law. (And indeed the article was sufficiently “final” that when it was republished as a book in 1959 it could still be presented as usefully summarizing the state of things.)

Looking at the article now, though, it’s notable how much it recognized was not at all settled about the Second Law and its foundations. It places Boltzmann squarely at the center, stating in its preface:

The section titles are already revealing:

And soon they’re starting to talk about “loose ends”, and lots of them. Ergodicity is something one can talk about, but there’s no known example (and with this definition it was later proved that there couldn’t be):

But, they point out, it’s something Boltzmann needed in order to justify his results:

Soon they’re talking about Boltzmann’s sloppiness in his discussion of the H curve:

And then they’re on to talking about Gibbs, and the gaps in his reasoning:

In the end they conclude:

In other words, even though people now seem to be buying all these results, there are still plenty of issues with their foundations. And despite people’s implicit assumptions, we can in no way say that the Second Law has been “proved”.

Radiant Heat, the Second Law and Quantum Mechanics

It was already realized in the 1600s that when objects get hot they emit “heat radiation”—which can be transferred to other bodies as “radiant heat”. And particularly following Maxwell’s work in the 1860s on electrodynamics it came to be accepted that radiant heat was associated with electromagnetic waves propagating in the “luminiferous aether”. But unlike the molecules from which it was increasingly assumed that one could think of matter as being made, these electromagnetic waves were always treated—particularly on the basis of their mathematical foundations in calculus—as fundamentally continuous.

But how might this relate to the Second Law? Could it be, perhaps, that the Second Law should ultimately be attributed not to some property of the large-scale mechanics of discrete molecules, but rather to a feature of continuous radiant heat?

The basic equations assumed for mechanics—originally due to Newton—are reversible. But what about the equations for electrodynamics? Maxwell’s equations are in and of themselves also reversible. But when one thinks about their solutions for actual electromagnetic radiation, there can be fundamental irreversibility. And the reason is that it’s natural to describe the emission of radiation (say from a hot body), but then to assume that, once emitted, the radiation just “escapes to infinity”—rather than ever reversing the process of emission by being absorbed by some other body.

All the various people we’ve discussed above, from Clausius to Gibbs, made occasional remarks about the possibility that the Second Law—whether or not it could be “derived mechanically”—would still ultimately work, if nothing else, because of the irreversible emission of radiant heat.

But the person who would ultimately be most intimately connected to these issues was Max Planck—though in the end the somewhat-confused connection to the Second Law would recede in importance relative to what emerged from it, which was basically the raw material that led to quantum theory.

As a student of Helmholtz’s in Berlin, Max Planck got interested in thermodynamics, and in 1879 wrote a 61-page PhD thesis entitled “On the Second Law of Mechanical Heat Theory”. It was a traditional (if slightly streamlined) discussion of the Second Law, very much based on Clausius’s approach (and even with the same title as Clausius’s 1867 paper)—and without any mention whatsoever of Boltzmann:

For most of the two decades that followed, Planck continued to use similar methods to study the Second Law in various settings (e.g. elastic materials, chemical mixtures, etc.)—and meanwhile ascended the German academic physics hierarchy, ending up as a professor of theoretical physics in Berlin. Planck was in many ways a physics traditionalist, not wanting to commit to things like “newfangled” molecular ideas—and as late as 1897 (with his assistant Zermelo having made his “recurrence objection” to Boltzmann’s work) still saying that he would “abstain completely from any definite assumption about the nature of heat”. But regardless of its foundations, Planck was a true believer in the Second Law, for example in 1891 asserting that it “must extend to all forces of nature … not only thermal and chemical, but also electrical and other”.

And in 1895 he began to investigate how the Second Law applied to electrodynamics—and in particular to the “heat radiation” that it had become clear (particularly through Heinrich Hertz’s (1857–1894) experiments) was of electromagnetic origin. In 1896 Wilhelm Wien (1864–1928) suggested that the heat radiation (or what we now call blackbody radiation) was in effect produced by tiny Hertzian oscillators with velocities following a Maxwell distribution.

Planck, however, had a different viewpoint, instead introducing the concept of “natural radiation”—a kind of intrinsic thermal equilibrium state for radiation, with an associated intrinsic entropy. He imagined “resonators” interacting through Maxwell’s equations with this radiation, and in 1899 invented a (rather arbitrary) formula for the entropy of these resonators, that implied (through the laws of electrodynamics) that overall entropy would increase—just like the Second Law said—and when the entropy was maximized it gave the same result as Wien for the spectrum of blackbody radiation. In early 1900 he sharpened his treatment and began to suggest that with his approach Wien’s form of the blackbody spectrum would emerge as a provable consequence of the universal validity of the Second Law.

But right around that time experimental results arrived that disagreed with Wien’s law. And by the end of 1900 Planck had a new hypothesis, for which he finally began to rely on ideas from Boltzmann. Planck started from the idea that he should treat the behavior of his resonators statistically. But how then could he compute their entropy? He quotes (for the first time ever) his simplification of Boltzmann’s formula for entropy:

As he explains it—claiming now, after years of criticizing Boltzmann, that this is a “theorem”:

We now set the entropy S of the system proportional to the logarithm of its probability W… In my opinion this actually serves as a definition of the probability W, since in the basic assumptions of electromagnetic theory there is no definite evidence for such a probability. The suitability of this expression is evident from the outset, in view of its simplicity and close connection with a theorem from kinetic gas theory.

But how could he figure out the probability for a resonator to have a certain energy, and thus a certain entropy? For this he turns directly to Boltzmann—who, as a matter of convenience in his 1877 paper had introduced discrete values of energy for molecules. Planck simply states that it’s “necessary” (i.e. to get the experimentally right answer) to treat the resonator energy “not as a continuous, infinitely divisible quantity, but as a discrete quantity composed of an integral number of finite equal parts”. As an example of how this works he gives a table just like the one in Boltzmann’s paper from nearly a quarter of a century earlier:

Pretty soon he’s deriving the entropy of a resonator as a function of its energy, and its discrete energy unit ϵ:

Connecting this to blackbody radiation he claims that each resonator’s energy unit is connected to its frequency according to

so that its entropy is

“[where] h and k are universal constants”.

In a similar situation Boltzmann had effectively taken the limit ϵ→0, because that’s what he believed corresponded to (“calculus-based”) physical reality. But Planck—in what he later described as an “act of desperation” to fit the experimental data—didn’t do that. So in computing things like average energies he’s evaluating Sum[x Exp[-a x], {x, 0, ∞}] rather than Integrate[x Exp [-a x], {x, 0, Infinity}]. And in doing this it takes him only a few lines to derive what’s now called the Planck spectrum for blackbody radiation (i.e. for “radiation in equilibrium”):

And then by fitting this result to the data of the time he gets “Planck’s constant” (the correct result is 6.62):

And, yes, this was essentially the birth of quantum mechanics—essentially as a spinoff from an attempt to extend the domain of the Second Law. Planck himself didn’t seem to internalize what he’d done for at least another decade. And it was really Albert Einstein’s 1905 analysis of the photoelectric effect that made the concept of the quantization of energy that Planck had assumed (more as a calculational hypothesis than anything else) seem to be something of real physical significance—that would lead to the whole development of quantum mechanics, notably in the 1920s.

Are Molecules Real? Continuous Versus Discrete

As we discussed at the very beginning above, already in antiquity there was a notion that at least things like solids and liquids might not ultimately be continuous (as they seemed), but might instead be made of large numbers of discrete “atomic” elements. By the 1600s there was also the idea that light might be “corpuscular”—and, as we discussed above, gases too. But meanwhile, there were opposing theories that espoused continuity—like the caloric theory of heat. And particularly with the success of calculus, there was a strong tendency to develop theories that showed continuity—and to which calculus could be applied.

But in the early 1800s—notably with the work of John Dalton (1766–1844)—there began to be evidence that there were discrete entities participating in chemical reactions. Meanwhile, as we discussed above, the success of the kinetic theory of gases gave increasing evidence for some kind of—at least effectively—discrete elements in gases. But even people like Boltzmann and Maxwell were reluctant to assert that gases really were made of molecules. And there were plenty of well-known scientists (like Ernst Mach) who “opposed atomism”, often effectively on the grounds that in science one should only talk about things one can actually see or experience—not things like atoms that were too small for that.

But there was something else too: with Newton’s theory of gravitation as a precursor, and then with the investigation of electromagnetic phenomena, there emerged in the 1800s the idea of a “continuous field”. The interpretation of this was fairly clear for something like an elastic solid or a fluid that exhibited continuous deformations.

Mathematically, things like gravity, magnetism—and heat—seemed to work in similar ways. And it was assumed that this meant that in all cases there had to be some fluid-like “carrier” for the field. And this is what led to ideas like the luminiferous aether as the “carrier” of electromagnetic waves. And, by the way, the idea of an aether wasn’t even obviously incompatible with the idea of atoms; Kelvin, for example, had a theory that atoms were vortices (perhaps knotted) in the aether.

But how does this all relate to the Second Law? Well, particularly through the work of Boltzmann there came to be the impression that given atomism, probability theory could essentially “prove” the Second Law. A few people tried to clarify the formal details (as we discussed above), but it seemed like any final conclusion would have to await the validation (or not) of atomism, which in the late 1800s was still a thoroughly controversial theory.

By the first decade of the 1900s, however, the fortunes of atomism began to change. In 1897 J. J. Thomson (1856–1940) discovered the electron, showing that electricity was fundamentally “corpuscular”. And in 1900 Planck had (at least calculationally) introduced discrete quanta of energy. But it was the three classic papers of Albert Einstein in 1905 that—in their different ways—began to secure the ultimate success of atomism.

First there was his paper “On a Heuristic View about the Production and Transformation of Light”, which began:

Maxwell’s theory of electromagnetic [radiation] differs in a profound, essential way from the current theoretical models of gases and other matter. We consider the state of a material body to be completely determined by the positions and velocities of a finite number of atoms and electrons, albeit a very large number. But the electromagnetic state of a region of space is described by continuous functions …

He then points out that optical experiments look only at time-averaged electromagnetic fields, and continues:

In particular, blackbody radiation, photoluminescence, [the photoelectric effect] and other phenomena associated with the generation and transformation of light seem better modeled by assuming that the energy of light is distributed discontinuously in space. According to this picture, the energy of a light wave emitted from a point source is not spread continuously over ever larger volumes, but consists of a finite number of energy quanta that are spatially localized at points of space, move without dividing and are absorbed or generated only as a whole.

In other words, he’s suggesting that light is “corpuscular”, and that energy is quantized. When he begins to get into details, he’s soon talking about the “entropy of radiation”—and, then, in three core sections of his paper, he’s basing what he’s doing on “Boltzmann’s principle”:

Two months later, Einstein produced another paper: “Investigations on the Theory of Brownian Motion”. Back in 1827 the British botanist Robert Brown (1773–1858) had seen under a microscope tiny grains (ejected by pollen) randomly jiggling around in water. Einstein began his paper:

In this paper it will be shown that according to the molecular-kinetic theory of heat, bodies of microscopically visible size suspended in a liquid will perform movements of such magnitude that they can be easily observed in a microscope, on account of the molecular motions of heat.

He doesn’t explicitly mention Boltzmann in this paper, but there’s Boltzmann’s formula again:

And by the next year it’s become clear experimentally that, yes, the jiggling Robert Brown had seen was in fact the result of impacts from discrete, real water molecules.

Einstein’s third 1905 paper, “On the Electrodynamics of Moving Bodies”—in which he introduced relativity theory—wasn’t so obviously related to atomism. But in showing that the luminiferous aether will (as Einstein put it) “prove superfluous” he was removing what was (almost!) the last remaining example of something continuous in physics.

In the years after 1905, the evidence for atomism mounted rapidly, segueing in the 1920s into the development of quantum mechanics. But what happened with the Second Law? By the time atomism was generally accepted, the generation of physicists that had included Boltzmann and Gibbs was gone. And while the Second Law was routinely invoked in expositions of thermodynamics, questions about its foundations were largely forgotten. Except perhaps for one thing: people remembered that “proofs” of the Second Law had been controversial, and had depended on the controversial hypothesis of atomism. But—they appear to have reasoned—now that atomism isn’t controversial anymore, it follows that the Second Law is indeed “satisfactorily proved”. And, after all, there were all sorts of other things to investigate in physics.

There are a couple of “footnotes” to this story. The first has to do with Einstein. Right before Einstein’s remarkable series of papers in 1905, what was he working on? The answer is: the Second Law! In 1902 he wrote a paper entitled “Kinetic Theory of Thermal Equilibrium and of the Second Law of Thermodynamics”. Then in 1903: “A Theory of the Foundations of Thermodynamics”. And in 1904: “On the General Molecular Theory of Heat”. The latter paper claims:

I derive an expression for the entropy of a system, which is completely analogous to the one found by Boltzmann for ideal gases and assumed by Planck in his theory of radiation. Then I give a simple derivation of the Second Law.

But what’s actually there is not quite what’s advertised:

It’s a short argument—about interactions between a collection of heat reservoirs. But in a sense it already assumes its answer, and certainly doesn’t provide any kind of fundamental “derivation of the Second Law”. And this was the last time Einstein ever explicitly wrote about deriving the Second Law. Yes, in those days it was just too hard, even for Einstein.

There’s another footnote to this story too. As we said, at the beginning of the twentieth century it had become clear that lots of things that had been thought to be continuous were in fact discrete. But there was an important exception: space. Ever since Euclid (~300 BC), space had almost universally been implicitly assumed to be continuous. And, yes, when quantum mechanics was being built, people did wonder about whether space might be discrete too (and even in 1917 Einstein expressed the opinion that eventually it would turn out to be). But over time the idea of continuous space (and time) got so entrenched in the fabric of physics that when I started seriously developing the ideas that became our Physics Project based on space as a discrete network (or what—in homage to the dynamical theory of heat one might call the “dynamical theory of space”) it seemed to many people quite shocking. And looking back at the controversies of the late 1800s around atomism and its application to the Second Law it’s charming how familiar many of the arguments against atomism seem. Of course it turns out they were wrong—as they seem again to be in the case of space.

The Twentieth Century

The foundations of thermodynamics were a hot topic in physics in the latter half of the nineteenth century—worked on by many of the most prominent physicists of the time. But by the early twentieth century it’d been firmly eclipsed by other areas of physics. And going forward it’d receive precious little attention—with most physicists just assuming it’d “somehow been solved”, or at least “didn’t need to be worried about”.

As a practical matter, thermodynamics in its basic equilibrium form nevertheless became very widely used in engineering and in chemistry. And in physics, there was steadily increasing interest in doing statistical mechanics—typically enumerating states of systems (quantum or otherwise), weighted as they would be in idealized thermal equilibrium. In mathematics, the field of ergodic theory developed, though for the most part it concerned itself with systems (such as ordinary differential equations) involving few variables—making it relevant to the Second Law essentially only by analogy.

There were a few attempts to “axiomatize” the Second Law, but mostly only at a macroscopic level, not asking about its microscopic origins. And there were also attempts to generalize the Second Law to make robust statements not just about equilibrium and the fact that it would be reached, but also about what would happen in systems driven to be in some manner away from equilibrium. The fluctuation-dissipation theorem about small perturbations from equilibrium—established in the mid-1900s, though anticipated in Einstein’s work on Brownian motion—was one example of a widely applicable result. And there were also related ideas of “minimum entropy production”—as well as “maximum entropy production”. But for large deviations from equilibrium there really weren’t convincing general results, and in practice most investigations basically used phenomenological models that didn’t have obvious connections to the foundations of thermodynamics, or derivations of the Second Law.

Meanwhile, through most of the twentieth century there were progressively more elaborate mathematical analyses of Boltzmann’s equation (and the H theorem) and their relation to rigorously derivable but hard-to-manage concepts like the BBGKY hierarchy. But despite occasional claims to the contrary, such approaches ultimately never seem to have been able to make much progress on the core problem of deriving the Second Law.

And then there’s the story of entropy. And in a sense this had three separate threads. The first was the notion of entropy—essentially in the original form defined by Clausius—being used to talk quantitatively about heat in equilibrium situations, usually for either engineering or chemistry. The second—that we’ll discuss a little more below—was entropy as a qualitative characterization of randomness and degradation. And the third was entropy as a general and formal way to measure the “effective number of degrees of freedom” in a system, computed from the log of the number of its achievable states.

There are definitely correspondences between these different threads. But they’re in no sense “obviously equivalent”. And much of the mystery—and confusion—that developed around entropy in the twentieth century came from conflating them.

Another piece of the story was information theory, which arose in the 1940s. And a core question in information theory is how long an “optimally compressed” message will be. And (with various assumptions) the average such length is given by a ∑p log p form that has essentially the same structure as Boltzmann’s expression for entropy. But even though it’s “mathematically like entropy” this has nothing immediately to do with heat—or even physics; it’s just an abstract consequence of needing log Ω bits (i.e. log Ω degrees of freedom) to specify one of Ω possibilities. (Still, the coincidence of definitions led to an “entropy branding” for various essentially information-theoretic methods, with claims sometimes being made that, for example, the thing called entropy must always be maximized “because we know that from physics”.)

There’d been an initial thought in the 1940s that there’d be an “inevitable Second Law” for systems that “did computation”. The argument was that logical gates (like And and Or) take 2 bits of input (with 4 overall states 11, 10, 01, 00) but give only 1 bit of output (1 or 0), and are therefore fundamentally irreversible. But in the 1970s it became clear that it’s perfectly possible to do computation reversibly (say with 2-input, 2-output gates)—and indeed this is what’s used in the typical formalism for quantum circuits.

As I’ve mentioned elsewhere, there were some computer experiments in the 1950s and beyond on model systems—like hard sphere gases and nonlinear springs—that showed some sign of Second Law behavior (though less than might have been expected). But the analysis of these systems very much concentrated on various regularities, and not on the effective randomness associated with Second Law behavior.

In another direction, the 1970s saw the application of thermodynamic ideas to black holes. At first, it was basically a pure analogy. But then quantum field theory calculations suggested that black holes should produce thermal radiation as if they had a certain effective temperature. By the late 1990s there were more direct ways to “compute entropy” for black holes, by enumerating possible (quantum) configurations consistent with the overall characteristics of the black hole. But such computations in effect assume (time-invariant) equilibrium, and so can’t be expected to shed light directly on the Second Law.

Talking about black holes brings up gravity. And in the course of the twentieth century there were scattered efforts to understand the effect of gravity on the Second Law. Would a self-gravitating gas achieve “equilibrium” in the usual sense? Does gravity violate the Second Law? It’s been difficult to get definitive answers. Many specific simulations of n-body gravitational systems were done, but without global conclusions for the Second Law. And there were cosmological arguments, particularly about the role of gravity in accounting for entropy in the early universe—but not so much about the actual evolution of the universe and the effect of the Second Law on it.

Yet another direction has involved quantum mechanics. The standard formalism of quantum mechanics—like classical mechanics—is fundamentally reversible. But the formalism for measurement introduced in the 1930s—arguably as something of a hack—is fundamentally irreversible, and there’ve been continuing arguments about whether this could perhaps “explain the Second Law”. (I think our Physics Project finally provides more clarity about what’s going on here—but also tells us this isn’t what’s “needed” for the Second Law.)

From the earliest days of the Second Law, there had always been scattered but ultimately unconvincing assertions of exceptions to the Second Law—usually based on elaborately constructed machines that were claimed to be able to achieve perpetual motion “just powered by heat”. Of course, the Second Law is a claim about large numbers of molecules, etc.—and shouldn’t be expected to apply to very small systems. But by the end of the twentieth century it was starting to be possible to make micromachines that could operate on small numbers of molecules (or electrons). And with the right control systems in place, it was argued that such machines could—at least in principle—effectively be used to set up Maxwell’s demons that would systematically violate the Second Law, albeit on a very small scale.

And then there was the question of life. Early formulations of the Second Law had tended to talk about applying only to “inanimate matter”—because somehow living systems didn’t seem to follow the same process of inexorable “dissipation to heat” as inanimate, mechanical systems. And indeed, quite to the contrary, they seemed able to take disordered input (like food) and generate ordered biological structures from it. And indeed, Erwin Schrödinger (1887–1961), in his 1944 book What Is Life? talked about “negative entropy” associated with life. But he—and many others since—argue that life doesn’t really violate the Second Law because it’s not operating in a closed environment where one should expect evolution to equilibrium. Instead, it’s constantly being driven away from equilibrium, for example by “organized energy” ultimately coming from the Sun.

Still, the concept of at least locally “antithermodynamic” behavior is often considered to be a potential general signature of life. But already by the early part of the 1900s, with the rise of things like biochemistry, and the decline of concepts like “life force” (which seemed a little like “caloric”), there developed a strong belief that the Second Law must at some level always apply, even to living systems. But, yes, even though the Second Law seemed to say that one can’t “unscramble an egg”, there was still the witty rejoinder: “unless you feed it to a chicken”.

What about biological evolution? Well, Boltzmann had been an enthusiast of Darwin’s idea of natural selection. And—although it’s not clear he made this connection—it was pointed out many times in the twentieth century that just as in the Second Law reversible underlying dynamics generate an irreversible overall effect, so also in Darwinian evolution effectively reversible individual changes aggregate to what at least Darwin thought was an “irreversible” progression to things like the formation of higher organisms.

The Second Law also found its way into the social sciences—sometimes under names like “entropy pessimism”—most often being used to justify the necessity of “Maxwell’s-demon-like” active intervention or control to prevent the collapse of economic or social systems into random or incoherent states.

But despite all these applications of the Second Law, the twentieth century largely passed without significant advances in understanding the origin and foundations of the Second Law. Though even by the early 1980s I was beginning to find results—based on computational ideas—that seemed as if they might finally give a foundational understanding of what’s really happening in the Second Law, and the extent to which the Second Law can in the end be “derived” from underlying “mechanical” rules.

What the Textbooks Said: The Evolution of Certainty

Ask a typical physicist today about the Second Law and they’re likely to be very sure that it’s “just true”. Maybe they’ll consider it “another law of nature” like the conservation of energy, or maybe they’ll think it is something that was “proved long ago” from basic principles of mathematics and mechanics. But as we’ve discussed here, there’s really nowhere in the history of the Second Law that should give us this degree of certainty. So where did all the certainty come from? I think in the end it’s a mixture of a kind of don’t-question-this-it-comes-from-sophisticated-science mystique about the Second Law, together with a century and a half of “increasingly certain” textbooks. So let’s talk about the textbooks.

While early contributions to what we now call thermodynamics (and particularly those from continental Europe) often got published as monographs, the first “actual textbooks” of thermodynamics already started to appear in the 1860s, with three examples (curiously, all in French) being:

And in these early textbooks what one repeatedly sees is that the Second Law is simply cited—without much comment—as a “principle” or “axiom” (variously attributed to Carnot, Kelvin or Clausius, and sometimes called “the Principle of Carnot”), from which theory will be developed. By the 1870s there’s a bit of confusion starting to creep in, because people are talking about the “Theorem of Carnot”. But, at least at first, by this they mean not the Second Law, but the result on the efficiency of heat engines that Carnot derived from this.

Occasionally, there are questions in textbooks about the validity of the Second Law. A notable one, that we discussed above when we talked about Maxwell’s demon, shows up under the title “Limitation of the Second Law of Thermodynamics” at the end of Maxwell’s 1871 Theory of Heat.

Tait’s largely historical 1877 Sketch of Thermodynamics notes that, yes, the Second Law hasn’t successfully been proved from the laws of mechanics:

In 1879, Eddy’s Thermodynamics at first shows even more skepticism

but soon he’s talking about how “Rankine’s theory of molecular vortices” has actually “proved the Second Law”:

He goes on to give some standard “phenomenological” statements of the Second Law, but then talks about “molecular hypotheses from which Carnot’s principle has been derived”:

Pretty soon there’s confusion like the section in Alexandre Gouilly’s (1842–1906) 1877 Mechanical Theory of Heat that’s entitled “Second Fundamental Theorem of Thermodynamics or the Theorem of Carnot”:

More textbooks on thermodynamics follow, but the majority tend to be practical expositions (that are often incredibly similar to each other) with no particular theoretical discussion of the Second Law, its origins or validity.

In 1891 there’s an “official report about the Second Law” commissioned by the British Association for the Advancement of Science (and written by a certain George Bryan (1864–1928) who would later produce a thermodynamics textbook):

There’s an enumeration of approaches so far:

Somewhat confusingly it talks about a “proof of the Second Law”—actually referring to an already-in-equilibrium result:

There’s talk of mechanical instability leading to irreversibility:

The conclusions say that, yes, the Second Law isn’t proved “yet”

but imply that if only we knew more about molecules that might be enough to nail it:

But back to textbooks. In 1895 Boltzmann published his Lectures on Gas Theory, which includes a final chapter about the H theorem and its relation to the Second Law. Boltzmann goes through his mathematical derivations for gases, then (rather over-optimistically) asserts that they’ll also work for solids and liquids:

We have looked mainly at processes in gases and have calculated the function H for this case. Yet the laws of probability that govern atomic motion in the solid and liquid states are clearly not qualitatively different … from those for gases, so that the calculation of the function H corresponding to the entropy would not be more difficult in principle, although to be sure it would involve greater mathematical difficulties.

But soon he’s discussing the more philosophical aspects of things (and by the time Boltzmann wrote this book, he was a professor of philosophy as well as physics). He says that the usual statement of the Second Law is “asserted phenomenologically as an axiom” (just as he says the infinite divisibility of matter also is at that time):

… the Second Law is formulated in such a way that the unconditional irreversibility of all natural processes is asserted as an axiom, just as general physics based on a purely phenomenological standpoint asserts the unconditional divisibility of matter without limit as an axiom.

One might then expect him to say that actually the Second Law is somehow provable from basic physical facts, such as the First Law. But actually his claims about any kind of “general derivation” of the Second Law are rather subdued:

Since however the probability calculus has been verified in so many special cases, I see no reason why it should not also be applied to natural processes of a more general kind. The applicability of the probability calculus to the molecular motion in gases cannot of course be rigorously deduced from the differential equations for the motion of the molecules. It follows rather from the great number of the gas molecules and the length of their paths, by virtue of which the properties of the position in the gas where a molecule undergoes a collision are completely independent of the place where it collided the previous time.

But he still believes in the ultimate applicability of the Second Law, and feels he needs to explain why—in the face of the Second Law—the universe as we perceive “still has interesting things going on”:

… small isolated regions of the universe will always find themselves “initially” in an improbable state. This method seems to me to be the only way in which one can understand the Second Law—the heat death of each single world—without a unidirectional change of the entire universe from a definite initial state to a final state.

Meanwhile, he talks about the idea that elsewhere in the universe things might be different—and that, for example, entropy might be systematically decreasing, making (he suggests) perceived time run backwards:

In the entire universe, the aggregate of all individual worlds, there will however in fact
occur processes going in the opposite direction. But the beings who observe such processes will simply reckon time as proceeding from the less probable to the more probable states, and it will never be discovered whether they reckon time differently from us, since they are separated from us by eons of time and spatial distances 10^10¹⁰ times the distance of Sirius—and moreover their language has no relation to ours.

Most other textbook discussions of thermodynamics are tamer than this, but the rather anthropic-style argument that “we live in a fluctuation” comes up over and over again as an ultimate way to explain the fact that the universe as we perceive it isn’t just a featureless maximum-entropy place.

It’s worth noting that there are roughly three general streams of textbooks that end up discussing the Second Law. There are books about rather practical thermodynamics (of the type pioneered by Clausius), that typically spend most of their time on the equilibrium case. There are books about kinetic theory (effectively pioneered by Maxwell), that typically spend most of their time talking about the dynamics of gas molecules. And then there are books about statistical mechanics (as pioneered by Gibbs) that discuss with various degrees of mathematical sophistication the statistical characteristics of ensembles.

In each of these streams, many textbooks just treat the Second Law as a starting point that can be taken for granted, then go from there. But particularly when they are written by physicists with broader experience, or when they are intended for a not-totally-specialized audience, textbooks will quite often attempt at least a little justification or explanation for the Second Law—though rather often with a distinct sleight of hand involved.

For example, when Planck in 1903 wrote his Treatise on Thermodynamics he had a chapter in his discussion of the Second Law, misleadingly entitled “Proof”. Still, he explains that:

The second fundamental principle of thermodynamics [Second Law] being, like the first, an empirical law, we can speak of its proof only in so far as its total purport may be deduced from a single self-evident proposition. We, therefore, put forward the following proposition as being given directly by experience. It is impossible to construct an engine which will work in a complete cycle, and produce no effect except the raising of a weight and the cooling of a heat-reservoir.

In other words, his “proof” of the Second Law is that nobody has ever managed to build a perpetual motion machine that violates it. (And, yes, this is more than a little reminiscent of P ≠ NP, which, through computational irreducibility, is related to the Second Law.) But after many pages, he says:

In conclusion, we shall briefly discuss the question of the possible limitations to the Second Law. If there exist any such limitations—a view still held by many scientists and philosophers—then this [implies an error] in our starting point: the impossibility of perpetual motion …

(In the 1905 edition of the book he adds a footnote that frankly seems bizarre in view of his—albeit perhaps initially unwilling—role in the initiation of quantum theory five years earlier: “The following discussion, of course, deals with the meaning of the Second Law only insofar as it can be surveyed from the points of view contained in this work avoiding all atomic hypotheses.”)

He ends by basically saying “maybe one day the Second Law will be considered necessarily true; in the meantime let’s assume it and see if anything goes wrong”:

Presumably the time will come when the principle of the increase of the entropy will be presented without any connection with experiment. Some metaphysicians may even put it forward as being a priori valid. In the meantime, no more effective weapon can be used by both champions and opponents of the Second Law than the indefatigable endeavour to follow the real purport of this law to the utmost consequences, taking the latter one by one to the highest court of appeal experience. Whatever the decision may be, lasting gain will accrue to us from such a proceeding, since thereby we serve the chief end of natural science the enlargement of our stock of knowledge.

Planck’s book came in a sense from the Clausius tradition. James Jeans’s (1877–1946) 1904 book The Dynamical Theory of Gases came instead from the Maxwell + Boltzmann tradition. He says at the beginning—reflecting the fact the existence of molecules had not yet been firmly established in 1904—that the whole notion of the molecular basis of heat “is only a hypothesis”:

Later he argues that molecular-scale processes are just too “fine-grained” to ever be directly detected:

But soon Jeans is giving a derivation of Boltzmann’s H theorem, though noting some subtleties:

His take on the “reversibility objection” is that, yes, the H function will be symmetric at every maximum, but, he argues, it’ll also be discontinuous there:

And in the time-honored tradition of saying “it is clear” right when an argument is questionable, he then claims that an “obvious averaging” will give irreversibility and the Second Law:

Later in his book Jeans simply quotes Maxwell and mentions his demon:

Then effectively just tells readers to go elsewhere:

In 1907 George Bryan (whose 1891 report we mentioned earlier) published Thermodynamics, an Introductory Treatise Dealing Mainly with First Principles and Their Direct Applications. But despite its title, Bryan has now “walked back” the hopes of his earlier report and is just treating the Second Law as an “axiom”:

And—presumably from his interactions with Boltzmann—is saying that the Second Law is basically an empirical fact of our particular experience of the universe, and thus not something fundamentally derivable:

As the years went by, many thermodynamics textbooks appeared, increasingly with an emphasis on applications, and decreasingly with a mention of foundational issues—typically treating the Second Law essentially just as an absolute empirical “law of nature” analogous to the First Law.

But in other books—including some that were widely read—there were occasional mentions of the foundations of the Second Law. A notable example was in Arthur Eddington’s (1882–1944) 1929 The Nature of the Physical World—where now the Second Law is exalted as having the “supreme position among the laws of Nature”:

Although Eddington does admit that the Second Law is probably not “mathematically derivable”:

And even though in the twentieth century questions about thermodynamics and the Second Law weren’t considered “top physics topics”, some top physicists did end up talking about them, if nothing else in general textbooks they wrote. Thus, for example, in the 1930s and 1940s people like Enrico Fermi (1901–1954) and Wolfgang Pauli (1900–1958) wrote in some detail about the Second Law—though rather strenuously avoided discussing foundational issues about it.

Lev Landau (1908–1968), however, was a different story. In 1933 he wrote a paper “On the Second Law of Thermodynamics and the Universe” which basically argues that our everyday experience is only possible because “the world as a whole does not obey the laws of thermodynamics”—and suggests that perhaps relativistic quantum mechanics (which he says, quoting Niels Bohr (1885–1962), could be crucial in the center of stars) might fundamentally violate the Second Law. (And yes, even today it’s not clear how “relativistic temperature” works.)

But this kind of outright denial of the Second Law had disappeared by the time Lev Landau and Evgeny Lifshitz (1915–1985) wrote the 1951 version of their book Statistical Mechanics—though they still showed skepticism about its origins:

There is no doubt that the foregoing simple formulations [of the Second Law] accord with reality; they are confirmed by all our everyday observations. But when we consider more closely the problem of the physical nature and origin of these laws of behaviour, substantial difficulties arise, which to some extent have not yet been overcome.

Their book continues, discussing Boltzmann’s fluctuation argument:

Firstly, if we attempt to apply statistical physics to the entire universe … we immediately encounter a glaring contradiction between theory and experiment. According to the results of statistics, the universe ought to be in a state of complete statistical equilibrium. … Everyday experience shows us, however, that the properties of Nature bear no resemblance to those of an equilibrium system; and astronomical results show that the same is true throughout the vast region of the Universe accessible to our observation.

We might try to overcome this contradiction by supposing that the part of the Universe which we observe is just some huge fluctuation in a system which is in equilibrium as a whole. The fact that we have been able to observe this huge fluctuation might be explained by supposing that the existence of such a fluctuation is a necessary condition for the existence of an observer (a condition for the occurrence of biological evolution). This argument, however, is easily disproved, since a fluctuation within, say, the volume of the solar system only would be very much more probable, and would be sufficient to allow the existence of an observer.

What do they think is the way out? The effect of gravity:

… in the general theory of relativity, the Universe as a whole must be regarded not as a closed system but as a system in a variable gravitational field. Consequently the application of the law of increase of entropy does not prove that statistical equilibrium must necessarily exist.

But they say this isn’t the end of the problem, essentially noting the reversibility objection. How should this be overcome? First, they suggest the solution might be that the observer somehow “artificially closes off the history of a system”, but then they add:

Such a dependence of the laws of physics on the nature of an observer is quite inadmissible, of course.

They continue:

At the present time it is not certain whether the law of increase of entropy thus formulated can be derived on the basis of classical mechanics. … It is more reasonable to suppose that the law of increase of entropy in the above general formulation arises from quantum effects.

They talk about the interaction of classical and quantum systems, and what amounts to the explicit irreversibility of the traditional formalism of quantum measurement, then say that if quantum mechanics is in fact the ultimate source of irreversibility:

… there must exist an inequality involving the quantum constant ℏ which ensures the validity of the law and is satisfied in the real world…

What about other textbooks? Joseph Mayer (1904–1983) and Maria Goeppert Mayer’s (1906–1972) 1940 Statistical Mechanics has the rather charming

though in the end they sidestep difficult questions about the Second Law by basically making convenient definitions of what S and Ω mean in S = k log Ω.

For a long time one of the most cited textbooks in the area was Richard Tolman’s (1881–1948) 1938 Principles of Statistical Mechanics. Tolman (basically following Gibbs) begins by explaining that statistical mechanics is about making predictions when all you know are probabilistic statements about initial conditions:

Tolman continues:

He notes that, historically, statistical mechanics was developed for studying systems like gases, where (in a vague foreshadowing of the concept of computational irreducibility) “it is evident that we should be quickly lost in the complexities of our computations” if we try to trace every molecule, but where, he claims, statistical mechanics can still accurately tell us “statistically” what will happen:

But where exactly should we get the probability distributions for initial states from? Tolman says he’s going to consider the kinds of mathematically defined ensembles that Gibbs discusses. And tucked away at the end of a chapter he admits that, well, yes, this setup is really all just a postulate—set up so as to make the results of statistical mechanics “merely a matter for computation”:

On this basis Tolman then derives Boltzmann’s H theorem, and his “coarse-grained” generalization (where, yes, the coarse-graining ultimately operates according to his postulate). For 530 pages, there’s not a single mention of the Second Law. But finally, on page 558 Tolman is at least prepared to talk about an “analog of the Second Law”:

And basically what Tolman argues is that his can reasonably be identified with thermodynamic entropy S. In the end, the argument is very similar to Boltzmann’s, though Tolman seems to feel that it has achieved more:

Very different in character from Tolman’s book, another widely cited book is Percy Bridgman’s (1882–1961) largely philosophical 1943 The Nature of Thermodynamics. His chapter on the Second Law begins:

A decade earlier Bridgman had discussed outright violations of the Second Law, saying that he’d found that the younger generation of physicists at the time seemed to often think that “it may be possible some day to construct a machine which shall violate the Second Law on a scale large enough to be commercially profitable”—perhaps, he said, by harnessing Brownian motion:

At a philosophical level, a notable treatment of the Second Law appeared in Hans Reichenbach’s (1891–1953) (unfinished-at-his-death) 1953 work The Direction of Time. Wanting to make use of the Second Law, but concerned about the reversibility objections, Reichenbach introduces the notion of “branch systems”—essentially parts of the universe that can eventually be considered isolated, but which were once connected to other parts that were responsible for determining their (“nonrandom”) effective initial conditions:

Most textbooks that cover the Second Law use one of the formulations that we’ve already discussed. But there is one more formulation that also sometimes appears, usually associated with the name “Carathéodory” or the term “axiomatic thermodynamics”.

Back in the first decade of the twentieth century—particularly in the circle around David Hilbert (1862–1943)—there was a lot of enthusiasm for axiomatizing things, including physics. And in 1908 the mathematician Constantin Carathéodory (1873–1950) suggested an axiomatization of thermodynamics. His essential idea—that he developed further in the 1920s—was to consider something like Gibbs’s phase fluid and then roughly to assert that it gets (in some measure-theoretic sense) “so mixed up” that there aren’t “experimentally doable” transformations that can unmix it. Or, in his original formulation:

In any arbitrary neighborhood of an arbitrarily given initial point there is a state that cannot be arbitrarily approximated by adiabatic changes of state.

There wasn’t much pickup of this approach—though Max Born (1882–1970) supported it, Max Planck dismissed it, and in 1939 S. Chandrasekhar (1910–1995) based his exposition of stellar structure on it. But in various forms, the approach did make it into a few textbooks. An example is Brian Pippard’s (1920–2008) otherwise rather practical 1957 The Elements of Classical Thermodynamics:

Yet another (loosely related) approach is the “postulatory formulation” on which Herbert Callen’s (1919–1993) 1959 textbook Thermodynamics is based:

In effect this is now “assuming the result” of the Second Law:

Though in an appendix he rather tautologically states:

So what about other textbooks? A famous set are Richard Feynman’s (1918–1988) 1963 Lectures on Physics. Feynman starts his discussion of the Second Law quite carefully, describing it as a “hypothesis”:

Feynman says he’s not going to go very far into thermodynamics, though quotes (and criticizes) Clausius’s statements:

But then he launches into a whole chapter on “Ratchet and pawl”:

His goal, he explains, is to analyze a device (similar to what Marian Smoluchowski had considered in 1912) that one might think by its one-way ratchet action would be able to “harvest random heat” and violate the Second Law. But after a few pages of analysis he claims that, no, if the system is in equilibrium, thermal fluctuations will prevent systematic “one-way” mechanical work from being achieved, so that the Second Law is saved.

But now he applies this to Maxwell’s demon, claiming that the same basic argument shows that the demon can’t work:

But what about reversibility? Feynman first discusses what amounts to Boltzmann’s fluctuation idea:

But then he opts instead for the argument that for some reason—then unknown—the universe started in a “low-entropy” state, and has been “running down” ever since:

By the beginning of the 1960s an immense number of books had appeared that discussed the Second Law. Some were based on macroscopic thermodynamics, some on kinetic theory and some on statistical mechanics. In all three of these cases there was elegant mathematical theory to be described, even if it never really addressed the ultimate origin of the Second Law.

But by the early 1960s there was something new on the scene: computer simulation. And in 1965 that formed the core of Fred Reif’s (1927–2019) textbook Statistical Physics:

In a sense the book is an exploration of what simulated hard sphere gases do—as analyzed using ideas from statistical mechanics. (The simulations had computational limitations, but they could go far enough to meaningfully see most of the basic phenomena of statistical mechanics.)

Even the front and back covers of the book provide a bold statement of both reversibility and the kind of randomization that’s at the heart of the Second Law:

But inside the book the formal concept of entropy doesn’t appear until page 147—where it’s defined very concretely in terms of states one can explicitly enumerate:

And finally, on page 283—after all necessary definitions have been built up—there’s a rather prosaic statement of the Second Law, almost as a technical footnote:

Looking though many textbooks of thermodynamics and statistical mechanics it’s striking how singular Reif’s “show-don’t-tell” computer-simulation approach is. And, as I describe in detail elsewhere, for me personally it has a particular significance, because this is the book that in 1972, at the age of 12, launched me on what has now been a 50-year journey to understand the Second Law and its origins.

When the first textbooks that described the Second Law were published nearly a century and a half ago they often (though even then not always) expressed uncertainty about the Second Law and just how it was supposed to work. But it wasn’t long before the vast majority of books either just “assumed the Second Law” and got on with whatever they wanted to apply it to, or tried to suggest that the Second Law had been established from underlying principles, but that it was a sophisticated story that was “out of the scope of this book” but to be found elsewhere. And so it was that a strong sense emerged that the Second Law was something whose ultimate character and origins the typical working scientist didn’t need to question—and should just believe (and protect) as part of the standard canon of science.

So Where Does This Leave the Second Law?

The Second Law is now more than 150 years old. But—at least until now—I think it’s fair to say that the fundamental ideas used to discuss it haven’t materially changed in more than a century. There’s a lot that’s been written about the Second Law. But it’s always tended to follow lines of development already defined over a century ago—and mostly those from Clausius, or Boltzmann, or Gibbs.

Looking at word clouds of titles of the thousands of publications about the Second Law over the decades we see just a few trends, like the appearance of the “generalized Second Law” in the 1990s relating to black holes:

But with all this activity why hasn’t more been worked out about the Second Law? How come after all this time we still don’t really even understand with clarity the correspondence between the Clausius, Boltzmann and Gibbs approaches—or how their respective definitions of “entropy” are ultimately related?

In the end, I think the answer is that it needs a new paradigm—that, yes, is fundamentally based on computation and on ideas like computational irreducibility. A little more than a century ago—with people still actively arguing about what Boltzmann was saying—I don’t think anyone would have been too surprised to find out that to make progress would need a new way of looking at things. (After all, just a few years earlier Boltzmann and Gibbs had needed to bring in the new idea of using probability theory.)

But as we discussed, by the beginning of the twentieth century—with other areas of physics heating up—interest in the Second Law was waning. And even with many questions unresolved people moved on. And soon several academic generations had passed. And as is typical in the history of science, by that point nobody was questioning the foundations anymore. In the particular case of the Second Law there was some sense that the uncertainties had to do with the assumption of the existence of molecules, which had by then been established. But more important, I think, was just the passage of “academic time” and the fact that what might once have been a matter of discussion had now just become a statement in the textbooks—that future academic generations should learn and didn’t need to question.

One of the unusual features of the Second Law is that at the time it passed into the “standard canon of science” it was still rife with controversy. How did those different approaches relate? What about those “mathematical objections”? What about the thought experiments that seemed to suggest exceptions? It wasn’t that these issues were resolved. It was just that after enough time had passed people came to assume that “somehow that must have all been worked out ages ago”.

And it wasn’t that there was really any pressure to investigate foundational issues. The Second Law—particularly in its implications for thermal equilibrium—seemed to work just fine in all its standard applications. And it even seemed to work in new domains like black holes. Yes, there was always a desire to extend it. But the difficulties encountered in trying to do so didn’t seem in any obvious way related to issues about its foundations.

Of course, there were always a few people who kept wondering about the Second Law. And indeed I’ve been surprised at how much of a Who’s Who of twentieth-century physics this seems to have included. But while many well-known physicists seem to have privately thought about the foundations of the Second Law they managed to make remarkably little progress—and as a result left very few visible records of their efforts.

But—as is so often the case—the issue, I believe, is that a fundamentally new paradigm was needed in order to make real progress. When the “standard canon” of the Second Law was formed in the latter part of the nineteenth century, calculus was the primary tool for physics—with probability theory a newfangled addition introduced specifically for studying the Second Law. And from that time it would be many decades before even the beginnings of the computational paradigm began to emerge, and nearly a century before phenomena like computational irreducibility were finally discovered. Had the sequence been different I have no doubt that what I have now been able to understand about the Second Law would have been worked out by the likes of Boltzmann, Maxwell and Kelvin.

But as it is, we’ve had to wait more than a century to get to this point. And having now studied the history of the Second Law—and seen the tangled manner in which it developed—I believe that we can now be confident that we have indeed successfully been able to resolve many of the core issues and mysteries that have plagued the Second Law and its foundations over the course of nearly 150 years.

Note

Almost all of what I say here is based on my reading of primary literature, assisted by modern tools and by my latest understanding of the Second Law. About some of what I discuss, there is—sometimes quite extensive—existing scholarship; some references are given in the bibliography.

Stephen Wolfram (2023), "How Did We Get Here? The Tangled History of the Second Law of Thermodynamics," Stephen Wolfram Writings. writings.stephenwolfram.com/2023/01/how-did-we-get-here-the-tangled-history-of-the-second-law-of-thermodynamics.

Text

CMS

Wolfram, Stephen. "How Did We Get Here? The Tangled History of the Second Law of Thermodynamics." Stephen Wolfram Writings. January 31, 2023. writings.stephenwolfram.com/2023/01/how-did-we-get-here-the-tangled-history-of-the-second-law-of-thermodynamics.

APA

Wolfram, S. (2023, January 31). How did we get here? The tangled history of the second law of thermodynamics. Stephen Wolfram Writings. writings.stephenwolfram.com/2023/01/how-did-we-get-here-the-tangled-history-of-the-second-law-of-thermodynamics.

Posted in: Historical Perspectives, Physics

Name (required)

Email (will not be published; required)

Please enter your name.

Website

2 comments

Brilliant analysis of the foundations of the Second Law, but would have liked to have had more on the relation in contribution the Second Law can make to some contemporary concerns ie. Causal completeness and by extension determinism.

Dan James

February 1, 2023 at 3:51 am
Missing the discovery of the fluctuations theorems in the late 1990’s and early 2000s that put the second law on its firmest mechanical foundations.

David Limmer

February 2, 2023 at 9:14 am