COMPLEXITY

David Wolpert & Farita Tasnim on The Thermodynamics of Communication

Episode Notes

Communication is a physical process. It’s common sense that sending and receiving intelligible messages takes work…but how much work? The question of the relationship between energy, information, and matter is one of the deepest known to science. There appear to be limits to the rate at which communication between two systems can happen…but the search for a fundamental relationship between speed, error, and energy (among other things) promises insights far deeper than merely whether we can keep making faster internet devices. Strap in (and consider slowing down) for a broad and deep discussion on the bounds within which our entire universe must play…

Welcome to COMPLEXITY, the official podcast of the Santa Fe Institute. I’m your host, Michael Garfield, and every other week we’ll bring you with us for far-ranging conversations with our worldwide network of rigorous researchers developing new frameworks to explain the deepest mysteries of the universe.

This week we speak with SFI Professor David Wolpert and MIT Physics PhD student Farita Tasnim, who have worked together over the last year on pioneering research into the nonlinear dynamics of communication channels. In this episode, we explore the history and ongoing evolution of information theory and coding theory, what the field of stochastic thermodynamics has to do with limits to human knowledge, and the role of noise in collective intelligence.

Be sure to check out our extensive show notes with links to all our references at complexity.simplecast.com. If you value our research and communication efforts, please subscribe, rate and review us at Apple Podcasts or Spotify, and consider making a donation — or finding other ways to engage with us, including a handful of open postdoctoral fellowships — at santafe.edu/engage.

Lastly, this weekend — October 22nd & 23rd — is the return of our InterPlanetary Festival! Join our YouTube livestream for two full days of panel discussions, keynotes, and bleeding edge multimedia performances focusing space exploration through the lens of complex systems science. The fun begins at 11 A.M. Mountain Time on Saturday and ends 6 P.M. Mountain Time on Sunday. Everything will be recorded and archived at the stream link in case you can’t tune in for the live event. Learn more at interplanetaryfest.org…

Thank you for listening!

Join our Facebook discussion group to meet like minds and talk about each episode.

Podcast theme music by Mitch Mignano.

Referenced in this episode:

Nonlinear thermodynamics of communication channels
by Farita Tasnim and David Wolpert (forthcoming at arXiv.org)

Heterogeneity and Efficiency in the Brain
by Vijay Balasubramanian

Noisy Deductive Reasoning: How Humans Construct Math, and How Math Constructs Universes
by David Wolpert & David Kinney

Stochastic Mathematical Systems
by David Wolpert & David Kinney

Twenty-five years of nanoscale thermodynamics
by Chase P. Broedersz & Pierre Ronceray

Ten Questions about The Hard Limits of Human Intelligence
by David Wolpert

What can we know about that which we cannot even imagine?
by David Wolpert

Communication consumes 35 times more energy than computation in the human cortex, but both costs are needed to predict synapse number
by William Levy & Victoria Calvert

An exchange of letters on the role of noise in collective intelligence
by Daniel Kahneman, David Krakauer, Olivier Sibony, Cass Sunstein, David Wolpert

When Slower Is Faster
by Carlos Gershenson & Dirk Helbing
Additional Resources:

The stochastic thermodynamics of computation
by David Wolpert

Elements of Information Theory, Second Edition (textbook)
by Thomas Cover & Joy Thomas

Computational Complexity: A Modern Approach (textbook)
by Sanjeev Arora & Boaz Barak

An Introduction to Kolmogorov Complexity and Its Applications (textbook)
by Ming Li & Paul Vitányi

Episode Transcription

Farita Tasnim (0s): It's just all throughout the literature that such systems, the highest energy cost comes from the communication that is required between these components. The fact that these components have limited access to information, not only from their environment but also from each other, we want to try to clarify that.

David Wolpert (22s): Information theory as that term is commonly used in all of the sciences frankly, outside of engineering is the introductory chapter for the freshman textbook. It is not Shannon coding theory. It's not about codebooks, it's not really about information capacity. It's got nothing to do with the full-on richness. People will love to talk about Kullback-Leibler divergence, mutual information, entropy. You'll see it all over the place. That ain't information theory folks any more than something like being able to solve a linear algebra equation is calculus.

Michael Garfield (1m 22s): Communication is a physical process. It's common sense that sending and receiving intelligible messages takes work. But how much work? The question of the relationship between energy, information and matter is one of the deepest known to science. There appear to be limits to the rate at which communication between two systems can happen, but the search for a fundamental relationship between speed, error and energy among other things, promises insights far deeper than merely whether we can keep making faster internet devices. Strap in and consider slowing down for a broad and deep discussion on the bounds within which our entire universe must play.

Welcome to Complexity, the official podcast of the Santa Fe Institute. I'm your host, Michael Garfield, and every other week we'll bring you with us for far ranging conversations with our worldwide network of rigorous researchers developing new frameworks to explain the deepest mysteries of the universe. This week we speak with SFI Professor David Wolpert and MIT Physics PhD student Farita Tasnim, who have worked together over the last year on pioneering research into the non-linear dynamics of communication channels.

In this episode, we explore the history and ongoing evolution of information theory and coding theory, what the field of stochastic thermodynamics has to do with limits to human knowledge and the role of noise in collective intelligence. Be sure to check out our extensive show notes with links to all of our references and more at complexity.simplecast.com. If you value our research and communication efforts, please subscribe, rate and review us at Apple Podcasts or Spotify and consider making a donation or finding other ways to engage with us, including a handful of open post-doctoral fellowships at santafe.edu/engage.

Lastly, this weekend, October 22nd and 23rd is the return of our Interplanetary Festival. Join our YouTube livestream for two full days of panel discussions, keynotes and bleeding edge multimedia performances focusing space exploration through the lens of complex systems science. The fun begins at 11:00 AM Mountain Time on Saturday and ends at 6:00 PM Mountain Time on Sunday. Everything will be recorded and archived at the streaming link. In case you can't tune in for the live event.

Learn more@interplanetaryfest.org. Thank you for listening, right, David Wolpert, Farita Tasnim. It is a pleasure to have you both on Complexity podcast.

Farita Tasnim (4m 6s): Thanks for having us.

Michael Garfield (4m 8s): So David, we've had you on the show before. Farita, we have not. I imagine we've got a lot of Wolpert fans in the audience, so I'm especially excited to introduce Yeah, you, because I met you Farita through the graces of your collaboration with David that we're gonna talk about today. So if we can, I'd just like to have you introduce yourself and give a bit of a background into your work as a scientist and then what brought you up to SFI?

Farita Tasnim (4m 39s): Yeah, so I'm a grad student at MIT actually with a background in electrical engineering and did my master's in biomedical engineering before I realized that I want to be spending more time trying to understand the principles by which living systems can exist and maintain themselves and kind of got introduced to SFI randomly through one of the education initiatives here, the complexity interactive, and then got introduced to David Wolpert when someone noticed that I was asking lots of questions about non-equilibrium statistical physics and started working with David soon thereafter around the end of 2020 and now I'm doing my PhD with David and studying essentially what are the features of the organization, of the components of a living system that allow them to perform kind of insane computations, predicting the future, et cetera.

So, and adjusting their actions accordingly. And yeah, so that's what, we'll talk some about that today

Michael Garfield (5m 51s): For sure. And before we get to the, the paper that I want to discuss, I you, you also sent me this piece by Vijay Balasubramanian who's also with us here at the Institute and he's got this, this paper Heterogeneity and Efficiency in the Brain. I just wanna read a short piece from the introduction here because I feel like for those that are not familiar with this particular domain of questions, this one kind of sums it up really well.

He says, “energetically the brain is the most expensive tissue in the body. It is 2% of body weight, but 20% of metabolic load more expensive per gram than muscle when you are working out suggesting there will be evolutionary pressure toward computational efficiency. On the other hand, the brain consumes a mere 12 to 20 watts of power about the same as a refrigerator light bulb and uses this to nearly beat supercomputers at chess, produce, art and music store memories of a lifetime, experience, emotions like love and anger, learn from experience and build skyscrapers and nano scale devices alike.”

How does it manage to do all this on such a meager budget? Okay, so having set the stage with that profound question, I'd like to ask you David, cuz you've done quite a bit of work in this area. We didn't talk about it much when we had you on the show last, but you've got quite a bit of research that we'll link to and you've been given this years of thought already. So if you'd be so kind, I'd love to hear you talk a little bit about the history of this area, the thermodynamics of computation.

It's certainly something that extends beyond the brain to all kinds of technological areas, but yeah, how did you get into this and why and how much have things changed since you first started pursuing this research?

David Wolpert (7m 53s): Okay, that's a tall order. So there's another aspect to thermodynamics of computation. It's viewed and by many physicists as an actually far more fundamental, profound part of the fabric of reality than merely how biological systems operate and how we might engineer devices and such practical trivialities. There's this famous phrase from John Wheeler back in the mid 20th century “It from bit” the notion that somehow all of reality was nothing more than a manifestation of in some non-trivial sense information processing. Through the 20th century that theme was picked up people, Dave Deutsch following up on Richard Feynman and so on and so forth, start talking about quantum computation, introducing the notion. Interestingly, Dave's, one of his motivations was to try to actually give some kind of formal rigor to the notion of the physical touring thesis that nothing in the physical world can actually perform a computation that's more powerful than the touring machine.

But in any case, it seems that in many regards, that particular focus of physicists has been most successful when they've been thinking about statistical physics because statistical physics compared to conventional physics, it's all about ignorance of the human being, not about nothing intrinsic to the system like in quantum mechanics or anything like that, but rather it's to the experimentalists, what do they know and what can they actually access?

There's some deep aspects of statistical physics that really serve to emphasize that you might have information about a system which would have profound thermodynamic benefits if you only had a device on your tabletop experiment that could actually exploit that information. You might know things about this famous one called the Gibbs Paradox. You might know that in a container there's a whole bunch of red particles on one side and blue particles on the other side.

If you had a filter that could be selective and only allowed red particles through, you could exploit that information to gain power, to gain energy that you could then use for other things. But if you have that information and don't have such a filter, it's as though you didn't have it in the first place. So there's this really deep notion of statistical physics that's not just partly it reflects your ignorance, but going even further, it also reflects not just constraints on what you know, but constraints on what you can actually do, how you can access the real world.

So anyway, from a high level, it's because of that where those aspects of statistical physics that it's not too surprising that the research program of trying to find the relationship between information processing and physics has born the most fruit in that particular domain. Even so, though it's been an extraordinarily challenging problem to make progress in. In the 20th century, people only had access to equilibrium statistical physics and some small variants where you were either close to equilibrium or you were stuck in a local equilibrium.

They did some amazing things with us. Parisi won his Nobel prize couple of years ago in large measure for his work on spin glass. A spin glass is a system that is frozen in a local equilibrium. It's not the global thermal equilibrium but's the local equilibrium. So a lot of the techniques of conventional equilibrium statistical physics, can allow you to at least formulate the mathematical issues and it's making progress on those issues that Parisi his star really shown.

But if you think about any real computational system, you are extraordinarily far and I mean you as in you physically the listener as well as the little device you're listening to this podcast through and so on and so forth, you are extraordinarily far from anything that's in approximation of an equilibrium. You are also not only changing slowly, which is another requirement of equilibrium statistical physics, that any changes are actually are called quasi ecstatic. They're infinite testing, slow changes, you're changing damn fast and so is your little device that you're listening to this through.

So what all that suggests strongly is that they did not have the tools in the 20th century to actually properly address these issues. They had some great intuitive insights, but if you look at the actual physics equations, which one might have hoped exist, that proved the precise details of when these intuitive insights hold what they mean more precisely and formally and all the kinds of structure and richness that gives you a feeling of being on top of things.

If you do stuff like use quantum physics to analyze the hydrogen atom or something like that, it wasn't there. And there's, as a result, not surprisingly, there was controversy. It bled over into the philosophy, literature and so on so forth. So that's where things pretty much stayed at the end of the 20th century and people sort of moved on because they didn't see how to make any progress beyond that. Okay, now let's look to the 21st century da, da da fanfare of trumpets.

Other physicists came up with these brilliant insights that now forms a field that's sometimes called sarcastic thermodynamics. They basically realized that there are ways to modify statistical physics to allow it to apply arbitrarily far away from thermal equilibrium with systems that are changing arbitrarily quickly. This is a still exploding field. There was a great perspective in nature on this particular topic about a year ago, and it's getting many, many new results.

It's still a newborn, fresh, wet behind the ears and so on and so forth. But I have to say, looking at it from the outside, it's somewhat of a standard thing among all science in that it's siloed. These people have constructed these amazingly powerful tools that can at least theoretically, so to speak, apply to everything from the evolution of ecosystems to financial markets, nevermind to physical systems where energy is one of the big issues, which was its original birthplace, but they're all focused on things that are important but nonetheless things like small biochemical issues, ligand detectors and cell walls and what's going on with operating some of the machinery that makes ATP and some experiments with quantum dots and so on. Very important stuff, front cover of Nature and so on and so forth some several times. But gee, it's very, let's just say non SFI, it's small board. They're coming up with revolutionary science and they are doing it within the mindset of conventional siloed scientists.

So I was frying some different fish so to speak when I stumbled across this work and I said, Holy expletive deleted. We can now apply this to actually properly start to investigate the non-equilibrium far from static statistical physics properties of real computers of not these kind of semi-formal word arguments kinds of things that people were restricted to before. But now we can go after the whole enchilada, the real McCoy or at least start going down that path.

We can, hey there is the cliff. It's just over there. We were going straight, not getting anywhere, sort of bumping into some rocks and things we wanted to go flying there. There's a cliff off the left off to the left. Let's all jump. And that's basically how I got into this, jumping.

Michael Garfield (16m 18s): Awesome. Well so let's jump in then. There's this, this piece, Non-equilibrium Thermodynamics of Communication Channels. Farita, since you're the first author on this paper, why don't you take us from David's context here and tell us a little bit about, this is gonna be the, explain it to me like I'm five section of the podcast. This piece is building on work by Claude Shannon and specifically, you know, it focuses on this quantity, the channel capacity.

So that's, that seems like a good place to start if you could by laying down some of the, just like the core concepts for this paper before we actually drive into the specifics of the way that the two of you are formalizing all of this and the conclusions that you come to.

Farita Tasnim (17m 12s): Yeah, so maybe starting with some motivation, so why are we studying communication channels given everything that David said? We do want to understand how these living systems and that's a flexible definition, anything from the brain to modern digital computers to, you know, you could even consider an ecosystem or a financial market potentially as some living system. The key is that all of these dynamic far from equilibrium systems, they have limited information about the things that exist in their environment, the things that they're interacting with and within themselves.

The components that make up these systems have limited information about each other and therefore they have to compute some function, they have to be learning something, right? And if we consider that oftentimes these such systems have multiple components. These multiple components are often all the time continuously shuttling information amongst each other in order to compute whatever function it is they're computing. So if that's the case, well if we look into the literature also, it's just all throughout the literature that such systems, the highest energy cost comes from the communication that is required between these components.

The fact that these components have limited access to information, not only from their environment but also from each other, we want to try to clarify that . We see that in the brain. There's a paper I think in PNAS that came out from Levy and Calvert, you know, with a title like communication consumes 35 times more energy than computation in the human brain. And of course, you know, we can argue about semantics and specifics of what is communication and computation, but it is, for example, phenomenologically has also been observed in the creation of digital computers through like rents rule for example, energetic cost scaling with the a number of wiring links in a digital computer and in like a CPU and a chip scaling with a number of wiring connections in that computing unit. And this is also one of the main drivers behind the entire field of neuromorphic computing, which is growing hugely. So in traditional computing architectures you have what's called von Neumann computing where you have your CPU separated from your memory.

Whereas what neuromorphic computing is trying to do is to reduce the communication that's required between these two units to have some memory be at the location of the computing devices themselves. So that reduces the overall amount of communication required. So we have noticed that in all such computing systems communication is extremely costly and yet we don't have a proper understanding of why that is the case or what is the origin of this communication cost and also what are the details of it, what are the subtle features of communication, how does energy scale with the amount of desired communication in a channel.

So I would say we kind of touch on Shannon's work. So Claude Shannon, he started the entire field of communication theory essentially in his Master's thesis in which he posited and at least half proved informally proved this noisy channel coding theorem. And what that says is if you have, so a communication channel consists usually of some source of information and then that source of information gets transformed somehow through some encoding, some code book which basically sequences of symbols that come from the source and bid sequences to some code word and that defines a code book.

And so that goes through this input and the input then transfers that information through a noisy communication channel to an output. And what that means is the output will sometimes have errors in what it receives if the input is trying to send a one, the output receives sometimes with some probability a zero instead of a one if we're talking binary alphabet. And then there's some decoding function which the output takes its n bit strings and then converts it back to what it thinks the message, the original message from the information source was.

And perhaps there's some feedback which then can adjust the next sequence of inputs. So what Shannon proved essentially is that for such a communication channel you can send information at some maximum rate with negligible error and that rate is called the channel capacity. So even with noise you can send at some maximum rate with negligible error even if there is noise.

And that is you know, kind of striking. You would assume that you have to, you would have to always accept some amount of error no matter what if you have a noisy channel. But he proved otherwise and he proved that there's just this maximum speed of information transfer across the communication channel and then further went on to prove that this maximum rate of information transfer is equal to the mutual information between the input and the maximum mutual information between the input and the output as you vary the probability of seeing different symbols at the inputs, the probability distribution at the input, it's equivalent to that.

And what that means is actually the implications are pretty profound that there exists a code book, there exists in fact infinitely many code books that can achieve this channel capacity. However, in practice it has been found as extremely difficult for humans to come up with such a code book but they exist but at least because it is a non-constructive proof that Shannon did. It's not that oh look here's this code book that gives you this channel capacity.

He just shows that it exists. We don't quite know how to get there. But what it is useful for is that if you have an engineer that's designing some communication channel and they design it to, they can calculate perhaps what the channel capacity is, assuming they can do so, which itself can be difficult, if they can design their system to get to about 90, 95% of that channel capacity, they know that they're probably not gonna be able to do much better than that and so they can just move on after that. And so what we are now doing is what we have done so far in this work is we've uncovered this non-trivial structure that in certain setups for communication channels we wanna know, well what is the energetic cost actually what is the energetic cost of transmitting information down the channel certain rate. And what we have found is something quite interesting that, and I won't go into it quite yet, I'll let you perhaps ask some more questions to lead us in there. But overall what's interesting is that it does imply that there are setups for communication channels. There is a way to perhaps tune the noise in the communication channel to achieve a minimum of heat dissipation of in our terms entropy production.

So yeah, I can just go from there.

Michael Garfield (25m 0s): Sure, thank you. Okay so David, I'm curious cuz like two things come up for me listening to all of that and these are tangents maybe, but I'm curious whether you think they actually are or whether this is all pointing to something really, really fundamental. One is that what you're talking about here kind of reminds me of the way that rivers don't run straight. There seems like a link to this other thing around turbulence formation and the way that rivers they bend because you get vertical flows in water and so the minimization of friction and the roll of noise in systems that we don't typically think of as computational like a river delta is connected to this through what your work is saying about the structure of systems and the way that they produce entropy and dissipate things.

So that's a new question, but I'm curious, that's one thing it would be interesting to hear you speak to. And then the other thing is David, you and David Kenny published this pre-print for stochastic mathematical systems and it strikes me that this question about the code book and there being an infinite number of codebooks that satisfy these requirements, but where are they? Sounds a lot like questions that come up in astrobiology where you say like look, there should be all of these different possible amino acid sequences but we don't see that stuff.

Does it exist? Could it exist? Is this based on simply our lack of evidence or is this pointing to something about the way that the actual is a much, much smaller subset of the possible for like concrete practical reasons? And I don't know there, there seemed like there was an underground tunnel between that kind of a question. And maybe I to asking, maybe I'm trying to bite off too much here, but this question about the way that, you know, David talks about it says, I'm actually thinking of an earlier paper that you wrote with David Kenny about the math that we have and this much larger space of possible mathematics.

So I guess what I'm really asking is do you think that this work can kind of be generalized in either of those two directions because like a river basin is out of equilibrium or because questions about the code book are speaking to this sort of much more general question about constraints that comes up again and again in in complex systems research and you know why it is that we seem to occupy a very narrow strip of the theoretically possible and yeah at this point I've thoroughly distracted people from like the actual meat of this paper, but yeah, am I making any sense at all?

And if, if I'm not, how am I not here?

David Wolpert (28m 3s): Sure is a ton of issues. So trying to channel my inner Mike Garfield, What I think that you may have been I'm responding to, I don't know that the papers that I've done with David Kenny per se, there's a eon essay that I recently wrote and especially an archive which is a more technical version of it, of an archive pre-print, which David Kenny was not involved with but which is related. There the emphasis was on the astonishingly constrained restricted formulations of our very thought processes, the things that others celebrate, like for example our communication systems in a broader sense, this is the way that we speak our language, our mathematics, our most important and powerful cognitive prosthesis. Cognitive crutch, the thing that has arguably allowed us to have this fantastic success of monopolizing depending on how you count 75% of the biomass on planet earth and destroying it where no other creature was able to do that.

This all comes from our ability with language. And language has this striking character to it all language including mathematics that every single set of statements in any human tongue is a finite set of finite length strings of symbols. When you read a book, that's what you're getting. And those symbols themselves are all chosen from a finite set, what's called the alphabet. So every sentence in a book is a string of a finite length of every element in that string is an element of a finite set name, the alphabet and the book as a whole is only a finite number of these sentences strung one after the other.

And finites are finites are finites. Many people revel in the fact that we can use these to construct mathematics and arguably one might say even artistic constructions which have abilities like self-reference and oh isn't this amazing that we just happen to have the minimal necessary to understand the fabric of reality is the way because we can have some way of capturing self-reference, they presume that that must be the fabric of reality. I react against to that same fact that everything is finite strings of finite symbol sets very differently at how astonishingly narrow it is.

You can imagine instead people have played with what are called infinitary logic where you've got actually infinite strings. But even then the alphabets typically not always are finite and it's still what's called countable infinity. It's very, very different from the vortices in that river flow talking about which is the real numbers is also all deterministic, all standard mathematics is deterministic. If A is true, if it's also true that A implies B, then B follows modus ponens.

That is assumed to be the case in mathematics, a hundred percent probability always in every time that the laws of math have no randomness in them. But well just perhaps they actually do. And if we were to allow the possibility how that might that expand our minds so to speak and if nothing else illustrate how minuscule our conceptions at present and perhaps always will be formulated as they are in terms of our simple language structures.

So I think that's what you are reacting to. And there's also a similar issue then with to alluding to with astrobiology, which are these very strange phenomena that there seem to be a lot of metabolic pathways that could have been exploited in life. Even simple things like other amino acids and people do not know why they are not ever exploited. Why did natural selection not come across them? Because it seems like they're quite accessible and try to take advantage of them.

And the jury is out on this. It's a, it's only recent that people have even realized this, and this isn't my work, but it is other work going on at the SFI. Most likely it's because there are some problems that we're not currently sensitized to with those other approaches that as we explore the more we will find those, but it's not really known. Now both of those bodies of research of unexploited metabolic pathways and the limitations of human language and therefore of human conception of reality, both of those are so to speak, science rather than engineering is we're stumbling around in the room that's already been made and we are figuring out some of its features and we're figuring out there's a whole bunch of things that could have been here. There could have been a whole bunch of other furniture, why are we not seeing it? Or hey, we're just looking around the ground floor. Might there be actually some levels up above other rooms in this particular dwelling? Things like that. It's already there.

It's what we are confronted with when we open our eyes on birthday plus one. When we were born you open your eyes, this is what you got folks and what's going on there. That's so to speak science. Communication theory is a body of math that can perhaps provide explanations for all that. But it's about engineering. If you wanted to construct things now, what would, as opposed to just trying to find them, your goal now is to try to construct them. In some cases your challenges in trying to construct them will provide you insights into what it is that you do find out there.

To give a very simple example that sort of ties back to the theme of this podcast, thermodynamics grew out of engineering where people were trying to build steam engines to drive trains across tracks to power mills and things like this. And out of that grew thermodynamics and things like the second law and the impossibility of perpetual motion machines and so on so forth. And that has now provided an amazingly powerful tool with which people can actually, or a flashlight, I should suppose I should say, by which people can use to look around this room that we find ourselves in because we, these are universal engineering constraints.

We know that they must apply to whatever it was that constructed all these other pieces of furniture to really keep abusing this metaphor passed its expiration date. So it's in that vein, that communication theory and in particular potentially the thermodynamic aspects of it becomes so important for us to be able to understand this room. It's very important for us to be able to design more efficient devices and that's what all the original motivation was.

But especially if there are what appeared to be these deep fundamental restrictions and what the physical thermodynamic implications might be, that then would be consequential in a way that goes far beyond your iPhone 25. What are gonna be the limits on what his capabilities are in terms of getting by without being recharged or something like that. Instead the fundamental nature of communication that grew out of engineering and its thermodynamic aspects might actually be able to tell us things like, well it's not just an iPhone, it's rather any computational entity anywhere in all of this physical universe.

So it might even, who knows has something to say about metabolic pathways and so and so forth. So that's the 40,000 feet as we're coming down to the Mike Garfield Regional Airport. But then at the last minute we see that, oh it looks like the ground control tower there at the airport is having some problems. So we're just gonna divert to this near being airport, which is a little bit closer to the original theme of the podcast.

So Claude Shannon, which like a bunch of other 20th century people, I sometimes happen in just hazard the guests that they must've been aliens cause their minds are just so far beyond what I can imagine doing. But he sat down and said, okay, I've got a what's called a noisy communication channel. What in the world does that even mean going back to that five-year-old, it's a wire, it's a couple wire or it's just the ether that we're sending some radio waves across.

It's something that's between me and you, between my mouth and your ears. There's a communication channel in that case it's the air. There's noise in that communication channel. In this case it's the very, very slight sounds of traffic coming through the window and so and so forth. What that noise means is that what's gonna be hitting your ear with some probability is not the same mellifluous, beautiful well-chosen prose that was omitted from my particular mouth. Okay?

So in that communication channel, the very, very fortunate air that is right next to my mouth, that's the input and the rather little bit less jazzed air that's next to your ears. That's the output of the communication channel. Your brain it's what's called a decoder in my brain. That's the end coder. So here's the way it works. This is precisely the way Shannon thought about things by the way. So I come up in my head for my nefarious reasons, a message that I want to get into your consciousness.

The only thing I've got is this noisy channel. So what do I do? What I do is I use a code book, I take that message I want to convey and I translate it into a string. There's this finite length strings again, just like in all of our language, that string can be very, very long. It can be a real long winded soliloquy like this particular monologue that I've been using to try to convey to you the depths of my brilliance and insight during this podcast as an example.

And so Shannon realized, well hey, if we make that string with which David is trying to convey this message to Mike be long enough and there's enough redundancy, then we can actually lower the effective probability of error that yes instantaneously, so to speak there might be a substantial probability of error in Mike's ear not getting what my mouth sped out into the ether. But if instead Mike gets to see David be saying a whole long list of things now it might be that there's actually a greater probability of him not being completely mistaken, of him not thinking David said zero when David said one, it's gonna be much more accurate.

So Shannon said, Okay, let's take it to the limit and try to say that this code word it's called in the vernacular, it's the translation of a code word into a long string of symbols. Let it get very, very long. If it gets too long, then the rate at which David's actually gonna be conveying messages full to the floor. Sure, yeah, he got one message across with almost no probably of error, but it took him the whole damn podcast to do it. This is not very efficient. Let's see if there's actually, what is the greatest rate?

If we want him to be able to get the messages across with vanishingly low probability of error through some code book, through some language, what's the greatest rate he would be able to do so given the fact that well he's gotta go through that bottleneck of that noisy channel. And this is the problem that Shannon posed to himself. I would like to think that it almost broke him to figure out the answer, but probably now he figured out while he was playing with his rubber duckies in the bathtub one evening most likely. But basically what he showed was I cannot actually construct a code book, but I can prove that there's an infinite number of codebooks.

In fact, almost every random codebook, if it's long enough, will have properties that I can exploit to actually come up with a formula for this thing called the channel capacity, which is a function of the noise in the channel. And that channel capacity is the maximal rate. If you were to use one of these codebooks, which I called Shannon and have now proven exists, even though I can't quite give you one of them, well if you were to use one of these then you'll be able to actually get the messages across at the rate that gets all the way up to the channel capacity.

And so that's the best you can do. And even though that was an engineering solution that applies everywhere in reality. And so therefore it's actually a very, very powerful flashlight we can use when we're looking around this room. That is the amazing thing to this day, people do not have general purpose codebooks that actually can achieve in broad set situations challenge channel coding the of the capacity. But we know for this reason the capacity is the crucial feature.

And so in this project with Farita, what we're trying to say is, okay, if you'll notice everything that David was just talking about there in that Athena out of my head in one fell swoop, it's a chapter of an amazing textbook that in all of that dribble there wasn't any mention of physics, it was all abstract in math land. Well if we now try to instantiate that in terms of physical systems, does that start to then tell us, well why is it perhaps that communication among the components of computers is so absurdly expensive, thermodynamically. Basically what is going on underneath those fascinating comments that Vijay was putting forth there in his particular article?

And that's what we're trying to do.

Michael Garfield (42m 34s): Awesome. So again, maybe I'm abusing this analogy, but again, if you're thinking about liberalizing communication channels and channels through which other things flow, you know talking about, no, you don't wanna do that. Okay, cuz

David Wolpert (42m 53s): Looks the same, but it's a completely different creature.

Michael Garfield (42m 56s): Okay? Cause when you're talking about increasing temperature and increasing signal rate, it does seem like whether the water actually stays.

David Wolpert (43m 5s): The water is not actually being used as a communication channel. Nobody's trying to create vortices of a certain sort and send them down it to then be decoded on the other end. It's rather chaotic processes on their own that are just doing their own thing. So it's got noise and it's actually got a directionality to it, but there's nothing about it where there's really a transmitter and a receiver and any sense of an optimized code book that's going on.

So it's not at the level that which this particular project is trying to investigate things.

Michael Garfield (43m 43s): Okay, well then let's land at the right airport. I feel like I'm gonna still be banging my head against the wall with this after we wrap, but we can not waste people's time. So Farita, tell me a little bit about the way that you are formalizing this, like the constraints for instance, how coupling between input and output is non reciprocal. Input is set exogenously external to the system cetera. And then from the model building, now we can finally get into your results and I'd love to, yeah, so if you can kind of unpack us as we enter the house there, that'd be great.

Farita Tasnim (44m 23s): Yeah, sure. So to hit on some of the points of the setup, we can start with the setup of the model of the communication channel. We tried to be as true to Shannon's original model of a communication channel as possible while also paying heed to what one might expect in real communication channels. So number one, yes the input is set exogenously, What we're actually trying to figure out here is the thermodynamics of copying.

What are the thermodynamic costs of a message appearing accurately at point B when it has been transmitted from point A. So in that sense, so therefore we don't really care how or what set the messages at the input. So in that sense we're kind of, we are black boxing away the information source and rather we're making use of one of the conclusions of Shannon's noisy channel coding theorem, which is that essentially intuitively what it tells you is that the, all of these infinite code books that are very hard for us to find that what they do is at the input right as it's about to be sent to the, through the noisy channel it makes it look as if you are receiving independent samples, identically taken samples from a distribution that maximizes the mutual information between the input and the output. So the input is set exogenously. So we don't care about the details of that. All we do is we set the input to a certain distribution so that the channel can be said to be running at its capacity. And of course it's mutual information. This channel capacity also depends on the amount of noise in the channel.

So that's one part of it. The nonreciprocity between the input and the output is just again, staying true to a communication channel that the output state changes depending on the input state, but the input state it's dynamics is not affected by the output state. That's all that means. And so the way that we set it up is that the noise in the channel could actually come from potentially multiple sources. And what I specifically show in the simulations of the results of the simulations I'll talk about is that let's say you have two independent sources of noise in the channel.

They can be formalized as thermal reservoirs. So they have some temperature, but that temperature corresponds to the amount of noise. So the higher the temperature of a particular noise source, the higher the probability of error, right? So let's say you have two independent sources of noise in this channel so that the output can be affected by either one. It's state transitions can be affected by either one. What we find is if we fix the temperature of one of those reservoirs, one of those sources of noise, and we vary, we sweep the temperature of the other one through some range, we find, well obviously we are changing the channel capacity because that depends on the amount of noise.

But that changes monotonically with the amount of noise that with the temperature in the second source of noise. I'll just call it reservoir from now on in the second reservoir. However, the entropy production or the heat dissipation of the channel of communication in the channel is non-monotonic. So at very low temperatures you have a high entry production, high amount of heat dissipation. And as you increase that temperature, it goes through some minimum.

And then as you increase the temperature further, you again increase the entry production or the heat dissipation. So this non modernity of the entropy production or the heat dissipation coupled with the modernity of the channel capacity with the temperature of the second reservoir, that is a really striking result because that implies that if I want to send information down a channel, then if I wanna minimize the amount of energy I use to do that, I really should create that channel according to this specific level to tune the noise sources so that we can achieve this minimum of energy dissipation.

And then that goes on to imply in the idealized scenario, if you have a bunch of independent channels and you're running them all in parallel, of course you have to be careful in actual engineering, if you were to actually engineer this, how you actually split your information among those channels. But assuming that you can run them all at their capacity, this implies that it would be more efficient to send information at a specific lower rate across multiple channels than to run at a high rate using one channel.

And I mean energetically efficient, thermodynamically efficient and I won't go too much in the details of that because it really has to be worked out very, you know, to a T to actually say something about the, the how to engineer a communication channel or a set of communication channels to achieve that. But you know, this is good because it at least is some intuitive theoretical heading gives us a heading towards what we have observed already in both biological and artificially created communication channels.

For example, they're a brain region or two brain regions don't have just a single neuronal pathway between them. They have a multitude of them, very large multiplicity of neuro pathways between any two given regions in the brain. At the same time, the gold standard in all of our wireless communication systems for cellular service uses multiplexing, which again, take makes use of this multiple channel effect. So yeah, those are the results which we've found and are continuing to try to expand upon and see what are the exact limits and what kind of scenarios can you have like more than two sources of noise, blah, blah blah.

So we're checking all of that now.

Michael Garfield (50m 38s): Awesome. So this Goldilocks level of noise, this seems to fly in the face of, seems like a kind of like a commonly held conviction that in general you want to minimize the amount of noise in systems that you get more efficiency with less noise. And David, this is a topic that came up in an exchange of letters on the roll of noise in collective intelligence that you had with David Krakauer and Daniel KahnemanandOlivier Sibony, Cass Sunstein.

Again, I want to check with you that I'm not making an unreasonable analogical leap here, but it seems like there's a link between there being a Goldilocks level of noise in a communications channel and the implication that you are gonna work with that by multiplexing communication channels and then this other thing which is a bit different, but that collective intelligence benefits in some way from there being noise in the system actually harvests noise.

So I'm curious if you can kind of disambiguate this for me. In what ways are these kind of different things?

David Wolpert (51m 56s): Yeah, so the book by Danny and Cass was fine as far as it goes. It was about how people in human societies tend to worry about biases, which in the statistics sense means that I have a tendency to operate in a certain way other than the what is the average. And that there's this is a huge concern in many, many forms, in much of current sociopolitical discourse I suppose. Their point was that noise can also be very, very problematic. And for example, it's well known that giving judges discretion and things that judges they're human and therefore they're got all kinds of bad characteristics. Something that people actually, I think tend to forget when they try to push away from standardized tests and so on is that the alternative is to have human judges and humans are pretty piss poor at this kind of a thing.

But in any case, so for example, there are judges who just depending on the time of day will have a 50% different conviction rate for essentially the same kind of evidence or give massively different sentences if it's before lunch or just after lunch and all these kinds of things. And that's all forms of noise. And Danny and Cass were making the very legitimate point that people should be more aware of this op-ed writers at all of our and other social media, I suppose, flame wars, that they should be concerned about these things as well.

And David and I, our response is basically, well yeah sure, of course and power to you and this is an important point to make, but far more than baby with bath water. Don't throw the elephant out with the bath water. Noise in general as a concept plays a huge number of other roles. It's a much more highly valence, many different features to the concept of noise and its manifestations than just judges being humans and therefore noisy.

And that's not a good thing when you have people sitting in little judgment of one another. For example, things like Monte Carlo algorithms, going back to some of my previous work, Monte Carlo algorithms of this amazing way of using noise to escape the cursor dimensionality. If I'm doing a search in a D dimensional space in general, the same search problem, it's difficulty grows exponentially with a dimension D In Monte Carlo search instead it actually goes down with the square root of the number of samples you've taken so far in your search process independent of the dimensionality. It circumvents the, the dimension and the cursor dimensionality.

That's why many high dimensional optimization problems are done with Monte Carlo rather than trying to do it by some more kind of an exhaustive search procedure, what's called quadrature. In other examples in computer science theory, it's not actually known yet for sure. For a while though, people were fairly convinced that in terms of the computational complexity theory, that having random algorithms would actually give you a benefit of having deterministic algorithms. So there are these kinds of benefits to having noise in general.

Certainly noise plays a fundamental role in natural selection. If it was all deterministic you wouldn't be able to do it, et cetera, etc. In the case of the communication challenge, the work that Farita’s looking at, it's a more nuanced kind of a thing in that we're talking about different kinds of noise as she very rightly emphasizes. We've gotta be very careful in terms of the maximization of all this. But what it appears like is the following. If I give you Michael a set of communication channels, and I'm not even gonna try to figure out what this might mean about you're talking to multiple people at once or something like that, but I'm sure you can insert the proper narrative and you've got a message that you want to send and one of them's got an information capacity that suffices to send it.

The other ones are noisier, they have smaller information capacities and you might naively think we'll just go with the most efficient one and be done with. What we are seem to be finding is that in some situations there are major thermodynamic advantages to not going exactly at the capacity. You might want to go under the capacity and the other channels you might want to even run them noisier beyond their capacity. The details have yet to be figured out and as Farita, it's all part of what's so intriguing about it is the connect is the apparent analogy with things like multiplexing within the human brain where there are many, many channels, there are many different kinds of neurons that seem to be actually conveying very similar things, but they're doing through very different media.

And it's not clear why some people, like in the articles by Vijay that you were pointing to and aother fellow travelers are hypothesizing that information theory has something to do with it. Information theory is that term is commonly used in all of the sciences, frankly outside of engineering is the introductory chapter for the freshman textbook. It is not Shannon coding theory, it's not about code books, it's not really about information capacity, it's got nothing to do with the full-on richness.

People will love to talk about Kullback-Leibler divergence, mutual information, entropy. You'll see it all over the place. That ain't information theory folks anymore than something like being able to solve a linear algebra equation is calculus. It's just the very, very teeny elementary aspects of it. There's a richness of it that communication engineer, what's called coding theory is all about and that is front and center with my being able to communicate from my mouth to your ear.

And we are finding evidence that that same mathematical richness and depth is also playing a role in the thermodynamics of communication and why there can be benefits to doing things multiplexing and yet other kinds of benefits to noise that of course were not at all considered by Danny and Cass et cetera in their book about the pointing out emphasizing some of the problems in certain contexts of noise.

Michael Garfield (58m 36s): All right, well we're already way late into your next scheduled meeting with each other, so I don't wanna press this too much further, but one more quick question because we have Carlos Gershenson here as a visiting scholar on sabbatical and in his talk that he gave recently here he mentioned a 2015 paper he did with Dirk Helbing When Slower is Faster, looking at kinda like traffic logistics, public transport, et cetera. And at least in the example systems that they give in this paper, there's a lot of overlap with the kind of example systems you give in the intro to yours.

And so it strikes me that what the two of you're converging on when you talk about, you know, sometimes there being a benefit to running lower than channel capacity is again that both of your papers are kind of pointing to this same truth that maybe one of the reasons the brain is so good at what it does is connected to the observation that people have made that the kind of thinking that brains seem really good at is different from the kind that we see Von Neumann machines excelling at.

And that part of it is that brains are very cheap and slower than these other systems. And so that in closing, I would love to hear you speak to this question of speed and intentionally slowing down or if not this, what do you think the light from this paper shines on questions about creating more thermodynamically efficient computers? I mean yours and his seem related in that way.

David Wolpert (1h 0m 20s): Okay, it's interesting connection you're seeing there. So what Dirk and Carlos did in that paper, loosely speaking was things like this. I've got nothing to do with multiplexing or at least there are large parts of it that don't involve multiplexing. I've just got a single lane of traffic though actually usually this arrives as in cases where you have multiple lanes of traffic, but the idea is quite simple. If I were to lower the speed limit, the total throughput might actually go up. And that's true in many physical systems.

In the case of traffic, it's not hard to see what's going on. If you lower the speed limit then you can have a non-linear improvement in the intercar spacing. If you and I are just both driving much more slowly, I can be much closer to you with the same margin of safety and that can simply have the effect that the total number of cars per unit time through the system is faster. So you run into the same kind of things in fluid mechanics. If you try to push water through some small channel too fast, you're gonna cause it essentially to block up. You slow it down and now all of a sudden you'll be able to get much more water going through that channel.

Michael Garfield (1h 1m 30s): I hate to say this but this is totally the point I was trying to get at earlier.

David Wolpert (1h 1m 33s): There you go. I'm catching up with you.

Michael Garfield (1h 1m 36s): Catching up with you. We were both trying to get there too fast.

David Wolpert (1h 1m 40s): So that's the kind of phenomenon going on there. In a certain sense, the mathematics is, is kind of interesting what you're getting at here because there's many instances in the sciences where to what appear to be disparate phenomena actually are being governed by similar mathematics. And so for example, in communication theory, there's a particular problem called min cut, max flow min cut with a class of problems, which is very much related to this issue of actually getting things through a pipeline and communication theory, coding theory, when you talk about going through a network, runs up against these issues full on, full bore.

The kinds of issues that Farita is looking at though right now are much more like a link on that particular network. And some of the kinds of things that Carlos and Dirk were highlighting were links on different kinds of networks. But where there would be a very striking similarity would be if you were to take the kinds of links in his system put together into network, the properties will flow through that network. Comparing it to, in Farita's case, where the individual links are instead going to be things like, well electrons going down a wire as opposed to cars going down speed limit.

So in Farita's case is not gonna be a safety distance between the success of electrons, it's gonna be giving you the benefit, but in both cases, when you start to go into the regime of networks and since both of them have to do with flow moving through a system where there are networks, you're gonna see that the mathematics is gonna be very similar. So that's or large part of where there is gonna be that commonality.

Michael Garfield (1h 3m 22s): All right, well we're way over time. Farita, I wanna give you the last word here just in closing, if you have anything that you feel we ought to mop up before this ends or if you just wanna point people into the bold future of unanswered questions.

Farita Tasnim (1h 3m 38s): Yeah, well we're gonna be putting out an archive on this in the next few weeks so people will have something to look at to match up to all this. I think we did some good mopping up, but perhaps I can talk a little bit to how this fits in with the rest of, you know, where we're going next, things like that. You know, so we talked a lot about how communication is extremely important for computation. And this is what this project is really trying to get at is, well what, what are the actual communication costs?

And as simple as it is the, the model that we're looking at, that you can think of it as two nodes in a network, there's information flowing from this node a to node B. We're trying to understand the energetic cost of that and the subtleties of the function, the mathematical function of how that energy varies with the amount of noise and the types of noise across the channel. Well, now we can expand. What happens if you have multiple nodes connected together in some sort of network?

How does the network configuration affect the energetic costs of communication from point A to point Z in a network of many, many more nodes? Particularly, we've seen clues in the literature that there are certain features of the network. If you talk about graph properties such as modularity or hierarchy of how these components are connected to each other, do they form clusters communities?

That's what modularity is. Are they organized in a certain tree like fashion that could be connected to hierarchy? How much does that give us in terms of energetic benefits for just communication? Again, just copying a message straight down the chain, like banana phone or whatever that game is, so that that's kind of where we're going next, and we're really excited to see what the results are from there and how we can exploit our most recent results to connect us into that future.

Michael Garfield (1h 5m 40s): Awesome. Thanks to you both for taking so much time to unpack all of this stuff. I know our listeners appreciate it. I certainly appreciate it, and I hope we didn't cut too much into your next conversation.

Farita Tasnim (1h 5m 52s): Thanks for giving us the communication channel by which to communicate these results.

Michael Garfield (1h 5m 57s): Everybody's gonna listen to this one on half speed. Thank you for listening. Complexity is produced by the Santa Fe Institute, a nonprofit hub for complex systems science located in the high desert of New Mexico. For more information, including transcripts, research links, and educational resources, or to support our science and communication efforts, visit Santafe.edu/podcast.