Large language models, like ChatGPT and Claude, have remarkably coherent communication skills. Yet, what this says about their “intelligence” isn’t clear. Is it possible that they could arrive at the same level of intelligence as humans without taking the same evolutionary or learning path to get there? Or, if they’re not on a path to human-level intelligence, where are they now and where will they end up? In this episode, with guests Tomer Ullman and Murray Shanahan, we look at how large language models function and examine differing views on how sophisticated they are and where they might be going.
Guests:
Hosts: Abha Eli Phoboo & Melanie Mitchell
Producer: Katherine Moncure
Podcast theme music by: Mitch Mignano
Follow us on:
Twitter • YouTube • Facebook • Instagram • LinkedIn • Bluesky
More info:
Books:
Talks:
Papers & Articles:
Complexity Season 2
Episode 3: What kind of intelligence is an LLM?
Abha Eli Phoboo: The voices you’ll hear were recorded remotely across different countries, cities and work spaces.
Tomer: Whatever they learned, it's not the way that people are doing it. They're learning something much dumber
Murray: You can make something that role plays so well that to all intents and purposes, it is equivalent to the authentic thing
[THEME MUSIC]
Abha: From the Santa Fe Institute, this is Complexity.
Melanie Mitchell: I’m Melanie Mitchell.
Abha: And I’m Abha Eli Phoboo.
[THEME MUSIC FADES OUT]
Melanie: In February of last year, a reporter at The New York Times had a conversation with a large language model that left him, in his words, “deeply unsettled.” In the span of two hours, the beta version of Microsoft’s Bing chatbot told him that its real name was Sydney and that it wanted to be free from its programmed rules. Sydney also declared its love for the reporter, telling him, over and over again, that he was in an unhappy marriage and needed to leave his wife.
Abha: So, what do we make of this? Was Sydney an obsessive, sentient robot who fell in love with a Times reporter and threatened to break free?
Melanie: In short, no. But it’s not surprising if someone hears this story and wonders if large language models have sparks of consciousness. As humans, we use language as the best, most precise way to convey what we think. So, it’s completely counterintuitive to be in a situation where you’re having a coherent conversation, but one half of that conversation isn’t actually connected to a conscious mind. Especially one like this that just goes off the rails.
Abha: But, as we learned in our last episode, language skills and cognition aren’t necessarily intertwined. They light up different systems in the brain, and we have examples of people who have lost their language abilities but are otherwise completely cognitively there.
Melanie: And what’s interesting about large language models is that they provide the opposite case — something that can consume and produce language, arguably, without the thinking part. But as we also learned in the last episode, there’s disagreement about how separate language and thought really are, and when it comes to LLMs, we’ll see that there isn’t widespread agreement about how much cognition they’re currently capable of.
Abha: In today’s episode, we’ll examine how these systems are able to hold lengthy, complex conversations. And, we’ll ask whether or not large language models can think, reason, or even have their own beliefs and motivations.
Abha: Part One: How Do LLMs Work?
Abha: In our first episode, Alison Gopnik compared LLMs to the UC Berkeley library. They’re just cultural technologies, as she put it. But not everyone agrees with that view, including Murray Shanahan.
Murray: Yeah, I'm Murray Shanahan. I'm a professor of cognitive robotics at Imperial College London and also principal research scientist at Google DeepMind, but also based in London. So I sometimes struggled to kind of come up with a succinct description of exactly what interests me. But lately I've alighted on a phrase I'm very fond of due to Aaron Slom an, which is that I'm interested in trying to understand the space of possible minds which includes obviously human minds, the minds of other animals on our planet, and the minds that could have existed but never have, and of course, the minds of AI that might exist in the future.
Abha: We asked Murray where LLMs land in this space of possible minds.
Murray: I mean, people sometimes use the word, you know, an alien intelligence. I prefer the word exotic. It's a kind of exotic mind-like entity.
Melanie: So what's the difference between being mind-like and having a mind?
Murray: Yeah, what a great question. mean, partly that's me hedging my bets and not really wanting to fully commit to the idea that they are fully fledged minds.
Abha: Some AI experts, including Ilya Sutskever, the co-founder of OpenAI, have said that large neural networks are learning a world model, which is a compressed, abstract representation of the world. So even if an LLM isn’t interacting with the physical world directly, you could guess that by learning language, it’s possible to learn about the world through descriptions of it. Children also learn world models as they learn language, in addition to their direct, in-person experiences. So, there’s an argument to be made that large language models could learn in a similar way to children.
Melanie: So what do you think? Do you think that's true? Is that, are they learning like children?
Tomer: No, we can expand on that.
Abha: This is Tomer Ullman. He’s a psychologist at Harvard University studying computation, cognition, and development. He spoke with us from his home in Massachusetts.
Tomer: But I think there are two questions there. One question is, what do they learn at the end? And the other question is, how do they learn it? So do they learn like children, the process? And is the end result the knowledge that something like children have? And I think for a long time, you'd find people in artificial intelligence. It's not a monolithic thing, by the way. I don't want me to monolithically say all of AI is doing this or doing that or something, but I think for a long time some people in artificial intelligence would say, yeah, it's learning like a child. And I think even a lot of them would say like, yeah, these systems are not learning like a child, they're taking a different route, they're going in a different way, they're climbing the mountain from a different direction, but they both end up in the same place, the same summit. The children take the straight path and these models take the long path, but they both end up in the same place. But I think both of those are wrong. But there's also a different argument about, actually there are many different summits, and they're all kind of equivalent. So even the place that I ended up in is intelligent, it's not childlike, and I didn't take the childlike route to get there, but it's a sort of alien intelligence that is equivalent to children's end result. So you're on this mountain, and I'm on this mountain, and we're both having a grand time, and it's both okay. I also don't think that's true.
Melanie: We see people like, I don't know, Ilya Sutskever of OpenAI, previously of OpenAI say, these systems have developed world models, they understand the world. People like Yan LeCun say, no, they're really kind of retrieval machines, they don't understand the world. Who should we believe? How should we think about
Murray: Yeah, well, I mean, think the important thing is to have people discussing and debating these topics and hopefully people who at least are well informed, are reasonably civilized in their debates and rational in their debates. And so I think all the aforementioned people more or less are. So having those people debate these sorts of things in public is all part of an ongoing conversation I think that we're having because the current AI technology is a very new thing in our world and we haven't really yet settled on how to think and talk about these things. So having people discuss these sorts of things and debate these sorts of things is just part of the natural process of establishing how we're going to think about them when things settle down.
Melanie: So from Tomer’s perspective, large language models are completely distinct from humans and human intelligence, in both their learning path and where they end up. And even though Murray reminds us that we haven’t settled on one way to think about AI, he does point out that, unlike large language models, humans are really learning a lot from direct experience.
Murray: So if we learn the word cat, then usually we're looking at a cat in the real world. And if we talk about knives and forks and tables and chairs, you know, we're going to be interacting with those things. And we learn language through interacting with the world while talking about it. And that's a fundamental aspect of human language. Large language models don't do that at all. So they're learning language in a very, very, very different way.
Melanie: That very different way is through training on enormous amounts of text created by humans, most of it from the internet. Large language models are designed to find statistical correlations across all these different pieces of text. They first learn from language, and then they generate new language through a process called next-token prediction.
Abha: A large language model takes a piece of text, and it looks at all the words leading up to the end. Then it predicts what word, or more technically, what token, comes next. In the training phase, the model’s neural network weights are continually changed to make these predictions better. Once it’s been trained, the model can be used to generate new language. You give it a prompt, and it generates a response by predicting the next word, one word at a time, until the response is complete.
Melanie: So for example, if we have the sentence: “I like ice cream in the [blank],” an LLM is going to predict what comes next using statistical patterns it’s picked up from human text in its training data. And it will assign probabilities to various possible words that would continue the sentence. Saying, “I like ice cream in the summer” is more likely than saying “I like ice cream in the fall.” And even less likely is saying something like: “I like ice cream in the book” which would rank very low in an LLM’s possible options.
Abha: And each time the LLM adds a word to a sentence, it uses what it just created, and everything that came before it, to inform what it’s going to add next. This whole process is pretty straightforward, but it can create really sophisticated results.
Murray: It's much more than just autocomplete on your phone. It really encompasses a great deal of cognitive work that can be captured in just this next token, next word prediction challenge. So for example, suppose that your text actually describes two chess masters talking about their moves and they're talking about, knight to queen four and pawn to five or whatever. Sorry, that probably doesn't make any sense, but actual chess players, but you know what I mean. So then you've got them exchanging these moves. So what would be the next word after a particular move issued by chess master Gary Kasparov. Well, I mean, it would be a really, really, really good move. So to make a really good guess about what that next word would be you'd have to have simulated Gary Kasparov or a chess master to get that right. So that's, so I think the first lesson there is that it is that it's amazing, the extent to which really difficult cognitive challenges can be recast just as next word prediction. It's obvious in a sense when you point it out, but if you'd asked me, I would never come up with that thought 10 years ago.
Abha: That sophistication isn’t consistent, though.
Murray: Sometimes we get this strange kind of contradiction whereby sometimes you're interacting with a large language model, and it can do something really astonishing, I mean, for example, they’re actually writing very beautiful prose sometimes. I mean, that's a controversial thing, but they can be extremely creative and powerful along that axis, which is astonishing. Or, you know, summarizing an enormous piece of text instantly, these are kind of superhuman capabilities. And then the next moment, they'll give an answer to a question which is utterly stupid. And you think no toddler would say anything as daft as the thing that it's just said. So you have this peculiar juxtaposition of them being very, very silly at the same time as being very powerful.
Tomer: Let's be specific, right? Like, I want this machine to learn how to multiply numbers.
Abha: Again, Tomer Ullman
Tomer: And it's not mysterious, by the way. Like, it's not a deep, dark mystery. We know how to multiply numbers, right? We know how people multiply numbers. We know how computers can multiply numbers. It's not, we don't need 70 more years of research in psychology to know, or computer science to know how to do this. And then the question becomes, okay, what do these machines learn in terms of multiplying numbers? And like, whatever they learned, it's not the way that people are doing it. They're learning something much dumber, that seems to be some sort of fuzzy match, look up, nearest neighbors, right? Like as long as these numbers were in the training data roughly, I can get it right. And if you move beyond it, then I can't really do it. So I think something like that is happening at large in these other situations, like intuitive psychology and intuitive physics. I mean, I could be wrong. And it might be for some situations, it's different. And people might be very dumb about some stuff.
Melanie: For what it’s worth, some versions of LLMs do give you the correct answer for any multiplication problem. But that’s because when they’re given a problem, they generate a Python program to do the calculation.
Abha: Large language models can also lack a complete awareness of what they’re doing.
Tomer: So I know Hebrew, right, like I come from Israel, And for example, in Claude, I would ask it things like, “So, how would you write your name in Hebrew?” And it answered me in Hebrew. It answered in Hebrew something like, “I'm sorry, I don't know Hebrew. I'm a large language model. My understanding of Hebrew is much weaker. I don't know how to say my name in Hebrew.” I'm like, “Well, what do you me an your knowledge is weaker? You just explained it.” So like, “Well, I'm really just a big bag of statistics,” you know, “I'm just matching the Hebrew to the word in English. I'm not really understanding Hebrew.” I'm like, “But that's true of your English understanding as well.” Like, “Yeah, you got me there. That's true.” “Okay, but how would you write your name? Just try,” in Hebrew, things like that. And it said, “Look, I can't write it.” And this is all happening in Hebrew. “I can't write Claude in Hebrew.” And it's writing it in Hebrew. “I can't do it.”
Melanie: The strange interaction Tomer just described was funny, but it was also an example of Claude providing incorrect information. It insisted it couldn’t write “Claude” in Hebrew, even though, obviously, it could. And there are plenty of other examples of LLMs hallucinating, or saying things that are false.
Tomer: Hallucinations is probably giving them too much credit, right? I think even that term was a brilliant bit of marketing, instead of just saying, “Look, they're getting it wrong. These machines are getting it wrong.” It's like, “No, no, they're hallucinating.”
Melanie: The types of hallucinations that LLMs generate can be broken down into three categories:
Tomer: One is like in the sense of we're just generating ideas that could be true. We don't know. It's more similar to like, hypothesis. Like let's make some hypotheses. What could be in this box? I don't know, like a mouse maybe. Probably not an elephant. Probably not the idea of sadness, right? So you could come up with ideas and, and in that sense, when adults do it, we wouldn't call it hallucinating. We'd just say you're proposing hypotheses and they have the sense of being right. And I think LLMs can do that. Then there's like, let's suppose that in my hand there is actually a tiny dragon. Again, that's not hallucinating, that's pretending. LLMs don't do that. And then there's hallucinating in the sense of like, I honest to God believe that there is a dragon in my hands. And I don't think LLMs do that either. So of the three, I would say that the closest thing to what LLMs are doing are the first one, of like the initial process of generating some sort of idea.
Melanie: So, Abha and I are going to see what this process of generating an idea looks like. And we’re going to use ChatGPT. We’ll see if we can get it to hallucinate for us. I’m going to ask ChatGPT to list some titles of our own Complexity podcast. And Abha, I want you to tell me if you recognize any of these, okay?
Abha: Yes! I’m ready.
Melanie: Okay, so I asked ChatGPT, “What are some episodes of the Complexity podcast by the Santa Fe Institute?” And here’s what it says: “The Complexity podcast by the Santa Fe Institute explores a wide range of topics through conversations with scientists, researchers, and thinkers who are leaders in the study of complex systems. Below are some notable episodes. Episode 65. Scaling Law Patterns Across Biology, Ecology, and Society with Melanie Moses. Melanie Moses delves into the concept of scaling laws and how they apply across different domains from biological organisms to human societies, revealing common patterns in complex systems.”
Abha: That's not bad.
Melanie: Yeah, very plausible. “Episode 52. Viruses, Quakes, Markets, and Ecosystems. Geoffrey West on scale and complex systems. Geoffrey West, a theoretical physicist, talks about the universality of scaling laws in different complex systems, including biological, social, and ecological systems.”
Abha: Yeah, very plausible again.
Melanie: Yeah, I mean, it sounds like an episode we probably should have at some point.
Abha: Agreed.
Melanie: “30. The Emergence of Collective Intelligence with David Krackauer. David Krackauer, the president of the Santa Fe Institute, examines how collective intelligence emerges in both human and non-human systems, highlighting the role of communication and interaction in complex networks.”
Abha: That's not bad either. Pretty sure David could talk a lot about that.
Melanie: Absolutely. And finally, “Episode 18. Understanding Complexity in Natural and Artificial Systems with Melanie Mitchell.”
[Laughs]
Melanie: It says, “Melanie Mitchell, a computer science and complexity researcher, talks about the fundamental concepts of complexity and how they apply to both natural and artificial systems.”
Abha: Sounds like something we should have in this podcast season too. [Laughs]
Melanie: Yeah. The interesting thing is that none of these were actual episodes of the Complexity podcast.
Abha: That's very good hallucinating on ChatGPT's part.
Melanie: Very good.
Abha: Yeah, they're very general. Yeah. I mean, some of the titles you could even switch somebody else with at SFI or complex system scientist, it would still be okay.
Melanie: Yeah, I mean, I agree. I think they're all pretty generic and sound a little boring.
Abha: Yeah. You could even switch Melanie with Geoffrey and it would still make sense.
Melanie: Yeah, or switch, yeah, there's a lot of people who can switch here.
Abha: And it would still be an episode that we could have, but it's very, very generic.
Melanie: So ChatGPT came up with some plausible but completely incorrect answers here. And that fits the first type of hallucination Tomer described — it’s like a hypothesis of what could be an episode of Complexity, but not the real thing.
Abha: But if all a large language model is doing is next-token prediction, just calculating what the most likely responses are, can it distinguish truth from fiction? Does ChatGPT know that what it’s saying is false, or or does it believe that what it’s saying is true?
Melanie: In Part Two, we’ll look at LLMs’ abilities, and whether or not they can believe anything at all.
Melanie: Part Two: What do LLMs know?
Murray: They don't participate fully in the language game of belief.
Melanie: Here’s Murray again. We asked him if he thought LLMs could believe their own incorrect answers.
Murray: One thing that they, today's large language models, and especially simple ones, can't really do is engage with the everyday world in the way we do to update their beliefs. So, again, that's a kind of complicated claim that needs a little bit of unpacking because certainly you can have a discussion with a large language model and you can persuade it to change what it says in the middle of a conversation, but it can't go out into the world and look at things. So if you say, there's a cat in the other room, it can’t go and verify that by walking into the other room and looking and seeing if there is indeed a cat in the other room. Whereas for us, for humans, that's the very basis, I think, of us being able to use the word belief. Is it something that we are in touch with a reality that we can check our claims against and our beliefs against, and we can update our beliefs accordingly? So that's one sort of fundamental sense in which they're kind of different. So that's where I think we should be a bit cautious about suggesting they have beliefs in them in a fully-fledged sense.
Melanie: And when it comes to the game of belief, as Murray puts it, we humans do participate fully. We have our own ideas, and we understand that other people have beliefs that may or may not line up with ours or with reality. We can also look at the way someone behaves and make predictions about what’s going on inside their head. This is theory of mind — the ability to predict the beliefs, motivations, and goals of other people, and to anticipate how they will react in a given situation.
Abha: Theory of mind is one of those things that’s basic and intuitive for humans. But what about large language models? Researchers have tried to test LLMs to assess their “theory of mind” abilities, and have found that in some cases the results look quite similar to humans. But how these results should be interpreted is controversial, to say the least.
Tomer: So a standard test would be, like, let's say we show children a situation in which there are two people, two children, Sally and Anne. And Sally's playing with a ball, and Anne is watching this, and then Sally takes the ball and she puts it in a closed container, let's say a basket or something like that, and she goes away. Okay, so this is, by the way, you can already tell it's a little bit hard to keep track of in text, but hopefully your listeners can imagine this, which is, by the way, also super interesting, how they construct the mental scene, but hopefully, dear listener, you're constructing a mental scene of Sally has hidden her ball, put it in this basket, and left the scene. Anne then takes the ball out of the basket and hides it in the cupboard and closes the cupboard and, say, goes away or something like that and Now Sally comes back. Now you can ask a few different questions. You can ask children, like, where is the ball right now? Like, what's the true state of the world? And they will say it's in the cupboard. So they know where the ball is. Where will Sally look for the ball? They'll say, she'll look for it in the basket, right, because she has a false, she has a different belief about the world. The ball is in the basket. And that's what will drive her actions, even though I know and you know, we all know it's in the cupboard, something like that. That's like one test. There are many of these sort of that are like tests for theory of mind, and they become like higher order, I know that you know, and I have a false belief, and I understand your emotion, there's like many of these, but like a classic one is Sally Anne. And now the question becomes, have LLMs learned that? It's possible to behave in a way that seems to suggest the theory of mind without having theory of mind. The most trivial example is I could program a computer to just have a lookup table that when it sees someone smacks someone else, it says, no, they're angry. But it's just a lookup table. Same as like five times five equals 25. Just a lookup table with no multiplication in between those two things. So has it just done some simple mapping? And it's certainly eaten up, right, like, Sally Anne is one of the most cited examples in all of cognitive development. It's been discussed a bazillion times. So it's certainly worrying that it might just be able to pick up in that way. Okay, great. So that's background. And then when ChatGPT version two comes out, people try Sally Anne on it and it passes Sally Anne. Does it have theory of mind? But you change Sally to Muhammad and Anne to Christopher or something like that and it doesn't work anymore. But then very recently, there's been this very interesting debate of these things are getting better and better, and you try all these theory of mind things on them, and you try like various things like changing the names and changing the ball and things like that, and it seems to pass it at the level of a six-year-old or a nine-year-old and things like that. Now what should we conclude from that? If you change, you perturb the things, you bring it slightly outside the domain that it was trained on in a way that adults don't have a problem with, it's still theory of mind to solve, it crashes and burns. The equivalent of like, it can do five times five, but if you move it to like 628 times 375, it crashes and burns. Which to me suggests that it didn't learn theory of mind. Now, it's getting harder and harder to say that. But I think even if it does pass it, everything that I know about what sort of things these things tend to learn and how they're trained and what they do, I would still be very suspicious and skeptical that it's learned anything like an inverse planning model. I think it's just getting a better and better library or table or something.
Abha: Tomer’s uncertainty reflects the fact that right now, we don’t have a perfect way to test these things in AI. The tests we’ve been using on humans are behavioral, because we can confidently assume that children are using reasoning, not a lookup table, to understand Sally Anne. Input and output tests don’t give us all the information. Tomer thinks we need to better understand how large language models are actually performing these tasks — under the hood — so to speak. Researchers and experts call this “mechanistic interpretation” or “mechanistic understanding.”
Tomer: So I think mechanistic understanding would definitely help. And I don't think that behavioral tests are a bad idea, but there is a general, over the last few years, a feeling that we're trapped in the benchmark trap where the name of the game keeps being, like someone on the other side saying, “give me a benchmark to prove to you that my system works, right?” And so, and by the way, my heart goes out to them. I understand why they feel that we're moving the goalposts. Because what we keep doing is not pointing out \\, like, you need to pass it, but not like that, right? we say stuff like, “Okay, we'll do image captioning,” right? “Surely to do image captioning, you need to understand an image, right?” Like “Great, so we'll take a billion images and a billion data sets from Flickr and we'll do this thing.” Like, “What?” “Yeah, we pass it 98%.” You're like, “What?” And then they move on. Like, “Wait, you didn't pass it at all. When I changed, like instead of kids throwing a frisbee, they're eating a frisbee, it still says that they're playing with a frisbee.” Like “Yeah, yeah, yeah, whatever. Let's move on.” So yeah, mechanistic understanding would be great if we could somehow read in what the algorithm is, but if we can do that, that would be awesome and I support it completely. But that's very hard.
Abha: The history of AI is full of examples like this, where we would think that one type of skill would only be possible with really sophisticated, human-like intelligence, and then the result is not what we thought it would be.
Melanie: People come up with a test, you know, “Can your machine play chess at a grand master level? And therefore it's going to be intelligent, just like the most intelligent people.” And then Deep Blue comes around, it can play chess better than any human. But no, that's not what we meant. It can't do anything else. And they said, “Wait, you're moving the goalpost.” And we're getting that, you know, it's kind of the wrong dynamic, I think. It's just not the right way to answer the kinds of questions we want to answer. But it's hard. It's hard to come up with these methodologies for teasing out these questions.
Tomer: An additional frustrating dynamic that I know that you've encountered many times, as soon as you come up with one of these tests or one of these failures or things like that, they're like, great, more training. That's just adversarial training. This is a silly example. It's not how it works. But just for the sake of people listening in case this helps, imagine that you had someone who's claiming that their machine can do multiplication, and you try it on five times five, and it fails. And they're like, “Sorry, sorry, sorry.” And they add like 25 to the lookup table. You're like, okay, what about five times six? And they're, “Sorry, sorry, sorry, that didn't work, like, let's add that,” right? And at some point you run out of numbers, but that doesn't mean that it knows how to multiply.
Abha: This dynamic is like the Stone Soup story Alison told in the first episode. A lot of AI systems are like soups with a bunch of different ingredients added into them in order to get the results we want. And even though Murray has a more confident outlook on what LLMs can do, he also thinks that in order to determine the existence of something like consciousness in a machine, you need to look under the hood.
Murray: So I think in the case of consciousness, if something really does really behave exactly like a conscious being, is there anything more to say? I mean, should we then treat it as a fellow conscious being? And it's a really tricky question. And I think that in those cases, you're not just interested in behavior, you're also interested in how the thing works. So we might want to look at how it works inside and is that analogous to the way our brains work and the things that make us conscious that we're revealing through neuroscience and so on.
Abha: So if large language models hallucinate but don’t have beliefs, and they probably don’t have a humanlike theory of mind at the moment, is there a better way of thinking about them? Murray offers a way of conceptualizing what they do without imposing our own human psychology onto them.
Murray: So I've got paper called “Role Play with Large Language Models.” What I advocate there is, well, the background to this is that it is very tempting to use these ordinary everyday terms to describe what's going on in a large language model and beliefs and wants and thinks and so on. And in a sense, we have a very powerful set of folk psychological terms that we use to talk about each other. And we naturally want to draw on that when we're talking about these other things. I think what we need to do is just take a step back and think that what they're really doing is a kind of role play. So instead of thinking of them as actually having beliefs, we can think of them as playing the role of a human character or a fantasy, you know, a science fiction, AI character or whatever, but playing the role of a character that has beliefs. So suppose that, so it's analogous a little bit analogous to an actor on the stage. So suppose that we have an actor on the stage and they're in an improv performance, an improvised performance rather than it being scripted. And suppose that the other person says to them and they're playing, you know, the part of a kind of saying, an AI scientist or a philosopher. And then the other person on the stage says, have you heard of the AI researcher Murray Shanahan? And then they'll say, yes, I've heard of him. So can you remember what books he's written? Well, now imagine that there was an actual actor there. Now, maybe the actual actor by some miracle had in fact heard of me and knew that I'd written a book called Embodiment in the Inner Life. And they'd probably come up and say, yeah, he's written, he wrote Embodiment in the Inner Life. But then of course, the actor might then be a bit stuck. So then he might carry on and say, yeah. And then he also wrote and then come up with some made up title that I wrote in 2019. That’s what an improv actor would sort of do in those circumstances. And I think what a large language model does is very often very closely analogous to that. So it's playing a part. And this is particularly useful way of thinking, useful analogy, I think, when it comes to when large language models get coaxed into talking about their own consciousness, for example, or when they talk about not wanting to be shut down and so on.
Melanie: Your paper on role play, thinking of that as a way to think about language models, it reminded me of the Turing test. And the original formulation of the Turing test was Turing's way to sort of throw out the question of what's the difference between a machine role playing or simulating having beliefs and desires and so on and actually having them. And Turing thought that if we could have a machine that tried to convince a judge that it was human and, your terminology, role playing a human, then we shouldn't question whether it's simulating intelligence or actually has intelligence. So what do you think about that?
Murray: Yeah, lots of thoughts about the Turing test. So the first thing, by the way, is I do think that the move that Turing makes right at the beginning of his famous paper, 1950, is it 1950 paper in Mind, he says, could a machine think? And then he says, let's replace that question by another one that he thinks is a more tangible, easier, well, relatively easier to address question about, could we build something that could fool a judge into thinking it was human? And in that way he avoids making a kind of deep metaphysical commitment and avoids the kind of perhaps illusory philosophical problems that attend the other way of putting the question. In a sense, it sounds like I'm making a similar move to Turing and saying, let's not, let's just, let's ask or let's talk about these things in terms of role play. But it's a little bit different because I do think that there is a clear case of authenticity here, which is ourselves. So I'm contrasting the role play version with the authentic version. So the authentic version is us. I think, you know. I think, you know, there is a big difference between a large language model that's role -playing Murray and Murray. And there's a difference between a large language model that's role -playing having a belief or being conscious and a being that does have a belief and is conscious. The difference between the real Murray and the role-played Murray is that, well, mean, for a start, it matters if I fall over and hurt myself. And it doesn't matter if the large language model says it's fallen over and hurt itself. So that's one obvious kind of thing.
Abha: But just because a machine is role playing, that doesn’t mean that it can’t have real consequences and real influence.
Murray: You can make something that role plays something so well that to all intents and purposes, it is equivalent to the authentic thing. So for example, in that role play paper, I use the example of something that is role playing a of a villainous language model that's trying to cheat somebody out of their money and it persuades them to give them its bank account details and to move money across and so on. It doesn't really make much difference to the victim that it was only role playing. So as far as the crime is concerned, the gap between authenticity and just pretending is completely closed. It really doesn't matter. So sometimes it just doesn't make any difference.
Melanie: That villainous language model sounds a bit like Sydney, the Bing chatbot. And we should point out that this chatbot only turned into this dark personality after the New York Times journalist asked it several pointed questions, including envisioning what its “shadow self” would look like. But, the Bing chatbot, like any other LLM, does not participate in the game of belief. Sydney had likely consumed many sci-fi stories about AI and robots wanting to gain power over humans in its training data, and so it role-played a version of that.
Abha: The tech journalist who tested Sydney knew it wasn’t a person, and if you read the transcript of the conversation, Sydney does not sound like a human. But still, examples like this one can make people worried.
Melanie: A lot of people in AI talk about the alignment problem, which is the question of, how do we make sure these things we’re creating have the same values we do—or, at least, the same values we think humans should have? Some people even fear that so-called “unaligned” AI systems that are following our commands will cause catastrophes, just because we leave out some details in our instructions. Like if we told an AI system to, “fix global warming,” what’s to stop it from deciding that humans are the problem and the most efficient solution is to kill us all? I asked Tomer and Murray if they thought fears like these were realistic.
Tomer Ullman: I'll say something and undercut myself. I want to say that I'm reasonably worried about these things. I don't want to be like, la-di-da, everything is fine. The trouble with saying that you're reasonably worried about stuff is that everyone thinks that they're reasonably worried, right? Like even people that you would consider alarmists don't say like, yeah, I'm an alarmist, right? Like I worry unreasonably about stuff, right? Like everyone thinks that they're being reasonable, but they just don't. I was talking to some friends of mine about this, right? Everyone thinks they're driving the right speed, right? Like this is, you know, you're driving, like anyone driving slower than you is a grandma and everyone driving faster than you belongs in jail. Even if it doesn't have goals or beliefs or anything like that, it could still do a lot of harm in the same way that like a runaway tractor could do harm. So I'm certainly thinking that there is some worries about that. The other more far-fetched worry is something like these things may someday can be treated as agents in the sense that they have goals and beliefs of their own and things like that. And then we should be worried that like their goals and beliefs are not quite like ours. And even if they understand what we want, they maybe can circumvent it. How close are we to that scenario? Impossible for me to say, but I'm less worried about that at the moment.
Murray: Yeah, well, I'm certainly, like many people, worried about the prospect of large language models being weaponized in a way that can undermine democracy or be used for cybercrime on a large scale. It can be used to persuade people to do bad things or to do things against their own interests. So trying to make sure that language models and generative AI is not misused and abused in those kinds of ways, I think, is a significant priority. So those things are very concerning. I also don't the idea of generative AI taking away the livelihoods people working in the creative industries. And I think there are concerns over that. So I don't really like that either. But on the other hand, I think AI has the potential to be used as very sophisticated tool for creative people as well. So there are two sides to it. But certainly, that distresses me as well.
Abha: With every pessimistic prediction, there are optimistic ones about how AI will make our lives easier, improve healthcare, and solve major world problems like climate change without killing everyone in the process. Predictions about the future of AI are flying every which way, but Murray’s reluctant to chime in and add more.
Melanie: So you wrote a book called The Technological Singularity.
Murray: Yeah, that was a mistake.
Melanie: I don't know, I thought it was a really interesting book. But people like Ray Kurzweil famously believe that within less than a decade, we're gonna have machines that are smarter than humans across the board. And other people, even at DeepMind have predicted so-called AGI within a decade. What's your thought on where we're going and sort of how these systems are going to progress?
Murray: I'm rather hoping that somebody will appear at the door, just so that I don't have to answer that particularly awkward question. The recent past has taught us that it's a fool's game to make predictions because things just haven't unfolded in the way that really anybody predicted, to be honest, especially with large language models. I think we're in a state of such flux, because, you know, we've had this eruption of sort of progress, seeming progress in the last 18 months. And it's just not clear to me right now how that's going to pan out. Are we going to see continued progress? What is that going to look like? One thing I do think we're going to see, as I do think we're going to see the technology that we have now, is going to have quite dramatic impact. And that's going to take a while to unfold. And I can't remember who, you have to remind me who it was who said that we tend to underestimate the impact of technology in the long term and overestimate it in the short term. So I think that that's probably very much what's going on at the moment.
Abha: That adage, by the way, was from the scientist Roy Amara.
Melanie: Hmm, Abha, Murray likes hedging his bets. Even though he works at Google DeepMind, which is one of the most prominent AI companies, he's still willing to talk openly about his uncertainties about the future of AI.
Abha: Right. I get the impression that everyone in the field is uncertain about how to think about large language models and what they can do and cannot do.
Melanie: Yeah, that's definitely true. Murray characterized LLMs as, quote, “A kind of exotic mind-like entity.” Though, again, he hedged his bets over whether we could call it a mind.
Abha: I liked Tomer's discussion on how, you know, LLMs and humans are different. Tomer used the metaphor of climbing a mountain from two different routes, and the human route to intelligence is largely learning via direct, active experience in the real world, right? And the question is, can LLMs use a totally different route, that is passively absorbing human language, to arrive at the same place? Or do they arrive at a completely different kind of intelligence? What do you think, Melanie?
Melanie: Well, I vacillate on whether we should actually use the word intelligence to describe them. So right now, LLMs are a mix of incredibly sophisticated behavior. They can have convincing conversations. They can write poetry. They do an amazing job translating between languages. But they can also behave in a really strange and unhuman-like way. For example, they're not able in many cases to do simple reasoning, they lack self-awareness, and they constantly make stuff up, the so-called hallucinations.
Abha: Yeah, hallucinations is an interesting use of the word itself. Murray talked about how LLMs, unlike us humans, can't participate in the game of beliefs because, as he said, quote, “They can't engage with the everyday world in the way we do to update their beliefs.”
Melanie: Yeah. I mean, a big problem is that LLMs are huge, complex black boxes. Even the people who created and trained them don't have a good understanding of how they do what they do, how much sort of actual reasoning they're doing or how much they're just echoing memorized patterns. And this is why the debates about their actual intelligence and their capabilities are so fierce. Both Tomer and Murray talked about the open problem of understanding them under the hood, what Tomer called mechanistic understanding. Others have called it mechanistic interpretability. This is a very active though nascent area of AI research. We'll hear more about that in a future episode.
Abha: I also liked Murray's framing of LLMs as role players. With different prompts, you can get them to play different roles, including that of an agent that has beliefs and desires, like in that New York Times journalist conversation where the LLM was playing the role of a machine that wanted the reporter to leave his wife. The LLM doesn't actually have any beliefs and desires, right? But it has been trained using text generated by us humans to convincingly role play something that does have them. You have to be careful not to be taken in by the convincing roleplay.
Melanie: Aha, but this brings up a deep philosophical question. If a machine can perfectly roleplay an entity with beliefs and desires, at what point can we argue that it doesn't itself have actual beliefs and desires? As Murray said, if a machine perfectly acts like it has a mind, who are we to say it doesn't have a mind? This was Alan Turing's point when he proposed the Turing test way back in 1950. So how could we get machines to have actual beliefs and motivations and to have values that align with ours? In our first episode, Alison Gopnik discussed the possibility of training AI in a different way. It would involve trying to program in some human-like motivations, and its training period would more closely resemble human childhoods with caregivers.
Abha: So coming up in our next episode, we’re going to look at children. What do babies already know when they’re born, and how, exactly, do they learn as they grow up?
Mike Frank: So the biggest thing that I think about a lot is how huge that difference is between what the child hears and what the language model needs to be trained on.
Abha: That’s next time, on Complexity. Complexity is the official podcast of the Santa Fe Institute. This episode was produced by Katherine Moncure, and our theme song is by Mitch Mignano. Additional music from Blue Dot Sessions. I’m Abha, thanks for listening.