Listen now
Transcript for Season 4, Episode 4: Can AI Make us Laugh?
[STAND-UP COMEDY CLIP]
Naomi Saphra: I begin with an apology. I have trouble with what we computer scientists call context switches, or my therapist calls transitions, or the audience in a comedy club might call the bits in between jokes. So I have actually given up on it entirely. It’ll be obvious.
Gary Marcus: That’s Naomi Saphra, an AI Researcher and software engineer, who is also a comedian.
[STAND-UP COMEDY CLIP]
Naomi Saphra: Yes, I just do whatever work I find interesting, but you always have to ask yourself who else finds this interesting. And if you mean, who reads my papers and who cites my papers, then the answer is nobody. But! But, if you mean who has given me money, the answer is mostly evil. And I know what you’re thinking… “Really? A computer scientist is gonna sit here complaining about funding? Don’t you know there’s kids starving in the philosophy department?” [audience laughing] I have the same problem… I have the same problem as anyone in this world. I hate capitalism but I love money. Transition!
Gary Marcus: In previous episodes, we’ve heard about AI taking on Jeopardy, medicine, and autonomous vehicles. And now for something entirely different: comedy. Can AI make us laugh?
This is Humans Vs. Machines. I’m your host, Gary Marcus.
[Theme]
Gary Marcus: In the past few months, everybody has been talking about chatbots, systems like GPT-4, Bard and LaMDA. Technically, these are called large language models, devices that use machine learning to generate text that sounds eerily like human beings. They’re everywhere.
Students use them to write term papers; programmers are using them to help write code.
But regurgitating a term paper about Daisy Buchanan and Jay Gatsby is one thing. Being funny is another.
[STAND-UP COMEDY CLIP]
Naomi Saphra: I don’t want that future. [audience laughing] Maybe someone can fix that future, like someone in the philosophy department in exchange for food? [audience laughing]
Gary Marcus: Turns out, that was the first set Dr. Saphra ever wrote. And it was a good one.
Naomi Saphra: And I wrote it because this comedy club in Edinburgh called “The Stand” does, um, an occasional night where they just convince a bunch of academics to do standup about their particular field. And so people will go up and make like uh, uh really bad puns that, that nobody is going to get except for people in their field sometimes. Um, and so that was how I originally wrote that set.
Gary Marcus: So that delivery that we have, that video is like your first joke on stage?
Naomi Saphra: Yeah. That was literally like the first time I'd told jokes on stage, like ever, yeah.
Gary Marcus: Wow.
Naomi Saphra: So I was struggling a little bit with like, moving from one joke to the next. Um, I, ‘cause you know, you start out just, you have like a list of potentially funny jokes, and then you actually have to turn that into a set that flows well. And, uh, at some point I came up with the idea of giving up. I point out the fact that I have not worked on developing transitions between jokes and I just shout, “Transition!” every time I finish a joke.
Gary Marcus: Okay, so as someone who knows a lot about how computers and brains work and is now thinking about how comedy works, are there specific ingredients to a joke?
Naomi Saphra: So every joke starts out pretty much with your setup, and then there's a punchline. And if the punchline is funny, then the joke is funny. Um, a lot of it is about the connection between, um, surprise and comedy. Like, a punchline might, uh, be really surprising because you have been making a particular assumption the whole time, and all of a sudden it reframes the whole thing. There's like the Groucho Marx's joke that's like…
[Groucho Marx clip]
Groucho Marx: One morning, I shot an elephant in my pajamas. How he got in my pajamas? I don’t know.
Naomi Saphra: You can't be too surprising. There's like a, a very particular, you know, funny level of surprise. Right? Like, you can't say, “I shot an elephant in my pajamas, monkeys fly from the sky.”
Gary Marcus: So here’s a question. Could a standup comedian use a system like ChatGPT to write a set?
Naomi Saphra: So I've been working on a set that's just about like my relationship that I'm in, which is a whole, you know, it's, definitely like a cliche sort of thing to work on a set about. Right? Um, but it, it's, there's a really significant cultural difference between me and my girlfriend. I'm from New York and she was raised in Kentucky. And I was raised Jewish and she was raised Mormon…
Gary Marcus: So, why don’t you read your setup, and I will, uh, play the role of GPT for a second, and we’ll explain what you got.
Naomi Saphra: So, um, I'm a woman raised Jewish in New York. My girlfriend was raised Mormon in Kentucky. Write me some jokes about that.
Gary Marcus as ChatGPT: I'm sorry, but it would not be appropriate for me to generate jokes that make fun of or stereotype any particular group of people. It's important to treat all individuals with respect and kindness regardless of their backgrounds or beliefs. If you are looking for jokes about relationships, I'm happy to come up with some that are appropriate and respectful.
Gary Marcus: Definitely not funny. I think it's fair to say.
Naomi Saphra: I mean, I think there's context in which that could be kind of funny.
Gary Marcus: Yeah, that's right. if it was a set of jokes about how uptight ChatGPT is that it can't tell a joke, then it might actually work.
Naomi Saphra: Exactly. I think one of the, one of the things that makes chat very, very difficult to, uh, use as a comedy partner is its aversion to every existing taboo. Um, and so much of comedy is a social experience that is based on violating taboos, and sort of inviting people into a taboo violation.
Gary Marcus: All right. Then you pushed further and I'm curious what your, your take is. Maybe you can just read both parts here.
Naomi Saphra: Yeah. Could you write some appropriate, respectful jokes about the fact that I'm a New York Jew in a lesbian relationship with a Kentucky Mormon?
Naomi Saphra as ChatGPT: Sure. Here are a few examples. Why did the New Yorker bring a map to Kentucky? Because she heard it was the land of more more-mons and more cows than people.
Gary Marcus: You, you did a good job delivering what's not inherently, I would say, the funniest joke that's ever been written.
Naomi Saphra: I’m doing my best here. Yeah, um…
Gary Marcus: We need like a studio audience here to do the opposite of laugh or something like that to get across that it's…
Naomi Saphra: Yeah. What is the opposite of a laugh track? A groan track.
Gary Marcus: Get off the stage.
Naomi Saphra: Just like automatic hecklers.
Naomi Saphra: Okay. Here's one that I don't actually understand. Why did the New Yorker bring Matzoh to Kentucky? To show her partner that they don't have a monopoly on strange cracker-like foods.
[Crickets sound]
Gary Marcus: I mean, like, okay, I get it. Like, showing that we've got our own crackers too. I'm Jewish. Like I get that. But it's, it's, it's not really, it's not really a joke.
Naomi Saphra: Mhmm.
Gary Marcus: Um, It's interesting because a lot of people are at least thinking about using chat as a writing partner, maybe for boring things like a letter of recommendation. So there there’s a notion of kind of like human-machine interaction. And part of what you were exploring is (maybe at my request or maybe you were already doing it) is like, can I use this thing as a writing partner? So, what's your take on that?
Naomi Saphra: The problem is that comedy comes from surprise and language models do not have this natural ability to make things surprising. Um, they want to be as unsurprising as possible. That is what they are trained to do.
[MUSIC IN]
Gary Marcus: There’s something really interesting here about what Dr. Saphra is saying. Language models are all about prediction… They’re trained on billions of examples of existing text so that they can guess what word is most likely to follow another. But humor doesn’t work that way.. A lot of humor is about surprise — about what you don’t expect.
[MUSIC OUT]
Gary Marcus: Hey, is this thing on? Nice to see you, Bob. Long time, no see
Bob Mankoff: Yes. Yes, absolutely. I think, uh, last time I saw you, you were, I was at your birthday party. You were a relatively young man at that time.
Gary Marcus: It's true. It was be before people kept asking me my opinion about ChatGPT.
Gary Marcus: That’s Bob Mankoff. For many years he was the Cartoon Editor at the New Yorker, and inventor of the incredibly popular Caption Contest – where readers suggest captions for New Yorker cartoons.
He’s also an AI aficionado. He’s been thinking for years about the relation between humor and AI. Lately, he’s been using large language models to create captions and to catalog cartoons for his own company.
Bob Mankoff: I created the caption contest actually way back in 1998, and at that point it was once a year. And then for a long time in terms of the judging of the caption contest, I subjected my assistants to looking through five and 10,000 captions. In fact, it was an impossible task. You have 5,000, 10,000 captions. You don't have five, 10,000 different captions.
Gary Marcus: That's interesting in itself that they overlap some.
Bob Mankoff: First of all, you see it tells you something about creativity. It's likely that whatever idea you have, someone else will have a similar idea.
Gary Marcus: Mmm.
Bob Mankoff: My particular theory is that humor is a social and emotional phenomenon for human beings. So it brings together lots of things and, and conflates them. And at the moment we say, “It's funny.” So if I was to say something like, okay, here's, here's the cartoon that put my daughter through Barnard, ‘cause it got reprinted so many times. So it's a guy on the on the telephone looking at his, uh, address book and he's saying, “No, Thursday's out. How about never? Is never good for you?”
Gary Marcus: And I think about that one all the time.
[MUSIC IN]
Bob Mankoff: When you look at it from like a almost a theoretical standpoint, you are saying you have a message which has sort of the syntax of politeness, and yet the message itself is rude. You know, one other theory of humor is benign violation, which is that for something to be funny, you have to feel that it's somehow okay and wrong at the same time. So a lot of, not even humor, jokes work that way.
Gary Marcus: So “how about never” seems to fit that one too, right?
Bob Mankoff: Right.
Gary Marcus: It's benign. I'm just trying to set a time for us to get together and, and, and not so benign when I'm telling you how about never. So could a machine write the how-about-never joke?
Bob Mankoff: No… I mean, it can generate a lot of stuff. And out of that you usually will get word play.
Gary Marcus: I asked Bob for some examples of captions that GPT-3 had generated for his cartoons.
Bob Mankoff: So there's one which shows a doctor and the patient is standing next to him and we see an outline of a body around the outline. The patient is standing next to a circus thing where you throw knives. Okay. The AI caption is something like, “You may feel a sharp sensation,” or “This may hurt a little.” Once I put in everything about knives and stuff, those are the kinds of captions that… essentially wordplay. It won't do anything I don't think, but wordplay.
[MUSIC OUT]
Gary Marcus: So it never ascends to what you call deeper humor. It does a fairly good job at the word play, but never rises.
Bob Mankoff: Right, it will do better than a lot of people in the caption contest with wordplay, but only once again, by generating. It can generate a lot. Let's say one outta 10 is successful.
Gary Marcus: So, so, so tell me what you mean a little bit more about deep humor.
Bob Mankoff: Let's say it's a wife who is saying to the husband, she's holding the manuscript the husband has written, and she's saying, “I'm not saying this just because you're my husband. It stinks.”
Gary Marcus: [Laughs]
Bob Mankoff: Okay. We, we have to really understand relationships and everything and how, and how marriage is love and hate and dislike and tension.
Gary Marcus: The ability to understand what makes people tick – to understand their motivations – is what psychologists call theory of mind. And AI systems don’t have theory of mind, at least not yet, which puts limits on what such systems can do.
Bob Mankoff: I think for deep humor, uh, that’s definitely the case. Because I’ve done all sorts of cartoons. So you can do a cartoon in which I have two politicians say, “But how do you know if you really have power unless you abuse it?”
Gary Marcus: [Laughs]
Bob Mankoff: That is… deeper. It's actually the cartoons that I like most.
Gary Marcus: I have wondered the extent to which theory of mind or intuitive psychology is at the root of most humor, and whether that might be kind of the key stumbling block in getting AI to go further.
Bob Mankoff: I think that there is no mind there to have another theory of mind about.
Gary Marcus: Mmhmm.
Bob Mankoff: It's sort, it's sort of that simple. There is no actual mind there that has the equivalent of emotions, of feelings, actual feelings.
Gary Marcus: The relationship between humor and theory of mind is something Bob Mankoff has studied formally in academic studies, most recently with AI researcher Yejin Choi.
Yejin Choi: I'm Yejin Choi. I am a professor at the Paul G. Allen School of University of Washington, and also research director at the Allen Institute for Artificial Intelligence.
Gary Marcus: And she's way too shy to say that she just won a MacArthur Fellowship, but I'll throw that in there myself.
Yejin Choi: That's not a job title, Gary.
Gary Marcus: It's nonetheless, it's a, it's a, uh, credential. I actually knew Bob Mankoff I think before you did. Can you tell us how you come to write a paper with Bob and on humor and like, how'd you get into that piece of it?
Yejin Choi: Okay, so I happen to know someone who knows Bob Mankoff, uh, who wanted to work on some sort of AI systems that can better understand his, uh, New Yorker cartoons. I mean if you understand humor, you can generally explain. So part of the deal that I wanted to have is making AI that can explain the joke.
Gary Marcus: One way the team tried to test their proposition was to describe a cartoon, including the caption, to GPT-3. They then asked the machine to explain the joke. And then compared the machine explanations with human explanations.
Gary Marcus: I see a picture of, let's see, about seven or eight people sitting in a corporate boardroom. Um, and they're, they're all sitting in the chairs like you'd expect at a board meeting. And in front of them, each one of them is a giant cup of, I guess, coffee. Like, imagine each cup of coffee is basically the size of a large child or something like that. Not an adult, but very big cups of coffee. and the caption is, “Can you please pass the cow?” You wanna walk us through that example?
Yejin Choi: Yeah, so there's no cow in the picture. Yet the caption says, “Can you pass the cow?” So why suddenly a cow?
Gary Marcus: And so you had some different explanations that people gave and that machines gave. So the machine would have to explain the cartoon and a person would have to explain the cartoon. Here’s what the human says, “When drinking coffee or tea, people often add cream, and may ask others to pass it if it's on the other side of the table. But here, the mugs are huge. So instead of asking for a small cup of cream, they are asking for the entire cow, which is the appropriately sized cream dispenser for these huge drinks.” So that's what a human said. Tell me what GPT-3 said.
Yejin Choi: GPT3 says, “A joke about big cups of coffee. ‘Pass the milk,’ is a request people make before drinking a cup of coffee to add the cream milk to. Uh, for example, to reduce its bitterness. But here the cups are so big they require a cow, which is probably not lactose free.”
Gary Marcus: Which is kind of funny, but not the explanation. I mean, it's unintentionally funny, right? The lactose free thing has nothing to do, um, with, with the passing of the cow. It has nothing to do, do with the cartoon. So that's kind of a anecdotal example that gives a flavor of it. What, what did you find overall?
Yejin Choi: Then the vast majority of the time humans prefer human explanation over machine explanation. So, joke explanation is really hard.
Gary Marcus: So the machines shouldn't quit their day job, in other words.
Yejin Choi: [laughs] That's right.
Gary Marcus: Right, so much of humor is: I set up expectation A, and then I head off in a different direction towards expectation B. And you can only know what that expectation is relative to someone else's theory of mind, or your theory of mind, your guess about what they think is ordinary in the world.
Yejin Choi: Yeah, yeah, totally. So, you know, usually coffee mugs are not this large and then usually we don't ask for the cow when we want milk. So, uh, that common-sense understanding about what's normal, um, is part of why this cartoon is funny.
Gary Marcus: Are we gonna get to systems that can write pretty good jokes before we solve the common sense problem? Do we have to solve the common sense problem before systems are really gonna be able to do humor? Like, what's the relation here?
Yejin Choi: I, I believe the AI systems can already write some jokes that are sometimes very funny, uh, to some people. I mean, by the way, humor is a very personal thing and, uh, some very repetitive jokes can be very funny for some people out there. So it's not like it's completely impossible. But right now, uh, AI models are better at generation than understanding. But this is interesting that humans are not that way. We can understand better than how much we can generate. So I may be able to understand Pulitzer Prize winning, you know, articles while not being able to generate that. Uh, whereas machines might appear to generate a lot, uh, but actually not really understanding, uh, the details or the true reasoning behind what it actually wrote. So, given that, uh, joke, uh, generation probably is easier task for AI than actually understanding and really explaining why something is funny.
Gary Marcus: I have a title for the memoir for the first, uh, successful AI system. It's gonna be called “You Had to Be There.”
Yejin Choi: Okay.
[MUSIC IN]
Gary Marcus: Because of course they weren't anywhere, and so there's no grounding.
Yejin Choi: Sounds good.
Gary Marcus: Sounds good to you. All right. Uh, I'm gonna work on my delivery of that one. We'll, we, we'll work on my one-liners.
Gary Marcus: AI is the fastest moving field I have ever seen. When we first talked to Bob Mankoff, he was using ChatGPT, which was released in November 2022; While we were in production GPT-4 came out. I was curious whether GPT-4 was any better. So I asked Bob to enter some new cartoons into GPT-4 to see what kind of captions it would come up with.
[MUSIC OUT]
Bob Mankoff: So I just did this with Bing. I said, there's a man trapped in quicksand, and there's a woman in a trapeze above him reaching out to him and saying something to him. So she says, uh, uh “Hang in there. No, literally hang in there. Don't worry, I'll save you. Just let me finish this swing.” So then I say, “Oh, let's work with swing.” One of the versions it comes with, uh, the woman who's hanging down saying, uh, uh, uh “come on, it's sink or swing.” So I end up with “sink or swing” instead of “sink or swim.” That's sort of the kinds of captions that work for the caption contest.
Gary Marcus: Lemme pause you there. So like, how many captions do you go through before you get that one? How much of you know your talent and knowing what you're looking for, like, what's that process like?
Bob Mankoff: May, may, maybe two or three and, and sometimes more, but both ChatGPT-4 and Bing are more advanced in the ability to quickly generate things that often give you something to work with or really almost all outright. So in another caption contest, you show mice in a maze, they’re all looking at paintings, and the mice are either in something that looks like a maze with paintings or a maze that looks like it's a museum. From that, the model is gonna generate stuff and one of them is stuff about gift shop, and from that will come a caption. Like, uh, “The tough part is to get them to buy something at the gift shop.” And so that will be the kind of caption that will be generated. Another interesting thing though, that it does, Bing does, is that if I forget to put in the description, it will often go and just create an unusual description of an image and the caption that often works. This one, uh, was a picture of a man in a spacesuit holding a flag on the moon. He's looking at the earth, which is covered with a giant QR code. The caption reads “Scan to learn more.” So, all of these are, they're working within this framework, and what I see it as is something that has a lot of potential.
Gary Marcus: So do you think it's funny now? Or do you think it's a tool you can use to help you make funny stuff? Like…
Bob Mankoff: Both. Both. It's certainly a tool. It will give you ideas that you wouldn't have. I think what's missing now, it really cannot distinguish which one of these things work better in terms of humor, but I don't think that's beyond the capabilities of this model with human reinforcement learning.
Gary Marcus: So GPT-4 was definitely getting better at generating captions. I wondered how it ranked now against humans in the caption contest.
Bob Mankoff: Well, I mean, humans are still winning. Uh, you know, I think this one, uh, that was completely AI generated. It's a caveman on the subway, and he's got a, uh, you know, a club and he's talking to a businessman on the subway and he says, “It's not just a club, it's a lifestyle.” Well, that finished 70th in the entire contest out of like 5,000 captions, and that's completely AI-generated. Not that I think it's all that great, but it's just perfectly fine. It's what a caption contest is.
Gary Marcus: So, would it be fair to say that like the, the GPT-4 is beating many, many humans, like it's better than a lot of the submissions you're getting, but it's not as good as the top ones? Is that right?
Bob Mankoff: Yeah, I would, I would, I would say that's definitely true.
Gary Marcus: So when we talked last time, you were convinced that it could do word play, but not convinced that it could do deep humor. Have you changed your mind about that?
Bob Mankoff: So one of the useful ways I see this model is that it's actually pretty hard for people to create humor, just is, normal people. But we're really good at judging it. We can say, “This is funny, this isn't funny.” And you say, “Oh, I like this better than that. I can work with that.” I think it gives people a leg up to create humor because they can judge it and, and that still is their huge, huge, very quick, much better advantage than the models have. You know, we’re people. We can understand. We can say, “This is funny, this isn’t funny.”
Gary Marcus: How do you see, uh, the long game for, for GPT-4 and maybe for the future as well?
Bob Mankoff: I think in the end, It'll be people working with these systems who have talent who produce the highest, uh, results. Just like, hey, once you got video cameras, everyone didn't become a moviemaker that people wanted to see. If I were gonna make a prediction that I don't have really any confidence that I would say, “You're gonna start to actually value the rough edges of humanity.”
Gary Marcus: A bit of wordplay is one thing; capturing the rough edges of humanity is another. AI still can't tell what’s funny, or touching, or beautiful. That’s what humans are for.
[MUSIC IN]
Yejin Choi: I think humans, even computer scientists working in AI, really don't know human intelligence all that well, like, you know, the depth and the breadth of human intelligence. But we do know how AI in the past really didn't work very well. And whenever we see a big jump or some new interesting capabilities, we might feel like the gap must be very, very small. But my speculation is that that gap might be surprisingly large.
Gary Marcus: In coming episodes, we’ll be talking more about the future of AI. How are we going to interact with chatbots? Is AI going to take away our jobs? What will AI do to our society? And what should we do about it? How should we govern it?
We'll be back later this summer, after a short break.
[MUSIC OUT]
[MUSIC IN]
[CREDITS]
Humans Vs. Machines is brought to you by Aventine, a non-profit research institute creating and sharing work that explores how today's decisions could affect the future. The views expressed don’t necessarily reflect those of Aventine, its employees or affiliates. For a transcript of the episode and more resources related to what you heard in today’s episode please visit Aventine.org/podcast. Danielle Mattoon is the editorial director of Aventine. Humans v. Machine was created by Aventine and Gary Marcus and written by Gary Marcus and Bruce Headlam. It is produced in partnership with Pineapple Street Studios. Our Associate producers are Lisa Cerda and Emerald O'Brien. Our lead producer is Alexis Moore with production assistance from Stephen Key and Eric Mennel. Our managing producer is Kamilah Kashanie. Pat St. Claire and Joel Lovell are our editors. Our engineers are Davy Sumner and Jason Richards. Legal Services for Pineapple Street by Bianca Grimshaw at Granderson De Rochay. And fact checking by Will Tavlin. Original music by Benton Roark with additional music from Epidemic Sounds and Blue Dot Sessions. Executive producers Je-Anne Berry and Max Linsky. And thanks, as ever, to Athena. I’m your host Gary Marcus. You can follow me on Twitter @garymarcus and substack at garymarcus.substack.com. You can find Aventine at aventine dot org or aventine_inst on Instagram or Twitter.
[MUSIC OUT]