Listen now
Transcript for Season 4, Episode 1: And the winner is…Watson!
[MUSIC]
[Montage of news reports about AI]
Sundar Pichai: AI is one of the most important things humanity is working on. It's more profound than, I don’t know, electricity or fire.
Bloomberg: A senior Google engineer who says one of the company's artificial intelligence systems has become a sentient being.
News Anchor 1: Driverless cars and trucks are just the beginning of a wave of automation that will threaten millions of jobs in every industry at once.
News Anchor 2: Is the artificial intelligence bot, ChatGPT, coming for your job?
Sundar Pichai: AI hosts are important for some of the biggest advances we are going to see. You know, whenever I see the news of a young person dying of cancer, you, you realize AI is gonna play a role in solving that in the future.
[Theme Music]
GARY MARCUS: We hear promises, practically every day, about self-driving cars, about chatbots, even machines with consciousness. Some come true; some don't. Machines really can beat humans at Chess and Go, but driverless cars still aren’t ready for prime time, even after a decade of hype.
Want to know what’s real—and what’s not? You’ve come to the right place.
This is Humans vs Machines. We’re going to take a long, hard look at hype and hubris; what’s real and what’s BS. Are computers becoming sentient? Will they take all our jobs? Try to take over the world?
I’m your host, Gary Marcus. I’m a cognitive scientist, an author, and an entrepreneur. I built an AI company that I sold to Uber, and co-wrote a book called Rebooting AI. I’ve spent most of life, pretty much since I was 8 years old, thinking about minds and machines. I’m sometimes seen as a critic of AI, but I love AI, and want to see it work. I also know how hard it is to get it right.
AI is already changing the world – sometimes in good ways, sometimes not… But to me, AI is on the wrong path – technologically, philosophically and even ethically. If we want artificial intelligence to really help us in our everyday lives, then we’re going to have to rethink AI.
Let’s go back more than a decade… to the birth of an incredible technology – IBM”s Watson – one of the most surprising accomplishments in AI History – and then see how it went terribly wrong.
[SFX]
GARY MARCUS: It’s 2004. Charles Lickel, an IBM executive, is sitting in a restaurant with colleagues, wondering what the company should do next. A few years earlier, IBM got amazing publicity when one of its computers, Deep Blue, beat the world champion Garry Kasparov, at chess. Lickel’s looking for the next a big thing. He glances around the bar, and notices that everyone is crowded around the TVs.
The crowd is transfixed. Ken Jennings, Jeopardy! superstar, is on what would turn out to be a 74 game winning streak..
GARY MARCUS: Lightning strikes. Lickel thinks to himself, IBM won at chess, could a computer win in Jeopardy!? He starts shopping the idea at IBM. At first there were no takers.
David Ferrucci: Nobody thought it was doable. Nobody wanted to take it on.
GARY MARCUS: That’s David Ferrucci. Nowadays he’s the CEO of an AI startup, Elemental Cognition. Back then he was a rising star at IBM Research.
David Ferrucci: I spent a little time and said, “You know what? I think this is doable.” I said, “I would love to do this. I mean, I agree that this is sort of out there and looks extremely hard relative to the kinds of problems we're working on now, but this is an opportunity of a lifetime.”
Gary Marcus: Did you think you had a chance to win or you were just like, “If we make progress at all, it'll be exciting”?
David Ferrucci: I think at that time I was more thinking, “This is a great problem to focus on, to push the whole area of natural language processing, to push the whole area of semantic search, of question answering. Jeopardy looks like a really great problem.”
GARY MARCUS: Ferrucci saw even then that the kind of simple keyword search that people knew from Google had its limits; he wanted to figure out how to get machines to answer questions that were posed in everyday language.
David Ferrucci: If we get the right investment, we can really sort of push the limits on this and, and see what was possible and what wasn’t possible.
GARY MARCUS: Given where AI was at the time, the challenge facing Dave and his team was massive. There were some existing AI tools for finding basic facts, but only with easy, straightforward questions. Jeopardy! was a whole new beast entirely.
A traditional search system might be asked “Where was George Washington born?” On Jeopardy!, you might get asked something more like “Martha’s famous politician husband, now found on some US currency, first learned to walk here.”
Type those words into a Google search in 2006, or even today, and you get articles about Martha Stewart & US currency – not what you want. Most of the web hits you get aren’t even relevant; you need a human to sort them out.
David Ferrucci: Here's the interesting thing about search. When you set the expectation that if you put in a query, I'm gonna give you something related and interesting to read, you're kind of always happy. If you give the expectation that you're a question answering system and you're gonna accurately answer questions, the expectations go through the roof almost immediately.
GARY MARCUS: With all these wild and sometimes cryptic Jeopardy questions, he was going to have to reinvent what search could be.
And he was going to have to build new hardware, and solve problems that by rights seemed impossible with the technology of that era. There was no road map here, no off-the-shelf software. It was like launching a rocket to Jupiter and trying to figure out along the way how you were going to build the life support systems.
And they couldn’t take forever, either. They had maybe four years to get the job done, they needed to make progress right away.
David Ferrucci: I had researchers who were very concerned about doing it because of how hard it was. And I sort of challenged them with that. I said, “If you go off and continue to do what we're doing now, do you think this technology can solve jeopardy, can win a jeopardy? Would you be able to answer that question?” And they said, “No.” And I said, “How, how does that make you feel?” They kind of thought about that and said, “You know, you're absolutely right. This is the kind of risks scientists should not be afraid of and research scientists to jump on.”
GARY MARCUS: Up until this point, Ferrucci had been running a research team, building new algorithms, publishing papers… Now he realizes everything is different: nobody cares how many papers he publishes. He needs to build a product. And that product needs to win.
David Ferrucci: People would come to me and say, “Dave, you know, we have like three to five years, let's say we have four years to solve this,” ‘cause that was the promise I gave (laughs) … and they would say, “We have plenty of time.” And I said, “We're already late.”
There was sort of a traditional academic research mentality, where people thinking about writing individual algorithms and publishing on them, and so like, decomposing the problem.
And so I was like, “We can't really think that way. We always have to think of the final goal.” When you put it all together the system that's actually gonna run to play jeopardy, is it getting better?
GARY MARCUS: Listening to him, and what he pulled off, it dawned on me that he was probably one of the greatest technical managers in the history of AI. Often the hardest part is getting everyone to row in the same direction, keeping them happy. And here’s Ferrucci on day one delivering a brutal truth, telling everyone, “We’ve got to change everything that we’re doing, or we will never win.”
David Ferrucci: We're gonna have 25, 30 researchers on this. You know, we need an architecture that people work on various components that plug into that architecture. And moreover, we need those components to be loosely integrated.
GARY MARCUS: Beating Jeopardy was never going to be easy. Take, for example, an actual $600 question, straight out of high school biology, but not something most of us think about every day.
David Ferrucci: “In cell division, mitosis splits the nucleus. And cytokinesis splits this liquid, cushioning the nucleus.”
(RECORD SCRATCH)
GARY MARCUS: Now that’s pretty tricky, even for a human. It’s about how cells work, and about a liquid that is inside the cell. And no machine back then or even now could actually work all the way through the question. Watson didn’t actually know what mitosis is, it didn’t understand what cell division is; it didn’t even know what the word cushioning meant. All it could really do was look carefully at all a whole bunch of stuff that was written on the web using these various words, and make educated guesses.
[MUSIC IN]
Step one was to analyze the question itself, breaking it down into pieces. Often a single question had more than one clue. The first step was just to find those clues.
Step two was to make a guess about what kind of thing any given question was getting at. Animal? Mineral? Vegetable? In the cell division question, Watson’s first job would be to guess that the question was looking for a liquid. For a single question, Watson might generate 1000 hypotheses.
Once it made that guess, another system would do something like a web search, looking through Watson’s databases for evidence for each of the thousand hypotheses, like cell division and mitosis. Eventually Watson would narrow the list down to a small number of possibilities.
David Ferrucci: We might generate organelle, vacuole, cytoplasm, plasma, mitochondria, blood, chromosome, meiosis…
GARY MARCUS: All those words are about biology, so it’s a start. But now Watson has to guess which of those words actually are liquids. Blood is, organelle is not, and so on.
Watson then scores all the evidence to come up with a final ranking for its answer, along with its confidence level.
David Ferrucci: It's almost like in order to solve an open-ended question, we turned it into a multiple choice question. So It'll generate a lot of possible answers, a thousand possible answers, say. Now I go and I evaluate each one of those answers because in them, most of them are crap. One of them is right. and there's probably a bunch that might be close to right.
GARY MARCUS: The right answer is cytoplasm.
[MUSIC OUT]
David Ferrucci: All of a sudden we were now answering questions we couldn't even come close to before.
GARY MARCUS: The first version of the system was so slow it took Watson two hours just to answer a single question.
Still, the important thing was that the IBM team had figured out a way to make tricky Jeopardy questions into something like multiple choice. And that gave them a fighting chance.
With this architecture in place, Dave divided up the team into subteams, building different components to address different parts of the problem, generating hypotheses, looking for evidence and sorting possible answers.
David Ferrucci: Now all of a sudden, like, how do we integrate all those? And that's where we used machine learning.
GARY MARCUS: Machine learning is a part of AI that is about getting machines to learn things based on data. Suppose you want a machine to recognize the difference between a cat and dog; instead of telling the machine go look at the whiskers or the ears, you give the computer a lot of pictures, some of cats and some of dogs, each with one a label, and let the machine tune itself.
In the end, Ferrucci used a whole set of machine learning algorithms to “fit” the data, to see what actually worked, getting each component to work as well as it could with the data that they had.
David Ferrucci: So this allowed people to go off, suggest projects, work on projects, and, the machine learning would find the right weights. Now, it wasn't that simple. We ended up with, I think eight successive machine learning models, but that allowed us to have independent groups, generate ideas, produce features, and then quickly integrate and see if they mattered. And then the shape of our confidence curve, our confidence prediction, became much more accurate.
GARY MARCUS: Now this is critical too. Dave just mentioned confidence. Search engines like Google give you pages of possible answers, but Google doesn’t tell you how sure it is of those answers. Simple keyword search was never going to cut it for Jeopardy. Watson was going to need to know how confident it was about its answers.
[MUSIC IN]
Gary Marcus: And that's critical in Jeopardy ‘cause you have to decide whether it's worth hitting the buzzer.
David Ferrucci: And then there was another dynamic which was: do you wanna look stupid?
Gary Marcus: Ah…
GARY MARCUS: Ferrucci thought that confidence was so important he wanted Watson’s confidence to be displayed on the television screen during the actual competition. He didn’t just want Watson to give its answer — he wanted Watson to say how sure it was, and he wanted the viewers to get a taste of what was going on – something he called the “answer panel”.
David Ferrucci: So the answer panel was a little, you know, display on the screen that showed uh, Watsons top three answers. And its relative confidence in those answers.
GARY MARCUS: Now Ferrucci has a framework, complete with machine learning, and his answer panel. After a couple years he’s made some serious progress.
Gary Marcus: And at that point, were you pretty sure you were gonna win? Or just
David Ferrucci: No, No. We knew that to really be competitive with the likes of Ken Jennings, we had to be somewhere in the low to mid seventies, of percent accuracy and very good confidence predictions to win and not look stupid at Jeopardy. I, you know, IBM, there were lots of internal debates about, “What do we do if we lose?” I mean, everyone understood there was a risk of losing.
GARY MARCUS: Ferrucci kept testing his system against a giant database of previous games, going back decades. Every quarter, he’d report the data to the executives, letting them in on Watson’s progress. Then, they hit a snag.
David Ferrucci: I remember one of the guys on the team, great guy, he was doing a lot of the leading, a lot of the ML stuff and all of a sudden he goes, “Holy crap, we're not looking at our performance chronologically.”
Gary Marcus: Mm.
David Ferrucci: I'm like, “Oh my God.” And there was a big drop. The clear measurement was, there was a point in time, it was like mid eighties, that if you looked at our performance, it went and dropped 10%. And 10% was significant, it literally took you outta the game.
GARY MARCUS: This… was a setback, a big one:
David Ferrucci: Oh my God. Oh my God. I felt very sick to my stomach. You know, it's like that gut wrenching, holy shit, right?
Gary Marcus: Yeah, so what that meant is like you've got a system that could, could have won Jeopardy in 1981, but if you have 1990 questions now you're, you're just a decent player, not a great player.
David Ferrucci: Correct. You can't compete with the top players. You can compete with a person off the street easy, but you can’t compete with the top people. We have to understand that a Jeopardy game is sort of always evolving. It's changing, you know, changes with the times. the writers evolve. And so from a scientific perspective, from a language processing perspective, the language is evolving, right?
GARY MARCUS: One of the biggest problems in machine learning is that the world changes over time. Your systems work on one set of data, and then the world throws you a curveball. The whole enterprise can be really delicate. Ferrucci had run straight up against that.
Eventually with a lot of hard work, they were able to move on. But even so there were other problems.
David Ferrucci: And don't forget we didn't have access to the cloud of machines. We had to show the machine, it had to be in a room disconnected from the internet, right? ‘Cause it was human against the machine.
[MUSIC IN]
Ken Jennings: I was 100% confident that I would win.
GARY MARCUS: That’s Ken Jennings, the all-time Jeopardy champion with the 74-game streak, who had inspired IBM’s Watson project. He, and Brad Rutter, another of Jeopardy’s best players, were invited to compete with Watson.
Ken Jennings: I had taken enough AI classes in college as a computer science major to know that natural language question answering was a extremely hard task, that there were no solutions that were playing Jeopardy at anywhere near human levels at the time. And I thought, “Sure, I'll go on there and defend Carbon-based life, easy money.”
GARY MARCUS: A few months before the taping, Ken was shown a chart of Watson’s performance against his own - and it showed that Watson was catching up.
Ken Jennings: It was alarming. It kind of felt like this is, this is what it's like when the future comes for you. It's not a Terminator laser eye piece tracking you down. It's just a line on a graph getting closer and closer to the thing you can do that you thought made you human and irreplaceable.
GARY MARCUS: Soon the team had built everything they needed and got to a point where they thought they had a chance. But they knew it was no guarantee, either:
[MUSIC OUT]
David Ferrucci: The algorithms, the architecture, the hardware, the scaling, the speed, the confidence, the performance of the system, the data we had to demonstrate really did something great, was fantastic, super happy. But here's the problem. There was like a 25 to 28% chance that we were gonna lose. And everybody's gonna go, “Oh, well I guess you didn't do that.”
Gary Marcus: That's right. The academics would know how amazing it was.
David Ferrucci: The academics would know.
Gary Marcus: But the rest of the world would be like, yeah, they played, whatever.
David Ferrucci: They lost.
Gary Marcus: They lost.
David Ferrucci: So that was kind of frightening, right? And, I asked PhD researchers not to publish for four years.
Gary Marcus: Which is really dangerous for their career. I mean, scary for them.
David Ferrucci: Correct. I asked them to take a risk and to bet on me and to bet on us and to bet on the project. And I said, “If we win, people will wanna read our papers.”
Gary Marcus: If you lose, you'll write nice letters of recommendation.
David Ferrucci: [Laugh]
GARY MARCUS: All this echoed in Dave’s head as they walked in, the day of the big match. It was January 2011. About a hundred people, including Alex Trebek, were at the TJ Watson Research Auditorium at IBM… all sworn to secrecy with airtight NDAs.
David Ferrucci: I think one of the most touching moments of my career happened because you could imagine, you know, four plus years in this thing, in the trenches with the team, unbelievable ups and downs. Like that machine just had to work this one time, right? It all came down to this thing. So it was just, I mean, the emotions were just, you know, intense.
GARY MARCUS: Even now, more than a decade later, Dave is blown away by that moment,
David Ferrucci: And when I got up to speak, and this of course was before, you know, we played the game, no one knew what was gonna happen. And my team and all the IBM execs, you know, gave me a standing ovation.
Gary Marcus: Well deserved.
GARY MARCUS: And then… it's time to play.
[Tape]
This is Jeopardy! The IBM Challenge.
GARY MARCUS: In total, it would be three consecutive games. IBM had a lot on the line.
Tape: From Los Angeles, California, Brad Rutter. An IBM computer system able to rapidly analyze and understand natural language, Watson. And from Seattle, Washington, Ken Jennings.
[SFX]
Ken Jennings: Watson of course, was not the black rectangle you saw on TV. That was just an avatar, a flat screen TV turned on its side and then connected by a cable to a room full of IBM's fastest servers running in parallel or something like that. Watson was the size of an RV. And as a result, Watson doesn't travel. Between us is kind of this looming presence and there was kind of a HAL 9,000 vibe from the absence there. the sense that there's no… sorry, Watson, there's no human soul there to play against. But also the fact that it's got this robotic thumb. You could hear this kind of ominous clicking, tick, tick, tick, tick, tick, tick, tick. Again, like a Terminator-like robot coming for you. They had turned out one of those little corporate auditoriums into a mini Jeopardy set. But if you looked closely, it wasn't a real Jeopardy set. The icon on the floor, like at center Court, if you will, was the Watson logo. It was like walking out on the floor of Boston Garden or you know, the Chicago Bulls arena and you know, the intimidation factor of, “Hey, you're in Watson's house now.”
David Ferrucci: The actual game was a nail bite- people don't realize, ‘cause a lot of people don't completely understand the actual game dynamics of Jeopardy.
GARY MARCUS: Game One was promising, ending with Brad and Watson tied for the lead. Game Two saw a shocking moment:
David Ferrucci: When we got the first final jeopardy answer horribly wrong, or what seemed to be horribly wrong. The category was US Cities. So they were looking for US Cities. The question was…
Alex Trebek: Its largest airport is named for a World War II hero. Its second largest for a World War II battle.. 30 seconds players good luck.
GARY MARCUS: Final Jeopardy starts with whoever has the least money. Ken goes first.
Alex Trebek: And you wrote down what is Chicago. That is correct, and you wagered $2,400. That doubles your score to 4,800. Down to Brad now —
GARY MARCUS: Brad follows suit, he gets it right. But Watson…
Alex Trebek: And the response was… What is Toronto, with a lot of question marks. Which means of course that Watson had many many doubts.
David Ferrucci: In final Jeopardy, you have to answer. You have no choice. You must answer. So if this were a regular Jeopardy question, we would've never answered it.
Ken Jennings: It's such an important part of good human-level Jeopardy play is not just to know, “Hey, do I have a guess here or not?” The more important question is, “Do I feel certain enough about my guess to hit the little buzzer?”
David Ferrucci: For the final jeopardy questions there was sadly no answer panel. So our first answer was Toronto, which was of course wrong. Our second answer was Chicago.
GARY MARCUS: Watson at least had the sense not to bet too much, because it knew it wasn’t confident. Watson had guessed correctly that it needed a US city, but it wasn’t absolutely certain Toronto wasn’t a US city; even though the answer was wrong everything in the system functioned more less as it was supposed to. It made the best guess it could and knew it was probably wrong. Here’s where tying betting to confidence together made a difference.
Alex Trebek: And the wager, how much are you going to lose? (Audience Groans) Oh you sneak, $947. (Clapping, cheering)
GARY MARCUS: Watson is still in the lead.
Ken Jennings: But it was the kind of thing where it's mistake revealed that when an artificial intelligence errs, it doesn't do so the way a bad or substandard human player would, it it can kind of go off the rails and if, if your machine is not playing Jeopardy, if it's driving your Tesla for you, it's important to know about those mistake use cases.
GARY MARCUS: It all comes down to Game 3. Game 3 starts reasonably well for Watson. Near the end, Brad has been left in the dust. Ken is in the lead with 17,000. Watson is trailing just behind, with about 15,000. And everyone’s kind of waiting on the Daily Double.
David Ferrucci: Watson had not clinched it until the last daily double of the final game. If you land on a daily double, you can take a very risky bet and you can bet everything you have if you want on that question. And you can flip the outcome of the game.
GARY MARCUS: Watson draws the Double.
Alex Trebek: Here we go, in legalese. This two word phrase means the power to take private property for public use. It’s okay as long as there is just compensation.
David Ferrucci: And I think at that point, had Ken gotten a daily double, he could have flipped it and, and won.
GARY MARCUS: But he doesn’t.
Watson: What is eminent domain?
Alex Trebek: You’re Right.
GARY MARCUS: Watsons score continues to rise. Then comes final jeopardy.
Alex Trebek: The category is 19th century novelists, and here is the clue. William Wilkinson’s ‘An Account of the Principalities of Wallachia and Moldavia’ inspired this author’s most famous novel.
GARY MARCUS: Everybody gets it right, but the betting is key. Brad is up first. – His answer? Bram Stoker. He’s right.
Alex Trebek: You’ll double your score to 11,200, and we add that to your total from yesterday… that gives you a two day total of 21,600.
GARY MARCUS: Then Ken.
Alex Trebek: And We find… who is Stoker? I for one welcome our new computer overlords (laughter). An your wager was 1000. That gives you to a two day total of 24,000.
Alex Trebek: Now we come to Watson. We’re looking for Bram Stoker. And we find… Who is Bram Stoker (cheers) and the wager? Hello, 17,973, 41,413 and a two-day tota l of 77,147.
David Ferrucci: Had we not then gotten final jeopardy, right, we would've won by a dollar but we got final jeopardy right. And it looks like we won by a lot.
Ken Jennings: I think Dave’s right. The final scores did look like a, a crush, you know? the previous day Brad and I had each beaten Watson and Watson just got a very favorable spread of categories in the televised games and found the daily doubles when it needed to. It's never fun to lose on Jeopardy. I just remember standing there at that podium thinking, “I guess this is what it feels like to be replaceable.” You know, like a lot of my identity was tied up in being the person who knows stuff. You know. And on some level you just feel like the thing that makes you weird is also what makes you unique and irreplaceable. And it was kind of a moment of reckoning to see that you know, you throw enough money at the problem and you can basically duplicate my only, my only marketable skill. I kind of felt like I'd been kicked outta the information economy and like that I was the first one, but that I probably wouldn't be the last.
GARY MARCUS: For Dave, it was another feeling entirely.
[MUSIC IN]
David Ferrucci: It was a huge relief, there was elation, absolutely the moment. And then I think I went home that night and relaxed that evening and the next day I was like, “Now what am I gonna do?”
GARY MARCUS: And that wasn’t just the question for Dave, but for IBM. When the show aired a few weeks later, Watson’s victory exploded in the media. The show’s ratings climbed each night of the contest, making it the highest rated show in the top ten markets in the U.S.
IBM’s stock price surged almost 13 percent for the year, on its way to an all-time high in early 2013. IBM had been in the shadow of tech companies like Google and Apple… but suddenly, it had a hit on its hands.
This is where our story starts to turn dark. IBM needed to figure out what to do with Watson next. It reached and it reached too far.
[TAPE:]
John Kelly: Fast-forward from that game show five years ago and we’re in cancer now.
Charlie Rose: How did you do it?
John Kelly: We worked with the best oncologists, cancer doctors at Memorial Sloan Kettering, M.D. Anderson and they told us what data to feed Watson and Watson learned over that period of time.
Zak Kohane: They said, “Well what about watson? It’s solving the problem.” And I said, “It's not solving the problem its not getting close. They're making a lot of noise.”
Suchi Saria: So they’re declaring victory based on a conceptual idea, without going through the rubric of like rigor.
Casey Ross: I think they just over-promised and under delivered. In essence, Watson was sold for its parts.
GARY MARCUS: In the next episode: after a stunning rise, a bigger fall.
[MUSIC UP]
CREDITS