Editor's Note

Listen now

Hear the episode about IBM's attempt to turn Watson into a cancer diagnostic tool. Subscribe on Apple, Spotify, Stitcher, or wherever you listen to podcasts.

Transcript for Season 4, Episode 2: How IBM’s Big Bet Failed

[TAPE]

Alex Trebek: Now we come to Watson. We’re looking for Bram Stoker. And we find… Who is Bram Stoker? And the wager? (cheering) Hello! $17,973. $41,413 and a two-day total of $77,147.

[THEME MUSIC]

Gary Marcus: It’s 2011, Watson, a computer built by IBM, wins Jeopardy against two of the game show’s best contestants. It’s an astounding feat for Watson. And It’s a publicity coup for IBM. Almost immediately, executives start to dream of ways to capitalize on their victory… Their flagship goal: medicine and the toughest fight of all, cancer.

[Tape MONTAGE]

“Watson is the beginning of a whole new era of computing…”

“If you think solving cancer is cool, then we’re cool…”

“Watson was hailed as the great hope for IBM or a great hope for IBM, and actually in the real world of business it's having a hard time.”

Gary Marcus: In the last episode, we heard about Watson’s astonishing Jeopardy victory. Today, Watson’s second act: a more troubled story, about what came after Jeopardy. It’s a tale of hubris, a tale of why AI is often harder than most people think, and a warning that when it comes to AI, we should all be more skeptical of corporate hype.

I’m Gary Marcus, and this is Humans vs. Machines.

Watson was meant to diagnose cancer. Their dream? Feed in a patient’s medical records, and Watson would read those records and the medical literature, put it all together; and out would come a diagnosis, and a treatment plan.

[phone sfx]

Ari Caroline: I think I first heard about it over Twitter actually, where there was the early buzz going on in terms of what they were doing with Jeopardy.

Gary Marcus: That’s Ari Caroline, now the chief executive of a Biomedical AI startup called Weave.

[MUSIC IN]

Back in 2012, he was at Memorial Sloan Kettering, known as MSK, one of the world’s top cancer centers and one of the first to collaborate with Watson. It was a moment of pure buzz, when the promise of IBM’s Watson was at the forefront of everyone’s mind:

[TAPE MONTAGE]

“Watston…”

“Watson…”

“What's next for Watson? IBM hopes to use the technology to help Doctors diagnose patients.”

Gary Marcus: For all the hype, the reality was that Watson wasn't going to be able to do any of that out of the gate.

Ari Caroline: We weren't, you know, under any illusions about it being able to be plug and play with cancer research. So what we were proposing to them was a research collaboration with the IBM research team.

Gary Marcus: IBM was very happy to take the meeting. They were eager to kickstart Watson’s foray into cancer. At their first meeting at MSK they brought the top brass. They brought the CEO, Ginni Rometty. David Ferrucci, who led the team to the Watson victory, was there, too. But even then, Dr. Caroline sensed that something was wrong.

Ari Caroline: That first meeting, I think we got a few of the first warning signs that this actually wasn't going to be a collaborative research project.

Gary Marcus: So what were the signs, which maybe in hindsight you wish you had, you know, taken more seriously?

Ari Caroline: Well, so first of all, the person who was leading the discussion… There was, uh, some confusion about whether he represented IBM Corporate or IBM Research. But he came across as being a very, you know, sort of prototypical entrepreneur-type who is describing something that he clearly didn't really understand himself.

Gary Marcus: Ferrucci, the genius behind Watson, who we met in the last episode, was there… but strangely silent. Before long, he was gone from IBM altogether.

Gary Marcus: So Ferrucci was on the research side. He's gone now. And you're mostly talking to corporate types. Like, what's the issue with talking to the corporate people?

Ari Caroline: Well, it wasn't a research project. You know, it became very clear that we weren't working together on it. The reason my team had been assigned to it was because it was thought of as a research project with efforts, um, between our research teams and their research teams. That clearly wasn't happening. Everybody we interacted with was from IBM Corporate.

Gary Marcus: And so what was their ask at that point?

Ari Caroline: They wanted a lot of data, and they wanted time with clinicians.

Gary Marcus: The big source of the data that IBM wanted was patient records, but it turns out patient records aren’t as straightforward as you might think.

Ari Caroline: Most of the important clinical features that you would look for are often captured in the notes and not captured in structured data fields in the EMR.

Gary Marcus: I'm just gonna unpack for one second there. You go to your doctor, you get an electronic medical record, that's the EMR, and some of it is structured. You just tick a box, you put in a number or something like that. And some of it is just text, it's doctors taking notes and that's what you're describing as the unstructured, notes. Your thought is that there's really good information in there that's getting lost because machines can't really read.

Ari Caroline: Exactly, exactly.

Gary Marcus: The problem is twofold. First, electronic medical records are built around billing and insurance, not science. The things medical researchers want to know about, aren’t always emphasized in what the insurance companies need. And second, doctors often add stuff in the patient notes that is helpful, but in writing, rather than database records, and computers can’t read; so they miss a lot of what the doctors are trying to say.

Ari Caroline: I'll give you an example in breast cancer. Often the diagnosis code will often reflect the extent of the disease. But from a treatment perspective, what a doctor wants to know is, is it HER2-positive? Does it have this biomarker that's represented by the HER2 gene? And so what they'll describe when they're describing it to a colleague or a tumor board, that's the way they think of the diagnosis actually. But that diagnosis, again, is not captured in any structured form. It's described in free text.

Gary Marcus: part of the issue here is that most of the data exists around billing. So the billing process cares about one thing and the doctors know something else is important, and they can't put it into the billing software, so they write it down in the kind of like the open-ended place. And so part of what your hopes from Watson was that they would help you with problems like that.

Ari Caroline: That's exactly right.

Gary Marcus: Dr Caroline wanted to teach Watson to help the machines read the text, but that would take time and research, and IBM didn’t seem to have the patience. Slowly, surely, it was becoming clear that MSK and IBM had very different ideas about the collaboration.

Ari Caroline: They didn't explicitly blow us off, and it took us some time to realize that they were implicitly blowing us off.

Gary Marcus: It takes a while cuz you have a bunch of meetings. You're still optimistic. And then at some point it dawns on you that, like, this thing that we want isn't gonna happen.

Ari Caroline: Yeah. I mean, again, we had very early warning signs and I had brilliant analysts working on this project who, from day one were telling me, “The technology isn't there, the people aren't there, this isn't working. This isn't what we expected. They're not working with us at all.” There was potential there that was real potential because we were actually working with the technology ourselves. We saw that the potential was there.

Gary Marcus: The potential was there, but the research that Dr. Caroline wanted to see wasn’t happening. And Watson wasn’t working as planned when they tried it out on patient records.

Ari Caroline: It didn't really recognize where it didn't have sufficient information and shouldn't know an answer and started making recommendations even when it didn't have enough information.

Gary Marcus: So it was both unsophisticated in that sense of not having kind of a self-awareness, not in the consciousness sense, but, an awareness of its own limitations of where it needs more data. And sometimes it was wrong.

Ari Caroline: Oh, yeah, it was often, frequently wrong.

Gary Marcus: And so you see this and you, you tell them, “Hey, this is wrong.” And they kind of say, “Yeah, we're working on it.” Is that kind of how it went?

Ari Caroline: Yeah, and they often did go in and fix some of those rules-based algorithms, and, you know, some of the settings. But, it was always just tweaks. It didn't change the underlying system.

Gary Marcus: So it might have been possible to do it with a more open-ended research effort, but they had the tools they wanted to use and the tools they wanted to use just really weren't up to the job.

Ari Caroline: Oh, not even close.

Gary Marcus: Maybe it could have worked, if different choices had been made, but they weren’t. Dr. Caroline’s staff got fed up; tensions grew in his lab. Eventually, he was taken off the project. He feels bad that it didn’t work out for his team.

Ari Caroline: Things started to become tense between me and the IBM team very quickly. 2013, going into 2014, my boss made the decision, which I think was the correct decision at the time, to remove me from the project because I was not able to bring anything positive to it. The physicians, were, themselves very frustrated and, you know, several of them reached out to me and felt like I had abandoned them to some degree, abandoned them to, to these IBM folks. But honestly, there was a lot of, um, bad feelings about, the way we'd been treated and the bait and switch that they'd pulled on us.

Gary Marcus: After our interview, Dr. Caroline felt so moved reflecting on things that he sent us a voice note.

[MUSIC]

Ari Caroline: So with some time to reflect over Sabbath, you know with all devices turned off over 24 hours… I realize there was an aspect of this conversation that we didn’t really get into, and that was the real cost of all of this time that was spent trying to make this project work. From the perspective of the doctors, and what they gave up, it’s really hard to rationalize it that way.… Mark Chris, who eventually became the physician, head of Watson, I think that actually became his title… He stepped down as the chief of thoracic oncology and cut back his clinic hours quite substantially. Saw far fewer patients.

This is the type of person who, you know, everybody loved. He attended the funerals of the patients that he saw, who passed away, and there was one time early on in this project where I had a meeting with him. But he got a phone call just before we were supposed to start, and I was outside his office and the door was partially open, so I heard a bit of the conversation and it was the wife of one of his patients calling, uh, her husband had just passed away. And, he spoke to her for like an hour, just comforting her. How many physicians do we have like this in the world right now? And we took somebody like that off of patient care? That haunts me.

Gary Marcus: We reached out to IBM for comment. They didn't respond.

Gary Marcus: Dr. Caroline left the Watson project in 2014. IBM continued, buying more data, focusing on other collaborations. From the outside, things still seemed to be going great. Not long after, CEO Ginni Rometty appeared on Charlie Rose to say so.

[TAPE]

Ginni Rometty: We have participated in some of the most glorious moments of history. Whether might've been the first systems that ever did census, or landing a man on the moon. I’m telling you our moonshot will be the impact we have on healthcare.

Casey Ross: I first started covering Watson right around, it was about 2016, um, and then into 2017… Where I had noticed some reporting and some questions that were beginning to pop up about IBM Watson, in particular, in the realm of cancer care.

Gary Marcus: Here’s Casey Ross, from STAT News, who helped break the story.

Casey Ross: And so we didn't know how the story was gonna unfold, but then we started interviewing doctors at facilities all around the world that had begun using Watson — in particular Watson for oncology, uh, which is the cancer care product again. And in finding from those interviews, that it was a very muddled picture.

Gary Marcus: One of those collaborations was with MD Anderson, the renowned cancer clinic in Texas.

Casey Ross: That project ran way over budget and they weren't able to develop the level of functionality that they needed to make the treatment advisor sort of useful and, and scalable to other institutions, even, even within, uh, MD Anderson itself. And so the project kind of, uh, imploded.

Gary Marcus: The budget was 50 million dollars or something. Is that my memory or is a pretty substantial number?

Casey Ross: Well, I think it was initially in the thirties and then it ended up being like 62 million, something like that.

Gary Marcus: Casey’s first article on Watson came out in September 2017.

Gary Marcus: You feel confident now that you have a picture of what's going on. What did you say?

Casey Ross: We said that Watson for oncology was not living up to its hype or its expectations and was not delivering the type of benefits that the institutions that were using it had expected. And in fact, many of them were having significant problems applying the technology to the patients in front of them. That it was not also winning a lot of adoptions, they were not getting a lot of business. Um, they had more or less been blocked out of the US market. You know, there are very few takers essentially in the US or in Europe. They had more in Asia. The ones they had in Asia though were pretty critical.

Gary Marcus: How did IBM respond?

Casey Ross: To us directly? It was pretty much crickets.

[MUSIC]

Gary Marcus: IBM wasn’t responding to Casey. On the inside, some of the employees were getting worried; some started to reach out to Casey. Even after these problems were raised, IBM continued to promote Watson publicly.

Here’s Ginni Rometty on Fox Business in 2018, from Davos in Switzerland, as these problems were being raised:

[TAPE]

Ginni Rometty: Watson, I know you and I talked about it for healthcare, that's really where we started and —

Anchor: I think it's incredible…

Ginni Rometty: We will have treated over 100 thousand patients now and Watson for oncology is rolling out around the world. India, China, big take up. And if you think about it India - one oncologist for 14-16 hundred people. And so they'd never have a chance to have world class care, to have assistants for a doctor on the diagnosis and the treatment.

Gary Marcus: But on the inside, some of the employees were getting worried.

Casey Ross: Like we literally started getting the manila-envelope-in-the-mail type of thing. Because there were people that read our reporting within the company who thought it was accurate and wanted to support the questions that we were raising. And we had individuals providing us with recordings of internal meetings, where the executives at the company that were in charge of developing Watson for oncology would respond to our reporting. And it really began to kind of take off from there.

Gary Marcus: Were those shocking to you? Like when they came in?

Casey Ross: Completely shocking. Yes. We'd received the official line repeatedly from IBM. You know, that this, uh, this is working, we don't have any doubts about it. The internal documents that we got completely obliterated those defenses.

There was a quote that I quoted in that eventual follow up story where the clinician from the hospital said, you know, “We initially started to use this product because we thought it was good for marketing, but at this point it's just turned out to be a piece of S H I T.”

Gary Marcus: A slightly different question… How far did it actually get? So some patients were actually treated, I guess. Like, were there screw ups? Did any patients suffer because of the screw-ups?

Casey Ross: We never got any examples of patient harm. There were some suggestions within the internal documents of recommendations that Watson would make that were unsafe or for products that had, you know, a warning label that indicated that they shouldn't be used for the patient that was in front of the doctor at the time.

Gary Marcus: So, uh, there were always doctors in the loop, essentially. You never had like machine directly giving advice to a patient.

Casey Ross: Yeah. And frankly, it was always designed to be that way. There was always supposed to be a human in the loop. It was just that the human in the loop was supposed to be wowed by what the technology was saying. And that is the thing that was not happening.

Gary Marcus: So the doctors never got the “wow” moment that they would have been promised.

Casey Ross: Right. That “wow” sort of Eureka moment just never occurred.

Gary Marcus: The goal of the Watson project was noble: to help cancer doctors make better and more efficient decisions for their patients. And the company spent billions of dollars trying to make that happen. But they couldn’t keep their promises. What went wrong?

Isaac Kohane: The failure of Watson was basically twofold.

Gary Marcus: That’s Dr. Isaac Kohane, Chair of the Department of Biomedical Informatics at Harvard Medical School.

Isaac Kohane: Those who are involved in machine learning know that data cleaning, data access is 80% of our work. That part was not done well, and there was not good understanding what it would take to have that done real-time.

Gary Marcus: People often talk about AI and machine learning, as if they are some magical process: data in, and insights out. But it doesn’t really work out that way. There is a lot of grunt work just making sure your data is collected in a form the program can understand. If you just throw in all the data you’ve collected, willy-nilly, you are going to wind up with a mess.

And medicine is, well, complicated. Because humans are complicated, and hospitals are complicated, too. Dr. Kohane gives a great example of the kinds of challenges that can arise in interpreting medical data.

Isaac Kohane: I was visited by a, an executive, a, uh, CEO of a startup from Silicon valley. They had worked with Google. They had worked with insurance companies and, um, he was, he was a little cocky. And so I asked him the following question. We've looked at literally millions of patients across tens of millions of laboratory studies. I said, “Tell me why, if you have the misfortune of being like me, white, between age 50 and 65 and your white blood count is low at three o'clock in the morning, your chance of death in the next three years is 53%. Same 50, 60 year old, uh, male white male category. And it's three o'clock in the afternoon. Your chance of death is less than 3% in the next three years. How do you account from that difference between 53% to 3%? Any thought?

Gary Marcus: The young CEO didn’t get it; he couldn’t figure out what explained the difference. Your average machine learning system wouldn’t get it either. To see it, to see what’s going on, you would need to look beyond the numbers, to know something about people.

Isaac Kohane: The difference is if it's three o'clock in the morning, and someone's drawing blood out of your arm, that means you are in a hospital and you are dying. Or you look like you're dying. If it's three o'clock in the afternoon and they're drawing blood on you, it's a routine blood draw. And therefore, the context is everything. And we showed that for 63% of laboratory studies, knowing the time and location of the test was actually more predictive than the actual value of the measurement that you took the test.

Gary Marcus: This is a big deal. The point here is that medical data, without context, is a total minefield. If you build a machine learning system it will notice patterns. But will it notice the right patterns? The difference between the right patterns and the wrong patterns can be immense. What’s genuine causation, and what’s mere correlation? In machine learning, the issue is often about spurious correlations; you think you’ve got the right answer, but it turns out the AI has gotten fooled by some small detail or another. There were other problems too:

Isaac Kohane: They barely had any integration of the actual data of the patients into the machine learning system. The expectations that were around this project seemed to me completely out of sync with some of the facts on the ground.

Gary Marcus: Watson’s struggle didn’t just hurt IBM. For a while it hurt the whole field of medical AI, because people thought, wrongly, that IBM had the whole field sewn up.

Isaac Kohane: I would say it was distracting when I was talking to graduate students or other professors about what we were doing. For example, they said, “What about Watson? It's solving the problem!” And I said, “It's not solving the problem. It's not getting close. They're making a lot of noise.”

Gary Marcus: Watson was taking away a lot of oxygen from other people’s research, even superstars like Dr. Kohane. When a big corporation says they are working on a problem that they aren’t actually solving, it makes it harder for other teams to do their own work. I worry about something similar happening today, within academic labs — with them being more and more crowded out by big corporate promises.

At the end of day, though, the biggest flaw in IBM’s thinking wasn’t about the data. It was in thinking that in AI, one size fits all. Jeopardy was mainly about memory. Being good at memory doesn’t mean you are good at being a doctor. Watson was a great system for Jeopardy, but it didn’t mean you could expect the same technology to solve cancer too.

Suchi Saria: Whenever you're at the forefront of a very fast, evolving technology, you have to approach it with humility, right? Even when you think you've got it figured out. You know we're at the frontier of a field where there's so much to be done and the world is a complicated place.

Gary Marcus: That’s Dr. Suchi Saria, a medical AI researcher and the chief executive of a medical AI startup called Bayesian Health. Her view is that Watson was too ambitious. That a different approach was needed. Rather than shooting for the moon, trying to solve all of cancer with some single untested magic technology, she’s trying to focus on a small number of more manageable problems, taking them step by step. And she’s making real progress.

One of the main problems she’s been focusing on for the last several years is Sepsis. That’s the body’s overreaction to infection. It’s not a sexy topic in medicine, but Sepsis is a huge problem. It’s the leading cause of in-hospital deaths, killing more than 270,000 people every year in the United States and millions world-wide. It’s hard to diagnose in early stages, and once somebody has a full-blown infection, it’s very hard and expensive to treat.

Suchi Saria: What I realized was when I was first doing the work, I was doing it because it was clear to me that it was the leading cause of hospital deaths. It was a model disease of a number of other complications like it where there’s opportunity to introduce completely new ways of practicing.

[MUSIC]

Gary Marcus: Dr. Saria and her team wanted to build an AI system to monitor medical records, current symptoms, and lab results in real-time – essentially an early warning system that would send alerts to doctors and nurses if it determined a patient might have Sepsis. And for Dr. Saria, the problem was more than academic. Dr. Saria had written papers about Sepsis since 2013. One day, when she was a young assistant professor, things got very real.

Suchi Saria: What was to me very, uh, sad was, you know, I, I remember receiving phone call from my mom about, my nephew, he was in India. he's in the ICU, not doing very well. He got suspected when he was already in shock, a state that's pretty late in the game, much harder to resuscitate the person. And they wanted me to interpret his labs and figure out what the heck is going on and is there anything we can do? What was really, really sad about it was that I'd actually written papers on Sepsis and the papers had gotten a fair amount of attention. And then I lost my nephew to Sepsis. I kind of realized, like, “Damn it, I’d done all this work academically, but like what was it actually doing to make it out in the field?” And it went from a theoretical exercise of writing papers and, you know, doing work with my peers that my colleagues thought were great, to actually solving the problem. It definitely gave me like a very real honesty around making sure what we were building was going to work in the real world.

Gary Marcus: After years of hard work, Dr. Saria and her team got to test their ideas in the real world. Starting in 2018, they began a two-year study of her system at five hospitals. This wasn’t a randomized trial, but the results were promising. The system correctly identified 82% of the Sepsis cases and, crucially, it detected those cases earlier and reduced the time before antibiotics were administered by almost two hours – a potentially critical difference in a fast-moving disease.

Overall, her team estimated that the system reduced mortality by about 18%. Results like these, if they hold up, could some day help save hundreds of thousands of lives. Dr. Saria’s approach has been slow and painstaking; she’s been at it for eight years. It’s about as far away from the overnight magic that IBM promised as you could imagine. It’s all hard won lessons, about the journey from having a demo to having a real product that’s useful in the real world. She calls it her road map.

Step one is just getting good quality data; it was the same thing that Dr Caroline and MSK faced, for the same reason: health records aren’t built for making medicine better, they are built for insurers.

Suchi Saria: Financial codes are extremely, like, poor, in terms of fidelity of capturing the real Sepsis cases. So it’s garbage and garbage out, right? So if you have poor quality targets to learn from, you're just gonna learn systems that don't make any sense.

Gary Marcus: So even figuring out who actually has Sepsis is a challenge. And it’s only once you get good data that you can get to step two: figuring out whether your algorithms really work.

Suchi Saria: The second piece of it was then as we start to improve the quality of the metrics we were using to measure performance, that uncovered from us all sorts of ways in which the system itself was actually performing poorly.

Gary Marcus: False alarms are a big problem in systems like this; so are misses, when a system fails to detect Sepsis when it is really there. You can’t really afford to make either kind of mistake. And then once you think you have nailed that on your first set of data, you get to step three: moving from the lab to the real world.

Suchi Saria: When you go from your ideal setup, which is in a lab data environment where you've collected data very meaningfully, you've cleaned up and only taken records where there’s enough data, to operating in the real world, where someone in the emergency department has very little data… So we had to think about learning strategies that can operate in this kind of, like, highly variable data environments.

Gary Marcus: So you’ve sorted out that mess. You’ve dealt with the fact that everybody’s recording data in different ways. Now you have to deal with the pesky human beings that use your system, who don’t appreciate all the beauty of the algorithms you’ve developed, and who don’t care about all the hours you spent cleaning data either. They have their own problems to deal with.

Dr. Saria’s solution was to send alerts through the hospital's own medical record system. That way, anytime a doctor or nurse checked the records, they could receive a warning about possible Sepsis. In a test trial, health care providers paid attention to 89% of the alerts, a promising sign that means her system is beginning to get critical, real-world traction.

Suchi Saria: Then there was a question of how are we gonna deploy it in a way that builds provider trust? And here we had to really understand the healthcare environment within which we are deploying.

Gary Marcus: Step four is trust. What might make sense to an engineer building a system, might not make sense for a customer using the system. Here the customers - doctors and nurses -- are busy; time is an issue.

Suchi Saria: They don't have time to try to spend energy learning about your machine learning system in order to be able to ingest it as part of their workflow. So we had to think, you know, so I started working on this idea of human machine teaming. How do we build machine systems that amplify human experts?

Gary Marcus: Remember that dream of having systems that simply look at a person’s medical file and out comes a personalized treatment plan? Right now that’s just not possible. The AI for doing that -- for reading patients notes, correctly interpreting them, and understanding them in context of the medical literature -- just isn’t far enough along.

So any real world medical AI system has to be built with humans in the loop. And that means it’s not enough just to craft some new machine learning algorithm; you have to craft your software to be something that busy humans can trust and actually want to use.

Step five. You don’t just need an algorithm; you need a whole ecosystem.

Suchi Saria: I've spent the last four to five years doing work in this space on safety, reliability, and a huge part of like, for us being able to safely deploy where thousands of providers are trusting our system was then having those processes in place that would allow us to do real-time understanding of safety efficacy.

Gary Marcus: In your TED Talk you use the word grit and i think thats part of what youre talking about here. To get this stuff done, you need grit. You can't just be like, “I got a good idea.” Like, you have to go through these processes to make it work.

Suchi Saria: Yeah, I think you need at least a decade plus of grit. I don't think there is a shortcut here, and I think anyone who tells me that there is a shortcut, uh, nine outta 10 times, I wanna say maybe 10 outta 10 times, you know, when you probe deep, they don't have it figured out.

[MUSIC]

Gary Marcus: In medicine, as in life, you can’t always just look up the answers. Every patient is different… Some of what doctors do is basically just remembering stuff, but some of it is about reasoning, thinking about what might follow from what you already know in a situation you haven’t encountered before.

David Ferrucci, the guy who built the original Watson, the version that won at Jeopardy, had much the same take as I did. Watson was great for Jeopardy, but that didn’t mean it necessarily would work for cancer. I asked him a bit more about this, and he started talking about human cognition.

David Ferrucci: We make decisions in two ways. You know, we use our, our statistical experience, if you will. We go back and we say, “Well, you know, I've been in this situation, you know, five times or 10 times.”

Gary Marcus: Statistical experience is one way we make decisions. The other way is more like reasoning. Making inferences, deduction, logic, that sort of thing. Watson was pretty good at the statistics, but it wasn’t able to reason. To illustrate the difference here, Dave told a powerful story about his own father.

David Ferrucci: You know, he had a cardiac arrest while in a restaurant at a party. Ambulance came, brought him to a hospital. It obviously was a disaster, horrible nightmare experience. But at one point they trying to keep him alive. And the doctor, a resident, comes out to me and says, “You know, bad news, you're gonna have to sign a do not resuscitate.” “Why?” “Because your dad is brain dead.” I say, “How do you know he is brain dead?” And, and the resident says, “Well, there's a 98% chance that he's brain dead.” “How so?” “People who come in under similar circumstances, 98% of them turned out to be brain dead.” And I said, “So there's a 2% chance he's not brain dead.”

Gary Marcus: That’s just the statistics. But this was life or death - Ferrucci wanted to know that the doctor was reasoning based on his father’s specific case.

David Ferrucci: Basically in not-so-many words I was telling him, “You need to give me a deductive, logical argument of why my father in particular is brain dead. Show me evidence about him, not a statistical argument.”

[MUSIC]

David Ferrucci: But when you're that individual, you want an explanation. When you're treating a patient, when you're diagnosing a patient, you need that explanation. He was not brain dead, no brain damage at all, it turned out, 24 hours later. So, I was in the 2%, but I had to be able to distinguish between the two kinds of decision-making.

Gary Marcus: Watson could dredge vast arrays of statistical information, but it couldn’t really do the reasoning part, the type of thinking we use to make careful, deliberate decisions.

In January 2022, IBM announced that it was selling much of what was left of its Watson health-care system. Here’s Casey Ross, from STAT news.

Casey Ross: In essence, Watson was sold for its parts. It wasn't sold as a whole, it was just sold for the data assets that underlied, the AI.

Gary Marcus: Presumably the answer to, “Did those patients have any idea?” is no.

Casey Ross: No, I mean, so many steps down the line, right? Their data is now held by a private equity company. Did you give your data initially to the institution that collected it so that it could be, you know, sort of purchased eventually by a private equity company and monetized for a 10 x value? Is that why you gave your data to your provider?

Gary Marcus: IBM overpromised; the medical arm of Watson collapsed. But that was hardly the first or the last time that happened in the history of AI. In the coming episodes we will show you how to separate hype from reality, and help you understand what the latest advances in AI do and do not mean for business, our jobs, and our daily lives.

[TAPE MONTAGE]

“You get into a car that literally has no driver in it. It’s amazing.”

“Look, I hate to be the debbie downer of technology because I'm a roboticist and I'm a futurist and I want to see this technology and I really, really, really wish they could get it together in 6 months before my daughter gets her license but they're not going to. And telling people that they can be hands-free is wrong.”

Gary Marcus: I’m your host, Gary Marcus, and this is Humans vs. Machines.