EPISODE 410
Don’t Fear the Power of AI, Leverage It - Dr. Nigam Shah, Chief Data Scientist at Stanford University
08-30-2023
Transcript
Shiv Gaglani: Hi, I'm Shiv Gaglani. We've learned a lot on Raise the Line about the ever-growing amount of data available to healthcare providers and how that can seem overwhelming to them. Well, today we're going to look at how that data can be analyzed and made useful in providing care, among other purposes.
Joining me is Dr. Nigam Shah, chief data scientist at Stanford Healthcare and a professor of Medicine and Biomedical Data Science at Stanford. His research group analyzes multiple types of health data to answer clinical questions, generate insights, and build predictive models for the learning health system. At Stanford Healthcare, he leads artificial intelligence and data science efforts for advancing the scientific understanding of disease, improving the practice of clinical medicine, and orchestrating the delivery of healthcare.
Dr. Shah is also an inventor on eight patents and patent applications, has authored over 200 scientific publications, and has co-founded three companies. Before we get started, I'd like to thank Morgan Cheatham, who's a fellow medical student, entrepreneur -- and a venture capitalist in his case -- who first introduced me to Dr. Shah.
So, Dr. Shah, thanks for taking the time to be with us today.
Dr. Nigam Shah: Absolutely. It's a great pleasure to be here.
Shiv: You know, we always like to ask our guests to, in their own words, tell us how they got to
where they are today. So, how'd you get interested in both being a physician and then data science and biomedical informatics?
Dr. Shah: That's a great question in the sense that the right answer here is part luck and part being at the right time. So, in my case, when I finished med school, one of our family friends who got his PhD in the U.S. convinced me to try my hand at research and said if you don't like it, you come back to your residency and you can be an orthopedic surgeon, which I was going to be. That was in 2000, and that's when the use of computation for genomic sequencing sort of hit the front-page news everywhere. The New York Times, Cell, Science and so on. And I got really excited about like, oh, this is new. How could I use a computer to do better biology or better science of medicine?
So, I ended up doing a PhD in molecular medicine and along the way, I convinced my committee to essentially do a minor in computer science. When I finished, everybody said doctors who use computers and like reasoning systems and such go to Stanford. So, you should go there. And so, I came here in 2005 and never left.
Shiv: That's a good decision. We actually recently had Lloyd Minor, the dean of Stanford, on the podcast. And clearly, it's the right place, right time. There's so much innovation going on there. Let's talk a bit about your history. Like, what are some of the things you've worked on once you joined Stanford in terms of biomedical informatics? And then how has that evolved? Plus, if you can comment on your actual physician career...like, you got your MBBS, did you practice? I would love to hear more about that, too.
Dr. Shah: Sure, sure. So, I did the one year of internship that's required to complete the MBBS degree. I did my three months in internal medicine, surgery, OBGYN, and primary care. Other than that, I have not practiced after that. So, I wouldn't trust myself to treat myself at this point.
In terms of research, in 2005, I was very interested in using reasoning engines to make sense of molecular data and then along the way, I discovered that the amount of data available in structured form -- and as someone from Elsevier, you'll appreciate this -- is very limited. Most
of it is text. And so, from 2005 to 2010, roughly speaking, I spent time working on knowledge representation, text processing techniques, in order to find data and extract facts from it. Then in 2010, I sort of went on the job market, got a faculty position, and so applied all that I'd learned from 2010 to 15 to pharmacovigilance -- to extract things from the electronic health record -- as opposed to just biomedical text, which is freely available. Being at a medical center, we had access to the EHR.
I got tenured in 2015 and then I said, “Well, what would be the ultimate use of information extracted from the EHR?” And at least the answer I arrived at was to make better decisions for the next patient that walks in the door. So, from 2015 to 2020, we worked on something called the Green Button Project, where we ran a bedside service. Given a case, we would go through all the EHRs that Stanford had and apply everything I had learned from text processing -- statistics, reasoning engines, cohort building, phenotyping, and all of that informatics toolkit, so to speak -- and provide a report to the physician who was treating the patient about what happened to patients like mine. We ran that for a year and a half in the end, so it took a while to build all the tech and infrastructure. Then we did 100 cases, wrote a paper, and then spun out a company, Atropos Health, which actually Morgan, who you just referred to, is a huge fan of.
So, I went from tenured to full professor. I'd done two projects -- pharmacovigilance and this Green Button thing -- and I was like, “All right, what do I do next?” I wrote up a two-pager on how can data science improve a healthcare system and showed it around to my department chair, Dean Minor, and our CEO, and they said, “Well, why don't you do it?” And so that led to the chief data scientist job starting in 2022.
Shiv: That's incredible. What a journey. And I do want to talk a little bit about kind of the very interesting set of hats you wear, right? You're a researcher yourself, you run a lab; you're in leadership as chief data scientist; you're the founder of three companies. So what does an
average day in the life look like for you, if you can even summarize something?
Dr. Shah: You have to be very disciplined with time. As I often tell my students, time is the only non-renewable resource. You can lose a car, you can lose money, you can lose clothes and a passport and a phone or whatever. Everything can be replaced. Time cannot be replaced. So prioritization of time is sort of something I'm really a stickler about.
You probably experienced that in scheduling this as well, right? I was like, “No, not now. Later.” So, I think that is sort of the secret sauce: prioritization, being able to say no to good, exciting things so that you can focus on what you're currently working on. So, a typical day...I actually don't work that many hours. In fact, I like to watch TV with my kids and go to bed by 10.30 p.m. on most days. But the times I am working, it's focused on a specific thing, time bound and deadline driven.
Shiv: Yeah, that's great advice. One of my mentors, a CEO coach, used to say to me that ‘no is the amplifier of yes.’ Obviously, another famous person from the Bay Area, Steve Jobs, was very famous for saying, you've got to say no to a thousand things and get that discipline, which then amplifies the core signal from the noise. Another favorite quote that actually I learned on this podcast from an edtech entrepreneur was, ‘your timing is perfect if you stick around long enough.’
You've been at AI and biomedical informatics for decades now. There's been an explosion, clearly, since November 2022's release of ChatGPT. It's in the zeitgeist. There's a lot of hype
around it. Can you talk to us about this moment in generative AI and AI? Help us think about it as current and future clinicians. Is it more hype? What are the real applications you're seeing, given all this context that you have that many of our learners and listeners don't have?
Dr. Shah: Absolutely. I love talking about this part. We're not on video. I'm smiling because outside my office hangs a stained-glass plaque that says SUMEX-AIM,which stands for Stanford University Medicine's Experimental Computer for AI in Medicine. It's from 1980. So, the world's first supercomputer for AI in medicine was on our campus in 1980.
I recently had the opportunity to write a short history of AI -- a ‘how we got to now’ book chapter. We had a little plot showing the different hype cycles that have occurred in AI to date. This is the third hype cycle. There have already been two prior AI winters. For those who are insiders in this space, the thing that is most shocking is the sheer number of humans that are this time caught up in the hype. Previous hype cycles were not this big in terms of the magnitude. Today, it's probably 20% to 30% of the US that is caught up in the hype. Previously, it was in the low single digits. So, that is, I think, what is unique about this time around.
Obviously, the other things are we've never had a computers that were this powerful. We never had this much electronic data. You put those two together and I think we're at a point where there's a good chance that this time it sticks. I still say chance, because we've been down this twice. We've had two hype cycles and that's one of the reasons that I'm a huge proponent of systematically verifying the claimed benefits of, in this case, language models, AI, or what have you...whatever is the flavor of the day. Because often what happens is that people try one thing -- you give one complex case to GPT -- and it gives you a plausible looking answer and we conclude that, “Oh, GPT can treat complex cases.” That's not quite how it works.
The analogy I would love to plant in the listeners' heads is think about driving. We have a human, and we have a complex gadget, a machine, a car, basically. And the way we do it, we send the human to take a multiple-choice test at the DMV. We take the car, and we send it to the transportation safety board, and they do a whole bunch of tests -- rollover testing, crash testing, what have you -- and then we put the two together, and we do a road driving test.
But in case of language models, what we're doing is basically taking the car, sending it to take the multiple-choice exam, and saying it's fit to drive. That's basically what we're doing when we take GPT-4, have it take the USMLE, and say now it can practice medicine.
Shiv: That's funny. That's a really good analogy and a good way to think about it. And you preempted my next question. I was catching up with Morgan a couple of months ago and I had read the paper that you and he wrote, along with other people, the preprint of which you published in December of last year. So, ChatGPT comes out late in November 2022, and you already have published a paper in December that got a lot of attention -- especially from us at Osmosis and education tech companies -- that said GPT-3.5, and then eventually 4, perform really well on the USMLE Step 1 practice questions.
Obviously, we're all looking at GPT-5, and where is it going to go from here? People were extrapolating. Does this mean we need a trained doctor? How should we select med students? And I agree with you that there was a lot of hype from that, that I'm sure you and Morgan and others wanted to tamp down as you have been doing here. But I would love your opinion --having trained as a physician yourself -- on what medical schools should be doing right now? Does it still make sense to select students who can take tests really well? Because the people who are getting into med school now will be practicing in the late 2020s, early 2030s. By then, we'll have GPT-7 or other LLMs that you and your colleagues will create.
Dr. Shah: Yeah. So, I think the question to ask is what can we do with this technology rather than what can this technology do to replace parts of what we do? I’d love to give sort of another, you know, colorful analogy. One of our colleagues here, Dr. Erik Brynjolfsson, has this idea of what he calls a Turing Trap. So, most people have heard of the Turing Test, where a computer passes itself off as a human, and a lot of people would say GPT-4 has passed the Turing Test. The Turing Trap is where we limit our imagination to having computersor AI models only do those things that humans already know how to do. So we basically automate stuff we know how to do, like automate billing, automate history taking, automate note taking, and so on. That's the Turing trap.
What we should be asking is what is it that this humanand a computer together can do that neither of them could do alone? So, if you imagine if the Greeks had automated everything they did 3,500 years ago, or medicine automated everything we're doing even 200 years ago, like, bloodletting would beautomated. We'd have a machine that would do bloodletting, right? But we didn't fall into that trap.
So, I think what we need to be thinking is today we have primary care doctors with a panel of say, 1,500 patients to 2,000 patients. Can these things enable a primarycare doctor to have a panel of 5,000 patients? Can they reprioritize work so when my radiologist colleagues come to work on Saturday morning and there's 140 x-rays to read, the normal ones are at the bottom of the pile.
So, instead of getting caught up in, ‘oh, we're going to put doctors out of business,’ the mindset should be what are the things that I can completely offload from a doctor? And that is different from having a human in the loop. The obvious answer that everybody says is we'll just have a human in the loop. It's like, no, no...that actually increases my work. We're not going to check every damn thing this produces. Focus on taking off 30% of my job. History taking, for example or translating instructions into a reading level that the person expects or in the language they expect.
In terms of education, right now most med students end up spending $4,000 or $5,000 buying teaching material and USMLE question banks, so to speak. Why can we not generate those questions? Why can't everybody have questions generated on demand based on the last ten questions you answered and what you got wrong? That is the way by which we can leverage these technologies to help us train better as opposed to be fearful of them and taking over our jobs.
Shiv: Yeah, I love that nuance and I've never heard that term Turing Trap, but it reminds me of a decade ago I was on Stanford's campus and there was a talk Vinod Khosla gave in which he said infamously, that 80% of what doctors do will be obviated or replaced in, I think, a decade or by 2030. He was a bit ambitious with the timing, but also the media misinterpreted it and said 80% of doctors won't be needed by that time, which clearly is not what he was saying and not the case. Like, the role of a clinician will change as you're saying with cars in the analogy. We still want the driver.
However, there was this paper that came out recently that I'd love your opinion on, which is on human-computer interaction. They were looking at radiologists and they had three groups: the radiologists diagnosing images; they had AI just diagnosing images; and then they had the radiologists with AI diagnosing images. The third group counterintuitively performed the worst, because the radiologists in that group didn't understand how to use the AI and were overruling it when they shouldn't have, or were putting too much stock in it when they shouldn't have, whereas the other two groups performed better, I think, almost equivalently with the current models we have. Do you have any thoughts or commentary about those pitfalls to look out for...things like human-computer interaction or bias or any of that?
Dr. Shah: I think it builds into the point that we're just making. In that scenario, you're creating a dyad of a human and a computer, but the algorithm does something which the human has to then check. It's like somebody giving you advice which you then have to verify whether you're going to trust it or not. One, it increases your work, and two, if you're not sure about the quality of the advice, it confuses you. And so that's exactly the point I was making: if we use AI in a way that it reads through the 100 images and it says, these twenty I'm sure of, you don't need to worry about them. Spend your time on the other eighty. That's actually helping me. Otherwise, it's providing me with irrelevant information or potentially wrong information or hallucinated information that I now have to do the extra labor to spot.
Even in the work we did with Morgan about GPT 3.5 and 4, the headline result was that in about half of the responses, twelve doctors could not agree whether the GPT response agreed with the prior known answer, disagreed with it, or was wrong. No majority. So, now imagine if you sought a second opinion from somebody and you showed it to twelve people and twelve people can't agree. That second opinion is useless because it's confusing.
So, I think this gets back to this whole idea of doing this driving test or what I would call functional testing, because if we use it in practice, is it delivering the value that we had hoped? If not, maybe we should change the manner in which, or the place in the workflow where we use these things.
Shiv: Absolutely. That nuance is critical. I think a lot of the discussion on AI has been in the abstract: ’this is going to discover all these new drugs and cure cancer and global warming’ and all this. And that goes for negative hype too: ‘AGI is going to kill humans the first chance it gets.’ We'll get into some of that. That's abstract. Then it goes academic, and obviously you and several others publish papers which are academic, but you also have this role as chief data science at Stanford Healthcare.
What are some of the ‘boots on the ground’ actual applications that you've led at Stanford Healthcare right now that you're most proud of -- that are lowering costs, improving quality, whatever it is -- and it doesn't have to be gen AI. It could just be something that you're proud that you guys worked on over the last decade.
Dr. Shah: So, the example I will use is something very pedestrian. It's a classifier, a supervised learning classifier. Now, there are press articles out there that say we trained a deep learning model to do what I'm going to talk about -- and we did train a deep learning neural net for that -- but we're not using it. The thing we're using is something much simpler. It’s a gradient boosted model, a simpler model. But the crux of what makes it work is given the model's prediction -- and in this case, the model predicts the chance that somebody dies in the next twelve months -- you have to be absolutely clear what you're going to do, who is going to do that. Do they have the capacity and the incentive to do that, is the cost structure of the intervention such that we can sustain it long term, and do the various stakeholders, patients included, agree with that?
So, we started using this classifier for predicting who is likely to pass in the next twelve months, and we built like three of those models and published a bunch of papers. Then when we
deployed it, we worked with physicians who lead our serious illness conversation planning, those who lead palliative care practices, we worked with an ethicist to do a survey of all the stakeholders to say who should be shown the model output, what would they do? We redesigned the downstream care workflow so that it's not just the physician who has to take action. A respiratory therapist or a nurse practitioner, or in some cases, a medical student can pick up the serious illness conversation guide and have that conversation with the patient,
and then we watch that. We're flagging patients, eighty or 100 a day, we're triggering these alerts for people to have those conversations, we've taught them what to do, are they doing it?
There's a quality metric to have advanced care planning done to those patients that need it to be above a certain percentage -- maybe 10%, 15%, I forget the exact threshold -- and then we made sure that we were following through at a high enough rate that we crossed that threshold. So, you have to do all of these...you have to do the model, you have to do the workflow, you have to manage capacity, you have to manage throughput, follow through, and watch it so that if it's not working, you can then shut it off.
Shiv: That's a great example, and hopefully there's a case study around that, or you publish that.
Dr. Shah: Yeah, we have, there's one in the New England Journal Catalyst, and then we are also getting a HIMSS Davies IT award for doing this, in their sense, in the right way.
Shiv: That's awesome. I think a lot of health systems and physicians and leaders are confused as to how to incorporate this stuff, and that's why I think you most recently published this paper in JAMA -- we're talking on August 10th, you published it August 7th -- Creation and Adoption of Large Language Models in Medicine. Do you just want to spend a minute describing what that paper was getting across, and the applications for our learners and listeners?
Dr. Shah: So, you know, everyone has most likely tried ChatGPT by now. ChatGPT is a software application that is built off of one of two LLMs -- GPT 3.5 or 4. You can pick in the user interface which one you want to use. Now, when these things are first built -- and this is from OpenAI's own website -- when GPT3 was built, and you give it an instruction saying ‘explain the moon landing to a six-year-old,’ it would respond saying ‘explain gravity to a six-year-old,’ because those are the most probable words given what you just said. And so, a human had to then align the output with what we expect by teaching it, and that's called adaptation or tuning, depending on how you do it.
In one case, you can show it the right answer, which a human types out, and say, “Look, this is the better answer. Don't say what you just said.” In the other one, you let it produce answers, three or four, and you pick the best one and then you iterate and that's called Reinforcement Learning with Human Feedback. Once that was done, we got the magic that's called ChatGPT.
The point here is none of these things have been instruction-tuned for medicine, but we expect them to work out of the box, and why would they? And so, the core essence of the article is a call to action to the medical community, that if you really want to use these things, we have to create the instruction-tuning data so that these things produce the output that we expect. So, creation and adoption, or shaping the creation and adoption, of language models in medicine.
Shiv: Yeah, very important, because, again, there's so much confusion around this and people are misapplying them and they don't know the limits and the potential applications for it.
I'm curious, going beyond AI and healthcare, there's been a lot of hype -- and as I said, positive and negative -- around AGI, Artificial General Intelligence. You have people who helped create the transformer architectures -- neural network researchers like Geoffrey Hinton -- who have said AGI is going to end the world, there's a lot of risk here. On the other side, you've got Yann LeCun, who's at Meta, who said, no, this is all blown out of proportion. Same with Marc Andreessen at Andreessen Horowitz. Where do you fall in this? Do you have an opinion, a strong opinion, around AGI and timelines and risk, or is it sort of too early to tell?
Dr. Shah: I don't have a strong opinion. What I have is a dose of skepticism because of the logical inconsistencies of what is being said. Here's the inconsistency: to all the ones that are saying that these things are so bad and might end humanity, turn off your APIs. They won't do that, because they all want to make money. At the same time, those same people are going in front of Congress saying, “Oh, this is too dangerous, you should regulate it.” To me, that stinks a little bit like a power grab, because then you'll turn around and say, “This is so dangerous that only we can be trusted to use it, and hence everybody else has to use our APIs.” That is the classic definition of establishing natural monopolies. So, that's why I don't trust it, because if the fear is real, why do you have the server on?
Shiv: Yeah. Regulatory capture is definitely one way people have looked at it. That could be actually what's happening. The other piece is, I think they would say, if we don't do it and we have this race, not only will that hurt our profits and our shareholder value, but China
will do it, and then what happens? It's like a game theory issue.
Dr. Shah: Right. Forty years ago, it was Russia, and now it's China. That's why I'm a little bit skeptical that we're seeing a lot of polemic stances from people, because either they have a lot to gain or a lot to lose and I think the nuance is getting lost. Even things like when Travelocity and Expedia came around, there was this panic that people will book their flights to the wrong San Jose and there'll be pandemonium across the planet. It didn't happen. So, part of it, I think, is overblown. I'm quite sure both sides are overblowing it for different reasons, and the truth will land somewhere in the middle.
Shiv: Yeah, like most things. That's a really good example. I have two last questions for you. The first is, you said you have children. A lot of our listeners are early-stage clinicians approaching their careers. What advice would you give to any of them -- your children or our listeners -- about approaching their careers when things seem to be changing so incredibly quickly?
Dr. Shah: Well, two things. One, learn how to learn, because whatever you do your undergraduate degree in is not the thing your job is going to be by the time you graduate. That I can guarantee. The majors that will exist ten years from now are not even defined yet. The boundaries between the classical disciplines is blurring. You can't tell a civil engineer apart from a mechanical engineer from a structural engineer or an architect, right? So, get over the notion of classic boxes and labels.
And then second point is, don't be scared of technology These things are there for us to use for our advantage. I'm sure when Microsoft Word and computers and typewriters came around, everybody said ‘it’s the death of calligraphy, and people will never learn how to write.’ Again, overblown fear. Google comes around or calculators come around and people say, ‘oh, everybody's going to forget how to count.’ Again, overblown. So, that same cycle will repeat.
People will say, ‘oh, if you're starting to use GPT at a very young age, you're not going to develop these other faculties.’ I think our school of education disagrees. In fact, there's articles and research out there where they're showing that the earlier you're exposed to these things, in a controlled manner, the better off you are in terms of having a plan on how to deal with them. And a lot of teachers are worried about students using these things to cheat, but imagine what Khan Academy is doing. They're creating these personalized tutors so that everybody has an on-demand tutor that meets them at their level. That is amazing. So, we’ve got to proactively pick the amazing and stay away from all the fearmongering.
Shiv: Yeah, that's great advice and certainly close to our heart as a company that's also trying to provide an on-demand tutor to our learners that's personalized and adaptive. I agree with a lot of what you've said there. The last question is an open mic: is there anything else that you want
our audience to know about you, about AI or Stanford that we haven't gotten a chance to
talk about yet?
Dr. Shah: I would say don't get over-indexed on AI. Today, it's artificial intelligence and deep neural networks and tomorrow it'll be sideways learning and upside-down intelligence. I mean,
who knows? Keep the eye on what is it that you're doing with it and ask the question, why are you doing whatever is it that you're doing? Don't get too hung up on the how.
Shiv: Yeah, that's a great reminder not to be a hammer in search of a nail, but rather understand the problems deeply -- whether that's a patient you're seeing or the education system -- whatever it may be. So, that's some really wonderful advice.
Dr. Shah, I've really appreciated this conversation. You're obviously pioneering a lot of this work and it's been fun to see everything you've done so far and what you're going to do over the next coming months and years.
Dr. Shah: Well, thanks for having me.
Shiv: And with that, I'm Shiv Gaglani. Thank you to our audience for checking out today's show and remember to do your part to raise the line and strengthen our healthcare system . We’re all in this together. Take care.