Is Artificial Intelligence the Answer to Health Equity? - Munjal Shah, CEO of Hippocratic AI
Shiv Gaglani: Hi, I'm Shiv Gaglani. We've explored a variety of approaches to training more healthcare workers to fill existing and predicted gaps on the Raise the Line podcast, as well as ways to reduce demand on the system through improving health. Today, we're going to come at this issue from a different angle and take a look at how generative AI might help fill the need for a wide variety of roles -- from cardiologists to nutritionists to genetic counselors -- and improve healthcare access and equity as a result.
My guest is Munjal Shah, the co-founder and CEO of Hippocratic AI, a new startup that recently raised $50 million from General Catalyst and Andreessen Horowitz to build a safety-focused, large language model specifically for the healthcare industry. Munjal is a serial entrepreneur, startup advisor, and investor who has built and or helped build numerous other technology and AI startups.
He has a master's in computer science focused on AI from Stanford and a bachelor's in computer science from the University of California, San Diego, where his senior thesis focused on the use of AI in the design of new therapeutics.
So Munjal, thanks for taking the time to be with us today.
Munjal Shah: Hey, thanks for having me.
Shiv: So, you have quite an impressive background, and we always like to ask our guests in their own words to describe what got them interested in the technology space in your case and then healthcare.
Munjal: You know, technology for me started very early. I learned a program at age eight, which back then was pretty unusual, and just loved programming, loved video games. There was a guy across the street from me who worked for Atari at the time, and if you mowed his lawn, you got to play the newest games. He had the best lawn in the neighborhood because I was over there all the time. I just, you know, absolutely fell in love with technology at a very early age.
Healthcare was a different journey for me. I had a machine learning computer vision company called Like.com, L-I-K-E, that we sold to Google. The very next day after we sold the company, I had chest pains and ended up in the ER. My father had his first heart attack in his mid-forties. I was only thirty-seven at the time. It didn't end up being a heart attack, but it was something pretty serious. And so I really got control of my health, lost thirty or forty pounds, actually took endocrinology classes at Stanford in the School of Medicine just for fun. Like, I audited them, and I loved it. I actually was like, “Man, had I done this before, maybe I would have become a doctor instead of a computer scientist.”
Then spent the last ten years running a Medicare Advantage-related company. So, it just ended up, you know, these two things in my life have been two parts, and they just happen to come together now in building Hippocratic.
Shiv: That's awesome. Well, I'm sure you would have been a great doctor, but I'm glad you've chosen the tech entrepreneur role because you're obviously making a big impact in healthcare. So, let's talk about Hippocratic AI because, ChatGPT and large language models have been all everyone seems to talk about these days since they released in November of last year. You're positioning Hippocratic AI as a safety-focused, large language model, which acknowledges the concerns about using LLMs to provide people with medical advice. Can you tell us more about those concerns and how you're specifically addressing them and making your LLM different?
Munjal: Yeah, look, we started with the name: Hippocratic. Next, we went to the tagline, ‘do no harm.’ And then the third thing is we're going to be restricting the product when it comes out. It's not going to be able to do diagnoses. If you try to get it to do diagnosis, it's going to say, ‘I'm sorry, I can't do diagnoses.’ We'll try to make sure we put in all the checks, but you've used ChatGPT and it will tell you ‘I won't comment on certain things,’ or ‘I won't talk about certain things.’ There are ways to make these models have the safety elements to them.
But then beyond that, there's a number of key features that we're building into the model to really allow it to be safe. The first and foremost element is not focusing on diagnosis. I think people who are not in healthcare don't realize how many other roles there are and how much labor we use doing a ton of other things. I don't think American healthcare and the health of Americans is struggling because we are not diagnosing them well enough. It's struggling largely because once diagnosed, we're not able to affect the change in their overall health for largely chronic conditions, and I think that's the real opportunity.
There's no diagnoses in that. It's reminding you to take your medications, reminding you to go in for that checkup, reminding you to go in for that procedure, making sure the kind of social determinants of health are done. You know, “Hey, do you have enough to eat today? Should we call the local food shelter and ask them to drop off some food to you? Do you need a ride to your appointment? Is that why you didn't end up coming.” So, I think that people outside of healthcare don't realize how much of healthcare is everything that comes after the diagnosis.
Shiv: Yeah, no, absolutely. I mean, one of the themes we've been addressing on this podcast is provider burnout because of all the ancillary things they're supposed to be doing -- the administrative burden -- which many of our listeners experience daily, which is needing to submit billing codes, needing to submit health insurance reimbursement for authorization forms. Is that the type of stuff you guys are tackling first? Like, what is your initial product going to look like?
Munjal: Well, let me back up and actually give you some of the features of how we're making it safe, and then I'll tell you kind of the first use cases. So, there's a bunch of things we're doing to make it safe. First, we're training it and on additional evidence-based content, because we really want it to not just have the stuff that's on the internet, because some of that's evidence based on the internet, and some of it's how, you know, drinking almond milk cured your heart disease, which is just not true. And so-
Shiv: You aren't crawling Dr. Oz's site, are you?
Munjal: No. We wanted to get evidence-based content. The second thing we did was we wanted to certify the system. So, we actually identified 114 medical certifications. These are the tests and certs that nurses go through and dentists go through and pharmacists go through and pharmacy techs go through and lactate consultants go through or genetic counselors to get certified to be able to do that job in the real world.
We're like, look, language models should have the knowledge to be able to pass those tests. And we showed not only we passed them, but we passed them better than GPT-4 and actually many other models as well. So, we wanted to make sure it was smart enough, right? It had the right knowledge and it had evidence-based knowledge. And then the second thing we said was, look, it's not enough to just be smart in building a better healthcare large language model -- and notice I say healthcare model, not medical large language model, because we're really focused on healthcare at large, not just medicine...in fact, we're less focused on medicine -- but the second part was, hey, it's got to have good bedside manner.
I think that we came to realize that's not only it being empathetic, but it’s also actually a major paradigm shift that occurs because of one new fact: running a language model will cost about eighteen cents an hour, okay? And actually that'll probably be five cents an hour in the next year, and that includes voice recognition costs and other things. So, now when you have a system that's that inexpensive, I can spend an hour talking to the patient about anything they want to talk about. I can be talking to that vet about that battle in Desert Storm where he got injured and he wanted to share what really happened in his perspective. And great, because that's part of the journey, the health journey for a lot of people is sharing kind of their own health history and what happened to them. In today's world, you know, the nurse can't do that. She's got to get on the next phone call, and so she'll be like, “Hey, I'm just calling to remind you to take your medications and do you need a ride to your next appointment on Tuesday?”
Then administrators are like, “Huh, you know, we have no patient engagement. We really need to work on patient engagement.” I'm like, because we have no patient relationship because we had no time. And it's not their fault. But now with this thing, we have almost infinite time. So, we need to rethink what we can do with this. I call this bedside manner with a capital B, which is rethinking the entire bedside process. When you have discharge instructions, why do I give you the whole weeks’ worth of instructions at once, because you can't remember them. Even if I write them down, you're like, “Well, what was I supposed to do...change the whole bandage on Wednesday or on day three?” Why don't we just call you on day three and tell you exactly what to do on day three and day four? Because we can't afford it. We just don't have the bandwidth. But at 18 cents an hour, we absolutely can, and so the second thing is bedside manner.
The third thing is there's a thing in large language models called RLHF. It stands for Reinforcement Learning with Human Feedback. It's actually what they used to go -- most people believe to go -- from GPT-3 to ChatGPT and what made it talk so well. It's basically humans grading the talking and then you feed that back into the model.
We say, great, but you know what, can we use medical professionals to do that? And can we use them per function? Can we have genetic counselors do it for the cases where our system is trying to play the role of a genetic counselor? Can we have chronic care nurses do it for places where we're playing that? And how about we don't even launch this until a super majority of those people say, “Hey, this is ready to go.” So, we're actually letting the industry, these health professionals themselves, determine readiness of the system, and that's part of our commitment to all of this.
The last part is we're working with a bunch of healthcare organizations and actually putting up safety and governance committees and working very closely. Our first release won't just be an open API for everybody to use. It's actually more of a closed beta where we're working very closely with a number of health systems that have joined our founding partner program to just get it right and just make sure nobody gets hurt in this process. I'll pause there for a second and then I'm happy to talk about the use cases.
Shiv: Yeah, I'd like to. I could pull on any one of those points and threads, but the third one about training the model based on specific roles and then having those people in those roles assess whether it's ready to go...I'm curious, obviously you have investors and have a lot of LPs or healthcare systems, so you probably have those great connections, but does that mean you're actively contracting with like 500 genetic counselors, for example? Because one question we like to ask our guests is how can our audience maybe get involved or contribute if they're interested?
Munjal: We will be recruiting folks to come on. We'll probably do it in kind of an Uber app sort of way that you can do it on the side, on the beach, in between appointments, you know. And we'll pay. I mean, we won't ask you to do it for free, but we'll pay people for their time to be able to give us those answers. But yes, we'll be recruiting them.
We will be checking their certifications, because we want to make sure we have people with that exact certification and that exact role in real life giving us the feedback. I think we already have quite a number already working, and we'll just be adding many more as we work through this.
Shiv: Yeah, that makes sense and I've seen your benchmarks on your site. Very impressive in terms of how your existing models already performed relative to GPT-4 and PaLM and other companies for your healthcare specific certifications. Hopefully that'll address the hallucinations. What about the bias aspect of large language models? I'm curious, do you have any thoughts on that?
Munjal: Yeah, actually, on our site, you can see we've actually run an initial set of bias testing and showed that we were less biased than GPT-4. We said, all right, let's identify an underrepresented group, let's identify the healthcare issues that are overrepresented in that group, and then let's assess the knowledge level that our system has on those things. Because if you don't know a lot about certain treatments, and those treatments occur in a certain group, then you are going to end up not being able to treat that group as well. So, we built that out.
Then we built another whole set of questions that are just straight bias sort of testing questions and we put that up there as well for folks to see. I don't think we're done. I don't think we've done enough there, honestly. But we at least put that out there saying, “Hey, day one, right when we're starting, we're thinking about this, we're working on this, we're trying to make sure.”
This brings me to like, actually, the big idea here, which is, we are not doing this just to say let's help health system XYZ save some money on their labor. That's not why I'm here. I'm not even here to say let's help health system XYZ fill their staffing gap, okay? I'm here to get to what I call the idealized staffing. Honestly, I don't have a good word for this so actually, if you do, I'd love you to help coin a term.
Let’s look at it this way: 51% of American adults have one chronic disease. 25% or so have two or more. Do we provide a chronic care nurse to all 25% or even all 51% to help them manage? No. Could we? No, we can't afford it. Even if we could afford it, are there that many chronic care nurses or nurses period? No. Like, the math doesn't work. And so part of our vision is we want to really solve health equity for everybody. We need a system that's not a zero-sum game. Right?
Right now we're taking from Peter to pay Paul, so to speak. There's only so many nurses to go around. But now we can have an infinite number, almost, and we can keep all the nurses we have. In fact, we should. We may have them do more physical things that the language models can't do. What we should do is utilize our nurses in a more effective way at the top of their license, and then use the language models to handle all of the kind of more mundane activities that can be done.
Think of pre-op questions. Pre-op questions are fairly mundane, right? “Can I eat before? When should I stop taking this? Am I supposed to shave my leg before the knee surgery or will you shave it?” But that takes a lot of time today. There's a lot less risk in pre-op questions than post-op questions, but what would happen in the world if we could truly have 30 million nurses? How much would America's health improve? That's the question I want to ask and that's the vision we're after is, how do you use this technology to deliver a whole other tier of SLA -- to use a computer science term -- service level agreement, and one we've never seen before and we could never afford before. And that's how we solve health equity because then we'll have a surplus and we'll be able to help every single group and we won't just be taking from one group to another group. That's what we're excited about.
The best answer on hallucinations is to pick the use cases where that hallucination is less of an issue. Just touching on all that, that's why one of our first use cases is the chronic care nurse. That’s mostly a follow-up activity. It's not typically a diagnosis activity. It's about booking appointments. It's about just having an open-ended conversation where maybe a senior will bring up something that they wouldn't have brought up otherwise. It's about checking in more regularly and, you know, even some education, right? “You need to take that medication.” “Why? It makes me dizzy. I don't like it.” “Well, you know, here's why” and like helping to motivate some of that.
Shiv: Totally. I love all that. On this podcast we've talked about how we need to raise the line and strengthen our healthcare system...get more clinicians trained. The WHO estimates, I think, a shortage of 18 million by 2030 worldwide of healthcare professionals. And you're taking that to the next level of everyone having their own personal kind of concierge. Similar to our previous guest Sal Khan, who gave that TED Talk about everyone having a personal tutor. We've all been talking about this ever since the smartphone came out -- everyone could have a personal tutor -- but it was different then where you still have to be very self-directed and use an app to learn something. Whereas here with an AI kind of system that maybe pushes content out to you and knows you personally, everything about you. It's a totally different era of user interface and computing.
Munjal: Yeah. We call it “healthcare in every home.” We should have healthcare in every home. Everybody should be able to have it in their language at any time they want. We're not calling it “doctor in every home” because we don't think it's ready to be a doctor, but we do think some of these other things are quite useful. And there's just a ton of things we can do that are not diagnoses.
I mean, even explanation of benefits and billing. Have you ever really understood your healthcare bill? I work in the industry and I still can't understand the damn bill. Then when I call up about it, I don't always get the same answer if I call up twice and talk to two different customer service reps about it. But, you know, these language models can memorize the 200 page plan detail on every single plan in the country and it can reason across it and it can say, yes, that would be covered in this plan, but not covered in your old plan. You probably saw the GPT-4 demo where he reasoned across the tax code. I mean, if they can reason across the tax code, these things could definitely reason across healthcare policies and reimbursement.
So, anyways, that's kind of what we're looking at is how do we use this fundamentally to just massively increase healthcare capacity. Not just to fill the gap, but to take it to a level where we truly improve outcomes for not just the top few most expensive, right? Right now you give chronic care nurses to the top 2-3% most expensive patients. What about the other 48% that have a chronic disease? Oh, well, we don't have enough nurses. But really, it's also ROI negative by and large, at $90 an hour, to call them. It has to be a different cost scale.
Shiv: Totally. I'm also curious...it's early days and there's a lot you can just do with single mode, text-based large language models. Are you guys already exploring or thinking about multimodal for your LLM? And if so, what are some of the use cases you're thinking of?
Munjal: Well, so there's two ways you can say the word multimodal. One is you're inputting images to be able to do analysis of an X-ray or some sort of imaging. We're not doing that. We think, again, you need that for diagnosis. You don't really need that for most of these. But we are doing both text in and out and voice in and out. And we think the voice in and out is going to be particularly important for dealing with the senior population. Voice is more convincing. You can compel somebody to act better when you're talking to them than you can when you're chatting with them. That's just the truth of it, but it's typically so expensive that we've all moved to these other mediums.
The other thing about voice that we can do is...we’re actually building what's called a clip model. And a clip model is a form of a large language model that has two inputs to it. It has both the text coming in and it has another file format, in this case, a WAV file. And so we'll actually put both in so that it can tell the difference between, “My back hurts,” and “My back hurts” because they should respond differently, right.? The WAV file will show that intonation while the speech-to-text won't show any difference between those two statements I just made. So, it's only with the development of a clip model where we get tone detection and then basically learn how to respond properly to the tone.
Shiv: Yeah, that's incredible. The ways you can get health information and meta health information from voice we've covered before. We've had companies on like Suki and Ellipsis and Abridge in the past. And that actually leads to my next question, which is things are changing every week, really every day. There's new models coming out. A couple of weeks ago, auto GPT was like a big thing. Nobody really knows what what's going to shake out of the next couple of months, let alone years. The big tech companies are all coming into this. Microsoft is huge with Nuance and there’s Epic. You're a very well-funded, well backed company with incredible talent. How do you kind of see the ecosystem shaking out? Those are big competitors -- Apple, Microsoft, Google -- all trying to do the same thing. So, how do you see a Hippocratic kind of emerging -- or an Abridged or Ellipsis as well -- in a very crowded space?
Munjal: Yeah, so the good news in healthcare focused LLM is it's not that crowded. There's not a lot of people building them. If you go out and try to get an API today that's a healthcare- focused LLM...there's not one actually that you can buy and use. I think it's early. But the number one thing is specialization. We're just getting really, really specialized. You'll see us put out some benchmarks against some of the current healthcare-focused LLMs. There's only one really -- MedPaLM and MedPaLM2 -- but most people don’t realize it actually wasn't trained on any additional healthcare data. It was instruction tuned, as they call it, for the USMLE. You know what that means in real life? That means they sent it to Princeton Review for USMLE. It's not that they gave it new content, they just said we're going to really teach to the test, so to speak.
So, nobody has trained one of these yet. We’ll probably be the first one in the market. But you're right, there's some big guys out there. The Nuance stuff is for ambient listening. There isn't a default large language model thing. The announcement from our friends at Epic was to use a generic GPT-4 or GPT-3.5 -- I don't remember which one they announced. So nobody's built a vertical one.
Nobody's thought through things like bedside manner to get it right. Nobody's thought through things like, don't do batch mode, do just in time delivery of information post an operation, like we talked about earlier. There's a very different set of features to make this effective. I think that in the end, we're just at the beginning, and we're just starting and we're the only ones doing RLHF with these professionals and saying we're not shipping till it's ready. Nobody else has said that, and as far as I know, they're not planning to do that at the moment.
I think our safety focus hopefully will differentiate us as well. But they're focused the other way. Actually, a lot of times they seem to be very focused on doing more diagnoses. MedPaLM 2's team launched the ability to upload imaging. I'm like, what? Why are you doing that? Like, you're going more in the direction of diagnoses? I don't think this stuff's safe enough to do diagnoses with. Like, I really don't.
Shiv: Yeah. You've been in the tech space for a while. There was the SaaS wave, and then vertical SaaS became the way to win. Vertical AI is kind of what you guys are doing and it seems like a really thoughtful approach.
I want to be respectful of your time, so I only have a couple other questions. First is, as you know, Osmosis is a teaching company. We do health education. We have an audience of millions of current and future healthcare professionals, including genetic counselors and critical care nurses and chronic disease nurses. There's two questions on this: one is, if we could teach any group of people anything related to what you're doing in AI and large language models, what topics would you want us to build -- whether it's a video or a course -- and why?
Munjal: Interesting. I think that looking at burnout in this industry, I think there's a very unique opportunity to almost create a new way to use ChatGPT to augment and reduce the effort in your job for all healthcare professionals. There's this notion that the chronic care nurse will just reach out automatically, that's one level. And then there's the large language model as your ‘mini-me,’ in the sense of being able to help you with certain tasks. And I think people need to just even know when you use it, how would you ask it? What would you put in the prompts? What things do you need to do to get the right result? There is quite a technique, right, to actually navigating and utilizing a large language model. And there's a technique to doing it well for healthcare.
Even when ours comes out, you will probably also need to still interact with it correctly, or at least, you know, with some additional skills. Like in the early days when search came out we all had to learn how to use the search engine properly. There was actually a skill that was developed. Now, it's all second nature to us, we don't even remember that we learned a skill. But I think there is a skill to use these and then to remember to use these at the right time so that you can save time and, you know, hopefully, reduce burnout in this sector. So, I think that there's just a lot of opportunity to increase productivity and frankly, help people just have less burnout.
Shiv: Yeah, absolutely. Of course, you'll need to build like, “Here's how you use a language model. You don't write these prompts, you write these prompts or you would maybe rephrase that prompt in this way.” And also contextual, in app, prompting of how to do the right prompting.
The second to last question is, a decade ago, I was where you are in Palo Alto and I was listening to a talk by Vinod Khosla, who famously said 80% of what doctors do will be obviated in the next -- I forgot what he said -- decade or twenty years. He got a lot of flack for that, basically, because people misinterpreted it as saying 80% of doctors won't be needed in ten or twenty years, which is not what he was saying. Clearly, it’s not what you're saying as well. What advice would you give to our audience of future healthcare professionals about pursuing their career? Like, should somebody be a radiologist in this day and age? Or, you know, what should they be doing and thinking about? What basic general career advice to do you have for them?
Munjal: I really like to not give advice. This is something I don't do. I don't know that I understand enough about what happens here, but I would just say that I think with each new technology, the statements of how it's going to, you know, destroy the past are just always exaggerated, and never really quite come true the way we thought they're going to come true. For some reason, with generative AI, but also with a lot of technologies, people just want to take the limit of X as it goes to infinity and be like, “This is what's going to happen.”
When you look historically, people's predictions are super inaccurate. Like, super inaccurate. And I don't know why everybody gets all uppity about it when if you look historically, almost every single one of these things had a twisted turn and did not turn out like we thought it was going to turn out. I think we can't predict this at all. I mean, does the world need any less pilots now that we have autopilot in planes? Like, nope, didn't happen. We all thought self-driving cars are going to eliminate every truck driver in the country. I don't know...that kind of didn't work that way.
So, I think that there are real benefits this is going to bring, but they're not the portents that other people have put out there. They just ain't going to happen that way. They never do. There's always some major twist. I don't know why people can't recognize that they were wrong the other five times and they still prognosticate the sixth time as if this time they know for sure. I'm like, “You don't know anything.”
Shiv: Yeah. There’s a Mark Twain quote, which is “rumors of my demise have been greatly exaggerated.”
Munjal: That's what I was trying to remember. But yeah, I think let the cards play. We still need to understand this technology. We still need to understand where it goes. I think we all are excited about it. There's genuine interest there, but there's a long road here. We don't really understand it. Where is it going to end up? What new job is it going to create? Would you have predicted that a lot of online marketing was going to move from ad words to influencers? Like, no, that was a brand-new thing that just emerged. Now there's a whole job category called influencer. That was not a job before. So, I don't know. Things change.
Shiv: Yeah. One reason I love having guests on who are CEOs running companies is that there's another quote -- I love quotes as Michael knows -- which is the best way to predict the future is to create the future so why not just go ahead and start building? That's something I like to encourage our audience to do, is join companies like Hippocratic AI, if there's openings, and start contributing.
Munjal: If I can plug it, come to our site. We're hiring. I mean, we did have a ton of people come to us after our launch, but, you know we are hiring certain folks on the clinical side, as well as folks on the computer science side.
Shiv: Very cool. Yeah, definitely. We'll put that in the show notes to get people to look at the site for that. Well, with that, Munjal, thanks so much for taking the time to be with us today and more importantly, for the work you're doing to hopefully reduce burnout among our audience over the coming years.
Munjal: Awesome. Hey, thank you for having me.
Shiv: I'm Shiv Gaglani. Thank you to our audience for checking out today's show, and remember to do your part to raise the line and strengthen our healthcare system. We're all in this together. Take care.