Fuzzy Meets Exact: The Synergy between Large Language Models and Biophysics

How Large Language Models can enhance our scientific understanding

Jun 16, 2023

In this edition:

I explore the synergy between large language models (LLMs) and biophysics models to advance scientific understanding
When an LLM is given access to knowledge, it may help generate biophysics models to simulate disease mechanisms
Conversely, LLMs can potentially provide “readable” explanations of complex biophysics models, aiding scientists in understanding and hypothesis generation
Combining the “fuzzy” reasoning of LLMs with the “exact” knowledge capture of biophysics models may open up new avenues of scientific discovery and understanding

Read time: 4 minutes

Welcome to the second edition of Transforming Med.Tech, the newsletter in which I share my exploration of technical innovations that I believe will change medicine, science, and healthcare. A big thanks to the 50+ subscribers that joined since this newsletter launched a week ago! In this edition, I’m diving into the recent hype of ChatGPT-like chatbots and their potential to help us understand disease better.

The world of artificial intelligence continues to push the boundaries of scientific understanding. I believe an undiscovered area with potentially great promise is the synergy between large language models (LLMs) and biophysics models. Biophysics models, like those I used for cardiac arrhythmia modeling, play a crucial role in understanding complex biological behaviors and can offer valuable insights into various scientific fields. Could LLMs impact the development and understanding of biophysics models?

Contrasting worlds: Biophysics models vs. Large Language Models

Biophysics models represent a junction between the fields of biology and physics, offering a powerful approach to studying complex biological systems. By employing mathematical equations and computational simulations to analyze the physical properties underlying biological interactions, these models help researchers uncover crucial insights into disease. I have seen how they create a deeper understanding of the fundamental principles governing disease by capturing our (presumably) exact understanding of biophysical interactions.

Large language models (LLMs), on the other hand, are much more fuzzy. LLMs are trained on large bodies of text, from which they learn how to reason (by predicting the most likely next token), but LLMs are incapable of retaining the knowledge captured by all those texts. You'll need to provide them access to knowledge (e.g., by letting it search the internet) if you want accurate answers:

I definitely know to which LLM I’ll turn for some compliments… (Note that not everything is correct though, my methods are not based on Bayesian MAP estimation.)

Thus, LLMs may be excellent reasoning frameworks (albeit overconfident at times) but fail at retrieving knowledge if not provided explicit access. Biophysics models, on the other hand, arguably cannot reason - but can capture (exact) knowledge through the mathematical representation of biological processes. A synergy between the “fuzzy” reasoning of LLMs and the “exact” knowledge capture of biophysics models may open up new avenues of scientific discovery and understanding.

The potential synergy between biophysics models and large language models.

From generating code to shaping biophysics models

A small tip of the veil is already uncovered, as LLMs have gained significant attention for their applications in generating code. As they are surprisingly good at problem-solving tasks required for coding, they could potentially contribute to the creation of biophysics models through coding. But that would mean they need access to the knowledge to be captured by such models, either by input from a user, or by gaining explicit access to knowledge.

Currently, LLMs may get access to such knowledge through dedicated databases; e.g., Google's Bard and Microsoft's Bing can provide accurate responses to your queries because they can access the internet, as opposed to OpenAI's ChatGPT. But capturing exact knowledge may still be challenging. Some promising results were obtained with physics-informed neural networks - but those currently work only in small networks (far from the size of LLNs) and for relatively simple mathematical equations.

To create a biophysics model incorporating more extensive knowledge, an LLM would need to tap into the vast scientific literature on the topic. It would need to be able to deal with the conflicting literature, for example, no scientist ever obtains exactly the same result during an experiment… But if the LLM sufficiently understands disease mechanisms it could come up with a mathematical equation that captures this - similar to how it can already generate code to solve programming challenges. While still in its early stages, the idea of LLMs automating the generation of biophysics models represents an exciting development. And since LLMs were recently used to design a working tomato picking robot, these thoughts seem very realistic.

How LLMs can help understand biophysics

An alternative and equally promising approach is using LLMs to provide “readable” explanations of complex emergent properties found in existing biophysics models. Say we provide an LLM with access to code that captures a biological phenomenon, for example, the propagation of electrical activation in the heart. Firstly, the LLM can be probed with questions that help the user/scientist to understand what is currently modeled. What part of the model causes the upstroke of an electrical action potential, for example? Such probing questions could be particularly helpful to scientists entering a new field. It can already do that for some well-known physical problems:

Google Bard explaining the equation F=ma.

… and gives at least a general answer to my probing questions on a detailed cardiac action potential model:

LLMs might also help generate new hypotheses that scientists can test with models, further advancing knowledge in the field. Let it reason about electrical activation of the heart being impacted by scar or ion channel abnormalities (both of which we understand relatively well), or challenge it to reason about combinations of such abnormalities (for which we still lack insights). Its reasoning may help to come up with new hypotheses that we can test in the model or in the experimental wet lab. Ultimately, some even say LLMs may gain a true scientific understanding, although that still seems far away.

The potential synergy

The synergy between large language models and biophysics models has the potential to transform our understanding of complex biological processes and expedite scientific discoveries in the field. Of course, connecting fuzzy LLM reasoning with precise biophysics modeling presents a challenge. But by equipping researchers with the ability to create, understand, and reason about biophysics models more effectively, we can expect to witness significant advances in our collective scientific understanding with the help of these powerful tools. And these are not some hollow phrases - stay tuned for the next newsletter, where I will explore this by asking LLM-fueled chatbots to create and explain code that captures electrophysiology. Will they succeed?