This podcast features Gabriele Corso and Jeremy Wohlwend, co-founders of Boltz and authors of the Boltz Manifesto, discussing the rapid evolution of structural biology models from AlphaFold to their own open-source suite, Boltz-1 and Boltz-2. The central thesis is that while single-chain protein structure prediction is largely “solved” through evolutionary hints, the next frontier lies in modeling complex interactions (protein-ligand, protein-protein) and generative protein design, which Boltz aims to democratize via open-source foundations and scalable infrastructure. Full Video Pod On YouTube! Timestamps * 00:00 Introduction to Benchmarking and the “Solved” Protein Problem * 06:48 Evolutionary Hints and Co-evolution in Structure Prediction * 10:00 The Importance of Protein Function and Disease States * 15:31 Transitioning from AlphaFold 2 to AlphaFold 3 Capabilities * 19:48 Generative Modeling vs. Regression in Structural Biology * 25:00 The “Bitter Lesson” and Specialized AI Architectures * 29:14 Development Anecdotes: Training Boltz-1 on a Budget * 32:00 Validation Strategies and the Protein Data Bank (PDB) * 37:26 The Mission of Boltz: Democratizing Access and Open Source * 41:43 Building a Self-Sustaining Research Community * 44:40 Boltz-2 Advancements: Affinity Prediction and Design * 51:03 BoltzGen: Merging Structure and Sequence Prediction * 55:18 Large-Scale Wet Lab Validation Results * 01:02:44 Boltz Lab Product Launch: Agents and Infrastructure * 01:13:06 Future Directions: Developpability and the “Virtual Cell” * 01:17:35 Interacting with Skeptical Medicinal Chemists Key Summary Evolution of Structure Prediction & Evolutionary Hints * Co-evolutionary Landscapes: The speakers explain that breakthrough progress in single-chain protein prediction relied on decoding evolutionary correlations where mutations in one position necessitate mutations in another to conserve 3D structure. * Structure vs. Folding: They differentiate between structure prediction (getting the final answer) and folding (the kinetic process of reaching that state), noting that the field is still quite poor at modeling the latter. * Physics vs. Statistics: RJ posits that while models use evolutionary statistics to find the right “valley” in the energy landscape, they likely possess a “light understanding” of physics to refine the local minimum. The Shift to Generative Architectures * Generative Modeling: A key leap in AlphaFold 3 and Boltz-1 was moving from regression (predicting one static coordinate) to a generative diffusion approach that samples from a posterior distribution. * Handling Uncertainty: This shift allows models to represent multiple conformational states and avoid the “averaging” effect seen in regression models when the ground truth is ambiguous. * Specialized Architectures: Despite the “bitter lesson” of general-purpose transformers, the speakers argue that equivariant architectures remain vastly superior for biological data due to the inherent 3D geometric constraints of molecules. Boltz-2 and Generative Protein Design * Unified Encoding: Boltz-2 (and BoltzGen) treats structure and sequence prediction as a single task by encoding amino acid identities into the atomic composition of the predicted structure. * Design Specifics: Instead of a sequence, users feed the model blank tokens and a high-level “spec” (e.g., an antibody framework), and the model decodes both the 3D structure and the corresponding amino acids. * Affinity Prediction: While model confidence is a common metric, Boltz-2 focuses on affinity prediction—quantifying exactly how tightly a designed binder will stick to its target. Real-World Validation and Productization * Generalized Validation: To prove the model isn’t just “regurgitating” known data, Boltz tested its designs on 9 targets with zero known interactions in the PDB, achieving nanomolar binders for two-thirds of them. * Boltz Lab Infrastructure: The newly launched Boltz Lab platform provides “agents” for protein and small molecule design, optimized to run 10x faster than open-source versions through proprietary GPU kernels. * Human-in-the-Loop: The platform is designed to convert skeptical medicinal chemists by allowing them to run parallel screens and use their intuition to filter model outputs. Transcript RJ [00:05:35]: But the goal remains to, like, you know, really challenge the models, like, how well do these models generalize? And, you know, we’ve seen in some of the latest CASP competitions, like, while we’ve become really, really good at proteins, especially monomeric proteins, you know, other modalities still remain pretty difficult. So it’s really essential, you know, in the field that there are, like, these efforts to gather, you know, benchmarks that are challenging. So it keeps us in line, you know, about what the models can do or not. Gabriel [00:06:26]: Yeah, it’s interesting you say that, like, in some sense, CASP, you know, at CASP 14, a problem was solved and, like, pretty comprehensively, right? But at the same time, it was really only the beginning. So you can say, like, what was the specific problem you would argue was solved? And then, like, you know, what is remaining, which is probably quite open. RJ [00:06:48]: I think we’ll steer away from the term solved, because we have many friends in the community who get pretty upset at that word. And I think, you know, fairly so. But the problem that was, you know, that a lot of progress was made on was the ability to predict the structure of single chain proteins. So proteins can, like, be composed of many chains. And single chain proteins are, you know, just a single sequence of amino acids. And one of the reasons that we’ve been able to make such progress is also because we take a lot of hints from evolution. So the way the models work is that, you know, they sort of decode a lot of hints. That comes from evolutionary landscapes. So if you have, like, you know, some protein in an animal, and you go find the similar protein across, like, you know, different organisms, you might find different mutations in them. And as it turns out, if you take a lot of the sequences together, and you analyze them, you see that some positions in the sequence tend to evolve at the same time as other positions in the sequence, sort of this, like, correlation between different positions. And it turns out that that is typically a hint that these two positions are close in three dimension. So part of the, you know, part of the breakthrough has been, like, our ability to also decode that very, very effectively. But what it implies also is that in absence of that co-evolutionary landscape, the models don’t quite perform as well. And so, you know, I think when that information is available, maybe one could say, you know, the problem is, like, somewhat solved. From the perspective of structure prediction, when it isn’t, it’s much more challenging. And I think it’s also worth also differentiating the, sometimes we confound a little bit, structure prediction and folding. Folding is the more complex process of actually understanding, like, how it goes from, like, this disordered state into, like, a structured, like, state. And that I don’t think we’ve made that much progress on. But the idea of, like, yeah, going straight to the answer, we’ve become pretty good at. Brandon [00:08:49]: So there’s this protein that is, like, just a long chain and it folds up. Yeah. And so we’re good at getting from that long chain in whatever form it was originally to the thing. But we don’t know how it necessarily gets to that state. And there might be intermediate states that it’s in sometimes that we’re not aware of. RJ [00:09:10]: That’s right. And that relates also to, like, you know, our general ability to model, like, the different, you know, proteins are not static. They move, they take different shapes based on their energy states. And I think we are, also not that good at understanding the different states that the protein can be in and at what frequency, what probability. So I think the two problems are quite related in some ways. Still a lot to solve. But I think it was very surprising at the time, you know, that even with these evolutionary hints that we were able to, you know, to make such dramatic progress. Brandon [00:09:45]: So I want to ask, why does the intermediate states matter? But first, I kind of want to understand, why do we care? What proteins are shaped like? Gabriel [00:09:54]: Yeah, I mean, the proteins are kind of the machines of our body. You know, the way that all the processes that we have in our cells, you know, work is typically through proteins, sometimes other molecules, sort of intermediate interactions. And through that interactions, we have all sorts of cell functions. And so when we try to understand, you know, a lot of biology, how our body works, how disease work. So we often try to boil it down to, okay, what is going right in case of, you know, our normal biological function and what is going wrong in case of the disease state. And we boil it down to kind of, you know, proteins and kind of other molecules and their interaction. And so when we try predicting the structure of proteins, it’s critical to, you know, have an understanding of kind of those interactions. It’s a bit like seeing the difference between... Having kind of a list of parts that you would put it in a car and seeing kind of the car in its final form, you know, seeing the car really helps you understand what it does. On the other hand, kind of going to your question of, you know, why do we care about, you know, how the protein falls or, you know, how the car is made to some extent is that, you know, sometimes when something goes wrong, you know, there are, you know, cases of, you know, proteins misfolding. In some diseases and so on, if we don’t understand this fo