60 episodes

Welcome to this space dedicated to the M2D2 Talks co-organized by Valence Discovery and Mila - Quebec AI Institute.

From applied research papers to open source projects, we're hoping to use these talks to help demystify AI for drug discovery and make the field more accessible for newcomers. M2D2 will bring our vibrant AI & drug discovery communities together and spark new perspectives, provoke discussions, and offer a safe space to share new ideas.

For the best experience, please visit our YouTube channel where slides and video presentations can be referenced.

Molecular Modelling and Drug Discovery Valence Discovery

    • Science

Welcome to this space dedicated to the M2D2 Talks co-organized by Valence Discovery and Mila - Quebec AI Institute.

From applied research papers to open source projects, we're hoping to use these talks to help demystify AI for drug discovery and make the field more accessible for newcomers. M2D2 will bring our vibrant AI & drug discovery communities together and spark new perspectives, provoke discussions, and offer a safe space to share new ideas.

For the best experience, please visit our YouTube channel where slides and video presentations can be referenced.

    Structure-Independent Peptide Binder Design via Generative Language Models | Pranam Chatterjee

    Structure-Independent Peptide Binder Design via Generative Language Models | Pranam Chatterjee

    [DISCLAIMER] - For the full visual experience, we recommend you tune in through our ⁠⁠⁠⁠⁠⁠⁠⁠YouTube channel ⁠⁠⁠⁠⁠⁠⁠⁠to see the presented slides.

    Try datamol.io - the open source toolkit that simplifies molecular processing and featurization workflows for machine learning scientists working in drug discovery: ⁠⁠⁠⁠⁠https://datamol.io/⁠⁠⁠⁠⁠

    If you enjoyed this talk, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠Molecular Modeling and Drug Discovery (M2D2) talks⁠⁠⁠⁠⁠⁠⁠⁠ live.

    Also, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠M2D2 Slack⁠⁠⁠⁠⁠⁠⁠⁠.

    Abstract: The ability to modulate pathogenic proteins represents a powerful treatment strategy for diseases. Unfortunately, many proteins are considered “undruggable” by small molecules, and are often intrinsically disordered, precluding the usage of structure-based tools for binder design. To address these challenges, we have developed a suite of algorithms that enable the design of target-specific peptides via protein language model embeddings, without the requirement of 3D structures. First, we train a model that leverages ESM-2 embeddings to efficiently select high-affinity peptides from natural protein interaction interfaces. We experimentally fuse model-derived peptides to E3 ubiquitin ligases and identify candidates exhibiting robust degradation of undruggable targets in human cells. Next, we develop a high-accuracy discriminator, based on the CLIP architecture, to prioritize and screen peptides with selectivity to a specified target protein. As input to the discriminator, we create a Gaussian diffusion generator to sample an ESM-2-based latent space, fine-tuned on experimentally-valid peptide sequences. Finally, to enable de novo generation of binding peptides, we train an instance of GPT-2 with protein interacting sequences to enable peptide generation conditioned on target sequence. Our model demonstrates low perplexities across both existing and generated peptide sequences. Together, our work lays the foundation for programmable protein targeting and editing applications.



    Speaker: Pranam Chatterjee

    Twitter -  ⁠⁠⁠⁠⁠⁠⁠⁠Prudencio⁠⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠Jonny⁠⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠datamol.io

    • 1 hr
    Learning Local Equivariant Representations for Large-Scale Atomistic Dynamics | Albert Musaelian

    Learning Local Equivariant Representations for Large-Scale Atomistic Dynamics | Albert Musaelian

    [DISCLAIMER] - For the full visual experience, we recommend you tune in through our ⁠⁠⁠⁠⁠⁠⁠⁠⁠YouTube channel ⁠⁠⁠⁠⁠⁠⁠⁠⁠to see the presented slides.

    Try datamol.io - the open source toolkit that simplifies molecular processing and featurization workflows for machine learning scientists working in drug discovery: ⁠⁠⁠⁠⁠⁠https://datamol.io/⁠⁠⁠⁠⁠⁠

    If you enjoyed this talk, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠⁠Molecular Modeling and Drug Discovery (M2D2) talks⁠⁠⁠⁠⁠⁠⁠⁠⁠ live.

    Also, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠⁠M2D2 Slack⁠⁠⁠⁠⁠⁠⁠⁠⁠.

    Abstract: Trade-offs between accuracy and speed have long limited the applications of machine learning interatomic potentials. Recently, E(3)-equivariant architectures have demonstrated leading accuracy, data efficiency, transferability, and simulation stability, but their computational cost and scaling has generally reinforced this trade-off. In particular, the ubiquitous use of message passing architectures has precluded the extension of accessible length- and time-scales with efficient multi-GPU calculations.

    In this talk I will discuss Allegro, a strictly local equivariant deep learning interatomic potential designed for parallel scalability and increased computational efficiency that simultaneously exhibits excellent accuracy. After presenting the architecture, I will discuss applications and benchmarks on various materials and chemical systems, including recent demonstrations of scaling to large all-atom biomolecular systems such as solvated proteins and a 44 million atom model of the HIV capsid. Finally, I will summarize the software ecosystem and tooling around Allegro.

    Speaker: Albert Musaelian

    Twitter -  ⁠⁠⁠⁠⁠⁠⁠⁠⁠Prudencio⁠⁠⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠⁠ Jonny⁠⁠⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠⁠datamol.io

    • 1 hr 9 min
    Multimodal Deep Learning for Protein Engineering | Kevin K. Yang

    Multimodal Deep Learning for Protein Engineering | Kevin K. Yang

    [DISCLAIMER] - For the full visual experience, we recommend you tune in through our ⁠⁠⁠⁠⁠⁠⁠⁠YouTube channel ⁠⁠⁠⁠⁠⁠⁠⁠to see the presented slides.

    Try datamol.io - the open source toolkit that simplifies molecular processing and featurization workflows for machine learning scientists working in drug discovery: ⁠⁠⁠⁠⁠https://datamol.io/⁠⁠⁠⁠⁠

    If you enjoyed this talk, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠Molecular Modeling and Drug Discovery (M2D2) talks⁠⁠⁠⁠⁠⁠⁠⁠ live.

    Also, consider joining the ⁠⁠⁠⁠⁠⁠⁠⁠M2D2 Slack⁠⁠⁠⁠⁠⁠⁠⁠.

    Abstract: Engineered proteins play increasingly essential roles in industries and applications spanning pharmaceuticals, agriculture, specialty chemicals, and fuel. Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Large self-supervised models pretrained on millions of protein sequences have recently gained popularity in generating embeddings of protein sequences for protein property prediction. However, protein datasets contain information in addition to sequence that can improve model performance. This talk will cover models that use sequences, structures, and biophysical features to predict protein function or to generate functional proteins.

    Speaker: Kevin K. Yang

    Twitter -  ⁠⁠⁠⁠⁠⁠⁠⁠Prudencio⁠⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠Jonny⁠⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠⁠datamol.io

    • 1 hr 2 min
    Systematic Analysis of Biomolecular Conformational Ensembles with PENSA | Martin Vögele

    Systematic Analysis of Biomolecular Conformational Ensembles with PENSA | Martin Vögele

    [DISCLAIMER] - For the full visual experience, we recommend you tune in through our ⁠⁠⁠⁠⁠⁠⁠YouTube channel ⁠⁠⁠⁠⁠⁠⁠to see the presented slides.

    Try datamol.io - the open source toolkit that simplifies molecular processing and featurization workflows for machine learning scientists working in drug discovery: ⁠⁠⁠⁠https://datamol.io/⁠⁠⁠⁠

    If you enjoyed this talk, consider joining the ⁠⁠⁠⁠⁠⁠⁠Molecular Modeling and Drug Discovery (M2D2) talks⁠⁠⁠⁠⁠⁠⁠ live.

    Also, consider joining the ⁠⁠⁠⁠⁠⁠⁠M2D2 Slack⁠⁠⁠⁠⁠⁠⁠.

    Abstract: Molecular simulations enable the study of biomolecules and their dynamics on an atomistic scale. A common task is to compare several simulation conditions - like mutations or different ligands - to find significant differences and interrelations between them. However, the large amount of data produced for ever larger and more complex systems often renders it difficult to identify the structural features that are relevant for a particular phenomenon. PENSA is a flexible software package that enables a comprehensive and thorough investigation into biomolecular conformational ensembles. It provides a wide variety of featurizations and feature transformations that allow for a complete representation of biomolecules like proteins and nucleic acids, including water and ion cavities within the biomolecular structure, thus avoiding bias that would come with manual selection of features. PENSA implements various methods to systematically compare the distributions of these features across ensembles to find the significant differences between them and identify regions of interest. It also includes a novel approach to quantify the state-specific information between two regions of a biomolecule which allows, e.g., the tracing of information flow to identify signaling pathways. PENSA is a modular open-source library that also comes with convenient tools for loading data and visualizing results in ways that make them quick to process and easy to interpret. This talk will demonstrate its usefulness in real-world examples by showing how it helps to determine molecular mechanisms efficiently.

    Speaker: Martin Vögele

    Twitter -  ⁠⁠⁠⁠⁠⁠⁠Prudencio⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠Jonny⁠⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠⁠datamol.io

    • 50 min
    Training Neural Network Potentials: Bayesian and Simulation-based Approaches | Stephan Thaler

    Training Neural Network Potentials: Bayesian and Simulation-based Approaches | Stephan Thaler

    [DISCLAIMER] - For the full visual experience, we recommend you tune in through our ⁠⁠⁠⁠⁠⁠YouTube channel ⁠⁠⁠⁠⁠⁠to see the presented slides.

    Try datamol.io - the open source toolkit that simplifies molecular processing and featurization workflows for machine learning scientists working in drug discovery: ⁠⁠⁠https://datamol.io/⁠⁠⁠

    If you enjoyed this talk, consider joining the ⁠⁠⁠⁠⁠⁠Molecular Modeling and Drug Discovery (M2D2) talks⁠⁠⁠⁠⁠⁠ live.

    Also, consider joining the ⁠⁠⁠⁠⁠⁠M2D2 Slack⁠⁠⁠⁠⁠⁠.

    Abstract: Cryptic pockets, which are absent in ligand-free structures and have the potential to be used as drug targets, are often challenging to access through conventional biomolecular simulations due to their slow motions. To overcome this limitation, we have combined AlphaFold and Markov State modelling (MSM) to accelerate the discovery of cryptic pockets. AlphaFold was used to generate a diverse structural ensemble with open or partially open pockets that can serve as starting points for molecular dynamics simulations which were later stitched together using MSM to predict free energy and kinetics associated with cryptic pocket opening. Our approach explored known cryptic pockets, as well as discovered new cryptic pockets which were absent in PDB. Our study highlighted the power of AlphaFold and MSM to discover novel cryptic pockets which can unlock development of next-gen therapeutics.

    Speaker: Stephan Thaler

    Twitter -  ⁠⁠⁠⁠⁠⁠Prudencio⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠Jonny⁠⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠⁠datamol.io

    • 1 hr 3 min
    Accelerating Cryptic Pocket Discovery Using Alphafold and Markov State Modelling | Soumendranath Bhakat

    Accelerating Cryptic Pocket Discovery Using Alphafold and Markov State Modelling | Soumendranath Bhakat

    [DISCLAIMER] - For the full visual experience, we recommend you tune in through our ⁠⁠⁠⁠⁠YouTube channel ⁠⁠⁠⁠⁠to see the presented slides.

    Try datamol.io - the open source toolkit that simplifies molecular processing and featurization workflows for machine learning scientists working in drug discovery: ⁠⁠https://datamol.io/⁠⁠

    If you enjoyed this talk, consider joining the ⁠⁠⁠⁠⁠Molecular Modeling and Drug Discovery (M2D2) talks⁠⁠⁠⁠⁠ live.

    Also, consider joining the ⁠⁠⁠⁠⁠M2D2 Slack⁠⁠⁠⁠⁠.

    Abstract: Cryptic pockets, which are absent in ligand-free structures and have the potential to be used as drug targets, are often challenging to access through conventional biomolecular simulations due to their slow motions. To overcome this limitation, we have combined AlphaFold and Markov State modelling (MSM) to accelerate the discovery of cryptic pockets. AlphaFold was used to generate a diverse structural ensemble with open or partially open pockets that can serve as starting points for molecular dynamics simulations which were later stitched together using MSM to predict free energy and kinetics associated with cryptic pocket opening. Our approach explored known cryptic pockets, as well as discovered new cryptic pockets which were absent in PDB. Our study highlighted the power of AlphaFold and MSM to discover novel cryptic pockets which can unlock development of next-gen therapeutics.

    Speaker: ⁠Soumendranath Bhakat 

    Twitter -  ⁠⁠⁠⁠⁠Prudencio⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠Jonny⁠⁠⁠⁠⁠

    Twitter - ⁠⁠⁠⁠⁠datamol.io

    • 31 min

Top Podcasts In Science

Something You Should Know
Mike Carruthers | OmniCast Media | Cumulus Podcast Network
Hidden Brain
Hidden Brain, Shankar Vedantam
Radiolab
WNYC Studios
Ologies with Alie Ward
Alie Ward
StarTalk Radio
Neil deGrasse Tyson
Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas
Sean Carroll | Wondery