69 episodes

A podcast about computational biology, bioinformatics, and next generation sequencing.

the bioinformatics chat Roman Cheplyaka

    • Science
    • 4.8 • 32 Ratings

A podcast about computational biology, bioinformatics, and next generation sequencing.

    Suffix arrays in optimal compressed space and δ-SA with Tomasz Kociumaka and Dominik Kempa

    Suffix arrays in optimal compressed space and δ-SA with Tomasz Kociumaka and Dominik Kempa

    Today on the podcast we have Tomasz Kociumaka and Dominik Kempa,
    the authors of the preprint
    Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space.


    The suffix array is one of the foundational data structures in bioinformatics,
    serving as an index that allows fast substring searches in a large text.
    However, in its raw form, the suffix array occupies the space proportional to (and
    several times larger than) the original text.


    In their paper, Tomasz and Dominik construct a new index, δ-SA, which on the
    one hand can be used in the same way (answer the same queries) as the suffix
    array and the inverse suffix array, and on the other hand, occupies the space
    roughly proportional to the gzip’ed text (or, more precisely, to the measure δ
    that they define — hence the name).


    Moreover, they mathematically prove that this index is optimal, in the sense
    that any index that supports these queries — or even much weaker queries, such
    as simply accessing the i-th character of the text — cannot be significantly
    smaller (as a function of δ) than δ-SA.






    Links:



    Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space (Dominik Kempa, Tomasz Kociumaka)





    Thank you to Jake Yeung and other Patreon members for supporting this episode.

    • 56 min
    Phylogenetic inference from raw reads and Read2Tree with David Dylus

    Phylogenetic inference from raw reads and Read2Tree with David Dylus

    In this episode,
    David Dylus talks about
    Read2Tree,
    a tool that builds alignment matrices and phylogenetic trees from raw
    sequencing reads.
    By leveraging the database of orthologous genes called OMA, Read2Tree bypasses traditional, time-consuming steps such as genome assembly, annotation and all-versus-all sequence comparisons.





    Links:



    Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree
    (David Dylus, Adrian Altenhoff, Sina Majidian, Fritz J. Sedlazeck, Christophe Dessimoz)
    Background story
    Read2Tree on GitHub
    OMA browser
    The Guardian’s podcast about Victoria Amelina and Volodymyr Vakulenko





    If you enjoyed this episode, please consider supporting the podcast on Patreon.

    • 49 min
    AlphaFold and variant effect prediction with Amelie Stein

    AlphaFold and variant effect prediction with Amelie Stein

    This is the third and final episode in the AlphaFold series, originally recorded on February 23, 2022,
    with Amelie Stein, now an associate professor at the University of Copenhagen.


    In the episode, Amelie explains what 𝛥𝛥G is, how it informs us
    whether a particular protein mutation affects its stability, and how AlphaFold 2
    helps in this analysis.






    A note from Amelie:



    Something that has happened in the meantime is the publication of methods
    that predict 𝛥𝛥G with ML methods, so much faster than Rosetta. One of
    them, RaSP, is from our group, while
    ddMut is from another subset of
    authors of the AF2 community assessment paper.



    Other links:



    A structural biology community assessment of AlphaFold2 applications
    (Mehmet Akdel, Douglas E. V. Pires, Eduard Porta Pardo, Jürgen Jänes, Arthur O. Zalevsky, Bálint Mészáros, Patrick Bryant, Lydia L. Good, Roman A. Laskowski, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas, Victoria Ruiz Serra, Carlos H. M. Rodrigues, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Adam Frost, Jérôme Basquin, Kresten Lindorff-Larsen, Alex Bateman, Andrey V. Kajava, Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan I. Croll & Pedro Beltrao)
    A crime in the making: Russia’s atrocities — the podcast episode about the Olenivka prison massacre





    If you enjoyed this episode, please consider supporting the podcast on Patreon.

    • 35 min
    AlphaFold and shape-mers with Janani Durairaj

    AlphaFold and shape-mers with Janani Durairaj

    This is the second episode in the AlphaFold series, originally recorded on February 14, 2022,
    with Janani Durairaj, a postdoctoral
    researcher at the University of Basel.


    Janani talks about how she used shape-mers and topic modelling to discover
    classes of proteins assembled by AlphaFold 2 that were absent from the Protein
    Data Bank (PDB).





    The bioinformatics discussion starts at 03:35.


    Links:



    A structural biology community assessment of AlphaFold2 applications
    (Mehmet Akdel, Douglas E. V. Pires, Eduard Porta Pardo, Jürgen Jänes, Arthur O. Zalevsky, Bálint Mészáros, Patrick Bryant, Lydia L. Good, Roman A. Laskowski, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas, Victoria Ruiz Serra, Carlos H. M. Rodrigues, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Adam Frost, Jérôme Basquin, Kresten Lindorff-Larsen, Alex Bateman, Andrey V. Kajava, Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan I. Croll & Pedro Beltrao)
    The Protein Universe Atlas
    What is hidden in the darkness? Deep-learning assisted large-scale protein family curation uncovers novel protein families and folds (Janani Durairaj, Andrew M. Waterhouse, Toomas Mets, Tetiana Brodiazhenko, Minhal Abdullah, Gabriel Studer, Mehmet Akdel, Antonina Andreeva, Alex Bateman, Tanel Tenson, Vasili Hauryliuk, Torsten Schwede, Joana Pereira)
    Geometricus: Protein Structures as Shape-mers derived from Moment Invariants on GitHub
    The group page
    The Folded Weekly newsletter
    A New York Times article about the Kramatorsk missile strike. The Instagram video, part of which you can hear at the beginning of the episode, appears to have been deleted.





    If you enjoyed this episode, please consider supporting the podcast on Patreon.

    • 20 min
    AlphaFold and protein interactions with Pedro Beltrao

    AlphaFold and protein interactions with Pedro Beltrao

    In this episode, originally recorded on February 9, 2022,
    Roman talks to Pedro Beltrao
    about AlphaFold, the software developed by DeepMind that predicts a protein’s
    3D structure from its amino acid sequence.


    Pedro is an associate professor at ETH Zurich and the coordinator of
    the structural biology community assessment of AlphaFold2 applications project,
    which involved over 30 scientists from different institutions.


    Pedro talks about the origins of the project,
    its main findings, the importance of the confidence metric that AlphaFold
    assigns to its predictions, and Pedro’s own area of interest — predicting
    pockets in proteins and protein-protein interactions.






    Links:



    A structural biology community assessment of AlphaFold2 applications
    (Mehmet Akdel, Douglas E. V. Pires, Eduard Porta Pardo, Jürgen Jänes, Arthur O. Zalevsky, Bálint Mészáros, Patrick Bryant, Lydia L. Good, Roman A. Laskowski, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas, Victoria Ruiz Serra, Carlos H. M. Rodrigues, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Adam Frost, Jérôme Basquin, Kresten Lindorff-Larsen, Alex Bateman, Andrey V. Kajava, Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan I. Croll & Pedro Beltrao)
    Pedro’s group at ETH Zurich





    If you enjoyed this episode, please consider supporting the podcast on Patreon.

    • 52 min
    Enformer: predicting gene expression from sequence with Žiga Avsec

    Enformer: predicting gene expression from sequence with Žiga Avsec

    In this episode, Jacob Schreiber interviews Žiga Avsec about
    a recently released model, Enformer. Their discussion begins with life
    differences between academia and industry, specifically about how research
    is conducted in the two settings. Then, they discuss the Enformer model,
    how it builds on previous work, and the potential that models like it have
    for genomics research in the future. Finally, they have a high-level discussion
    on the state of modern deep learning libraries and which ones they use in their
    day-to-day developing.






    Links:



    Effective gene expression prediction from sequence by integrating long-range interactions (Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli & David R. Kelley )
    DeepMind Blog Post (Žiga Avsec)





    If you enjoyed this episode, please consider supporting the podcast on Patreon.

    • 59 min

Customer Reviews

4.8 out of 5
32 Ratings

32 Ratings

Adam Klie ,

Great breadth and exposition of cool topics!

I get a lot out of these podcasts! I’m a 3rd year PhD student studying bioinformatics and I feel that the breadth of these topics are giving me a much better feel of all that’s out there. They also have simplified a lot of complex concepts for me. Thanks so much for putting this on! Dreaming of the day where I can be a guest ;)

slinkerlee ,

great podcast!

This podcast has great interviews and in-depth coverage of new tools and techniques.

Top Podcasts In Science

Hidden Brain, Shankar Vedantam
WNYC Studios
Sam Harris
Mike Carruthers | OmniCast Media | Cumulus Podcast Network
Alie Ward
Neil deGrasse Tyson

You Might Also Like

Springer Nature Limited
Science Magazine
Scientific American
The Economist
The Economist
NPR