Micro binfie podcast Microbial Bioinformatics
-
- Science
Microbial Bioinformatics is a rapidly changing field marrying computer science and microbiology. Join us as we share some tips and tricks we’ve learnt over the years. If you’re student just getting to grips to the field, or someone who just wants to keep tabs on the latest and greatest - this podcast is for you.
The hosts are Dr. Lee Katz from the Centres for Disease Control and Prevention (US), Dr. Nabil-Fareed Alikhan and Dr. Andrew Page both from Quadram Institute Bioscience (UK) and bring together years of experience in microbial bioinformatics.
The opinions expressed here are our own and do not necessarily reflect the views of Centres for Disease Control and Prevention or Quadram Institute Bioscience.
Intro music : Werq - Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/
Outro music : Scheming Weasel (faster version) - Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/
Question and comments? microbinfie@gmail.com
-
Kostas Konstantinidis returns to talk to us about ANI and metagenomics
Kostas Konstantinidis returns to talk to us about ANI and metagenomics by Microbial Bioinformatics
-
Kostas Konstantinidis talks to us about ANI and metagenomics
We talk with Kostas! For more information please visit https://enve-omics.gatech.edu/
-
123 The Revolution of Hash Databases in cgMLST
In this episode of the Micro Binfie Podcast, hosts Dr. Andrew Page and Dr. Lee Katz delve into the fascinating world of hash databases and their application in cgMLST (core genome Multilocus Sequence Typing) for microbial bioinformatics.
The discussion begins with the challenges faced by bioinformaticians due to siloed MLST databases across the globe, which hinder synchronization and effective genomic surveillance. To address these issues, the concept of using hash databases for allele identification is introduced. Hashing allows for the creation of unique identifiers for genetic sequences, enabling easier database synchronization without the need for extensive system support or resources.
Dr. Katz explains the principle of hashing and its application in genomics, where even a single nucleotide polymorphism (SNP) can result in a different hash, making it a perfect solution for distinguishing alleles. Various hashing algorithms, such as MD5 and SHA-256, are discussed, along with their advantages and potential risks of hash collisions. Despite these risks, the use of more complex hashes has been shown to significantly reduce the probability of such collisions.
The episode also explores practical aspects of implementing hash databases in bioinformatics software, highlighting the need for exact matching algorithms due to the nature of hashing. Existing tools like eToKi and upcoming software are mentioned as examples of applications that can utilize hash databases.
Furthermore, the conversation touches on the concept of sequence types in cgMLST and the challenges associated with naming and standardizing them in a decentralized database system. Alternatives like allele codes are mentioned, which could potentially simplify the representation of sequence types.
Finally, the potential for adopting this hashing approach within larger bioinformatics organizations like Phage or GMI is discussed, with an emphasis on the need for a standardized and community-supported framework to ensure the longevity and effectiveness of hash databases in microbial genomics.
This episode provides a comprehensive overview of how hash databases can revolutionize microbial genomics by solving long-standing issues of database synchronization and allele identification, paving the way for more efficient and collaborative genomic surveillance worldwide. -
122 GAMBIT: Genomic Approximation Method for Bacterial Identification and Tracking
We discuss GAMBIT, software for accurately classifying bacteria and eukaryotes using a targeted k-mer based approach.
GAMBIT software: https://github.com/gambit-suite/gambit
GAMBIT suite: https://github.com/gambit-suite
GAMBIT (Genomic Approximation Method for Bacterial Identification and Tracking): A methodology to rapidly leverage whole genome sequencing of bacterial isolates for clinical identification.
https://doi.org/10.1371/journal.pone.0277575
TheiaEuk: a species-agnostic bioinformatics workflow for fungal genomic characterization
https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1198213/full -
121 K-mers, Sourmash, and Open Source Software - More Conversations with Titus Brown
In this episode, Andrew Page and Lee Katz continue their conversation with Titus Brown, diving deeper into his work on k-mers, Sourmash, and open source software development:
Topics discussed:
K-mers for analyzing sequencing data, and how Sourmash builds on MinHash
How Sourmash handles k-mers for metagenomic comparisons vs. MASH
The modhash and bottom sketch approaches used in Sourmash
Dealing with sequencing errors and noise in k-mer data
Sourmash as a reference-based method, and applications for metagenomics
Titus' focus on building reusable libraries and APIs vs one-off tools
Recruiting collaborators through "nerd sniping" with interesting problems
The open source philosophy that motivates Titus' software work
Overall, the conversation provides insight into Titus' approach to bioinformatics software through iterating quickly, focusing on usability, and building open source tools.
Papers:
Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4
Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2
IBD exploration - https://dib-lab.github.io/2021-paper-ibd/ -
120 Scaling Metagenomic Search with Sourmash - Conversations with Titus Brown
In this final episode with Titus Brown, the conversation focuses on his work scaling metagenomic search with Sourmash:
An overview of what Sourmash does - sketching and comparing large k-mer datasets
How the sampling approach enables analyses like containment estimation
Exciting capabilities of the Branchwater tool for multi-threaded real-time SRA search
Scaling to search across millions of metagenomes in seconds with WebAssembly
Potential public health applications for tracking and sourcing pathogens
Important caveats around resolution limits and need for follow-up analyses
Ongoing work to characterize the technique's specificity and sensitivity
Overall, this episode highlights the massive scaling Sourmash enables for metagenomic search, and the potential use cases in public health, while acknowledging current limitations and uncertainties. Titus emphasizes the need to precisely convey what bioinformatic tools can and cannot do as research continues.
Papers:
Spacegraphcats - https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02066-4
Sourmash - https://www.biorxiv.org/content/10.1101/2022.01.11.475838v2
IBD exploration - https://dib-lab.github.io/2021-paper-ibd/