64 episodes

A podcast about computational biology, bioinformatics, and next generation sequencing.

the bioinformatics chat Roman Cheplyaka

    • Science
    • 4.8 • 32 Ratings

A podcast about computational biology, bioinformatics, and next generation sequencing.

    Enformer: predicting gene expression from sequence with Žiga Avsec

    Enformer: predicting gene expression from sequence with Žiga Avsec

    In this episode, Jacob Schreiber interviews Žiga Avsec about
    a recently released model, Enformer. Their discussion begins with life
    differences between academia and industry, specifically about how research
    is conducted in the two settings. Then, they discuss the Enformer model,
    how it builds on previous work, and the potential that models like it have
    for genomics research in the future. Finally, they have a high-level discussion
    on the state of modern deep learning libraries and which ones they use in their
    day-to-day developing.






    Links:



    Effective gene expression prediction from sequence by integrating long-range interactions (Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli & David R. Kelley )
    DeepMind Blog Post (Žiga Avsec)

    • 59 min
    Bioinformatics Contest 2021 with Maksym Kovalchuk and James Matthew Holt

    Bioinformatics Contest 2021 with Maksym Kovalchuk and James Matthew Holt

    The Bioinformatics Contest is back this year, and we are back to discuss
    it!


    This year’s contest winners
    Maksym Kovalchuk (1st prize) and
    Matt Holt (2nd prize)
    talk about how they approach
    participating in the contest and what strategies have earned them the top
    scores.






    Timestamps and links for the individual problems:



    00:10:36 Genotype Imputation
    00:21:26 Causative Mutation
    00:30:27 Superspreaders
    00:37:22 Minor Haplotype
    00:46:37 Isoform Matching


    Links:



    Matt’s solutions
    Max’s solutions

    • 1 hr
    Steady states of metabolic networks and Dingo with Apostolos Chalkis

    Steady states of metabolic networks and Dingo with Apostolos Chalkis

    In this episode, Apostolos Chalkis presents sampling steady
    states of metabolic networks as an alternative to the widely used flux balance
    analysis (FBA). We also discuss dingo, a
    Python package written by Apostolos that employs geometric random walks to
    sample steady states. You can see dingo in action
    here.






    Links:



    Dingo on GitHub
    Searching for COVID-19 treatments using metabolic networks
    Tweag open source fellowships
    This episode was originally published on the Compositional podcast.

    • 38 min
    3D genome organization and GRiNCH with Da-Inn Erika Lee

    3D genome organization and GRiNCH with Da-Inn Erika Lee

    In this episode, Jacob Schreiber interviews Da-Inn Erika Lee about
    data and computational methods for making sense of 3D genome structure. They
    begin their discussion by talking about 3D genome structure at a high level
    and the challenges in working with such data. Then, they discuss a method
    recently developed by Erika, named GRiNCH, that mines this data to
    identify spans of the genome that cluster together in 3D space and
    potentially help control gene regulation.






    Links:


    GRiNCH: simultaneous smoothing and detection of topological units of genome organization from sparse chromatin contact count matrices with matrix factorization (Da-Inn Lee and Sushmita Roy)
    GRiNCH Project Page
    In silico prediction of high-resolution Hi-C interaction matrices(Shilu Zhang, Deborah Chasman, Sara Knaack, and Sushmita Roy)

    • 1 hr 9 min
    Differential gene expression and DESeq2 with Michael Love

    Differential gene expression and DESeq2 with Michael Love

    In this episode, Michael Love joins us to talk about the differential gene
    expression analysis from bulk RNA-Seq data.


    We talk about the history of Mike’s own differential expression package,
    DESeq2, as well as other packages in this space, like edgeR and limma, and the
    theory they are based upon. Mike also shares his experience of being the
    author and maintainer of a popular bioninformatics package.






    Links:



    Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
    (Love, M.I., Huber, W. & Anders, S.)
    DESeq2 on Bioconductor
    Chan Zuckerberg Initiative: Ensuring Reproducible Transcriptomic Analysis with DESeq2 and tximeta


    And a more comprehensive set of links from Mike himself:



    limma, the original paper and limma-voom:

    https://pubmed.ncbi.nlm.nih.gov/16646809/

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053721/




    edgeR papers:

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2796818/

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/




    The recent manuscript mentioned from the Kendziorski lab, which has a Gamma-Poisson hierarchical structure, although it does not in general reduce to the Negative Binomial:

    https://doi.org/10.1101/2020.10.28.359901




    We talk about robust steps for estimating the middle of the dispersion prior distribution, references are Anders and Huber 2010 (DESeq), Eling et al 2018 (one of the BASiCS papers), and Phipson et al 2016:

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218662/
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6167088/
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373812/




    The Stan software:

    https://mc-stan.org/




    We talk about using publicly available data as a prior, references I mention are the McCall et al paper using publicly available data to ask if a gene is expressed, and a new manuscript from my lab that compares splicing in a sample to GTEx as a reference panel:

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013751/
    https://doi.org/10.1101/856401




    Regarding estimating the width of the dispersion prior, references are the Robinson and Smyth 2007 paper, McCarthy et al 2012 (edgeR), and Wu et al 2013 (DSS):

    https://pubmed.ncbi.nlm.nih.gov/17881408/

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3378882/

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590927/




    Schurch et al 2016, a RNA-seq dataset with many replicates, helpful for benchmarking:

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4878611/




    Stephens paper on the false sign rate (ash):

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5379932/




    Heavy-tailed distributions for effect sizes, Zhu et al 2018:

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6581436/




    I credit Kevin Blighe and Alexander Toenges, who help to answer lots of DESeq2 questions on the support site:

    https://www.biostars.org/u/41557/

    https://www.biostars.org/u/25721/




    The EOSS award, which has funded vizWithSCE by Kwame Forbes, and nullranges by Wancen Mu and Eric Davis:

    https://chanzuckerberg.com/eoss/proposals/ensuring-reproducible-transcriptomic-analysis-with-deseq2-and-tximeta/

    https://kwameforbes.github.io/vizWithSCE/

    https://nullranges.github.io/nullranges/




    One of the recent papers from my lab, MRLocus for eQTL and GWAS integration:

    https://mikelove.github.io/mrlocus/

    • 1 hr 31 min
    Proteomics calibration with Lindsay Pino

    Proteomics calibration with Lindsay Pino

    In this episode, Lindsay Pino discusses the
    challenges of making quantitative measurements in the field of proteomics.
    Specifically, she discusses the difficulties of comparing measurements across
    different samples, potentially acquired in different labs, as well as a method
    she has developed recently for calibrating these measurements without the need
    for expensive reagents. The discussion then turns more broadly to questions in
    genomics that can potentially be addressed using proteomic measurements.





    Links:


    Talus Bioscience
    Matrix-Matched Calibration Curves for Asssessing Analytical Figures of Merit in Quantitative Proteomics
    (Lindsay K. Pino, Brian C. Searle, Han-Yin Yang, Andrew N. Hoofnagle, William S. Noble, and Michael J. MacCross)

    • 48 min

Customer Reviews

4.8 out of 5
32 Ratings

32 Ratings

Adam Klie ,

Great breadth and exposition of cool topics!

I get a lot out of these podcasts! I’m a 3rd year PhD student studying bioinformatics and I feel that the breadth of these topics are giving me a much better feel of all that’s out there. They also have simplified a lot of complex concepts for me. Thanks so much for putting this on! Dreaming of the day where I can be a guest ;)

slinkerlee ,

great podcast!

This podcast has great interviews and in-depth coverage of new tools and techniques.

Top Podcasts In Science

Alie Ward
Hidden Brain
Jordan Harbinger
Vox
Sam Harris
PRX and Greater Good Science Center

You Might Also Like

Leo Elworth
The Bioinformatics CRO
Paul Cooper
Seqera Labs
Lex Fridman
Sam Charrington