43 Folgen

A podcast about computational biology, bioinformatics, and next generation sequencing.

the bioinformatics chat Roman Cheplyaka

    • Biowissenschaften

A podcast about computational biology, bioinformatics, and next generation sequencing.

    #43 Generalized PCA for single-cell data with William Townes

    #43 Generalized PCA for single-cell data with William Townes

    Will Townes proposes a new, simpler way to analyze scRNA-seq data with unique
    molecular identifiers (UMIs). Observing that such data is not zero-inflated,
    Will has designed a PCA-like procedure inspired by generalized linear models
    (GLMs) that, unlike the standard PCA, takes into account statistical
    properties of the data and avoids spurious correlations (such as one or more
    of the top principal components being correlated with the number of non-zero
    gene counts).


    Also check out Will’s paper for a feature selection algorithm based on
    deviance, which we didn’t get a chance to discuss on the podcast.






    Links:



    Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model (F. William Townes, Stephanie C. Hicks, Martin J. Aryee, Rafael A. Irizarry)
    GLM-PCA for R
    GLM-PCA for Python
    scry: an R package for feature selection by deviance (alternative to highly variable genes)
    Droplet scRNA-seq is not zero-inflated (Valentine Svensson)

    • 59 Min.
    #42 Spectrum-preserving string sets and simplitigs with Amatur Rahman and Karel Břinda

    #42 Spectrum-preserving string sets and simplitigs with Amatur Rahman and Karel Břinda

    In this episode we hear from Amatur Rahman
    and Karel Břinda, who
    independently of one another released preprints on the same concept, called
    simplitigs or spectrum-preserving string sets. Simplitigs offer a way to
    efficiently store and query large sets of k-mers—or, equivalently, large de
    Bruijn graphs.





    Links:



    Simplitigs as an efficient and scalable representation of de Bruijn graphs (Karel Břinda, Michael Baym, Gregory Kucherov)
    Representation of k-mer sets using spectrum-preserving string sets (Amatur Rahman, Paul Medvedev)
    Open mic

    • 53 Min.
    #41 Epidemic models with Kris Parag

    #41 Epidemic models with Kris Parag

    Kris Parag is here to teach us about the mathematical modeling of
    infectious disease epidemics. We discuss the SIR model, the renewal models, and how
    insights from information theory can help us predict where an epidemic is
    going.






    Links:



    Optimising Renewal Models for Real-Time Epidemic Prediction and Estimation (KV Parag, CA Donnelly)
    Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models (KV Parag, CA Donnelly)
    The listener survey

    • 1 Std. 8 Min.
    #40 Plasmid classification and binning with Sergio Arredondo-Alonso and Anita Schürch

    #40 Plasmid classification and binning with Sergio Arredondo-Alonso and Anita Schürch

    Does a given bacterial gene live on a plasmid or the chromosome? What
    other genes live on the same plasmid?


    In this episode, we hear from Sergio Arredondo-Alonso and Anita Schürch, whose
    projects mlplasmids and gplas answer these types of questions.






    Links:



    mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species (Sergio Arredondo-Alonso, Malbert R. C. Rogers, Johanna C. Braat, Tess D. Verschuuren, Janetta Top, Jukka Corander, Rob J. L. Willems, Anita C. Schürch)
    gplas: a comprehensive tool for plasmid analysis using short-read graphs (Sergio Arredondo-Alonso, Martin Bootsma, Yaïr Hein, Malbert R.C. Rogers, Jukka Corander, Rob JL Willems, Anita C. Schürch)

    • 45 Min.
    #39 Amplicon sequence variants and bias with Benjamin Callahan

    #39 Amplicon sequence variants and bias with Benjamin Callahan

    In this episode Benjamin Callahan talks about some of the issues faced by
    microbiologists when conducting amplicon sequencing and metagenomic studies. The two main themes are:



    Why one should probably avoid using OTUs (operational taxonomic units) and
    use exact sequence variants (also called amplicon sequence variants, or
    ASVs), and how DADA2 manages to deduce the exact sequences present in the
    sample.
    Why abundances inferred from community sequencing data are biased, and
    how we can model and correct this bias.





    Links:



    Exact sequence variants should replace operational taxonomic units in marker-gene data analysis (Benjamin J Callahan, Paul J McMurdie & Susan P Holmes)
    DADA2: High-resolution sample inference from Illumina amplicon data (Benjamin J Callahan, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson & Susan P Holmes)
    In Nature, There Is Only Diversity (Michael R. McLaren, Benjamin J. Callahan)
    Consistent and correctable bias in metagenomic sequencing experiments (Michael R McLaren, Amy D Willis, Benjamin J Callahan)

    • 1 Std. 1 Min.
    #38 Issues in legacy genomes with Luke Anderson-Trocmé

    #38 Issues in legacy genomes with Luke Anderson-Trocmé

    In this episode Luke Anderson-Trocmé
    talks about his findings from the 1000 Genomes Project. Namely, the early
    sequenced genomes sometimes contain specific mutational signatures that
    haven’t been replicated from other sources and can be found via their
    association with lower base quality scores. Listen to Luke telling the story
    of how he stumbled upon and investigated these fake variants and what their
    impact is.





    Links:



    Legacy Data Confounds Genomics Studies (bioRxiv, Molecular Biology and Evolution (paywall)) (Luke Anderson-Trocmé, Rick Farouni, Mathieu Bourgey, Yoichiro Kamatani, Koichiro Higasa, Jeong-Sun Seo, Changhoon Kim, Fumihiko Matsuda and Simon Gravel)

    • 1 Std. 1 Min.

Top‑Podcasts in Biowissenschaften

Zuhörer haben auch Folgendes abonniert: