23 episodes

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

Deep Papers Arize AI

    • Science

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

    Breaking Down EvalGen: Who Validates the Validators?

    Breaking Down EvalGen: Who Validates the Validators?

    Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation.This week’s paper explores EvalGen, a mixed-initative approach to aligning LLM-generated evaluation functions with human preferences. EvalGen assists users in developing both criteria acc...

    • 44 min
    Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

    Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

    This week we explore ReAct, an approach that enhances the reasoning and decision-making capabilities of LLMs by combining step-by-step reasoning with the ability to take actions and gather information from external sources in a unified framework.To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

    • 45 min
    Demystifying Chronos: Learning the Language of Time Series

    Demystifying Chronos: Learning the Language of Time Series

    This week, we’ve covering Amazon’s time series model: Chronos. Developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization. Chronos however, is built on a language model architecture and trained with billions of tokenized time series observations, enabling it to provide accurate zero-shot forecasts matching or exceeding purpose-built models.We dive into time series forecasting, some recent research our te...

    • 44 min
    Anthropic Claude 3

    Anthropic Claude 3

    This week we dive into the latest buzz in the AI world – the arrival of Claude 3. Claude 3 is the newest family of models in the LLM space, and Opus Claude 3 ( Anthropic's "most intelligent" Claude model ) challenges the likes of GPT-4.The Claude 3 family of models, according to Anthropic "sets new industry benchmarks," and includes "three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus." Each of these models "allows users to select...

    • 43 min
    Reinforcement Learning in the Era of LLMs

    Reinforcement Learning in the Era of LLMs

    We’re exploring Reinforcement Learning in the Era of LLMs this week with Claire Longo, Arize’s Head of Customer Success. Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest (3H) responses can largely be attributed to the technique of Reinforcement Learning from Human Feedback (RLHF). This week’s paper, aims to link th...

    • 44 min
    Sora: OpenAI’s Text-to-Video Generation Model

    Sora: OpenAI’s Text-to-Video Generation Model

    This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Engineer at Arize, is joined by community member and AI Engineer Vibhu Sapra to review OpenAI’s technical report on their Text-To-Video Generation Model: Sora.According to OpenAI, “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.” At th...

    • 45 min

Top Podcasts In Science

24 spørgsmål til professoren
Weekendavisen
Hva så?! forklarer alt
Christian Fuhlendorff
Videnskab fra vilde hjerner
Niels Bohr Institutet · Københavns Universitet
Brainstorm
Videnskab.dk
KRANIEBRUD
Radio4
Periodisk
RAKKERPAK

You Might Also Like

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Erik Torenberg, Nathan Labenz
Super Data Science: ML & AI Podcast with Jon Krohn
Jon Krohn
a16z Podcast
Andreessen Horowitz