23 episodes

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

Deep Papers Arize AI

    • Science
    • 5.0 • 9 Ratings

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

    Breaking Down EvalGen: Who Validates the Validators?

    Breaking Down EvalGen: Who Validates the Validators?

    Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation.This week’s paper explores EvalGen, a mixed-initative approach to aligning LLM-generated evaluation functions with human preferences. EvalGen assists users in developing both criteria acc...

    • 44 min
    Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

    Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

    This week we explore ReAct, an approach that enhances the reasoning and decision-making capabilities of LLMs by combining step-by-step reasoning with the ability to take actions and gather information from external sources in a unified framework.To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

    • 45 min
    Demystifying Chronos: Learning the Language of Time Series

    Demystifying Chronos: Learning the Language of Time Series

    This week, we’ve covering Amazon’s time series model: Chronos. Developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization. Chronos however, is built on a language model architecture and trained with billions of tokenized time series observations, enabling it to provide accurate zero-shot forecasts matching or exceeding purpose-built models.We dive into time series forecasting, some recent research our te...

    • 44 min
    Anthropic Claude 3

    Anthropic Claude 3

    This week we dive into the latest buzz in the AI world – the arrival of Claude 3. Claude 3 is the newest family of models in the LLM space, and Opus Claude 3 ( Anthropic's "most intelligent" Claude model ) challenges the likes of GPT-4.The Claude 3 family of models, according to Anthropic "sets new industry benchmarks," and includes "three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus." Each of these models "allows users to select...

    • 43 min
    Reinforcement Learning in the Era of LLMs

    Reinforcement Learning in the Era of LLMs

    We’re exploring Reinforcement Learning in the Era of LLMs this week with Claire Longo, Arize’s Head of Customer Success. Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest (3H) responses can largely be attributed to the technique of Reinforcement Learning from Human Feedback (RLHF). This week’s paper, aims to link th...

    • 44 min
    Sora: OpenAI’s Text-to-Video Generation Model

    Sora: OpenAI’s Text-to-Video Generation Model

    This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Engineer at Arize, is joined by community member and AI Engineer Vibhu Sapra to review OpenAI’s technical report on their Text-To-Video Generation Model: Sora.According to OpenAI, “Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.” At th...

    • 45 min

Customer Reviews

5.0 out of 5
9 Ratings

9 Ratings

Top Podcasts In Science

Hidden Brain
Hidden Brain, Shankar Vedantam
Something You Should Know
Mike Carruthers | OmniCast Media | Cumulus Podcast Network
Radiolab
WNYC Studios
Making Sense with Sam Harris
Sam Harris
Ologies with Alie Ward
Alie Ward
Science Vs
Spotify Studios

You Might Also Like

Practical AI: Machine Learning, Data Science
Changelog Media
Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
This Day in AI Podcast
Michael Sharkey, Chris Sharkey
Last Week in AI
Skynet Today
No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People