24 Folgen

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

Deep Papers Arize AI

    • Wissenschaft
    • 5,0 • 1 Bewertung

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

    Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

    Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

    We break down the paper--Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment.Ensuring alignment (aka: making models behave in accordance with human intentions) has become a critical task before deploying LLMs in real-world applications. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. To address this issue, this paper presents a comprehensive ...

    • 48 Min.
    Breaking Down EvalGen: Who Validates the Validators?

    Breaking Down EvalGen: Who Validates the Validators?

    Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM-generated evaluators often inherit the problems of the LLMs they evaluate, requiring further human validation.This week’s paper explores EvalGen, a mixed-initative approach to aligning LLM-generated evaluation functions with human preferences. EvalGen assists users in developing both criteria acc...

    • 44 Min.
    Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

    Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

    This week we explore ReAct, an approach that enhances the reasoning and decision-making capabilities of LLMs by combining step-by-step reasoning with the ability to take actions and gather information from external sources in a unified framework.To learn more about ML observability, join the Arize AI Slack community or get the latest on our LinkedIn and Twitter.

    • 45 Min.
    Demystifying Chronos: Learning the Language of Time Series

    Demystifying Chronos: Learning the Language of Time Series

    This week, we’ve covering Amazon’s time series model: Chronos. Developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization. Chronos however, is built on a language model architecture and trained with billions of tokenized time series observations, enabling it to provide accurate zero-shot forecasts matching or exceeding purpose-built models.We dive into time series forecasting, some recent research our te...

    • 44 Min.
    Anthropic Claude 3

    Anthropic Claude 3

    This week we dive into the latest buzz in the AI world – the arrival of Claude 3. Claude 3 is the newest family of models in the LLM space, and Opus Claude 3 ( Anthropic's "most intelligent" Claude model ) challenges the likes of GPT-4.The Claude 3 family of models, according to Anthropic "sets new industry benchmarks," and includes "three state-of-the-art models in ascending order of capability: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus." Each of these models "allows users to select...

    • 43 Min.
    Reinforcement Learning in the Era of LLMs

    Reinforcement Learning in the Era of LLMs

    We’re exploring Reinforcement Learning in the Era of LLMs this week with Claire Longo, Arize’s Head of Customer Success. Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest (3H) responses can largely be attributed to the technique of Reinforcement Learning from Human Feedback (RLHF). This week’s paper, aims to link th...

    • 44 Min.

Kundenrezensionen

5,0 von 5
1 Bewertung

1 Bewertung

Top‑Podcasts in Wissenschaft

Aha! Zehn Minuten Alltags-Wissen
WELT
Das Wissen | SWR
SWR
KI verstehen
Deutschlandfunk
ZEIT WISSEN. Woher weißt Du das?
ZEIT ONLINE
radioWissen
Bayerischer Rundfunk
Quarks Daily
Quarks

Das gefällt dir vielleicht auch

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al
Alessio + swyx
No Priors: Artificial Intelligence | Technology | Startups
Conviction | Pod People
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Sam Charrington
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
Erik Torenberg, Nathan Labenz
Super Data Science: ML & AI Podcast with Jon Krohn
Jon Krohn
a16z Podcast
Andreessen Horowitz