Phillip Carter, formerly of Honeycomb, and Ben Lorica talk about observability and AI—what observability means, how generative AI causes problems for observability, and how generative AI can be used as a tool to help SREs analyze telemetry data. There’s tremendous potential because AI is great at finding patterns in massive datasets, but it’s still a work in progress.
About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.
Timestamps
- 0:00: Introduction to Phillip Carter, a product manager at Salesforce. We'll focus on observability, which he worked on at Honeycomb.
- 0:35: Let’s have the elevator definition of observability first, then we’ll go into observability in the age of AI.
- 0:44: If you google “What is observability?” you’re going to get 10 million answers. It’s an industry buzzword. There are a lot of tools in the same space.
- 1:12: At a high level, I like to think of it in two pieces. The first is that this is an acknowledgement that you have a system of some kind, and you do not have the capability to pull that system onto your local machine and inspect what is happening at a moment in time. When something gets large and complex enough, it’s impossible to keep in your head. The product I worked on at Honeycomb is actually a very sophisticated querying engine that's tied to a lot of AWS services in a way that makes it impossible to debug on my laptop.
- 2:40: So what can I do? I can have data, called telemetry, that I can aggregate and analyze. I can aggregate trillions of data points to say that this user was going through the system in this way under these conditions. I can pull from these different dimensions and hold something constant.
- 3:20: Let’s look at how the values differ when I hold one thing constant. Let’s hold another thing constant. That gives me an overall picture of what is happening in the real world.
- 3:37: That is the crux of observability. I'm debugging, but not by stepping through something on my local machine. I click a button, and I can see that it manifests in a database call. But there are potentially millions of users, and things go wrong somewhere else in the system. And I need to try to understand what paths lead to that, and what commonalities exist in those paths.
- 4:14: This is my very high-level definition. It’s many operations, many tasks, almost a workflow as well, and a set of tools.
- 4:32: Based on your description, observability people are sort of like security people. WIth AI, there are two aspects: observability problems introduced by AI, and the use of AI to help with observability. Let’s tackle each separately. Before AI, we had machine learning. Observability people had a handle on traditional machine learning. What specific challenges did generative AI introduce?
- 5:36: In some respects, the problems have been constrained to big tech. LLMs are the first time that we got truly world-class machine learning support available behind an API call. Prior to that, it was in the hands of Google and Facebook and Netflix. They helped develop a lot of this stuff. They’ve been solving problems related to what everyone else has to solve now. They’re building recommendation systems that take in many signals. For a long time, Google has had natural language answers for search queries, prior to the AI overview stuff. That stuff would be sourced from web documents. They had a box for follow-up questions. They developed this before Gemini. It’s kind of the same tech. They had to apply observability to make this stuff available at large.
信息
- 节目
- 频率两周一更
- 发布时间2025年9月18日 UTC 14:00
- 长度38 分钟
- 季1
- 单集27
- 分级儿童适宜