Generative AI in the Real World

O'Reilly

In 2023, ChatGPT put AI on everyone’s agenda. Now, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

  1. 5小时前

    Phillip Carter on Where Generative AI Meets Observability

    Phillip Carter, formerly of Honeycomb, and Ben Lorica talk about observability and AI—what observability means, how generative AI causes problems for observability, and how generative AI can be used as a tool to help SREs analyze telemetry data. There’s tremendous potential because AI is great at finding patterns in massive datasets, but it’s still a work in progress. About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise. Timestamps 0:00: Introduction to Phillip Carter, a product manager at Salesforce. We'll focus on observability, which he worked on at Honeycomb.0:35: Let’s have the elevator definition of observability first, then we’ll go into observability in the age of AI.0:44: If you google “What is observability?” you’re going to get 10 million answers. It’s an industry buzzword. There are a lot of tools in the same space.1:12: At a high level, I like to think of it in two pieces. The first is that this is an acknowledgement that you have a system of some kind, and you do not have the capability to pull that system onto your local machine and inspect what is happening at a moment in time. When something gets large and complex enough, it’s impossible to keep in your head. The product I worked on at Honeycomb is actually a very sophisticated querying engine that's tied to a lot of AWS services in a way that makes it impossible to debug on my laptop.2:40: So what can I do? I can have data, called telemetry, that I can aggregate and analyze. I can aggregate trillions of data points to say that this user was going through the system in this way under these conditions. I can pull from these different dimensions and hold something constant.3:20: Let’s look at how the values differ when I hold one thing constant. Let’s hold another thing constant. That gives me an overall picture of what is happening in the real world.3:37: That is the crux of observability. I'm debugging, but not by stepping through something on my local machine. I click a button, and I can see that it manifests in a database call. But there are potentially millions of users, and things go wrong somewhere else in the system. And I need to try to understand what paths lead to that, and what commonalities exist in those paths.4:14: This is my very high-level definition. It’s many operations, many tasks, almost a workflow as well, and a set of tools.4:32: Based on your description, observability people are sort of like security people. WIth AI, there are two aspects: observability problems introduced by AI, and the use of AI to help with observability. Let’s tackle each separately. Before AI, we had machine learning. Observability people had a handle on traditional machine learning. What specific challenges did generative AI introduce?5:36: In some respects, the problems have been constrained to big tech. LLMs are the first time that we got truly world-class machine learning support available behind an API call. Prior to that, it was in the hands of Google and Facebook and Netflix. They helped develop a lot of this stuff. They’ve been solving problems related to what everyone else has to solve now. They’re building recommendation systems that take in many signals. For a long time, Google has had natural language answers for search queries, prior to the AI overview stuff. That stuff would be sourced from web documents. They had a box for follow-up questions. They developed this before Gemini. It’s kind of the same tech. They had to apply observability to make this stuff available at large.

    38 分钟
  2. 1天前

    Raiza Martin on Building AI Applications for Audio

    Audio is being added to AI everywhere: both in multimodal models that can understand and generate audio and in applications that use audio for input. Now that we can work with spoken language, what does that mean for the applications that we can develop? How do we think about audio interfaces—how will people use them, and what will they want to do? Raiza Martin, who worked on Google’s groundbreaking NotebookLM, joins Ben Lorica to discuss how she thinks about audio and what you can build with it. Timestamps 0:00: Introduction to Raiza Martin, who cofounded Huxe and formerly led Google’s NotebookLM team. What made you think this was the time to trade the comforts of big tech for a garage startup?1:01: It was a personal decision for all of us. It was a pleasure to take NotebookLM from an idea to something that resonated so widely. We realized that AI was really blowing up. We didn’t know what it would be like at a startup, but we wanted to try. Seven months down the road, we’re having a great time.1:54: For the 1% who aren’t familiar with NotebookLM, give a short description.2:06: It’s basically contextualized intelligence, where you give NotebookLM the sources you care about and NotebookLM stays grounded to those sources. One of our most common use cases was that students would create notebooks and upload their class materials, and it became an expert that you could talk with.2:43: Here’s a use case for homeowners: put all your user manuals in there.3:14: We have had a lot of people tell us that they use NotebookLM for Airbnbs. They put all the manuals and instructions in there, and users can talk to it.3:41: Why do people need a personal daily podcast?3:57: There are a lot of different ways that I think about building new products. On one hand, there are acute pain points. But Huxe comes from a different angle: What if we could try to build very delightful things? The inputs are a little different. We tried to imagine what the average person’s daily life is like. You wake up, you check your phone, you travel to work; we thought about opportunities to make something more delightful. I think a lot about TikTok. When do I use it? When I’m standing in line. We landed on transit time or commute time. We wanted to do something novel and interesting with that space in time. So one of the first things was creating really personalized audio content. That was the provocation: What do people want to listen to? Even in this short time, we’ve learned a lot about the amount of opportunity.6:04: Huxe is mobile first, audio first, right? Why audio?6:45: Coming from our learnings from NotebookLM, you learn fundamentally different things when you change the modality of something. When I go on walks with ChatGPT, I just talk about my day. I noticed that was a very different interaction from when I type things out to ChatGPT. The flip side is less about interaction and more about consumption. Something about the audio format made the types of sources different as well. The sources we uploaded to NotebookLM were different as a result of wanting audio output. By focusing on audio, I think we’ll learn different use cases than the chat use cases. Voice is still largely untapped.8:24: Even in text, people started exploring other form factors: long articles, bullet points. What kinds of things are available for voice?8:49: I think of two formats: one passive and one interactive. With passive formats, there are a lot of different things you can create for the user. The things you end up playing with are (1) what is the content about and (2) how flexible is the content? Is it short, long, malleable to user feedback? With interactive content, maybe I’m listening to audio, but I want to interact with it. Maybe I want to join in. Maybe I want my friends to join in. Both of those contexts are new. I think this is what’s going to emerge in the next few years.

    36 分钟
  3. 2天前

    Stefania Druga on Designing for the Next Generation

    How do you teach kids to use and build with AI? That’s what Stefania Druga works on. It’s important to be sensitive to their creativity, sense of fun, and desire to learn. When designing for kids, it’s important to design with them, not just for them. That’s a lesson that has important implications for adults, too. Join Stefania Druga and Ben Lorica to hear about AI for kids and what that has to say about AI for adults. Timestamps 0:27: You’ve built AI education tools for young people, and after that, worked on multimodal AI at DeepMind. What have kids taught you about AI design?0:48: It’s been quite a journey. I started working on AI education in 2015. I was on the Scratch team in the MIT Media Lab. I worked on Cognimates so kids could train custom models with images and texts. Kids would do things I would have never thought of, like build a model to identify weird hairlines or to recognize and give you backhanded compliments. They did things that are weird and quirky and fun and not necessarily utilitarian.2:05: For young people, driving a car is fun. Having a self-driving car is not fun. They have lots of insights that could inspire adults.2:25: You’ve noticed that a lot of the users of AI are Gen Z, but most tools aren’t designed with them in mind. What is the biggest disconnect?2:47: We don’t have a knob for agency to control how much we delegate to the tools. Most of Gen Z use off-the-shelf AI products like ChatGPT, Gemini, and Claude. These tools have a baked-in assumption that they need to do the work rather than asking questions to help you do the work. I like a much more Socratic approach. A big part of learning is asking and being asked good questions. A huge role for generative AI is to use it as a tool that can teach you things, ask you questions; [it’s] something to brainstorm with, not a tool that you delegate work to.4:25: There’s this big elephant in the room where we don’t have conversations or best practices for how to use AI.4:42: You mentioned the Socratic approach. How do you implement the Socratic approach in the world of text interfaces?4:57: In Cognimates, I created a copilot for kids coding. This copilot doesn’t do the coding. It asks them questions. If a kid asks, “How do I make the dude move?” the copilot will ask questions rather than saying, “Use this block and then that block.”6:40: When I designed this, we started with a person behind the scenes, like the Wizard of Oz. Then we built the tool and realized that kids really want a system that can help them clarify their thinking. How do you break down a complex event into steps that are good computational units?8:06: The third discovery was affirmations—whenever they did something that was cool, the copilot says something like “That’s awesome.” The kids would spend double the time coding because they had an infinitely patient copilot that would ask them questions, help them debug, and give them affirmations that would reinforce their creative identity.8:46: With those design directions, I built the tool. I’m presenting a paper at the ACM IDC (Interaction Design for Children) conference that presents this work in more detail. I hope this example gets replicated.9:26: Because these interactions and interfaces are evolving very fast, it’s important to understand what young people want, how they work and how they think, and design with them, not just for them.9:44: The typical developer now, when they interact with these things, overspecifies the prompt. They describe so precisely. But what you’re describing is interesting because you’re learning, you’re building incrementally. We’ve gotten away from that as grown-ups.10:28: It’s all about tinkerability and having the right level of abstraction. What are the right Lego blocks? A prompt is not tinkerable enough. It doesn’t allow for enough expressivity. It needs to be composable and allow the user to be in control.

    33 分钟
  4. 3天前

    Douwe Kiela on Why RAG Isn’t Dead

    Join our host Ben Lorica and Douwe Kiela, cofounder of Contextual AI and author of the first paper on RAG, to find out why RAG remains as relevant as ever. Regardless of what you call it, retrieval is at the heart of generative AI. Find out why—and how to build effective RAG-based systems. Points of Interest 0:25: Today’s topic is RAG. With frontier models advertising massive context windows, many developers wonder if RAG is becoming obsolete. What’s your take?1:03: We now have a blog post: isragdeadyet.com. If something keeps getting pronounced dead, it will never die. These long context models solve a similar problem to RAG: how to get the relevant information into the language model. But it’s wasteful to use the full context all the time. If you want to know who the headmaster is in Harry Potter, do you have to read all the books? 2:04: What will probably work best is RAG plus long context models. The real solution is to use RAG, find as much relevant information as you can, and put it into the language model. The dichotomy between RAG and long context isn’t a real thing.2:48: One of the main issues may be that RAG systems are annoying to build, and long context systems are easy. But if you can make RAG easy too, it’s much more efficient.3:07: The reasoning models make it even worse in terms of cost and latency. And if you’re talking about something with a lot of usage, high repetition, it doesn’t make sense. 3:39: You’ve been talking about RAG 2.0, which seems natural: emphasize systems over models. I’ve long warned people that RAG is a complicated system to build because there are so many knobs to turn. Few developers have the skills to systematically turn those knobs. Can you unpack what RAG 2.0 means for teams building AI applications?4:22: The language model is only a small part of a much bigger system. If the system doesn’t work, you can have an amazing language model and it’s not going to get the right answer. If you start from that observation, you can think of RAG as a system where all the model components can be optimized together.5:40: What you’re describing is similar to what other parts of AI are trying to do: an end-to-end system. How early in the pipeline does your vision start?6:07: We have two core concepts. One is a data store—that’s really extraction, where we do layout segmentation. We collate all of that information and chunk it, store it in the data store, and then the agents sit on top of the data store. The agents do a mixture of retrievers, followed by a reranker and a grounded language model.7:02: What about embeddings? Are they automatically chosen? If you go to Hugging Face, there are, like, 10,000 embeddings.7:15: We save you a lot of that effort. Opinionated orchestration is a way to think about it.7:31: Two years ago, when RAG started becoming mainstream, a lot of developers focused on chunking. We had rules of thumb and shared stories. This eliminates a lot of that trial and error.8:06: We basically have two APIs: one for ingestion and one for querying. Querying is contextualized on your data, which we’ve ingested. 8:25: One thing that’s underestimated is document parsing. A lot of people overfocus on embedding and chunking. Try to find a PDF extraction library for Python. There are so many of them, and you can’t tell which ones are good. They’re all terrible.8:54: We have our stand-alone component APIs. Our document parser is available separately. Some areas, like finance, have extremely complex layouts. Nothing off the shelf works, so we had to roll our own solution. Since we know this will be used for RAG, we process the document to make it maximally useful. We don’t just extract raw information. We also extract the document hierarchy. That is extremely relevant as metadata when you’re doing retrieval.10:11: There are open source libraries—what drove you to build your own, which I assume also encompasses OCR?

    35 分钟
  5. 6天前

    Danielle Belgrave on Generative AI in Pharma and Medicine

    Join Danielle Belgrave and Ben Lorica for a discussion of AI in healthcare. Danielle is VP of AI and machine learning at GSK (formerly GlaxoSmithKline). She and Ben discuss using AI and machine learning to get better diagnoses that reflect the differences between patients. Listen in to learn about the challenges of working with health data—a field where there’s both too much data and too little, and where hallucinations have serious consequences. And if you’re excited about healthcare, you’ll also find out how AI developers can get into the field. Points of Interest 0:00: Introduction to Danielle Belgrave, VP of AI and machine learning at GSK. Danielle is our first guest representing Big Pharma. It will be interesting to see how people in pharma are using AI technologies.0:49: My interest in machine learning for healthcare began 15 years ago. My PhD was on understanding patient heterogeneity in asthma-related disease. This was before electronic healthcare records. By leveraging different kinds of data, genomics data and biomarkers from children, and seeing how they developed asthma and allergic diseases, I developed causal modeling frameworks and graphical models to see if we could identify who would respond to what treatments. This was quite novel at the time. We identified five different types of asthma. If we can understand heterogeneity in asthma, a bigger challenge is understanding heterogeneity in mental health. The idea was trying to understand heterogeneity over time in patients with anxiety. 4:12: When I went to DeepMind, I worked on the healthcare portfolio. I became very curious about how to understand things like MIMIC, which had electronic healthcare records, and image data. The idea was to leverage tools like active learning to minimize the amount of data you take from patients. We also published work on improving the diversity of datasets. 5:19: When I came to GSK, it was an exciting opportunity to do both tech and health. Health is one of the most challenging landscapes we can work on. Human biology is very complicated. There is so much random variation. To understand biology, genomics, disease progression, and have an impact on how drugs are given to patients is amazing.6:15: My role is leading AI/ML for clinical development. How can we understand heterogeneity in patients to optimize clinical trial recruitment and make sure the right patients have the right treatment?6:56: Where does AI create the most value across GSK today? That can be both traditional AI and generative AI.7:23: I use everything interchangeably, though there are distinctions. The real important thing is focusing on the problem we are trying to solve, and focusing on the data. How do we generate data that’s meaningful? How do we think about deployment?8:07: And all the Q&A and red teaming.8:20: It’s hard to put my finger on what’s the most impactful use case. When I think of the problems I care about, I think about oncology, pulmonary disease, hepatitis—these are all very impactful problems, and they’re problems that we actively work on. If I were to highlight one thing, it’s the interplay between when we are looking at whole genome sequencing data and looking at molecular data and trying to translate that into computational pathology. By looking at those data types and understanding heterogeneity at that level, we get a deeper biological representation of different subgroups and understand mechanisms of action for response to drugs.

    32 分钟
  6. 9月11日

    The Startup Opportunity with Gabriela de Queiroz

    Ben Lorica and Gabriela de Queiroz, director of AI at Microsoft, talk about startups: specifically, AI startups. How do you get noticed? How do you generate real traction? What are startups doing with agents and with protocols like MCP and A2A? And which security issues should startups watch for, especially if they’re using open weights models? Points of Interest 0:30: You work with a lot of startups and founders. How have the opportunities for startups in generative AI changed? Are the opportunities expanding?0:56: Absolutely. The entry barrier for founders and developers is much lower. Startups are exploding—not just the amount but also the interesting things they are doing.1:19: You catch startups when they’re still exploring, trying to build their MVP. So startups need to be more persistent in trying to find differentiation. If anyone can build an MVP, how do you distinguish yourself?1:46: At Microsoft, I drive several strategic initiatives to help growth-stage startups. I also guide them in solving real pain points using our stacks. I’ve designed programs to spotlight founders.3:08: I do a lot of engagement where I help startups go from the prototype or MVP to impact. An MVP is not enough. I need to see a real use case and I need to see some traction. When they have real customers, we see whether their MVP is working.3:49: Are you starting to see patterns for gaining traction? Are they focusing on a specific domain? Or do they have a good dataset?4:02: If they are solving a real use case in a specific domain or niche, this is where we see them succeed. They are solving a real pain, not building something generic. 4:27: We’re both in San Francisco, and solving a specific pain or finding a specific domain means something different. Techie founders can build something that’s used by their friends, but there’s no revenue.5:03: This happens everywhere, but there’s a bigger culture around that here. I tell founders, “You need to show me traction.” We have several companies that started as open source, then they built a paid layer on top of the open source project.5:34: You work with the folks at Azure, so presumably you know what actual enterprises are doing with generative AI. Can you give us an idea of what enterprises are starting to deploy? What is the level of comfort of enterprise with these technologies?6:06: Enterprises are a little bit behind startups. Startups are building agents. Enterprises are not there yet. There’s a lot of heavy lifting on the data infrastructure that they need to have in place. And their use cases are complex. It’s similar to Big Data, where the enterprise took longer to optimize their stack.7:19: Can you describe why enterprises need to modernize their data stack? 7:42: Reality isn’t magic. There’s a lot of complexity in data and how data is handled. There is a lot of data security and privacy that startups aren’t aware of but are important to enterprises. Even the kinds of data—the data isn’t well organized, there are different teams using different data sources.8:28: Is RAG now a well-established pattern in the enterprise?8:44: It is. RAG is part of everybody’s workflow.8:51: The common use cases that seem to be further along are customer support, coding—what other buckets can you add?9:07: Customer support and tickets are among the main pains and use cases. And they are very expensive. So it’s an easy win for enterprises when they move to GenAI or AI agents. 9:48: Are you saying that the tool builders are ahead of the tool buyers?10:05: You’re right. I talk a lot with startups building agents. We discuss where the industry is heading and what the challenges are. If you think we are close to AGI, try to build an agent and you’ll see how far we are from AGI. When you want to scale, there’s another level of difficulty. When I ask for real examples and customers, the majority are not there yet.

    31 分钟
  7. 9月10日

    Securing AI with Steve Wilson

    Join Steve Wilson and Ben Lorica for a discussion of AI security. We all know that AI brings new vulnerabilities into the software landscape. Steve and Ben talk about what makes AI different, what the big risks are, and how you can use AI safely. Find out how agents introduce their own vulnerabilities, and learn about resources such as OWASP that can help you understand them. Is there a light at the end of the tunnel? Can AI help us build secure systems even as it introduces its own vulnerabilities? Listen to find out. Points of Interest 0:49: Now that AI tools are more accessible, what makes LLM and agentic AI security fundamentally different from traditional software security?1:20: There’s two parts. When you start to build software using AI technologies, there is a new set of things to worry about. When your software is getting near to human-level smartness, the software is subject to the same issues as humans: It can be tricked and deceived. The other part is what the bad guys are doing when they have access to frontier-class AIs.2:16: In your work at OWASP, you listed the top 10 vulnerabilities for LLMs. What are the top one or two risks that are causing the most serious problems?2:42: I’ll give you the top three. The first one is prompt injection. By feeding data to the LLM, you can trick the LLM into doing something the developers didn’t intend.3:03: Next is the AI supply chain. The AI supply chain is much more complicated than the traditional supply chain. It’s not just open source libraries from GitHub. You’re also dealing with gigabytes of model weights and terabytes of training data, and you don’t know where they’re coming from. And sites like Hugging Face have malicious models uploaded to them. 3:49: The last one is sensitive information disclosure. Bots are not good at knowing what they should not talk about. When you put them into production and give them access to important information, you run the risk that they will disclose information to the wrong people.4:25: For supply chain security, when you install something in Python, you’re also installing a lot of dependencies. And everything is democratized, so people can do a little on their own. What can people do about supply chain security?5:18: There are two flavors: I’m building software that includes the use of a large language model. If I want to get Llama from Meta as a component, that includes gigabytes of floating point numbers. You need to put some skepticism around what you’re getting.6:01: Another hot topic is vibe coding. People who have never programmed or haven’t programmed in 20 years are coming back. There are problems like hallucinations. With generated code, they will make up the existence of a software package. They’ll write code that imports that. And attackers will create malicious versions of those packages and put them on GitHub so that people will install them.7:28: Our ability to generate code has gone up 10x to 100x. But our ability to security check and quality check hasn’t. For people starting, get some basic awareness of the concepts around application security and what it means to manage the supply chain.7:57: We need a different generation of software composition environment tools that are designed to work with vibe coding and integrate into environments like Cursor.8:44: We have good basic guidelines for users: Does a library have a lot of users? A lot of downloads? A lot of stars on GitHub? There are basic indications. But professional developers augment that with tooling. We need to bring those tools into vibe coding.9:20: What’s your sense of the maturity of guardrails? 9:50: The good news is that the ecosystem around guardrails started really soon after ChatGPT came out. Things at the top of the OWASP Top 10, prompt injection and information disclosure, indicated that you needed to police the trust boundaries around your LLM.

    43 分钟
  8. 9月9日

    Shreya Shankar on AI for Corporate Data Processing

    Businesses have a lot of data—but most of that data is unstructured textual data: reports, catalogs, emails, notes, and much more. Without structure, business analysts can’t make sense of the data; there is value in the data, but it can’t be put to use. AI can be a tool for finding and extracting the structure that’s hidden in textual data. In this episode, Ben and Shreya talk about a new generation of tooling that brings AI to enterprise data processing. Points of Interest 0:18: One of the themes of your work is a specific kind of data processing. Before we go into tools, what is the problem you’re trying to address? 0:52: For decades, organizations have been struggling to make sense of unstructured data. There’s a massive amount of text that people make sense of. We didn’t have the technology to do that until LLMs came around.1:38: I’ve spent the last couple of years building a processing framework for people to manipulate unstructured data with LLMs. How can we extract semantic data?1:55: The prior art would be using NLP libraries and doing bespoke tasks?2:12: We’ve seen two flavors of approach: bespoke code and crowdsourcing. People still do both. But now LLMs can simplify the process.2:45: The typical task is “I have a large collection of unstructured text and I want to extract as much structure as possible.” An extreme would be a knowledge graph; in the middle would be the things that NLP people do. Your data pipelines are designed to do this using LLMs.3:22: Broadly, the tasks are thematic extraction: I want to extract themes from documents. You can program LLMs to find themes. You want some user steering and guidance for what a theme is, then use the LLM for grouping.4:04: One of the tools you built is DocETL. What’s the typical workflow?4:19: The idea is to write MapReduce pipelines, where map extracts insights, and group does aggregation. Doing this with LLMs means that the map is described by an LLM prompt. Maybe the prompt is “Extract all the pain points and any associated quotes.” Then you can imagine flattening this across all the documents, grouping them by the pain points, and another LLM can do the summary to produce a report. DocETL exposes these data processing primitives and orchestrates them to scale up and across task complexity.5:52: What if you want to extract 50 things from a map operation? You shouldn’t ask an LLM to do 50 things at once. You should group them and decompose them into subtasks. DocETL does some optimizations to do this.6:18: The user could be a noncoder and might not be working on the entire pipeline.7:00: People do that a lot; they might just write a single map operation.7:16: But the end user you have in mind doesn’t even know the words “map” and “filter.”7:22: That's the goal. Right now, people still need to learn data processing primitives. 7:49: These LLMs are probabilistic; do you also set the expectations with the user that you might get different results every time you run the pipeline?8:16: There are two different types of tasks. One is where you want the LLM to be accurate and there is an exact ground truth—for example, entity extraction. The other type is where you want to offload a creative process to the LLM—for example, “Tell me what’s interesting in this data.” They’ll run it until there are no new insights to be gleaned.  When is nondeterminism a problem? How do you engineer systems around it?9:56: You might also have a data engineering team that uses this and turns PDF files into something like a data warehouse that people can query. In this setting, are you familiar with lakehouses architecture and the notion of the medallion architecture?10:49: People actually use DocETL to create a table out of PDFs and put it in a relational database. That’s the best way to think about how to move forward in the enterprise setting. I’ve also seen people using these tables in RAG or downstream LLM applications.

    30 分钟

关于

In 2023, ChatGPT put AI on everyone’s agenda. Now, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.