Summary: Timothy De Block is joined by Keith Hoodlet, Engineering Director at Trail of Bits, for a fascinating, in-depth look at AI red teaming and the security challenges posed by Large Language Models (LLMs). They discuss how prompt injection is effectively a new form of social engineering against machines, exploiting the training data's inherent human biases and logical flaws. Keith breaks down the mechanics of LLM inference, the rise of middleware for AI security, and cutting-edge attacks using everything from emojis and bad grammar to weaponized image scaling. The episode stresses that the fundamental solutions—logging, monitoring, and robust security design—are simply timeless principles being applied to a terrifyingly fast-moving frontier. Key Takeaways The Prompt Injection Threat Social Engineering the AI: Prompt injection works by exploiting the LLM's vast training data, which includes all of human history in digital format, including movies and fiction. Attackers use techniques that mirror social engineering to trick the model into doing something it's not supposed to, such as a customer service chatbot issuing an unauthorized refund. Business Logic Flaws: Successful prompt injections are often tied to business logic flaws or a lack of proper checks and guardrails, similar to vulnerabilities seen in traditional applications and APIs. Novel Attack Vectors: Attackers are finding creative ways to bypass guardrails: Image Scaling: Trail of Bits discovered how to weaponize image scaling to hide prompt injections within images that appear benign to the user, but which pop out as visible text to the model when downscaled for inference. Invisible Text: Attacks can use white text, zero-width characters (which don't show up when displayed or highlighted), or Unicode character smuggling in emails or prompts to covertly inject instructions. Syntax & Emojis: Research has shown that bad grammar, run-on sentences, or even a simple sequence of emojis can successfully trigger prompt injections or jailbreaks. Defense and Design LLM Security is API Security: Since LLMs rely on APIs for their "tool access" and to perform actions (like sending an email or issuing a refund), security comes down to the same principles used for APIs: proper authorization, access control, and eliminating misconfiguration. The Middleware Layer: Some companies are using middleware that sits between their application and the Frontier LLMs (like GPT or Claude) to handle system prompting, guard-railing, and filtering prompts, effectively acting as a Web Application Firewall (WAF) for LLM API calls. Security Design Patterns: To defend against prompt injection, security design patterns are key: Action-Selector Pattern: Instead of a text field, users click on pre-defined buttons that limit the model to a very specific set of safe actions. Code-Then-Execute Pattern (CaMeL): The first LLM is used to write code (e.g., Pythonic code) based on the natural language prompt, and a second, quarantined LLM executes that safer code. Map-Reduce Pattern: The prompt is broken into smaller chunks, processed, and then passed to another model, making it harder for a prompt injection to be maintained across the process. Timeless Hygiene: The most critical defenses are logging, monitoring, and alerting. You must log prompts and outputs and monitor for abnormal behavior, such as a user suddenly querying a database thousands of times a minute or asking a chatbot to write Python code. Resources & Links Mentioned Trail of Bits Research: Blog: blog.trailofbits.com Company Site: trailofbits.com Weaponizing image scaling against production AI systems Call Me A Jerk: Persuading AI to Comply with Objectionable Requests Securing LLM Agents Paper: Design Patterns for Securing LLM Agents against Prompt Injections. Camel Prompt Injection Defending LLM applications against Unicode character smuggling Logit-Gap Steering: Efficient Short-Suffix Jailbreaks for Aligned Large Language Models LLM Explanation: Three Blue One Brown (3Blue1Brown) has a great short video explaining how Large Language Models work. Lakera Gandalf: Game for learning how to use prompt injection against AI Keith Hoodlet's Personal Sites: Website: securing.dev and thought.dev