10월 31일
16분

🛡️ Breaking Agent Backbones: Evaluating LLM Security in AI Agents

Breaking Agent Backbones: AI agents are being deployed at scale, but their security is challenged by non-deterministic behavior and novel vulnerabilities. This episode introduces the "threat snapshot" framework and the new b3 benchmark, which systematically isolate and evaluate security risks stemming from the backbone LLM. We reveal crucial findings: enhanced reasoning capabilities generally improve security, yet model size does not correlate with lower vulnerability scores.

에피소드 웹페이지

프로그램

Build Wiz AI Show
발행일

2025년 10월 31일 오전 7:35 UTC
길이

16분
등급

전체 연령 사용가

🛡️ Breaking Agent Backbones: Evaluating LLM Security in AI Agents

정보