In this episode of BHIS Presents: AI Security Ops, the team digs into a problem every AI-enabled SOC eventually hits: The demo looked great — until the inference bill showed up! AI in SecOps gets expensive because security data is huge, repetitive, and constant. Logs, alerts, runbooks, tool definitions, and historical context all get pushed into models again and again. That burns money, slows systems down, and often makes answers worse. The fix is not exotic. It is basic engineering: use smaller models where they work, cache what repeats, stop dumping raw logs, and save expensive reasoning for the cases that actually need it. We dig into:• Why AI SecOps workloads get expensive fast • When smaller models are good enough • Where frontier models still make sense • How grouping alerts into cases reduces waste • Using strong models to judge cheaper models • Why prompt caching can be a major cost lever • How small prompt changes can break caching • Batch APIs for non-urgent security work • Why raw logs make prompts noisy and expensive • RAG, deduplication, and cached verdicts • Budget caps, circuit breakers, and stolen-key risk • When deterministic code beats another model call AI cost control is not just a budgeting exercise. It is a security architecture issue. If every alert goes to the biggest model with no caching, no limits, and no measurement, the system is not just expensive — it is uncontrolled. Good AI SecOps design means scoping the model, reducing unnecessary context, measuring spend, and putting guardrails around how AI is allowed to operate. ⸻ 📚 Key Concepts & Topics AI Cost Architecture • SecOps cost comes from large inputs, repeated context, and high alert volume • Model selection should match task difficulty • Routine triage can often use smaller models • Hard correlation and judgment may justify stronger models Model Evaluation • Test smaller models against real historical cases • Use stronger models as judges when appropriate • Compare quality before moving workloads • Do not assume the biggest model is always necessary Prompt & Context Design • Cache static instructions, tool definitions, and repeated context • Keep cacheable sections stable • Avoid changing static prompts with unnecessary variables • Better prompt structure can reduce both cost and noise Data Reduction & Retrieval • Do not send entire logs when only a few fields matter • Preprocess alerts before model calls • Use RAG instead of stuffing whole libraries into prompts • Cache repeated verdicts for repeated alert patterns Operational Guardrails • Track AI spend by workload • Set hard caps and circuit breakers • Use limits to reduce stolen-key blast radius • Treat AI pipelines like production security systems Deterministic Workflows • Not every task needs inference • Repeatable logic should become code • AI can help write that code • Once the workflow is deterministic, stop paying the model to repeat it #AISecurity #LLMSecurity #CyberSecurity #ArtificialIntelligence #SecOps #SOC #InfoSec #BHIS #AppSec #PromptEngineering #securityarchitecture ----------------------------------------------------------------------------------------------About Brian Fehrman - https://www.blackhillsinfosec.com/team/brian-fehrman/About Bronwen Aker - https://www.blackhillsinfosec.com/team/bronwen-aker/About Derek Banks - https://www.blackhillsinfosec.com/team/derek-banks/About Ethan Robish - https://www.blackhillsinfosec.com/team/ethan-robish/About Ben Bowman - https://www.blackhillsinfosec.com/team/ben-bowman/ (00:00) - Intro: When the AI Triage Assistant Gets Expensive (01:27) - The Setup: Saving Money Without Killing the Workflow (02:22) - Right-Size the Model: Cheap for Routine, Big for Hard (05:36) - Testing Smaller Models, Judges & Real SOC Workflows (13:46) - Prompt Caching: The Big Lever Hiding in Plain Sight (18:37) - Batch APIs: Half the Urgency, Lower the Cost (20:19) - Stop Dumping Logs: Less Noise, Better Answers (24:20) - RAG, Dedupe, Budgets & the Deterministic Code Bonus Click here to watch this episode on YouTube. Creators & Guests Ethan Robish - Guest Derek Banks - Host Brian Fehrman - Host Brought to you by: Black Hills Information Security https://www.blackhillsinfosec.com Antisyphon Training https://www.antisyphontraining.com/ Active Countermeasures https://www.activecountermeasures.com Wild West Hackin Fest https://wildwesthackinfest.com 🔗 Register for FREE Infosec Webcasts, Anti-casts & Summitshttps://poweredbybhis.com Click here to view the episode transcript.