Automatic

Eric Lamanna

0.0 (0)
Technology

Podcast for Automatic.co and LLM.co, the AI automation specialists.

7h ago

Model Rollbacks: The AI Version of Panic Mode

Deploying a new AI model and watching things go sideways is a rite of passage for most ML teams — but how you respond in that moment defines the maturity of your operation. This episode of Automatic digs into the strategy, psychology, and mechanics behind model rollbacks, drawing on this in-depth look at AI rollback strategy to make the case that reverting to a previous model is less a concession and more a calculated act of operational courage. The episode covers the full landscape of rollbacks — from what triggers them to what makes them succeed — including: Reframing the rollback mindset: Why reverting to a prior model signals disciplined engineering, not defeat, and how language choices shape team culture around incident response.What actually causes teams to pull the cord: Beyond the obvious metric drops, rollbacks are often driven by regulatory pressure, fairness threshold failures, or qualitative signals from users who simply feel something is "off."The infrastructure that makes rollbacks possible: Immutable model versioning, reproducible builds, canarying, staging environments, and shadow deployments — the groundwork that turns a chaotic revert into a clean one.Observability as a prerequisite: Structured feedback collection, per-cohort metric segmentation, and request tracing are what convert incident fog into an actionable map — without them, rollback becomes guesswork.The psychology of incidents: How to separate diagnosis from blame, why predictable status updates keep teams calm, and why the safest teams often look almost boring when the alarms are going off.Pre-mortems, checklists, and practice runs: Proactive habits — including mock rollbacks with timer pressure — that build the muscle memory needed to act decisively rather than reactively.The episode also touches on sneakier failure modes like feature store drift, where the model itself is fine but the data underneath it has quietly shifted, and how circuit breakers and guardrails can turn a potential cliff dive into a controlled slide. The throughline: rollback readiness isn't something you scramble for during an incident — it's something you build long before one arrives. If the intersection of AI reliability and team culture is your thing, don't miss the earlier episode The Private LLM Revolution: Why Enterprises Are Bringing AI In-House, which explores another dimension of how organizations are taking greater ownership of their AI systems. Automatic
1d ago

The Private LLM Revolution: Why Enterprises Are Bringing AI In-House

The question of where sensitive enterprise data goes when it hits a public AI tool is no longer theoretical — it's a boardroom concern. This episode of Automatic examines the accelerating shift toward private, in-house large language models, drawing on LLM.co's coverage of the private LLM revolution to trace what's driving the trend, who's already building, and what the road to production actually looks like. The episode covers the full arc of the private LLM case — from the compliance pressures that sparked the conversation to the deeper competitive advantages that are keeping it going: Privacy as the floor, not the ceiling: Regulatory exposure in healthcare, finance, law, and government makes third-party data handling a liability — but the more compelling argument is what becomes possible when an organization fully controls its own model.Industry-specific use cases gaining traction: Law firms are compressing contract review cycles, finance teams are building auditable AI workflows, hospitals are exploring on-premise clinical documentation, and manufacturers are turning dusty SOPs into real-time conversational interfaces.The institutional knowledge advantage: Private models can be trained on proprietary data — case law databases, internal wikis, historical filings — and paired with permission-aware retrieval so that outputs are both relevant and access-controlled.The real implementation hurdles: Hardware costs, talent scarcity, data preparation complexity, and ongoing governance requirements mean this is an infrastructure investment on the scale of an ERP rollout — not a quick deployment.A shifting cost calculus: Model distillation, maturing open-source weights, and falling hardware costs are steadily improving the build-vs-buy math — especially once API subscription spending and compliance risk are factored in.The "WordPress moment" thesis: Just as open-source web publishing democratized who could build an online presence, converging forces may be approaching a similar inflection point for enterprise AI ownership.For more on keeping AI systems sharp as data and requirements evolve over time, check out the related episode Forgetting to Forget: How to Keep AI Systems Sharp Over Time. More from the show on enterprise AI architecture, compliance frameworks, and industry use cases is published and updated regularly at LLM.co. LLM
2d ago

Forgetting to Forget: How to Keep AI Systems Sharp Over Time

AI models don't just degrade from bugs or bad data — sometimes they simply forget. Catastrophic forgetting is the quiet failure mode where updating a model with new information causes it to lose competence on tasks it previously handled with ease. This episode of Automatic unpacks the phenomenon in depth, drawing on the source article on continual learning and forgetting to make the case that keeping models sharp over time is as much a discipline problem as a technical one. The episode covers the full arc of the problem — from why it happens at the weight level, to the engineering patterns that prevent it, to the cultural habits that determine whether any of those patterns actually stick. Key topics include: Why catastrophic forgetting occurs: Fine-tuning adjusts a model's internal weights without preserving a map of what those weights were doing, causing new learning to silently overwrite old competence.Two foundational strategies — separation and rehearsal: Keeping capabilities in distinct parameter regions (via adapters or specialized submodels) and periodically exposing the model to curated examples of earlier tasks so prior knowledge isn't erased.Building smarter memory buffers: How to choose what to retain using diversity sampling, difficulty sampling, and importance weighting — so the buffer reflects real priorities rather than data volume.Technical toolkit options: Regularization methods that penalize destructive weight changes, adapter layers that slot new skills into existing architectures, and mixture-of-experts routing that gives new tasks their own parameters without displacing old ones.Evaluation and drift discipline: Why testing only on new tasks is a trap, how to build rolling test suites that span historical and fresh data, and how to distinguish intentional drift (the world changed) from harmful forgetting (the model regressed).Culture and ethics: Versioning everything, treating rollbacks as wins, running weekly drift reviews — and building explicit retention policies that prevent sensitive data from persisting under the guise of stability.The throughline is a reframe of how teams should think about model memory: not as a default behavior, but as a designed one. If retention isn't explicitly planned and enforced, it won't happen — and the next model update may quietly erase last quarter's hard-won gains. The episode closes with a practical entry point for teams feeling the weight of all this: start with a small rehearsal buffer, a weekly drift check, and one adapter instead of a global update. Small habits compound into systems that learn without forgetting what already works. For more from the show, check out the episode AI Latency Budgets: Why Seconds Kill Products — a close companion to this one for teams thinking seriously about production AI reliability. Automatic Private LLM
2d ago

AI Latency Budgets: Why Seconds Kill Products

Speed is a trust signal, not just a technical metric — and in AI products, it may be the difference between a loyal user and a silent churn. This episode of Automatic draws on the original article on AI latency budgets and why seconds kill products to make the case that every team shipping an AI feature needs to treat response time as a first-class design constraint, not an afterthought. Unlike a traditional web request, an AI pipeline chains together a dozen moving parts — intent detection, retrieval, reranking, prompt construction, model inference, post-processing, and client rendering — each silently consuming milliseconds. Without a formal budget, those costs compound invisibly. The episode walks through what it actually takes to define, instrument, and enforce a latency budget across a modern AI stack, covering: The human thresholds that define "fast enough" — why sub-second interactions feel instantaneous, why three seconds marks the edge of trust, and how the same answer delivered faster is perceived as smarter and safer.How to structure a component-level budget — assigning explicit millisecond allowances to each pipeline stage (input sanitization, retrieval, reranking, generation, streaming) so tradeoffs become visible and negotiable instead of political.Why server-side averages are a trap — the case for client-side instrumentation, tail latency percentiles (P95 and P99), correlation IDs, and structured spans that map the full journey from tap to paint.Streaming and progressive disclosure as perception tools — how showing partial output within a few hundred milliseconds bends the user's patience curve without changing total compute time.The overlooked latency culprits — tokenization overhead, vector store recall budgets, cold starts, payload size, TLS negotiation, and geographic distance between compute and user all adding up faster than model inference alone.Speed as an organizational culture, not just an engineering target — why teams that celebrate latency wins ship faster products, and how performance gates in the build pipeline prevent regressions from sneaking into production.The episode closes with a practical playbook: name an end-to-end target, assign component allowances, parallelize aggressively, right-size models for each pipeline stage, and instrument everything. If you enjoyed this one, don't miss How Private LLMs Reduce Operational Risk for Finance Teams for another look at building AI systems that earn — and hold — user trust. Automatic
3d ago

Bringing Agentic AI In-House: Private LLMs That Act, Not Just Chat

Most AI deployments inside companies stop at conversation — ask a question, get an answer, then go do the work yourself. This episode of Automatic examines the next leap: agentic AI systems that don't just respond but genuinely act, and why a growing number of organizations are choosing to run them entirely on private infrastructure. Drawing on this deep-dive on private agentic LLMs, the episode walks through what it actually takes to move from a chat assistant to an autonomous co-worker operating inside your own walls. Here's what the episode covers: The agentic shift explained: How modern LLMs move beyond text generation to execute multi-step tasks — querying data warehouses, filing tickets, routing approvals, and sending outputs — without human hand-offs at every stage.Three enabling technologies: Toolformer-style API-aware training, long-context architectures that preserve a plan across many steps, and fine-grained policy engines that block unsafe actions before they fire.Why private deployment wins: The compounding case for keeping agentic AI behind your own firewall — compliance obligations (HIPAA, SOC 2, GDPR), competitive sensitivity of the data in real prompts, direct control over guardrails, and more predictable infrastructure costs versus token-based public APIs.A five-layer architecture: The episode unpacks the distinct roles of the core inference layer, a vector-database memory and planning layer, a tooling and orchestration layer (LangChain or custom schemas), a policy enforcement sandbox, and a human-in-the-loop approval portal — and explains why clean boundaries between layers make the whole system maintainable.Governance that holds up in practice: Pre-deployment alignment through curated training data and scoped permissions, plus real-time oversight via exhaustive logging, rate limits, mandatory approvals, and "shadow mode" pilots where agents suggest but cannot act until accuracy benchmarks are met.A phased rollout playbook: Start narrow — legal-hold reminders, CRM cleansing, nightly ops reports — instrument everything, then widen autonomy incrementally as the system earns organizational trust and each small win funds the next infrastructure step.The episode closes with the strategic argument that organizations crossing this threshold now — with the right layered architecture and rigorous governance — will set a meaningfully higher bar for what intelligent automation looks like. More from the show: if you want to see how a specific regulated industry is already navigating this transition, check out the episode From Paper Stacks to Private LLMs: How Insurers Are Reinventing Claims. LLM
3d ago

Prompt Caching: How to Make LLMs Faster and Cheaper Without Cutting Corners

Every time an LLM application sends the same background context — system prompts, schema definitions, policy documents — without caching, it pays for that context all over again. At scale, across thousands of daily requests, that overhead compounds fast. This episode of Automatic unpacks the full guide to prompt caching for LLM applications, exploring how teams can dramatically reduce token spend and response latency while keeping output quality high — or even improving it. The episode covers the core mechanics of prompt caching and works through the practical architecture decisions, common failure modes, and design patterns that separate a well-built caching layer from one that quietly causes problems: What caching actually preserves — not full responses, but reusable intermediate artifacts like retrieval plans, chunk summaries, tool-call templates, and formatting scaffolds.Why quality can improve — caching deterministic steps removes variance from parts of a pipeline that should behave consistently, letting the model focus fresh generation where it actually matters.Where caching fits in real stacks — retrieval and grounding systems, tool-use orchestration, and structured generation each have distinct cacheable units and different risk profiles.Key design decisions — how to define the right cacheable unit, build cache keys that balance hit rate against staleness risk, and set expiration and invalidation policies that won't surprise the team during a model upgrade.Practical patterns that pay off quickly — intent normalization before routing, plan-then-execute splits that cache blueprints, and templated outputs with live-filled slots.The traps to avoid — over-caching personal or rapidly shifting context, hidden coupling between pipeline layers, and applying reuse in domains like legal or medical guidance where even structural similarity can mislead.The episode also addresses safety and trust: why cache hits should still pass through lightweight validators, how to log cache provenance for observability, and what privacy hygiene looks like when personal data and secrets are in play. The closing argument is that prompt caching, done with discipline, isn't an optimization hack — it's closer to craftsmanship, and the underlying pattern will remain valuable even as context windows and model capabilities continue to expand. For more on how AI is reshaping complex, document-heavy industries, listen to From Paper Stacks to Private LLMs: How Insurers Are Reinventing Claims. Automatic
4d ago

From Paper Stacks to Private LLMs: How Insurers Are Reinventing Claims

Insurance claims have never been clean, simple, or fast — and for decades, the industry has relied on rules-based software and armies of human reviewers to manage the mess. This episode of Automatic examines how that model is changing, drawing on this deep dive into private LLMs and claims data parsing to explore why large language models are proving to be one of the most consequential tools insurers have ever adopted. The shift goes well beyond efficiency — it touches auditability, regulatory compliance, and the day-to-day experience of policyholders navigating some of the hardest moments of their lives. The episode walks through the full arc of how private LLMs are being deployed inside insurance operations, from raw document ingestion to final adjuster review. Key topics include: Why claims data resists automation: A single claim can contain narrative reports, scanned PDFs, billing codes, email threads, and legacy attachments — a mix that traditional rules-based systems were never built to handle.Context-aware reading at scale: Unlike keyword-matching software, LLMs understand that identical language carries different meaning depending on where it appears in a document — a distinction that previously required careful human reading.The ingestion and extraction pipeline: How optical character recognition, layout analysis, and entity extraction work together to turn unstructured claims documents into structured, cited, auditable outputs.Policy reasoning and conflict detection: How well-designed systems compare claim details against live policy language — flagging potential exclusions, recommending reserve amounts, and explaining their reasoning in plain language for adjuster review.Privacy and governance as non-negotiables: Why insurers deploying these tools keep everything inside controlled networks, with locked-down data residency, access logging, encryption, and regression testing built into the model update cycle.Retrieval-augmented generation as a trust layer: How grounding model outputs in retrieved, up-to-date sources — rather than relying on training data alone — addresses the hallucination problem that makes enterprise AI adoption so risky in regulated industries.The episode also addresses the human side of implementation: how LLMs are being positioned as companion services alongside legacy claims platforms, why adjusters remain essential for judgment-heavy decisions, and what concrete success metrics — handling time, straight-through processing rates, claim reopening rates — actually reveal about whether a deployment is working. For more on how this technology is reshaping regulated industries, the episode How Private LLMs Reduce Operational Risk for Finance Teams covers closely related ground from a financial-services perspective. LLM
Jul 24

How Private LLMs Reduce Operational Risk for Finance Teams

Operational risk in finance rarely announces itself with a dramatic failure — it hides in a four-cent ledger discrepancy, a Friday afternoon rubber-stamp, or a compliance policy sitting unread in a shared folder. This episode of Automatic breaks down how private large language models are giving finance teams a fundamentally different kind of control, drawing on this in-depth look at how private LLMs lower operational risk for finance teams. Rather than bolting more rules onto aging infrastructure, private LLMs work with the messy, unstructured reality of modern finance — and they do it entirely inside the firm's own walls. The episode covers the full arc of where and how this architecture makes a difference: The hidden shape of operational risk: Why most financial errors are quiet, contextual, and invisible to traditional rules-based controls — and why that gap has been widening as data environments grow more complex.What "private" actually means: A private LLM lives inside the firm's own infrastructure, keeping every query, output, and data reference behind the organization's encryption perimeter — a hard requirement under strict data residency rules, not just a preference.Faster, cleaner reconciliations: How private models cross-reference multi-system ledgers at speed, surface variances, and draft explanations for human review — shrinking the month-end close from an all-nighter to a routine afternoon task.Real-time compliance enforcement: Embedding current policies directly into the model means every request is scanned against live rules before anything moves forward, with audit evidence tagged and organized as it accumulates rather than assembled under pressure.Democratizing analytical access: Breaking the bottleneck that forces non-technical staff to queue behind specialists for risk and exposure data — so decision-makers get answers in plain English, in seconds, when it matters.A compounding feedback loop: Each human accept-or-reject interaction becomes a training signal, quietly improving the model's fit to a specific desk and workflow over time — without manual retraining cycles.The episode also examines how reducing third-party integrations trims the vendor attack surface, how logged exceptions and timed reconciliation tasks feed measurable key risk indicators, and why this shift represents a strategic edge rather than an incremental upgrade. For more on how machine learning models handle imperfect data, check out the earlier episode Synthetic Labels: Training Machine Learning Models Without the Truth. LLM

See All (61)

Podcast for Automatic.co and LLM.co, the AI automation specialists.

Creator

Eric Lamanna
Years Active

2026
Episodes

61
Rating

Clean
Show Website

Automatic

Automatic

Model Rollbacks: The AI Version of Panic Mode

The Private LLM Revolution: Why Enterprises Are Bringing AI In-House

Forgetting to Forget: How to Keep AI Systems Sharp Over Time

AI Latency Budgets: Why Seconds Kill Products

Bringing Agentic AI In-House: Private LLMs That Act, Not Just Chat

Prompt Caching: How to Make LLMs Faster and Cheaper Without Cutting Corners

From Paper Stacks to Private LLMs: How Insurers Are Reinventing Claims

How Private LLMs Reduce Operational Risk for Finance Teams

About

Information

Automatic

Episodes

Model Rollbacks: The AI Version of Panic Mode

The Private LLM Revolution: Why Enterprises Are Bringing AI In-House

Forgetting to Forget: How to Keep AI Systems Sharp Over Time

AI Latency Budgets: Why Seconds Kill Products

Bringing Agentic AI In-House: Private LLMs That Act, Not Just Chat

Prompt Caching: How to Make LLMs Faster and Cheaper Without Cutting Corners

From Paper Stacks to Private LLMs: How Insurers Are Reinventing Claims

How Private LLMs Reduce Operational Risk for Finance Teams

About

Information