Just Now Possible

When AI Becomes Your SRE: How Incident.io Is Automating Incident Response

Guests

  • Lawrence Jones, Founding Engineer at Incident.io
  • Ed Dean Product Lead for AI at Incident.io

Key Takeaways

  • AI’s biggest impact comes from compressing time—identifying causes minutes instead of hours.
  • Retrieval-augmented reasoning still benefits from simplicity: deterministic tagging and re-ranking often beat complex vector setups.
  • Post-incident “time travel” evals let teams score AI accuracy after they know what really happened.
  • Building trust in AI isn’t just about precision—it’s about showing reasoning and uncertainty in ways humans understand.

Mentioned Tools & Concepts

  • Slack as the interface for human-AI collaboration
  • PGVector and Postgres for retrieval experiments
  • RAG (Retrieval-Augmented Generation)
  • Multi-agent orchestration
  • “AI as your company’s immune system”

Chapters 00:00 Meet the Founders: Lawrence and Ed 00:41 Introduction to Incident.io 01:25 Evolution of Incident.io Products 02:14 Understanding SRE and Its Importance 04:01 Real-World Incident Management 05:51 The Role of AI in Incident Management 10:12 Challenges and Innovations in AI SRE 12:14 Prototyping and Iterating AI Solutions 16:25 Refining Retrieval Strategies 21:52 Balancing AI and Human Interaction 32:06 User Experience and Trust in AI Systems 36:08 Interactive Slack Integration 37:08 Understanding the AI Investigation Process 37:50 Parallel Checks and Data Sources 38:35 Building Hypotheses and Refining Findings 40:09 Human-Agent Collaboration 49:23 Evaluating AI Effectiveness a01:04:13 Future Developments and Integrations