Most organizations think their AI rollout failed because the model wasn’t smart enough, or because users “don’t know how to prompt.” That’s the comforting story. It’s also wrong. In enterprises, AI fails because context is fragmented: identity doesn’t line up with permissions, work artifacts don’t line up with decisions, and nobody can explain what the system is allowed to treat as evidence. This episode maps context as architecture: memory, state, learning, and control. Once you see that substrate, Copilot stops looking random and starts behaving exactly like the environment you built for it. 1) The Foundational Misunderstanding: Copilot isn’t the system The foundational mistake is treating Microsoft 365 Copilot as the system. It isn’t. Copilot is an interaction surface. The real system is your tenant: identity, permissions, document sprawl, metadata discipline, lifecycle policies, and unmanaged connectors. Copilot doesn’t create order. It consumes whatever order you already have. If your tenant runs on entropy, Copilot operationalizes entropy at conversational speed. Leaders experience this as “randomness.” The assistant sounds plausible—sometimes accurate, sometimes irrelevant, occasionally risky. Then the debate starts: is the model ready? Do we need better prompts? Meanwhile, the substrate stays untouched. Generative AI is probabilistic. It generates best-fit responses from whatever context it sees. If retrieval returns conflicting documents, stale procedures, or partial permissions, the model blends. It fills gaps. That’s not a bug. That’s how it works. So when executives say, “It feels like it makes things up,” they’re observing the collision between deterministic intent and probabilistic generation. Copilot cannot be more reliable than the context boundary it operates inside. Which means the real strategy question is not: “How do we prompt better?” It’s: “What substrate have we built for it to reason over?” What counts as memory? What counts as state? What counts as evidence? What happens when those are missing? Because when Copilot becomes the default interface for work—documents, meetings, analytics—the tenant becomes a context compiler. And if you don’t design that compiler, you still get one. You just get it by accident. 2) “Context” Defined Like an Architect Would Context is not “all the data.” It’s the minimal set of signals required to make a decision correctly, under the organization’s rules, at a specific moment in time. That forces discipline. Context is engineered from: Identity (who is asking, under what conditions)Permissions (what they can legitimately see)Relationships (who worked on what, and how recently)State (what is happening now)Evidence (authoritative sources, with lineage)Freshness (what is still true today)Data is raw material. Context is governed material. If you feed raw, permission-chaotic data into AI and call it context, you’ll get polished outputs that fail audit. Two boundaries matter: Context window: what the model technically seesRelevance window: what the organization authorizes as decision-grade evidenceBigger context ≠ better context. Bigger context often means diluted signal and increased hallucination risk. Measure context quality like infrastructure: AuthoritySpecificityTimelinessPermission correctnessConsistencyIf two sources disagree and you haven’t defined precedence, the model will average them into something that never existed. That’s not intelligence. That’s compromise rendered fluently. 3) Why Agents Fail First: Non-determinism meets enterprise entropy Agents fail before chat does. Why? Because chat can be wrong and ignored. Agents can be wrong and create consequences. Agents choose tools, update records, send emails, provision access. That means ambiguity becomes motion. Typical failure modes: Wrong tool choice. The tenant never defined which system owns which outcome. The agent pattern-matches and moves. Wrong scope. “Clean up stale vendors” without a definition of stale becomes overreach at scale. Wrong escalation. No explicit ownership model? The agent escalates socially, not structurally. Hallucinated authority. Blended documents masquerade as binding procedure. Agents don’t break because they’re immature. They break because enterprise context is underspecified. Autonomy requires evidence standards, scope boundaries, stopping conditions, and escalation rules. Without that, it’s motion without intent. 4) Graph as Organizational Memory, Not Plumbing 4 Microsoft Graph is not just APIs. It’s organizational memory. Storage holds files. Memory holds meaning. Graph encodes relationships: Who metWho editedWhich artifacts clustered around decisionsWhich people co-author repeatedlyWhich documents drove escalationCopilot consumes relational intelligence. But Graph only reflects what the organization leaves behind. If containers are incoherent, memory retrieval becomes probabilistic. If containers are engineered with ownership and authority, retrieval becomes repeatable. Agents need memory to understand context. But memory without trust is dangerous. Which brings us to permissions. 5) Permissions Are the Context Compiler Permissions don’t just control access. They shape intelligence. Copilot doesn’t negotiate permissions. It inherits them. Over-permissioning creates AI-powered oversharing. Under-permissioning creates AI mediocrity. Permission drift accumulates through: Broken SharePoint inheritance“Temporary” broad accessGuest sprawlSharing links replacing group governanceOrphaned containersWhen Copilot arrives, it becomes a natural language interface to permission debt. Less eligible context often produces better answers. Least privilege is not ideology. It’s autonomy hygiene. Because agents don’t just read. They act. 6) Prompt Engineering vs Grounding Architecture Prompting steers conversation. Grounding constrains decisions. Prompts operate at the interaction layer. Grounding architecture operates at the substrate layer. Substrate wins. Grounding primitives include: Authoritative sourcesScoped retrievalFreshness constraintsPermission correctnessProvenanceCitations-or-silenceIf the system can’t show evidence, it must escalate. Web grounding expands the boundary beyond your tenant. Treat it like public search. Prompts don’t control what the system is allowed to know. Permissions and grounding do. 7) Relevance Windows: The Discipline Nobody Budgets For Relevance windows define eligible evidence per workflow step. Not everything retrievable is admissible. Components: Authority hierarchyFreshness rulesVersion precedenceScope limitsExplicit exclusionsMore context increases contradictions. Tighter windows increase dependability. If a workflow cannot state: “Only these sources count.” It isn’t ready for agents. 8) Dataverse as Operational Memory 4 Microsoft Dataverse is operational memory. State answers: Who owns this right now?What step are we in?What approval exists?What exception was granted?Without state, agents loop. With explicit state machines: OwnershipStatus transitionsSLAsApproval gatesException trackingAgents stop guessing. They check. Operational memory reduces hallucinations without touching the model. Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support. If this clashes with how you’ve seen it play out, I’m always curious. I use LinkedIn for the back-and-forth.