High Output: The Future of Engineering

Maestro AI

5.0 (2)
科技
一周一更

A window into tomorrow's software organizations through conversations with visionary engineering leaders who are redefining the profession. Join us to explore how leadership will evolve, what makes high-performing teams tick, and where the true value of engineering lies as technology and human creativity continue to intersect in unexpected ways. maestroai.substack.com

2天前

The Handoff Tax

I asked Muddassar Shaikh where engineering work is actually heading, and he answered with what he admitted could be the setup to a joke. “This can be the start of a great joke,” he said. “A product manager and a designer and an engineer walk into a room, and then jam on the idea together. And they come out with a working prototype instead of coming out with a spec.” What’s missing is the handoffs. No PRD for a designer to interpret. No mockups for an engineer to build from. No ticket waiting to be picked up. The session ends with a thing that runs, not a document describing the thing that should run. He kept returning to that image, and most of our conversation was about the distance between that room and the one his teams actually work in. Muddassar is the SVP of Engineering at GoodRx, with two decades behind him — Ticketmaster, where he grew the app install base to 42 million users, then Beachbody, now GoodRx. He’s led the kind of multi-year migrations that reshape an org chart, and so his first instinct about AI is that it’s familiar. “I would say this is part of my playbook,” he said. “I’ve led technology transformation and organizational transformation at a number of companies.” Cloud was one. Monolith to microservices was another. AI, in his telling, is just the next. Hold onto that, because by the end he complicates it himself. Leakage When I asked him to walk through how software actually gets made at GoodRx, the answer ran long. A PM talks to a business stakeholder. The ideas become product specs. The specs get handed to a designer, who makes visual artifacts. Those go to an architect or tech lead, who writes the technical diagrams. Then a team builds it. Business intent into PRD. PRD into wireframes. Wireframes into architecture. Every arrow is a person reading what the last person produced and trying to figure out what they meant. There’s “a lot of potential of leakage,” he said, as “these handoffs are happening between different roles.” Leakage is the right word for it. Each handoff is a lossy compression: the stakeholder had something in their head, the PM wrote down a version of it, the designer drew a version of that, the engineer built a version of that, and what ships is four translations downstream of the original intent. No single handoff is broken in a way you can name. But by the time the thing reaches production, some real fraction of what the business actually wanted has been quietly washed out of it. Muddassar’s read is that AI’s real leverage is on the chain itself, not on the code at the end of it. “What AI will do, already doing, is diffusing these different roles. So one person can play multiple roles. We can also find ways to reduce the leakage as handoffs go on. Or we can completely eliminate certain handoffs.” Most productivity tools just speed handoffs up. He’s saying some of those handoffs shouldn’t exist in the first place. Collapsing the chain That room from the joke — GoodRx isn’t in it yet. “We are not there yet,” he said. So he’s working toward it one handoff at a time. He gave me three examples, each killing a different translation step. JIRA automation closes the gap between a ticket and a branch: “add a label, or add a bot. It’ll read the JIRA specification. It’ll recognize what parts of the code need to change. It’ll go make the changes.” A tool his team open-sourced, called Lifecycle, shortens the engineer-to-QA loop by spinning up an ephemeral environment for every PR and posting the test link back to the ticket. And the third handoff isn’t technical at all: “We’ve had a lot of product managers starting to deploy quick fixes. AI has truly enabled me to democratize access to code.” For a copy change, the PM just ships it. The handoff to engineering disappears. Each one cuts out a step that used to be just how work moved through the org. You make progress by subtracting, and the subtractions stack. His last transformation cut cycle time from “13 days or so to about six days,” and “that took us about two and a half years.” Since adopting AI: “the cycle time has again reduced by half in the last eight months.” Same size of gain, a quarter of the time. Two workforces This is where his just-another-transformation framing breaks, and he knew it. “Previous transformations were primarily human driven. And now you have to manage humans, and you have to manage non-humans — the agents.” There’s a second workforce in the org now. It doesn’t attend standup, isn’t bound by morale or meeting culture, and runs at a pace no migration ever did. A cloud migration never made anyone manage a fleet of teammates that don’t sleep — and it never made the human teammates wonder, as Muddassar put it, whether “this role is even going to be around two years or five years from now.” So a leader now runs two workforces at once, and can see neither clearly. The pace is the part that surprised me — Muddassar told me he used to read a daily brief on the AI world and had to give it up for a weekly one, because there was too much shipping in any given day to keep up with. And the humans? Their most important work has moved to a place the old dashboards don’t look. When the act of coding gets cheaper, the value moves upstream of the code: into how an engineer scopes a problem, what they ask the model, whether they catch it when it’s wrong. As Muddassar put it, “the act of coding itself will become less and less important.” What survives is “system thinking” and “your ability to give really clear specs.” That work happens before a single line is committed — and none of it shows up in a PR count. The one handoff you can’t collapse For all the acceleration, the thing he was most insistent on was the part that doesn’t speed up. With more code generated by models — and then read and changed by models — the old review process strains. But the answer isn’t a lower bar. “The bar for product quality cannot reduce. So we have to have stronger harnesses to test the changes.” He’s seen the cautionary tales: “changes at much bigger companies being rolled out with AI that have caused business impact.” His reframe is that the code itself stops being the artifact you guard. “The quality of code will matter less and less. The outcome that comes out of the coding session, the final output — that’s gonna matter. If you’re able to write a well-defined spec, and if you have well-architected harnesses to evaluate the output of the prompt, then how the code is actually written matters less and less.” Not how it’s written. Whether it does the thing, and whether you can prove it. If the leverage is in collapsing handoffs, the one handoff you can’t collapse is between “the model produced something” and “we know it’s right.” That one you have to build deliberately, stronger than before. High Output is brought to you by Maestro AI. High Output is brought to you by Maestro AI. Muddassar described leaders running two workforces at once — the humans and the agents — and being able to se neither clearly. The agents are the newer blind spot. Teams are all-in on AI with almost no view into how it’s actually being used: what’s working, what’s wasted effort, and which engineers have learned to direct an agent well. Maestro plugs into Claude Code and Codex and gives you that view. The point isn’t to grade your engineers — it’s to help every one of them get better at directing AI. We see what your strongest AI users actually do differently and turn it into patterns the rest of the team can learn from, so every engineer on your team can master AI. Your team adopted AI. Maestro helps you see how it’s really going — and helps every engineer learn to direct an agent well. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit maestroai.substack.com

27 分钟
6月3日

Open the Barn Door

Twenty minutes into our conversation, I asked Charity Majors how engineering leaders should be finding good junior engineers right now. “God, I don’t f*****g know.” She apologized, then doubled back. “Sorry. Excuse me. You do need them. They’re not hard to find.” That answer is the whole interview in miniature. How a junior breaks into engineering today, Charity will tell you, is genuinely unresolved. None of the paths that worked for her exist anymore. How an engineering org builds a healthy pipeline, on the other hand, is not particularly hard. The two questions sit next to each other, and she refused to collapse them into a tidy answer. Charity is the co-founder and CTO of Honeycomb, twenty years into the industry, two O’Reilly books behind her and the second edition of one in progress. Her career has been built on distributed systems — production engineering at Parse, then Linden Lab, then founding an observability company. But most of what she said over the next half hour was about people, and she came back to one idea four or five times: engineering teams are not social systems and they are not technical systems. They’re sociotechnical systems, and the way you reason about one shapes the way you have to reason about the other. Idaho Charity grew up in the backwoods of Idaho. No computers, no phone line for most of her childhood. She got to college on a classical piano scholarship and noticed something there. “People who studied music were still hanging out working minimum-wage jobs in their thirties, forties, and fifties. And I was like, I grew up being poor. I am not going to be a poor adult. And so I switched lanes.” She got into tech in the late nineties. “Any smart kid who is willing to work weird hours and try a lot of stuff could make a go of it.” She doesn’t romanticize that. Tech was a toy then, she said, and now powers nuclear power plants, so the bar going up is correct. But twenty years on, she’s worried about what’s happened to the door behind her. “I think we really risk it becoming the sort of ivory tower where we keep out anyone who has a non-traditional background. You need to think harder about crafting paths into technology to meet the moment.” I asked how she got into management. “I was a reluctant manager.” She drew a line between management and leadership before I could follow up. These are sociotechnical systems, she said, “they’re not social or technical, or we could just take the great managers from Starbucks and put them in charge of engineering teams.” The reason she ended up doing the job at all was anger. “I got into people management the same way a lot of people do, which was enraged, because I didn’t like the way it was being done. And I was like, *god damn it, I guess I will do it differently. I will not make any of these mistakes.* So I made different mistakes, of course.” The self-correction is constant in conversation with her. She said something close to it three more times over the next half hour. The freeze When I brought up the AI-killing-the-junior-pipeline discourse, she pointed to something specific. She’d just read a piece by Annie Lowrey in the Atlantic that morning. The Job Market Is Hell. Unemployment is around 4.7%, which is historically fine, but nobody is leaving their jobs and nobody is hiring. On both sides of the resume, AI is doing the talking. Recruiters feed inbound applications into screening tools. Candidates feed job listings into chatbots. “The result is there are no people talking to people. Nobody’s figured out how to do this.” The framing she rejected was the one that treats this as inevitable. “What I don’t like about the way people talk about bringing juniors into tech is they talk about it like it’s some force of nature that we have no control over, which is absolute horseshit. This is a world we create. It’s a world that we reinforce.” It’s a sequence of decisions made by people in rooms. And the people most responsible for those decisions, she would argue later, aren’t the ones the org chart suggests. Make friends with the discomfort Before she got to the operational claims, she walked me through what she thinks her generation of managers got wrong, because the failure mode shapes everything else. “My generation swung the other way and was like very rigorous about, *you should have work-life balance. Nobody should be pinging you after hours.*” The intent was correct. She was managing in reaction to the era of people sleeping under their desks. But she watched it overcorrect. “I see some managers being like, *you’re working more than 40 hours, stop.* And honestly, we live in a very complex, fast-changing world, and if you’re intrinsically motivated to be working, if you’re learning, if you’re having fun, nobody should be stopping you, because that really is the path to success.” She isn’t arguing for the swing back, either. I brought up 996, the Chinese nine-to-nine, six-days-a-week framing that’s been making the rounds on Hacker News. She had nothing nice to say about the swing-back. “It all swings back. It all swings back, doesn’t it?” Then, more bluntly: “That’s b******t.” She read the cycle as a generational pattern, and she was harder on her own generation than on either pole. “If anyone had told me that, if I had followed that advice, I would not be where I am.” The piece of this that connects to junior hiring is the part most management writing skips. “You need to learn to make friends with the discomfort. You need to learn to find joy in the pain.” None of us, she said, evolved to handle data structures and algorithms, and the early years of an engineering career are genuinely agonizing. The juniors who make it through are the ones who learn to like the agony. A lot of senior engineers, looking back, have forgotten that they once lived through it. It’s a humanistic argument, not just an operational one. She talked for a while about school stamping out the curiosity children are born with. Twelve, twenty, twenty-five years of report cards, conditioning us to associate learning with extrinsic reward. What she loves about adulthood is the chance to rediscover the original instinct. Engineering is one of the few careers that pays you for it. 50 to 1 Her first operational claim was about team composition. “For every staff engineer that you have, let alone principal engineer, you need 50 intermediate engineers.” The number is a gesture. The shape of the argument is specific. Most companies have over-corrected toward senior hiring on the theory that they’ll get more leverage per dollar. The people who actually ship the bulk of features, she said, aren’t seniors. They’re intermediates. “Some of the most productive engineers that I’ve ever worked with have been intermediate engineers. They can just put on their headphones, beginning of the day, go deep, and just pound out the features and the bug fixes.” Heads down, pattern matching, finishing things. “Nobody who’s been in engineering for seven, ten years wants to do that. They’re sick of that.” The bored staff engineer is not a leverage win. “When people get bored, you do not get great work out of them. You get the best work out of people when they are working at that place that’s right on the edge of their ability.” And the supply chain only runs one direction. “Nobody stays a junior engineer for long, two years at most. So you’ve gotta keep feeding the system. You’ve gotta keep bringing new blood in.” Opening the barn door I asked what she’d recommend to companies that are paranoid about hiring right now. “I would advocate for opening the barn door a bit wider, giving more people a shot. Understanding that it means you will have to fire more of them. You will have to let more of them go. But I feel like it’s worse to never give people a shot.” The second half is the part she emphasized. A wider door costs you in faster, more honest performance management, and most engineering managers are bad at that part. “Nothing demoralizes a team more than when someone that they work with every day, who’s not pulling their weight, just hangs around forever.” The unsalvageable cases weren’t the ones that escalated. They were the ones that drifted. “Some of the most heartbreaking situations I’ve ever been in as a manager are when a person’s being let go after years of them doing exactly the same thing, and they’re legitimately dumbstruck.” There’s a side benefit she pointed out that I hadn’t considered. Junior engineers audit your systems in a way nobody else can. “If you’re an engineer joining a team where there is very low turnover, where people never join, where people never leave, that is not likely to be a very high functioning team either.” Old docs. Idiosyncratic mental models locked in three people’s heads. A dev environment that takes a month to set up because nobody’s tried in six. “If you’re used to bringing on junior engineers, oh boy, those kids will audit your systems like no one else.” That’s the sociotechnical argument in plain language. The team isn’t separable from the systems it owns, and the hiring policy isn’t separable from the operational health of the codebase. Both improve together or neither does. What she watches for in a junior The most optimistic moment came when I asked what she watches for in her own juniors. “Some of our junior engineers talk about how they are in conversation with Claude all day long. By the time they bring a question to their senior engineer, which they do very often, they have tried all the low-hanging fruit, they’ve tried a bunch of stuff, they’ve asked a lot of questions. So it is very well worth that senior engineer’s time.” That’s not the threatened-junior story most engineering leaders are telling right now. The juniors she descr

34 分钟
5月14日

When Craft Meets Non-Determinism

Superhuman built its reputation on a number: 100 milliseconds. Every interaction in the product has to feel instantaneous. Not fast. Instantaneous. That’s the threshold where the human brain stops perceiving lag and starts feeling like the software is an extension of thought. They’ve been engineering to that constraint for years, and it has shaped everything — the architecture, the hiring bar, the way even a billing email gets crafted like a product. Then they added AI. And for the first time, they were shipping something they couldn’t fully control. The feeling that built a company “Every single interaction needs to be below 100 milliseconds, because this is when you feel that things are instantaneous,” Loic says. The number didn’t come from a product spec. It came from game design. Rahul Vohra, Superhuman’s CEO, studied how games create the feeling of flow, and bet that people hate email because of how email works, not because email is email. The architecture follows from that constraint. Superhuman assumes the network will slow you down, so they build as if the network isn’t there — local-first, syncing in the background, optimistic UI throughout. “You need to build without a backend. How do you do that across multiple devices and make it crazy fast?” People pay $40 a month for email and feel it’s worth it. Their users — mostly executives and salespeople who average three hours a day in their inboxes — describe the experience the way people describe good tools: the software stops mattering and the work takes over. How taste becomes infrastructure Loic joined at the beginning of 2025 as an outsider. “I came in with genuine curiosity. I was blown away.” What surprised him wasn’t the rule but how thoroughly it had been internalized. “Even a backend engineer will think about the latency of their API and how this will reflect in the experience.” In most engineering organizations, backend engineers think about correctness and throughput. At Superhuman, they think about how the user will feel. It starts in hiring — product sense is a criterion for every role, not just product and design. The finance team applies the same scrutiny to the email a customer gets when they’re being told what they owe as the product team applies to the inbox. The offer letter is a product experience. “The offer is a ceremony. It’s not transactional — it’s already an experience.” Candidates who got that treatment show up acting like it. Rahul reviews everything going into production. “Within the organization, this is building a muscle in every single engineer, designer, product manager — everyone knows the bond is that high.” You can’t work at Superhuman long without developing an eye for when something feels off — a slightly slow animation, a misaligned pixel, an API call that’s a few milliseconds slower than it ought to be. Loic calls it sensation transference. Packaging changes how you experience the product inside. They take that idea seriously enough that the bill you get from the finance team is treated like part of the product. The part they can’t control For ten years, everything in Superhuman’s stack was deterministic. Same input, same output. That’s what made the 100ms promise keepable: you could engineer to it, measure it, hold it. AI broke that. “The consistency we were used to is not there anymore,” Loic says. “We all face the surprising change of behavior of a model that is technically not changing its version.” A model API doesn’t update its version number, but its outputs shift. The same query returns different results this week than last week. For most products, this is annoying. For Superhuman, it’s a more serious problem, because their users aren’t tolerant of inconsistency. “We are similar to Apple in the sense that people expect the best. They pay a bunch, so they always expect the best.” The specific problem is what happens when AI meets user-generated input. Superhuman can engineer every designed interaction. They cannot engineer how users phrase search queries. “We were controlling every single part of the interaction — feels fast, feels right, feels correct — and all of a sudden, the outcome of the search box is not what I was looking for. Garbage in, garbage out. But how do you control the garbage in?” There’s no bug to fix and no perf target to chase. The product was built on consistency, and now consistency is the thing they can’t fully promise. What the numbers don’t say Superhuman’s AI adoption numbers look good: 90% of engineers using AI daily, 70% of PRs AI-augmented, 90% of those interactions net positive, some engineers claiming 40% velocity gains. Loic is careful about how he explains this. The numbers work partly because of who their engineers are. “We have a very senior team — over-optimized on seniority. Those people tend to use AI with care. They know the outcome they want, and they just use AI to get faster to that outcome.” The 40% gains aren’t coming from code generation. They’re coming from everything before the code. “Coming into a new codebase, trying to understand what this library is doing — before, you had to find the entry point, map the dependencies, build your own mental model. Now Claude Code does that so much faster.” The win is in comprehension and orientation, not typing speed. But the same playbook doesn’t transfer automatically. “If you have a lot of junior engineers, vibe coding’s impact on code quality might be real. It’s not a problem for us — it’s not part of our DNA.” Taste filters the output. Senior engineers with strong judgment about what “right” looks like can catch what the model gets wrong. Engineers without that judgment can’t. Teams celebrating big AI velocity gains may be doing so because they have enough experienced judgment to catch the mistakes. Teams where most of the engineers are still building that judgment may be accumulating comprehension debt they don’t know about yet. The acquisition test The Grammarly acquisition tests the same question at a different scale: can Superhuman’s taste survive contact with mass distribution? Grammarly has the opposite profile. They’re embedded in Google Docs, Word, email clients, browsers. They have AI capabilities built over years of NLP work. What they’ve optimized for is breadth: supporting every kind of user, every context. Superhuman has been doing the opposite, going deep on one persona and refusing to compromise. Loic frames the challenge clearly: “How do we make Superhuman not this niche, very fancy application, but something brought to the mass — while keeping our identity?” He reaches for Apple as the reference point. “Learning from Grammarly’s scale and AI capabilities, keeping our culture and taste, and bringing that to the mass — that would be really interesting.” It’s a genuinely hard problem. Making things simple is hard. Linear built something delightful for small engineering teams, then got successful, then came the bigger companies, the feature requests, the complexity. The focus that made it work is what success makes hardest to maintain. What this means for you Superhuman is hitting a wall any product with a quality bar will hit. Three things their experience suggests are worth borrowing. Make your implicit promises explicit. Superhuman’s was 100ms and determinism — they had ten years of architecture built around it before AI made determinism optional. Most teams have a similar promise they’ve never said out loud: accuracy, consistency, availability, something. Find yours before the model finds it for you, because you can’t defend a contract you haven’t named. Treat the prompt box as a UX surface, not a backend problem. The moment that surprised Loic wasn’t a model bug — it was the search box. Users phrase queries badly. Prompts are now part of the interface the user sees, and “garbage in, garbage out” is no longer an engineering excuse. Better prompts and evals matter, but if the search box returns the wrong thing, the design team owns that, not the ML team. Don’t credit the tools for what your senior engineers are doing. Superhuman’s 40% velocity gains work because the people using AI know what right looks like and catch what the model gets wrong. If your team is junior, the same playbook will produce comprehension debt instead of speed. Once you can’t tell the tool’s contribution from the engineer’s, you’re not measuring AI productivity. You’re measuring how much taste you happened to hire. Loic spent time before tech in contexts where craft standards weren’t optional and the feedback was immediate — a French Navy vessel that had to be back at sea in six weeks, no extensions. The discipline from that kind of constraint is different from the kind you get from a style guide. You learn it because you have no choice, and then it doesn’t really leave. He thinks that’s what Superhuman has built. He’s been there less than a year. Whether the taste travels at Grammarly scale is the thing he’s actually being paid to find out. High Output is brought to you by Maestro AI. Loic’s AI numbers look good — 90% daily adoption, 40% velocity gains — but he’s the first to say the metrics don’t explain themselves. They work because his senior engineers have the judgment to catch what the model gets wrong. Most engineering leaders have no way to see that layer. You can see PR counts and cycle time. You can’t see whether your engineers are using AI well or just generating output faster. Maestro’s daily briefings reveal where your team’s time and energy actually go — not just what shipped, but the quality of the judgment behind it. Visit https://getmaestro.ai to see how we help engineering leaders understand what their AI adoption numbers actually mean. This is a public episode. If you would like t

40 分钟
4月29日

Stop writing code. Start reading it.

We recorded this episode with Steve back in October of 2025, before he invented Beads and Gastown. Several of his predictions have aged well in the months since. Steve Yegge has been VP or head of engineering at four companies. He keeps stepping down on purpose. Not because things went wrong — his organizations were doing well. He’s the kind of leader whose reputation travels through a company; at Amazon, at Google, engineers lined up to transfer onto his teams. He stepped down each time because he noticed the same thing: the moment he stopped being able to code alongside his engineers, conversations started requiring translation. Once you’re in translation mode, Yegge figured out, you’re not leading anymore. You’re triangulating toward an answer you don’t fully understand. In the AI era, he thinks this problem just got much more expensive. The translation layer When Yegge handed over the engineering org at Sourcegraph — his fourth deliberate step-down in a career that spans Amazon, Google, and Grab — he gave a specific reason. “I was going through a translation layer with my engineers where they’d be like, ‘Well, you see the AI does this, and then I do that, and then the AI does that, and then there’s a gateway’ — and I’m like, what?” It wasn’t that he didn’t trust his engineers. It was that he’d lost the ability to sense-check them. And he’d noticed what happened to leaders who stayed in that position too long: “That’s a technique that non-technical leaders use. People who’ve lost their technical chops, they can still be effective leaders, but they have to be very good at triangulating, almost like a GPS on the right answer by going to different technical people and getting it.” Triangulation is better than nothing. But it’s slow, and it requires your engineers to speak in executive-friendly summaries, which means you’re always one abstraction layer removed from what’s actually happening. Yegge’s response has been consistent across his career: hand the org to someone ready to take it, go back to IC, get his hands back in the code. At Sourcegraph that meant 18 months as an individual contributor during the period when AI coding changed the most — which is exactly when he made the predictions that got Anthropic’s attention. His observation about himself is worth sitting with: his most accurate forecasts came during IC phases, not executive phases. Proximity to the work makes the signal cleaner. The “Otherwise” has arrived The case for technical proximity isn’t just philosophical anymore. Yegge has data. Andrew Glover, Director of Productivity at OpenAI, shared findings with Yegge and his co-author Gene Kim: at OpenAI itself, engineers who adopted Codex — their fully agentic CLI coding tool — are producing pull requests that, even accounting for higher rejection rates, “dwarf the contributions of the people who aren’t doing agentic coding by an order of magnitude. Ten times as many commits.” The interesting part isn’t the 10x number. It’s where the 10x is and isn’t happening. “The ones who are successful with agentic coding were the ones living in the microservices world, where there’s lots of small, well-factored bits of software. The ones who are struggling are the folks in ChatGPT Land, which is one of the world’s largest monoliths.” For a decade, engineers warned that monolithic codebases would become a liability — every warning came with an implicit otherwise at the end: refactor now, or else. But the or-else never arrived. You could run with a monolith indefinitely; deployment was easier, QA was simpler, everything just “floated off and got deployed somewhere.” The warning was technically correct but operationally optional. “You didn’t refactor it. And so what we’re faced with right now is this rat race where first of all, everyone who’s already in microservices land is just being pigs. They can use all the tokens they want. AI is working for them beautifully. The ones with monoliths — and you just point at any company and they have a monolith — it is time to break them up.” The otherwise, he says, has finally arrived. A 2025 METR study found that experienced developers were 19% slower when using AI tools on large, real-world repositories — the kind of environments where monoliths live. What Bezos actually understood about services Yegge built some of the original infrastructure that justified Amazon’s service-oriented architecture, so he has a view on why Bezos pushed it so hard in the early 2000s that most people don’t know about. It wasn’t primarily an engineering decision. “I heard this later from a colleague at Amazon. Jeff had come from D.E. Shaw on Wall Street, and D.E. Shaw is a company that buys companies and breaks them up and sells the pieces off for a huge profit. He was worried that Amazon was gonna die because of the dot-com bust. And so what he wanted to do, as a last resort, was I’m gonna bust Amazon up and sell the pieces. Which means every one of them has to have a service interface.” An exit strategy for a dying company accidentally created the architecture for a trillion-dollar one. Bezos wasn’t playing chess when everyone else was playing checkers — he was scared. The mandate came from a Wall Street M&A playbook, not a software architecture philosophy. Modular design was a byproduct of an exit strategy. The companies that invested in microservices over the past decade for code organization reasons are now discovering they got AI compatibility for free. The companies that didn’t are discovering the bill is coming due. The “Dial” Yegge has a name for the decision every engineering leader is quietly making right now: the Dial. “Every company has been given a dial that goes from zero to a hundred, and it is the number of engineers that you’re gonna fire in order to pay for the rest of them to have AI.” He’s not being glib. If a subset of your engineers can produce 10x the output with agentic tooling, and those tools require meaningful investment in compute and licensing, the question of headcount allocation is already embedded in your budget decisions. You’re turning the dial whether you’re thinking about it explicitly or not. Most companies aren’t thinking about it explicitly. Yegge thinks that’s a mistake. “Once you finally figure out how coding is done today — with Codex, with Claude Code, with Sourcegraph Amp — you switched into that world. You are playing in the big leagues and everyone else is falling behind.” The dial isn’t just about AI spending. It’s about what you believe your engineers will be doing in 18 months. Writing code is for agents Which brings Yegge to his single most concrete piece of advice: stop spending your energy on writing code. Start spending it on reading code. “You’re gonna be generating 10 to 100 times as much code as you ever did before, and you’re gonna need to read it at some point because you need to own it.” Addy Osmani, VP of Engineering at Google Chrome, calls the alternative “comprehension debt” — the accumulation of plausible-looking code you’ve approved without truly understanding, a debt that comes due when something breaks at 2am and you can’t trace why. The shift is real and immediate. Yegge has already made it. He describes his current workflow as watching his agents code — actually sitting there, following the diffs, paying attention to what they produce — rather than writing much himself. “Turn off permission checks so you don’t have to hit enter all the time and just watch it. Watch it code. Pay attention to the diffs.” The skill of reading code fast and evaluating it accurately — is this correct? Does this make sense architecturally? Would I defend this in a code review? — is what separates a developer who’s a good director of agents from one who’s just vibe coding at scale and hoping for the best. Yegge’s analogy: a musician who practices sight reading every day for 10 minutes compounds that skill faster than someone who only practices composition. The reading muscle and the writing muscle are different. For most developers, the writing muscle is heavily developed and the reading muscle isn’t, because historically writing was the job. That’s the ratio that’s inverting. What this means to you If you’re a leader who has drifted from direct technical work, the cost of that drift just increased. AI coding is changing fast enough that managing by summary will leave you making decisions you don’t understand. You don’t need to write the code — but you need to be able to read the diffs. Ask whether your codebase is AI-ready. Not “are we using AI tools?” but “can an agent work effectively in our codebase?” The answer is mostly a function of modularity. If your engineers are struggling to adopt agentic coding, the problem is probably architectural, not motivational. Have an explicit conversation with your leadership team about how AI changes the headcount math. Not as a cost-cutting exercise, but as a forcing function for getting clarity on what you believe your engineering team will look like in two years. Leaving this implicit means it gets decided by budget pressure instead. And if you’re an engineer: watch your agent work. Follow the diffs. Treat it like sight reading practice. The engineers who can evaluate agent output quickly — who own what the agent ships — will be the ones who remain indispensable as the generation overhead approaches zero. High Output is brought to you by Maestro AI. Steve Yegge talked about the “translation layer” that forms when leaders drift from the code — but there’s a deeper version of that problem right now. Every engineering leader knows AI adoption is happening. What they can’t see is whether it’s working. Token counts and PR velocity tell you who’s generating more. They

46 分钟
2月11日

Principles Over Process with Gaurav Gargate

Most engineering leaders spend enormous energy on process. Which agile framework. Which sprint cadence. Which AI coding tool to adopt. How to standardize workflows across teams. The assumption is that the right process produces the right outcomes. Gaurav Gargate has come to believe the opposite. Get the principles right, and the process can flex. Gaurav is VP of Engineering at Confluent, where he runs their Security Products and Cloud Platform powering their cloud-native data streaming ecosystem. He joined when the business was sub-$100 million; today it’s $1.1 billion. Before Confluent, he spent seven years at Box and six years at Microsoft. And before any of that, he started his career at a 15-person startup in India — “Didn’t know what we were doing, but it was fun.” Across all of those environments—from a scrappy team of 15 to a billion-dollar enterprise—one pattern has held: the organizations that thrive are rigid about their principles and flexible about everything else. The ones that struggle have it backwards. The Agile Dogma Aha Moment Gaurav has a specific story about when this clicked. Early in his career, he was a believer in classical agile—sprints, scrums, the full playbook. He thought it was the way to run engineering projects. Then he hired a leader who was completely aligned on the principles: execution pays the bills, work needs visibility and traceability, quality gates matter. But the process? Different. “Look, I don’t necessarily care about the book process, whether you call it agile or you call it scrum or something else. I would love to have the agency to ensure I manage and track my work. My engineers feel like they’re actually doing the best work of their life and there is quality gate and accountability.” Gaurav calls this a strong aha moment. “I realized I was being unnecessarily dogmatic in my approach. And actually this additional way of doing it opened up so many gates.” The lesson wasn’t that agile is bad. It was that confusing a specific process with the underlying principle is a trap. The principle—visible, accountable, high-quality execution—can be achieved multiple ways. Insisting on one process locks out people who could deliver the same outcomes through a different path. It closes doors you didn’t know existed. The constraint is real, though. “You don’t wanna have 30 teams have 30 different innovative ways.” There’s a phase where letting a thousand flowers bloom is the right move, and there’s a point where you need to converge on five or six archetypes. The art is knowing when you’re in which phase. Culture Add Over Culture Fit The same logic applies to hiring. Early in his career, Gaurav screened for culture fit—people who matched the team’s existing style. Over time, he realized this was the same mistake as the agile dogma, applied to people instead of methodology. “It’s actually a bad idea to have a very closed door—only follow this culture and nothing else.” When you hire exclusively for fit, you get a team that reinforces its own assumptions. The same instincts. The same blind spots. The culture calcifies instead of evolving. His alternative: hire for culture add. Find people who share your principles and values, but bring their own approaches and experiences. “New people join in, people grow in their roles, people from different companies and backgrounds and experiences come together—the beauty is that an evolving culture being held strong on the principles of the company actually makes it a success story.” The distinction is subtle but important: principles are fixed, culture is not. Values are the foundation. Everything built on top should be allowed to shift. Share the Why, Trust the How Gaurav applies the same framework to day-to-day management, and he sums it up bluntly: “The fundamental principle is to treat people like adults and they will behave like adults.” In practice, that means sharing context aggressively—where the business is going, how decisions get made, what the company needs right now—and then stepping back. “Enable them, let them have that agency to make those micro decisions as much as possible.” He’s not flexible about everything. Collaboration, one-team attitude, flat hierarchy, open communication—these are non-negotiable. “There are certain principles which I’m actually not ready to compromise on.” But beyond those fixed points, he lets leaders find their own style. “Ultimately what every strong individual or leader wants is to be held accountable for the outcomes and the results they deliver. And nobody likes to be micromanaged on how they get there.” Rigid on values. Flexible on methods. The same pattern, applied to management instead of hiring or methodology. The SDLC Tree Where this gets most interesting is how Gaurav applies the framework to AI adoption. His approach is different from the typical “push coding copilots” playbook—and the principle underneath it is the same one driving everything else. The principle: engineers should spend their time on high-value, creative work. The process for achieving that? That’s what changes. Gaurav looks at the entire software development lifecycle as a tree of workflows and targets the branches no engineer enjoys. “Especially as a cloud infrastructure company, there is a ton of work in operating, managing, keeping your infrastructure secure, scaling the business. There are a lot of things that AI can generally do well.” Confluent handles security patches and vulnerability management across three clouds and roughly a hundred regions. Infrastructure gets set up, tested, and torn down constantly. These are the branches AI is taking over completely—with engineers administering and managing rather than doing the work by hand. “Engineers actually love to do the innovation. They love to do the new problem solving. They love to have that ability to write new code in a way they feel is appropriate.” His conclusion follows directly: “I would love my engineers to actually have that mental space to invest their time in that high value work and let all the undifferentiated work be taken over completely by AI.” This is a fundamentally different framing from “AI makes engineers faster.” It’s not about speed. It’s about expanding what engineering teams can accomplish. “The pie is getting bigger. We gotta look at AI as a way to expand the pie of work that an engineer can do, not necessarily just what they were doing last year.” He invokes Jevons’ paradox—the idea that when something becomes more efficient, total consumption increases rather than decreases. Because it’s easier to build, more will get built. More demand, more opportunity, more roles. And his take on whether AI threatens engineering jobs is unequivocal: “Every role, every job category is going to change because of AI.” But change isn’t elimination. It’s the same transition the industry went through when cloud replaced data center ops. The people who understood first principles learned the new layer and kept going. The Fundamentals Don’t Change This is the thread that ties everything together. Principles endure. Process shifts. When Gaurav joined Microsoft, people questioned whether he was a real engineer because he didn’t write device drivers. “The previous generation did something at a lot lower level, and then the next generation is doing something at a different layer. That’s always been happening for decades.” But through those decades of transformation, the fundamentals haven’t changed. Understanding operating systems, databases, memory management—”the fundamental understanding of these core principles is what allows a great engineer to learn and pick up new things.” His advice to new graduates is the same advice he’d have given five years ago: focus on the fundamentals. “Learning new things has become easier. Building and experimenting has become a lot easier than before. If people can really spend time understanding the core fundamental building blocks of computer science, applying them to learn and build new things is actually gonna be easier going ahead.” The career lesson mirrors the organizational one. The engineers who thrive across generational shifts are the ones grounded in principles, not attached to any particular layer or tool. The organizations that scale from startup to $1.1 billion are the ones that hold their values tight and let everything else evolve. The leaders who get the most from AI are the ones who know which work matters and which work is just process. Same pattern. Every level. What This Means for You First, separate your principles from your processes. Gaurav’s agile aha moment came when he realized he was treating a specific methodology as a principle. Identify which of your team’s practices are genuinely non-negotiable values and which are just comfortable habits dressed up as requirements. Second, audit your hiring for culture fit vs. culture add. Are you screening for people who share your principles, or people who share your habits? The first builds a team that evolves. The second builds one that calcifies. Third, when deploying AI, map your SDLC and target the work nobody wants. Instead of asking “how do we code faster,” ask “which branches of our workflow tree drain engineers without engaging them?” Security patches, infrastructure provisioning, repetitive operations—these are the high-ROI AI targets that also free engineers to do the work that drew them to the field. Fourth, give context instead of instructions. If you want people to make good micro-decisions without being micromanaged, they need the same information you have. Share the why and how you measure the what—then trust them to figure out the how. The question worth asking your team: Are the things you’re rigid about actually principles—or are they processes you’ve he

31 分钟
2025/12/11

Why AI Productivity Gains Are Context-Dependent | With Raju Matta

Some engineering teams are seeing real, measurable AI productivity gains. Cursor is transforming how frontend developers build React apps. AI-assisted code review is catching bugs before deployment. Prototypes that took weeks now take days. But not everyone’s seeing the same results. Raju Matta runs engineering for Cambridge Mobile Telematics—200+ engineers, three countries, petabytes of real-time sensor data processing driver safety. Six months ago, he formed a tiger team to systematically track AI tool adoption. Status reports every two weeks. Multiple tools tested: Copilot, Cursor, PR review bots. His finding? “I’ve not seen the measurable velocity increase that people are saying out in the market—but that doesn’t mean I have totally written off LLMs yet.” This isn’t skepticism. It’s measured evaluation. And the pattern Raju’s seeing reveals something important about when AI tools deliver and when they don’t. Where AI Tools Excel As part of their evaluation, CMT ran an internal hackathon to see what AI tools could do in practice. The results told a clear story. Eighteen projects, all using AI. Teams built fully working web apps—complete with datasets—in 2-4 hours. “For that purpose, it’s great. It’s not bad at all,” he says. The pattern: AI coding tools work brilliantly for rapid prototyping with established patterns, web development using well-documented frameworks, mechanical coding tasks like boilerplate and test generation, and quick experiments to validate product ideas. These are real productivity gains. The people claiming 2x-3x aren’t exaggerating—they’re working in contexts where AI capabilities align perfectly with task requirements. When your bottleneck is writing React components or generating CRUD endpoints, AI tools deliver measurable acceleration. But CMT’s production systems are different. The Complexity Multiplier They’re processing petabytes of data from gyroscopes, accelerometers, GPS sensors, video streams. They’re distinguishing potholes from crashes, sharp corners from reckless driving. They’ve been using AI and machine learning for this work for 13 years—long before LLMs became everyone’s productivity obsession. The engineering challenge isn’t writing code. It’s architecting systems that handle sensor fusion at scale, debugging why clusters fail under load, ensuring accuracy when lives depend on your classifications, and managing tech debt across distributed teams in six countries. “You can outsource your engineering and coding with AI tools, but not your thinking,” Raju explains. In complex production systems, the thinking is where the time goes. Code generation helps, but it’s not the bottleneck. The productivity multiplier drops from 3x to “incrementally helpful” because the constraint isn’t in the typing—it’s in the architectural decisions, the system design, the understanding of how everything fits together. This doesn’t make AI tools useless. They still catch bugs in PRs. They still help prototype solutions. They still accelerate certain tasks. But the overall velocity gain is modest because code generation often isn’t the long pole. The Tiger Team Approach Here’s what makes Raju’s perspective valuable: he’s not guessing. Six months ago, CMT’s CTO gathered the engineering leaders. “How are you guys thinking of AI?” The response: treat it like a first-class citizen. They formed a dedicated tiger team. Three people producing status reports every two weeks on tool adoption, usage patterns, and measurable impact. “We have about three or four tools that we are using all the way from PR review tools to tools like Copilot, Cursor.” This is systematic evaluation, not anecdotal impressions. And the data shows results that differ from the market narrative: “My general experience is that it’s good, it’s doing its job, but I haven’t seen the measurable velocity increase as much as what people are saying out in the market.” His peer conversations confirm the pattern isn’t unique to CMT: “Even other leaders and my peers that I speak with, who are working at big tech companies, have said similar things. So it’s not uncommon.” But Raju’s not dismissing the technology. “The tools are progressing at a very fast pace. I wouldn’t be surprised if it’s another six months or a year where we get to exhaust more pieces of the tool and get more done.” That “yet” matters. He’s still tracking, still evaluating, still expecting improvement. When Mistakes Have Consequences When Raju says “we have to save people’s lives,” he’s not being dramatic. CMT’s technology directly impacts driver safety. Their telematics platform processes sensor data to detect dangerous driving, assess risk, and potentially prevent accidents. This creates a different bar for “move fast and break things.” “We are a little bit more diligent because at the end of the day, we have to save people’s lives. So for us, we’d rather spend the time beforehand than reactively trying to address it.” The stakes are high—both financially and ethically. When your technology directly impacts human safety, you can’t afford to ship fast and fix later. The constraint isn’t just technical complexity—it’s consequence of failure. “AI tools can take you north, but with the same speed, they can take you south.” In safety-critical systems, the review time, the testing time, the verification time doesn’t compress even if code generation does. You can’t ship and iterate rapidly when mistakes could harm people. The overall productivity gain shrinks accordingly because the non-coding portions of the development cycle remain unchanged. This applies beyond telematics. Financial systems. Healthcare platforms. Infrastructure control. Any domain where errors have serious consequences faces the same limitation: AI can accelerate code generation, but it can’t compress the necessary validation and testing cycles. Where AI Struggles AI’s limitations show up in unexpected places. CMT uses AI to filter thousands of resumes for each job opening. The results? “50% makes sense. And 50% don’t make sense.” This split illustrates a broader pattern. AI works brilliantly for well-defined, repeatable tasks. It struggles with judgment calls, context-dependent decisions, and situations requiring nuanced understanding. The tool saves time on mechanical filtering. But the judgment about who’s actually right for the role? Still human. And critically, the humans can immediately spot when AI recommendations miss the mark—they don’t trust it blindly. This mirrors the coding experience. AI generates boilerplate quickly. But understanding whether the generated code fits the broader system architecture, handles edge cases properly, and follows team conventions? That requires human judgment that doesn’t compress. Where This Leaves Engineering Leaders The mistake isn’t believing AI tools work—they demonstrably do in many contexts. The mistake is assuming your context will see the same gains as someone in a completely different situation. Raju’s systematic evaluation reveals the variables that matter: Your problem domain determines gains. Web apps and prototypes with established patterns can see significant productivity improvements. Complex distributed systems with unique requirements tend to see incremental improvements. The difference isn’t the tool quality—it’s how much of your bottleneck typically sits in code generation versus system design. Your constraint defines the impact. If implementing features is your rate-limiting step, AI delivers massive value. If architectural decisions and system design are your constraint, AI helps less. Most production systems fall into the second category after the initial prototyping phase. Your risk tolerance changes the math. If you can ship and iterate rapidly, AI accelerates that cycle. If mistakes have serious consequences, the review and testing time doesn’t compress proportionally. The overall velocity gain depends heavily on how much of your process can safely be accelerated. Your system complexity matters. Greenfield projects with established patterns see huge gains. Legacy systems with unique constraints and interconnected dependencies see modest gains. The complexity of your codebase directly impacts how useful AI-generated code becomes. The Honest Assessment Raju isn’t claiming AI tools are overhyped. He’s providing the nuanced reality: they work extremely well for specific contexts and deliver modest improvements in others. His 6-month tiger team experiment with dedicated tracking hasn’t found a productivity revolution. They’ve found incremental gains with clear constraints. That’s the honest number engineering leaders need for planning. “LLMs can help us experiment and prototype features faster. They can help developers catch mistakes in our pull requests. They can help us find answers faster, and we are constantly evaluating,” he explains. “But I’ve not seen the impact that people are saying out there.” This doesn’t mean ignore AI tools. It means understand your context, measure systematically, and set realistic expectations. For rapid prototyping and web development? The 2-3x gains are real. For complex production systems with safety requirements? The gains exist but are much more modest. Both can be true simultaneously—the difference is context. What This Means for You First, measure systematically rather than relying on anecdotes. Set up dedicated tracking like Raju’s tiger team—assign ownership, establish regular reporting, and gather actual usage data. The hype cycle around AI tools means everyone has an opinion, but data reveals what actually works in your specific context. Second, understand where your bottleneck actually sits. If architectural decisions and system design consume most of your time, AI tools wi

36 分钟
2025/11/11

Building AI Products Under HIPAA | With Muhammad Atif

When you’ve bootstrapped an engineering org from 2 people to 500, working with Fortune 500 clients like Intel and Samsung, you learn something most AI builders miss: the best technology doesn’t always ship. Muhammad Atif, President and CTO of PureLogics, recently deployed an on-prem AI model that hits 70% of the accuracy of their original cloud-based prototype. That 30% accuracy gap represents the tradeoff required for HIPAA compliance. The cloud-based prototype couldn’t be deployed—patient data can’t touch external APIs under their client’s compliance requirements. This is the reality healthcare engineering leaders face: you’re building for the best model that meets your compliance requirements, not just the highest-performing model in isolation. Since co-founding Pure Logics in 2007, Muhammad has grown it from 2 people coding in a room to a 500-person global engineering firm. They build on-prem models that achieve 60-70% of cloud-based prototype accuracy while meeting the strict data security requirements that healthcare demands. The Compliance Wall Everyone Hits Muhammad’s team was prototyping an AI feature using OpenAI’s API. Fast iteration, impressive results. Then the client’s compliance team saw the architecture diagram. “When the customer said they need to have on-prem AI, we changed the entire paradigm,” Muhammad explains. The entire approach had to be rethought. The paradigm shift required rethinking four critical areas. First, hardware specification: what GPU specifications, how much RAM, what storage architecture. These decisions determine whether your model trains in days or weeks, whether inference is real-time or batch. Second, model selection: which open source model fits your domain? Healthcare has different requirements than generic NLP—you need models that work for medical terminology, clinical workflows, provider documentation patterns. Third, and most challenging, training data acquisition. You need millions of records to train effectively, but healthcare data is protected. “We need to have millions of records of data to train that model to bring up to that accuracy,” Muhammad explains. Where do you get training data that doesn’t violate HIPAA? Fourth, compliance layers: NIST AI RMF compliance, HSS trustworthy AI practices, OSAP LLM practices, HIPAA audit trails. “We need to make sure that we have all these security and safety guardrails implemented, especially when dealing with live patient data,” Muhammad says. “We have deployed an onsite model. It’s almost 70% accurate compared to the one we used to have in the initial POC,” Muhammad says. That 30% accuracy gap represents the tradeoff for meeting compliance requirements. The on-prem model that meets HIPAA requirements ships. The cloud-based prototype doesn’t. This is the reality healthcare leaders face. The question isn’t “what’s the highest-performing model?” It’s “what’s the best model we can deploy within our regulatory constraints?” What Compliance Expertise Enables Pure Logics’ on-prem AI capabilities unlock healthcare applications that wouldn’t be possible without deep compliance knowledge. Take their diabetic foot monitoring project. Diabetic patients often can’t feel temperature changes in their feet—a dangerous condition that can lead to undetected injuries and infections. Pure Logics is building algorithms that analyze thermal images of patients’ feet to detect temperature anomalies, giving providers early warning signs before problems escalate. Or their women’s health platform, which helps women track and manage their health throughout hormonal and menstrual cycles. These aren’t trivial consumer apps—they’re handling protected health information that requires the full compliance framework Pure Logics has built. “We have also been working with few startups who are working on like diagnostics and disease detection kind of algorithms, and we are really proud that we are going to be part of those teams,” Muhammad says. This is the payoff for solving the hard problems. Teams that can’t navigate HIPAA constraints can’t build these applications. Teams that can navigate HIPAA but can’t achieve reasonable AI model performance on-prem can’t make them useful. Pure Logics’ expertise in both areas—compliance frameworks and on-prem AI deployment—creates the foundation for meaningful healthcare innovation. The Hidden Cost of Moving Fast Muhammad sees a pattern with technical debt. “Tech debt is mostly built due to business pressure—’keep delivering, I need this thing or that thing’—or it can be due to poor planning or prioritization.” Add AI to the mix, and the pressure intensifies. Your CEO reads about companies shipping 4x faster with AI. Your board asks why you’re not seeing similar gains. Your competitors claim massive productivity jumps. But in healthcare, you can’t just vibe-code a system into production. “You can keep building things, but especially with AI—we are generating code through AI as well—we wanted to make sure we’re not building a product that reaches a certain level where we can’t add any further features, or it’s not scalable.” Pure Logics’ solution: quarterly audits. Load testing. Security reviews. Code quality checks. Database design reviews. Access audits—who has credentials to which systems. And version upgrade planning—if you’re on Python version X but version Z is stable, what’s the migration path? This sounds expensive. It is. But Muhammad has watched what happens without it: systems that need complete rebuilds after two years. Technical debt that makes simple features take weeks. Security vulnerabilities that surface during compliance audits. The paradox: moving slower with proper guardrails lets you move faster long-term. The Twenty-Year View Muhammad started Pure Logics in 2007 with one other person. They worked 12-14 hour days, went home at midnight, worked weekends. “The initial four to five months were quite challenging.” By 2008, they landed Fortune 500 clients—Live Nation, where they managed web presence for Maria Carey and Taylor Swift. By now, they have 500 people across multiple countries. This growth path offers a different model than the typical startup story. No VC funding. No blitzscaling. Just steady, sustainable growth by solving real problems for enterprise clients. What does this teach about AI adoption? “We need to have people who are not just coders, but they are also thinking from an end-to-end problem solving mindset. And they are great at other areas like soft skills—communication, explaining and connecting with people and driving to a solution.” The companies that win with AI won’t be the ones that generate the most code the fastest. They’ll be the ones that understand the complete problem: technical constraints, compliance requirements, security frameworks, and human workflows. What This Means For You If you’re building AI products in regulated industries, Muhammad’s framework offers a practical path: First, map your constraints before you optimize. Don’t start with “what’s the best model?” Start with “what meets our compliance requirements?” An on-prem model that achieves 70% of your prototype’s accuracy but ships is more valuable than a cloud-based prototype that can’t be deployed. Second, build security guardrails into your development workflow. Muhammad’s team achieves 20-25% productivity gains from AI coding tools while maintaining code quality through static analysis, peer review, and technical debt checks. Third, audit regularly, not reactively. Quarterly reviews of code quality, security, database design, and access controls catch problems when they’re manageable, not when they’ve compounded into system-wide issues. Fourth, choose tools for integration, not hype. The best AI tool isn’t the one with the most impressive demos. It’s the one that integrates with your existing quality processes and workflow. Fifth, remember that constraints can become advantages. Pure Logics’ on-prem expertise differentiates them. Companies that need HIPAA-compliant AI need teams that understand both AI and compliance frameworks. Your constraints are your moat. The critical question: are you building AI products that work within your industry’s reality, or are you trying to force approaches that only work for unrestricted consumer apps? About PureLogics: PureLogics is a global engineering firm specializing in healthcare software development with deep expertise in HIPAA compliance and on-prem AI deployment. Founded in 2007, they’ve grown from 2 engineers to a 500-person team serving Fortune 500 clients including Intel, Samsung, and Live Nation. The company focuses on building compliant AI solutions for healthcare organizations, from e-prescription systems and EMR integrations to on-prem AI models for sensitive patient data. Their expertise in both AI implementation and healthcare compliance frameworks enables them to build applications that meet strict regulatory requirements while delivering meaningful clinical outcomes. Learn more at purelogics.com. About Maestro AI: High Output is broght to you by Maestro AI. Maestro is an engineering visibility platform that helps leaders make data-driven decisions backed by narrative context. While most dashboards offer surface-level metrics, Maestro analyzes your team’s actual code, PRs, tickets, and communications to reveal not just what’s happening, but why. The platform automatically synthesizes this activity into real-time feeds for every project, team, and individual—replacing subjective status meetings with objective truth. This allows you to identify blockers before they impact deadlines, de-risk key initiatives, and measure the true impact of tools like AI on your organization. Visit https://getmaestro.ai to see how we help engineering leaders build more predict

35 分钟
2025/10/29

Stop starving your GPUs—with Jaikumar Ganesh

When you provision thousands of GPU clusters weekly for Apple, Spotify, OpenAI, Uber, Runway, and Cursor, you see something nobody else does. From the infrastructure layer, the patterns are unmistakable. Different companies, different products, different use cases—but they’re all hitting the same bottlenecks. Jaikumar Ganesh would know. As Head of Engineering at Anyscale, he runs Ray—the distributed compute engine that powers production AI at scale. Ray sits in the stack between Kubernetes and the AI workloads, orchestrating the compute that makes everything run. When you’re that deep in everyone’s infrastructure, you see the convergence before anyone else does. And what he sees right now? Companies are throwing money at GPU shortages while their GPUs sit idle half the time, waiting for CPUs to finish resizing images. It’s not a GPU problem. It’s a coordination problem. And it’s just one of several patterns everyone’s hitting—patterns most don’t even realize are shared. 🎧 Subscribe and listen now → The bottlenecks everyone’s hitting Here’s what actually happens in production. You need to process multimodal data—audio, video, robotics sensors, Zoom recordings. Reading images and resizing them? CPUs. LLM inference? GPUs. Writing results back? CPUs again. These are staged pipelines. “In a lot of legacy systems, you read an image and you have to wait till all the images are read till you activate the GPU,” JK explains. “Now what happens is there’s a GPU shortage, but you have a GPU sitting there idle, and so your GPU utilization is low and your finance team is like, ‘Hey, you’re spending so much money.’” This is the shift that sounds obvious but isn’t: we’ve moved from a CPU-centric world to a heterogeneous compute world—CPU plus GPU. Most frameworks were built for one or the other. Very few handle the handoff well. Ray Data solves this by handling the transitions without writing to disk at every stage. Different pipeline stages execute on the right resource, and nothing sits waiting. The companies that figure this out have massive cost advantages. The ones that don’t keep throwing money at GPU clusters that spend half their time idle. But here’s what’s remarkable: when you’re provisioning clusters at this scale, you see more than just the GPU coordination problem. You see the entire stack converging. Pull up any major AI company’s infrastructure and you’ll see the same architecture: At the top: AI workloads (data processing, pre-training, post-training, model serving) Below that: Training frameworks (PyTorch, JAX) Then: LLM-specific engines (VLLM for serving, DeepSpeed and FSDP for parallelism) Distributed compute: Ray Container orchestration: Kubernetes At the bottom: Cloud providers and GPU providers “Across all the companies we have worked with, in open source as well as those who are Anyscale customers, this pattern is consistent,” JK explains. Here are the four patterns driving convergence: * Heterogeneous compute coordination: CPU-centric thinking doesn’t work anymore. You need CPU and GPU working together efficiently. Most frameworks handle one or the other well, but the handoff between them is where money gets burned. Multimodal data processing—audio, video, sensor data—exposes this immediately. * Post-training infrastructure complexity: Everyone thinks pre-training is the hard part. Wrong. Post-training is where the real infrastructure complexity lives, and it’s where customization happens. Eight of the ten most popular open source post-training libraries are built on Ray. Why? Because you need inference stages mixed with training stages, all within the same workload. Someone has to orchestrate where each stage runs, whether to transfer model weights, how to handle the compute efficiently. * Multimodal data pipeline bottlenecks: It’s not a model problem—it’s an engineering problem. The bottleneck isn’t which model handles video best. It’s moving data between CPUs and GPUs efficiently without writing to disk at every stage. Fix the pipeline, not the model selection. * Domain-specific approaches returning: While everyone obsessed over LLMs, reinforcement learning quietly came back in gaming and simulation. Riot Games—one of Ray’s largest customers—uses RL to power the models behind their characters. When you have a physical world or game environment to model, RL still wins. Different problems need different approaches. They all need the same underlying infrastructure to scale. The interesting part isn’t that everyone uses similar tools. It’s that the bottlenecks are identical. They’re all hitting the same walls—and most of them think they’re the only ones. Where the real moat lives Pre-training gets the headlines and the hype. Post-training is where the actual differentiation happens. Think about it: pre-training is increasingly commoditized. You can use foundation models from OpenAI, Anthropic, or Meta. But post-training—fine-tuning models for your specific use case, your specific data, your specific product needs—that’s where you build something defensible. And post-training infrastructure is brutally complex. You need inference stages mixed with training stages. You’re constantly moving between different compute resources. You’re orchestrating model weight transfers. You’re debugging why your pipeline breaks at 2am. This is why eight of the ten most popular open source post-training libraries are built on Ray. Anthropic Claude uses them. Cursor’s agents use them. Not because Ray is magic, but because orchestrating this complexity requires infrastructure built specifically for heterogeneous compute. “They all use post-training libraries, and someone has to orchestrate and handle compute efficiently,” JK explains. “You can have inference stage, your training stage within the post-training libraries itself, and there’ll be a lot of complexity around where each one of these stages runs.” Your competitors are using the same foundation models. They’re reading the same papers. The differentiation isn’t in the base technology—it’s in how efficiently you can customize it for your needs. That’s an infrastructure problem, not a model problem. The distance that creates the view JK’s ability to see these patterns comes from somewhere specific. He grew up in a classroom with one other student—not a small private school, but a remote village in India where it took 10 days to receive a telegram about his grandmother’s death. The world had phones. His village didn’t. Years later, visiting relatives in the city, he saw a two-line pager clipped to his uncle’s belt. “I was like, whoa, what the hell is this?” he recalls. He asked which company made it. Motorola. “That’s where I want to be.” That distance from infrastructure—then getting close to it—shapes how you think about abstraction layers. He joined Motorola during its decline, then landed on the early Android team at Google when they were 10-15 people figuring out what they were building. Then co-started Uber’s AI group. Then Anyscale. JK has been in this position before: seeing the platform-level patterns emerge while individual companies think they’re solving unique problems. The moment that crystallized it happened on a bus in Panama in 2012. A local spent the entire ride on WhatsApp. JK asked what he was doing for so long. “He kind of gave me this look saying, dude, what a stupid question,” JK remembers. “He just said that this has allowed me to keep in touch with my family in remote village.” From Android enabling that Panama bus connection to Ray enabling AI at scale—JK’s entire career has been about building the infrastructure layer that lets others build. And that vantage point is what lets him see the convergence happening now. The honest take on AI coding agents Anyscale is targeting 30% productivity gains from AI coding tools. Not 10x. Not zero. Thirty percent. That’s the honest number—and it’s harder to achieve than you’d think. JK tried an experiment a year ago: fed Ray code to an AI agent without reviewing it, just to see what would happen. His cluster crashed. He spent the next two hours debugging why. The agent had written code that consumed too much memory, causing out-of-memory errors. This is someone who runs production AI infrastructure at massive scale. Even he can’t blindly trust AI-generated code. What works instead: spec-driven development. Detailed markdown files. Clear design documents. Tell the agent exactly what you want—function length limits, testing requirements, how to use specific libraries. Then review what it produces. “You cannot just vibe-code a system into production,” he explains. “I’ve seen engineers who say, ‘Oh, I just used agents for this,’ and they’ll look at the crap it has produced and it’s caused me more problems.” But here’s where it gets interesting. One of Anyscale’s senior engineers was firmly in the anti-LLM camp. Then he worked on a complicated problem with a distinguished member of the staff who used agents effectively. Two days later, JK got a Slack message: “Couldn’t have produced this code in three days. It would’ve taken me two weeks.” The pattern? Senior engineers who’ve been through previous platform shifts (internet, mobile, cloud) adapt faster. They recognize tectonic change when they see it. Junior engineers coming in fresh adapt quickly too—they haven’t developed rigid workflows yet. It’s the engineers in the middle who struggle most. The critical part: humans stay in the loop. “We do not want to be completely dependent on agents that we lose the critical thinking part,” JK emphasizes. “You need to understand your code at a deep level. You need to understand your design at a deep level and then let agents do their thing.” That’s the realistic target: 30% productiv

41 分钟

查看全部 17 集

5

共 5 分

2 个评分

A window into tomorrow's software organizations through conversations with visionary engineering leaders who are redefining the profession. Join us to explore how leadership will evolve, what makes high-performing teams tick, and where the true value of engineering lies as technology and human creativity continue to intersect in unexpected ways. maestroai.substack.com

创作者

Maestro AI
活跃年份

2025年 - 2026年
单集

17
分级

儿童不宜
版权

© Maestro AI
节目网站

High Output: The Future of Engineering