Digital Thoughts: AI From the Trenches

Pawel Jozefiak

What actually happens when an e-commerce manager builds AI agents, tests every model, and lets them run night shifts. No hype, just results. thoughts.jock.pl

Episodes

  1. I Built a Job Finder Agent for My Friends. I Just Showed It on Live.

    May 29 ·  Bonus

    I Built a Job Finder Agent for My Friends. I Just Showed It on Live.

    Thank you Karo (Product with Attitude), Leo Ram, Rajendran, Krithika, and many others for tuning into my live video with Wyndo and Dheeraj Sharma! Last week I sat down with Wyndo from The AI Maker for a Substack Live. The topic was one specific agent I have been running quietly for months: a job finder I built for the people closest to me. Wyndo just published his writeup of the show, and it is generous and clear, and I am grateful for the room he gave me to actually show the thing. This is the other side of that conversation. The builder’s side. What is inside the agent, why I built it the way I did, what surprised me about running it for real humans, and the smallest version someone could build this weekend. Let me start with why this one matters more to me than most things I have shipped. Why I Built It Someone close to me was looking for a new role and doing what most people do. Scanning LinkedIn alerts. Filtering through noise. Fighting the algorithm’s idea of what they should want. The signal-to-noise was awful. Five alerts a day, maybe one was worth opening, and that one was already three days stale by the time it arrived. The job market is not a problem of supply. There are roles. The problem is alignment, and alignment is exactly the kind of work an agent should be doing while you sleep. I gave myself one rule before writing a line of code. The agent does not apply for anyone. The human judgment, the cover letter, the decision to spend an hour on a specific company, that stays with the person. The agent’s job is upstream of all of that. It is the filter and the explanation. The person opens an email in the morning, reads three roles with a clear reason for each, and either replies or applies. That is the whole loop. Like, the smallest useful version of this idea is already a real product if you build it carefully. The Loop, In One Paragraph Every morning at 6:15, the agent wakes up. It loads the person’s profile, picks which job sources are due based on a tier and a cooldown, browses each one, scores everything it finds against the profile, writes an email with the best three to five roles and a one-line reason for each, and sends it. Throughout the day, the person can reply to the email. A short “not this kind of role, too senior” or “more like this one please.” The next morning’s search uses that reply. That is the agent. Profile in, daily email out, reply-driven refinement. Everything else is plumbing. What Is In The Profile The agent is only as good as the profile it reads. This is the part I underestimated for the first two weeks, and it is the part that did the most work once I took it seriously. For each person I run the agent for, there is a single source-of-truth file that looks something like this: * Current situation. What they do now, employment status, when they can start. Two or three lines, not a CV. * Target lanes. Not one role. Three or four. A strong candidate fits more than one pattern, and the agent should respect that. For a creative leader the lanes might be Head of Creative, in-house Creative Director, Brand Creative Lead. For an e-commerce operator the lanes might be VP e-commerce, Digital COO, Head of Digital Transformation. Lanes catch reality. * Geography rules. Hard rules. Remote EU first, then named hub cities, then the rest. Anything outside the allowed list gets rejected before it even scores. * Salary floor and target. A floor and a target in one currency. Below the floor, the role is rejected unless the company is on a tiny aspirational list. Without a floor, the agent will dribble out underpaid roles forever. * Dealbreakers. Concrete things, not vibes. No alcohol, no tobacco, no gambling. No must-have language other than the ones the person actually speaks. Industries that have been tried and disliked. * Positive examples. Three to five roles the person would actually want. Real job posts, pasted in. The agent uses these as reference points when it scores. Concrete examples beat any prompt I could write. The profile is a markdown file. That is the entire format. The agent reads it the way Wiz reads its own CLAUDE.md when it wakes up, with the same discipline: top of context, every run, before any decision. The Search, In Tiers Most people building a job agent for the first time make the same mistake I made. They try to search everything every day. That gets you rate-limited fast, costs money, and produces noise. I run sources in three tiers with a per-source cooldown. LinkedIn is tier one, every day, because it is where the volume lives. Tier two is the major aggregators that have decent role pages, on rotation, two or three per day, with a cooldown so the agent does not pound the same site. Tier three is the company career pages the person actually cares about, listed in their profile. Those run on a longer cooldown because their pages do not change as often. Three different tools handle the actual fetching: * Firecrawl for clean job-page extraction. It returns markdown, which the agent reads directly. * Web search through Claude for broad first-pass discovery. * Playwright for the sites that need a real browser, which mostly means LinkedIn behind an authenticated session. None of these is magic on its own. The reason it works is that the agent picks the right tool for the right source, and the cooldowns keep any one of them from becoming the bottleneck. I went deeper on the harness side of this in my post on agent coding harnesses if you want the broader picture. The Scoring Every candidate role gets scored on a 0-10 against the profile. The score is not a slider the agent moves around. It is a small set of rules the agent applies in the same order each time. * Lane match. Does the title plus the JD fit one of the lanes in the profile? If no, the role is out. * Geography. Is it in an allowed location, with the right remote rules? If no, the role is out. * Language. Are the must-have languages on the whitelist? If no, the role is out. * Salary. Floor first, then target. Below floor and not on the aspirational list, the role is rejected before scoring. * Fit reasoning. Why this role for this person, in one sentence. The agent has to write the sentence to keep the score. * Concerns. What might be a mismatch. Also one sentence. If the agent cannot name a real concern, the role is probably overhyped. Anything that scores six or above makes the morning email. Anything below six does not. The number is not a serving suggestion, it is a hard gate. I would rather get two roles tomorrow than five mediocre ones. The Email This is the part the friend sees, so this is the part I obsess over. The morning email is three to five roles. For each role: title, company, link, a one-sentence reason it fits, a one-sentence concern, salary if listed, location, and a suggested next action. That is the whole thing. No promotional framing, no agent personality, no apology when the day is quiet. If the day is quiet, the email does not arrive. The agent logs a quiet day and goes back to sleep. I learned this one the hard way. An empty digest is worse than no digest, because it teaches the person to stop opening the email. The right move when there is nothing to send is to send nothing. I write more about how I built the email layer in the post on knowing my agents are actually working. The short version: the email itself is the user interface, so it gets the same care as a product. The Reply Loop, The Part I Did Not Expect I thought the search and the scoring would be the interesting parts. They were not. The reply loop was the interesting part. People do not reply to job alerts. People do reply to a personal email that asks them a real question. So the email closes with a short note: not relevant? Reply with one line and tomorrow’s search adjusts. No form. No button. Just reply. When the reply arrives, the agent does three things. It classifies the feedback. It applies the change. It updates the profile. If someone replies “this role at a gambling company, never,” the agent does not just skip that role. It adds gambling to their dealbreakers, permanently. If a reply says “this is too senior, I want builder-track not exec,” the agent shifts the lane weights. If a reply says “more like this one please,” the agent saves that role as a positive example, and the next day’s search leans that direction. This is the same architecture I described in the post about my self-improving agent: corrections in, classified, graduated into permanent rules when they stop being a one-off. The job finder is the cleanest example of that loop I have built. The feedback is short, the surface is one email, and the change is visible the next morning. People notice when their agent listens. What I Showed On The Live Wyndo asked me to do three things on the stream, and I think this is the right order if you ever demo an agent of your own. First, the email. Before anything else. He had me open a sanitized morning brief and read it out. The audience does not need to see the code yet, they need to see the output. Five roles, a reason for each, a concern for each. That is what the friend opens. If you cannot show that first, the rest of the demo will not land. Second, the profile. I showed the markdown file with the lanes and the dealbreakers and the positive examples. This is where people get the idea. Most of the audience comments came in during this part. Oh, so the agent uses the JD against the profile? Yes. The profile is the thing. Third, the reply. I showed one fake feedback message and walked through what the agent did with it. Classified, applied, saved. The audience watched the profile file change. That was the moment of the show, the part I think Wyndo wrote about as “the part where the agent learns from you.” I did not start in the terminal. I did not show subagents or scheduling or memory. None of that mattered for the demo. The agent loop is profile, search, score, em

    1 hr
  2. The Compounding Agent

    Apr 11

    The Compounding Agent

    Episode four. What happens when hobbyist AI starts growing up into production AI, and how the lessons compound if you pay attention. First, a rare look inside the pros’ toolbox. Claude Code’s source got leaked. Instead of treating it like drama, I treated it like a free masterclass. Tool permission gating, risk classification, blocking budgets, memory management, multi-agent coordination, feature flags like autoDream and KAIROS. Most people building agents today are reinventing patterns that professional teams already solved. You learn more from reading one real production codebase than from ten tutorial posts. Then, applying those lessons to my own stack. My $599 Mac Mini M4 runs a 35 billion parameter model at 17.3 tokens per second. That alone is surprising. Then I swapped the brain of the classification tier to Gemma 4, and classification went from 8.5 seconds down to 1.9 seconds. A 4.4x speedup. I also disabled chain-of-thought on simple classification calls and got 30x faster results with identical accuracy. Production AI isn’t one giant model doing everything. It’s the right model for the right job, and most jobs don’t need the biggest one. Finally, handing the wisdom forward. After six months of running this thing daily, I wrote a beginner’s guide to building your first agent. Folder structure is the architecture. The nine common mistakes people make early. Model routing across Haiku, Sonnet, and Opus tiers. Progressive permissions. The context window trap. Overnight automation is where the real leverage lives. Not a hype piece. A map for the person walking in the door behind me. The thread: compounding expertise. Study how the pros build. Optimize your own stack with those patterns. Teach the next person who walks in. The gap between hobbyist AI and production AI is closing, and the fastest way to cross it is learning from real systems instead of tutorials. Posts discussed in this episode: - Claude Code’s Source Got Leaked. Here’s What’s Actually Worth Learning (https://thoughts.jock.pl/p/claude-code-source-leak-what-to-learn-ai-agents-2026) - My $600 Mac Mini Runs a 35B AI Model. Yesterday I Swapped Its Brain (https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026) - How to Build Your First AI Agent (Basics) (https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026) Get full access to Digital Thoughts at thoughts.jock.pl/subscribe

    26 min
  3. When AI Meets Reality

    Mar 23

    When AI Meets Reality

    Episode three. What happens when AI stops being theoretical and starts touching real money and real hardware. First, a failure worth studying. I told my AI agent to build one useful app every day. It produced unit converters, color pickers, base64 encoders. Statistically average, completely forgettable. Nobody cared. Then I changed one word: “experiments” instead of “apps,” with specific creative direction. One of those experiments hit #3 on Hacker News. The lesson: AI execution costs dropped to near zero. The only competitive advantage left is human taste and vision. Then, applying that lesson to revenue. I directed my agent to package what I know into digital products and sell them. $355 in three weeks against $400/month in AI costs. Near break-even on month one. The real story is the “execution gap”: most experts never monetize their knowledge because packaging, marketing, and distribution are hard. The agent handles all of that. What happens when that gap closes for everyone? Finally, where this is heading. I ran Qwen 3.5, a 9 billion parameter model, on my MacBook and iPhone. No cloud. No subscription. No internet. The gap between local and cloud AI is closing fast. If you can run capable AI on hardware you already own, the barrier to entry for everything above collapses. The thread: AI needs human direction to create value. The tools to provide that direction are becoming radically cheaper. The bottleneck isn’t technology anymore. It’s having something worth saying. Posts discussed in this episode: - I Told My AI to Build Apps Every Day. The Results Were Painfully Boring. Here’s the Lesson (https://thoughts.jock.pl/p/directed-ai-experiments-vibe-business) - My AI Costs $400/Month. This Month It Made $355 (https://thoughts.jock.pl/p/project-money-ai-agent-value-creation-experiment-2026) - I Ran Local AI on My MacBook and iPhone. The Gap Is Closing Fast (https://thoughts.jock.pl/p/local-llm-macbook-iphone-qwen-experiment) Get full access to Digital Thoughts at thoughts.jock.pl/subscribe

    19 min
  4. How I Taught My AI Agent to Think

    Mar 17

    How I Taught My AI Agent to Think

    Episode two. Three stages of giving an AI agent real independence. First, a counterintuitive discovery: more instructions made my agent worse. I went from 471 lines of rules down to 61 by replacing abstract adjectives with concrete behaviors. “Principle beats rule” turned out to be the single biggest performance unlock. Then, teaching it to learn. Error logging, structured lessons, and an identity layer that knows who I am. But MIT research shows personalized profiles increase sycophancy by 33-45%. The AI starts telling you what you want to hear instead of catching your mistakes. True autonomy requires friction, not agreement. Finally, giving it a physical home. Migrating to a dedicated Mac Mini broke everything: no display meant no UI automation (solved with a virtual 5K screen hack), hundreds of hard-coded paths pointed to folders that didn’t exist, and the agent burned through API credits stuck in silent error loops. The fix: full root authority inside a contained blast radius. If the AI deletes the entire drive, it literally doesn’t matter. The payoff: a self-improving agent running 24/7 on its own machine, with its own iCloud account, reachable via iMessage like a coworker. Posts discussed in this episode: - I Built a Personal AI Agent Called Wiz (https://thoughts.jock.pl/p/how-i-structure-claude-md-after-1000-sessions) - My AI Agent Learns From Its Own Mistakes. Here’s the Architecture (https://thoughts.jock.pl/p/wiz-ai-agent-self-improvement-architecture) - I Gave My AI Agent Its Own Computer. Here’s Every Lesson From 72 Hours of Migration (https://thoughts.jock.pl/p/mac-mini-ai-agent-migration-headless-2026) Get full access to Digital Thoughts at thoughts.jock.pl/subscribe

    22 min
  5. Building an AI Agent That Runs Night Shifts

    Feb 18

    Building an AI Agent That Runs Night Shifts

    First episode. I built an AI agent called Wiz that runs night shifts, deploys apps, and once changed my password twice in one night. This episode covers why I built it, what broke, and why a cheaper model made it better. Based on posts from Digital Thoughts - subscribe at thoughts.jock.pl for the full story. I’ve been writing Digital Thoughts for a while now, and some of you told me you’d rather listen than read. Fair enough. So I’m experimenting with a podcast version - AI-generated conversations based on my posts. Not me reading articles out loud, but two AI hosts digging into the ideas, arguing about them, and finding connections I didn’t even see when writing. This first episode covers the full arc of building Wiz - my personal AI agent. From “why would you build your own instead of using ChatGPT?” to the moment it started writing its own skills without asking. We get into the failures: tasks that looped infinitely, passwords changed twice in one night, and the counterintuitive discovery that downgrading to a cheaper model made the whole thing better. If you’re hearing this on Spotify or Apple Podcasts - every episode is based on posts from Digital Thoughts , where I write about using AI daily as a practitioner, not a pundit. Subscribe there if you want the full picture. Posts discussed in this episode: - I Built a Personal AI Agent Called Wiz - Why I Built My Own AI Agent Instead of Using OpenClaw - My AI Agent Runs Night Shifts, Builds Apps & Earns Revenue - Why I Switched My AI Agent from Opus to Haiku Get full access to Digital Thoughts at thoughts.jock.pl/subscribe

    15 min

About

What actually happens when an e-commerce manager builds AI agents, tests every model, and lets them run night shifts. No hype, just results. thoughts.jock.pl