Wrong reasoning → bad prompt or model. Wrong action → tool wired wrong. Wrong observation → context engineering issue. Three steps. Same loop in every framework.
| Layer | What it is | Today's pick | If you outgrow it |
|---|---|---|---|
| Model | The reasoning engine. Decides what to do, drafts output, decides when to stop. | Sonnet 4.6 + Haiku 4.5 | Qwen 3.6 (local) · DeepSeek (frontier OSS) |
| Runtime | Manages the loop, state, tool calls, retries, audit log. | n8n (visual, OSS) | LangGraph (code-first) · Temporal (durable) |
| Tools | External capabilities the model can invoke. | Claude web_search · Firecrawl · Gmail · Sheets | + MCP servers as you grow (HubSpot · Linear · Stripe) |
For the next 25 minutes we're going to make exactly these choices for our discovery agent.
| Model | In/M | Out/M | Released | Use for |
|---|---|---|---|---|
| Claude Opus 4.7 | $5 | $25 | Apr 16 | Hard edges only |
| Claude Sonnet 4.6 | $3 | $15 | — | Drafting · nuance · voice match |
| Claude Haiku 4.5 | $1 | $5 | Oct 2025 | Routing · classify · extract |
| GPT-5.5 | $5 | $30 | Apr 23 | Doubled price · OAI-locked |
| Gemini 3.1 Pro / Flash | $2 / $0.50 | $12 / $3 | — | If on GCP / budget tier |
| Qwen 3.6-27B (local) | ~$0 | ~$0 | Apr 2026 | Classification · claims Sonnet parity |
| DeepSeek V3.2 | $0.14 | $1.10 | — | Cheapest frontier reasoning |
Today's stack: Sonnet 4.6 + Haiku 4.5. Two models, one cascade — cost reasoning in Part 5.
4 engineers · product · ops · marketing · support
$40K/mo loaded
1 founder + agent stack · same surface area
$300/mo total
Pieter Levels: $3M+ ARR, zero employees · Ben Broca / Polsia: $1M+ ARR, 1,100 client cos solo. The wall is around $50–150K MRR. That's a runway, not a problem.
Stop asking AI questions.
Start giving AI tasks.
1. Define a typed JSON schema 2. Give 5 in-context examples 3. Set a human checkpoint 4. Hit Run.
From the description
The 3-axis matrix · 90-sec worksheet · Green/Yellow/Red for your tasks.
From the description
The Founder's Discovery Engine · clonable n8n JSON · MIT licensed on GitHub.
HITL = Human-in-the-Loop
The 4-stage trust ladder · voice.md pattern · 10× bandwidth math.
Plus — 12 transferable patterns & 7 other agents on the same stack.
How often does this task come up?
High — daily/weekly. Low — occasional. Don't automate low-volume; build cost will exceed time saved.
Are the steps the same every time?
Deterministic — rules apply. Ambiguous — needs judgment. Ambiguous tasks need a human checkpoint.
If the agent's wrong, what breaks?
Low-stakes — recoverable. Irreversible — not. Irreversible tasks should never be fully automated.
GREEN = high vol × deterministic × reversible · YELLOW = high vol × ambiguous × reversible · HITL · RED = anything irreversible · don't
Today's demo lives in YELLOW — cold outreach with a human checkpoint.
| Task you do every week | Volume H/L | Determinism D/A | Reversibility Low/High | Color G/Y/R |
|---|---|---|---|---|
Pick one Green from your list. That's your first agent.
Stack: n8n.cloud trial · Groq Llama 4 Scout · HN Algolia + Reddit JSON · Jina Reader. Cost: $0 for 14 days. Good enough to evaluate; voice match below Sonnet.
Stack: n8n.cloud · Anthropic Sonnet 4.6 + Haiku 4.5 · Firecrawl. Cost: ~$0.21/run · ~$6/mo at default frequency. Best voice match — what the live demo just used.
Walk through 17 steps in TUTORIAL.md. ~45 minutes self-paced. You learn the architecture deeply. Best for technical founders who want to own the stack.
Paste a prompt into Claude for Chrome. The browser agent does signup, OAuth, sheet creation, workflow import. ~10 min hands-off. Best for non-technical visionaries.
You'll get the slides · the speaker notes · the workflow JSONs · the autopilot prompts · the tutorials. Don't memorize anything. Reference. Pick your track this weekend.
Cost per run: ~$0.21 · Monthly: ~$6 paid · ~$8 max-DIY · Build today: 25 min
ICP + Drive voice.md · config-as-files patternSent log · idempotency = re-run safetyRuns · audit + cost meterThree sub-agents · each gets only the context it needs. None see the whole conversation. Sub-agent decomposition is how you cut cost without losing quality.
| Trigger | Plain English | When it fires | Founder use case |
|---|---|---|---|
| Cron | A timer · "run this every X" | Schedule · e.g. 7am daily | Recurring jobs · daily / weekly |
| Webhook | A doorbell · another system rings · agent answers | Event from elsewhere | Form fill · signup · support ticket |
| Manual | You · clicking Run | You decide | Testing · ad-hoc demos |
Today's choice: cron(0 13 * * *) — daily at 7am Boulder time (13:00 UTC during MDT) — plus a manual trigger for the live demo. Almost every SMB agent starts as cron.
Every input that drives the agent — voice, ICP, schemas, examples, exclusion lists — lives as a versioned editable file in Drive or Sheets. Not buried in workflow nodes.
| File | Lives in | Controls | Edited by |
|---|---|---|---|
voice.md | Drive | Tone · words · 5 example emails | Founder · rarely |
icp.md | Drive | Who we look for · keywords | Founder · monthly |
do-not-contact.csv | Sheets | Exclusion list | Anyone · any time |
schemas.md | Drive | Structured JSON output shapes | Architect · rarely |
The shorthand: look at every place in your workflow where a string of English drives the agent's behavior. That's a file. Not a prompt. A file. With version history.
// Claude Haiku 4.5 with web_search tool enabled
{
"model": "claude-haiku-4-5",
"tools": [{ "type": "web_search_20250305" }],
"system": "[ICP keywords from icp.md]",
"messages": [{
"role": "user",
"content": "Find founders publicly displaying ICP signals on
HN, Reddit, Product Hunt this week. Return JSON:
[{ person, signal_type, source_url,
evidence_quote, score 0-10 }]"
}]
}
Sub-agent #1 is classification. $1/$5 per million tokens is fine. Save Sonnet for nuance. This is the cascade pattern.
One vendor · one key · one node. ~$1.50/mo more at our scale. The simplicity is worth it.
Idempotent = running the agent twice produces the same outcome as running it once. Without it: bug fix triggers a re-run · missed cron triggers a re-run · demo on stage triggers a re-run · and you double-email five prospects who block you and tell their friends.
Plumbing. The unsexy stuff that separates "demo agent" from "agent that runs every day."
Apache 2.0 OSS. Self-host on Docker for free or Cloud at $83/mo. Best signal-to-noise on JS-heavy sites.
{
"model": "claude-sonnet-4-6",
"system": "[contents of voice.md] + [5 example emails]",
"messages": [{
"role": "user",
"content": "Draft a customer-discovery email to {person}.
Their signal: {evidence_quote}
Source: {source_url}
Their company context: {enriched_summary}
Follow voice.md exactly. Interview ask, not pitch.
Soft CTA. 80-110 words."
}]
}
Voice match is where quality matters. Haiku writes generic-friendly. Sonnet reads voice.md and writes like the person who wrote those examples.
In-context examples beat prompt instructions. The discipline of writing 5 good emails is what makes voice work. Few-shot > zero-shot.
# Sophia's voice — for outbound writing
## Posture
- Direct, no fluff. Short sentences. Cut adverbs.
- Specific over abstract. Numbers over adjectives.
- Confident but not arrogant. Assume the reader is smart.
## Words I use
"architecture", "compounding", "leverage", "the math here is"
## Words I never use
"synergy", "unlock", "revolutionary", "I hope this finds you well"
## Email structure (cold)
1. One-line context: how I found you (specific)
2. Two-line observation: something specific about your situation
3. One question I'd ask
4. One sentence on what I do
5. Soft CTA: "worth 15 minutes Thursday?"
## Examples
[5 real emails — see voice-md-template handout]
The agent inherits your taste through examples, not adjectives. If you can't write 5 good examples, the agent can't either.
Drafts land in Gmail Drafts folder
↓
You skim · edit ~2 words · hit Send
↓
~10 seconds per draft
↓
~300 personalized emails / month
↓
~50 minutes of YOUR time / month
HITL = a checkpoint where a human reviews agent output before it goes external.
The 10× output math the workshop description promised.
10 hours/week back. Goes to product · calls · fundraising.
Subject: Discovery Pulse — May 8
Today:
· 5 new drafts queued
· 3 follow-ups ready
· 1 reply yesterday from @kbrant
Cost today: $0.21
Top signal: pricing-frustration (n=2)
Your move: approve before 5pm
date · leads_found · drafts ·
cost_usd · avg_score ·
top_signal · errors
2026-05-07 · 30 · 5 · 0.21 ·
7.4 · pricing · 0
2026-05-08 · 28 · 5 · 0.19 ·
6.9 · hiring · 0
Trust gets built through visible track record. After two weeks of high approval rates, you can promote the agent to Stage 3.
Same architecture as Step 6. Different inputs. Once you've built one agent, you've built five.
Claude API — Haiku 4.5 routes & scores · Sonnet 4.6 drafts in your voice. Cascade cuts cost 60–70%. Free: Llama 4 Scout on Groq.
web_search + Firecrawl — Claude finds prospects, Firecrawl extracts company sites. Free: HN Algolia + Reddit JSON + Jina Reader, no auth.
n8n — open-source orchestrator. Visual nodes wire up the steps. Self-host on a $5 VPS or use n8n.cloud trial.
Google Sheets — three tabs: ICP (your config), Sent (idempotency), Runs (audit log). Edit the sheet, agent inherits.
voice.md in Drive — your writing samples. Cached on every Sonnet call. Edit it; the next run sounds more like you.
Gmail Drafts — the agent NEVER sends. You read each draft, edit, hit send. Stage 2 of the trust ladder.
To run · 4 free accounts: n8n.cloud · Anthropic ($5 credit) · Firecrawl (500 free) · Google. ·
To customize for your business · 4 things in your Sheet: voice.md (style) · icp_description (target) · signal_keywords (listen for) · subreddits (where they hang out). No redeploy.
| Primitive | Purpose | Where |
|---|---|---|
| Trigger types (cron · webhook · manual) | Wake the agent up | Step 1 |
| Config-as-files | Editable behavior, no redeploys | Step 2 |
| Sub-agent decomposition | Cheaper, narrower context per step | Steps 3 · 6 · 8 |
| Cascade (Haiku → Sonnet → Opus) | 60-70% bill cut | Steps 3 & 6 |
| Schema-first JSON outputs | Connections that don't break | Steps 3 · 5 |
| Idempotent dedup | Re-run safety | Step 4 |
| HITL via Gmail drafts | Brand voice protection | Step 7 |
| Audit log + cost meter | Trust through track record | Step 8 |
The spine of every agent you'll build for the next 5 years. Tools change. Patterns don't.
Combine them. Recombine them. Tools change. Patterns don't.
| Agent | What it does | Trigger | Output |
|---|---|---|---|
| Inbound Triage | Form fill → research → draft response | Webhook | Gmail draft |
| Competitor Pulse | Daily diff competitor changelogs | Cron daily | Email digest |
| Support Draft | Ticket → KB → drafted reply | Webhook | Helpdesk draft |
| Content Repurposer | Blog → tweet thread + LinkedIn + newsletter | Webhook | Drafts in Drive |
| Call Notes → CRM | Recording → structured notes → CRM update | Webhook | CRM entry |
| Investor Update | Metrics → narrative → draft email | Cron monthly | Gmail draft |
| Onboarding Personalizer | Signup → enriched welcome series | Webhook | Gmail sequence |
Same architecture. Different trigger · search/extract · output. Ship a different one every week.
Sonnet for everything.
$18/month
at 100K input / 20K output daily
$6/month
60–70% reduction. Same workflow, same output quality. The most underused cost lever in 2026. Free path follows the same logic — Llama 3.1 8B Instant for cheap, Llama 4 Scout for the heavy work.
Try Tier 0 free this week. No credit card. No install. Move to Tier 1 or 3 when you outgrow the trial.
| Build | Price | Runs | Throughput |
|---|---|---|---|
| AMD Strix Halo (Framework Desktop) | $1,499 — $2,500 | Llama 3 70B Q4 | 14–18 t/s |
| Mac Studio M3 Ultra 96GB | $3,999 | Qwen 3.6-27B | 17–18 t/s |
| RTX 5090 32GB | ~$3,600 | Qwen 3.6-27B Q4 · Gemma 4 31B | 5,200 t/s prefill |
Break-even vs API at ~2–3M tokens/day · ~12-month payback. Then 40–200× cheaper ongoing. Stack: llama.cpp (HIP / Vulkan / Metal) + Ollama or vLLM.
Revisit local when API bill > $200/mo · PII workloads · predictable throughput.
| Stage | What the agent does | What you do | Guardrails to add | Promote when |
|---|---|---|---|---|
| 1 · Shadow | Runs alongside · no external action | Compare agent output to your own work | Schema validation · cost ceiling per run | ≥ 90% agreement over 50+ runs |
| 2 · Pending Review (today) | Drafts only · never sends | Approve / edit / reject each output | + voice-distance check · audit log · do-not-contact filter | ≥ 95% approval over 2 weeks |
| 3 · Conditional Auto | Auto-fires high-confidence (score ≥ 8 + voice match ≥ 0.85) | Review low-confidence + 10% sample audit | + rate limit per recipient · anomaly detection · sandbox isolation | Auto-error rate < 1% sustained 30+ days |
| 4 · Full Auto | Fires all · alerts on exceptions | Weekly audit log review | + A/B replay testing · drift detection · automatic rollback | Trusted like a senior teammate |
Discovery Engine ships at Stage 2. Each row adds the guardrails of the row before. Graduate one stage at a time.
"The secret to successful agents in Enterprise is not the agents — it's the scaffolding around the agents."
— Sanjin Bicanic, Bain & Co · Apr 29, 2026
Same lesson at startup scale: the agent is 20%. Your dedup, HITL gate, voice.md, audit log — that's the 80%.
Reason → Action → Observation → loop. When you debug an agent, one of these three is broken. Memorize the loop, not the framework.
voice.md in Drive. ICP description in a Sheet cell. Edit a file — the next run inherits. The single most important pattern in this hour.
Haiku for cheap classification. Sonnet for the one step that needs voice nuance. Free path: Llama 3.1 8B Instant for cheap, Llama 4 Scout for the heavy work.
The agent drafts. You send. ~10 seconds per draft. ~10 hours/week back. Brand voice protected. Stage 2 of the trust ladder is the default — earn higher autonomy through the audit log over weeks.
Inbound triage · competitor pulse · support draft · content repurposer · call notes → CRM · investor update · onboarding personalizer. Different trigger, different search, different output. Same six pieces.
5 free 30-min audits for BSW attendees.
You bring your bottleneck. I'll tell you what to automate, what to leave alone, and which 3 tools to use.
agenticarchitect.ai/blog
Weekly deep-dives on agentic architecture for lean founders. n8n templates · cost teardowns · what's actually working in 2026.
Repo: github.com/sudosoph/bsw26-agentic-workflows · MIT · fork · ship yours
Boulder is the lean-founder agentic capital, and you're already here.
Boulder AI Builders · Boulder Startup Week · Silicon Flatirons (CU Boulder)