AI Agent Series

Using AI Agents in
Spec-Driven Development

A practical approach to working with AI agents —
using clear workflows, agent definition files, and step-by-step reviews

Agent Definition Files ADR-driven Workflow Sub-agent Orchestration Security Review Agent

Why LLMs Are Hard to Use in Real Projects

🎭

Hallucination

The model writes code using functions or libraries that do not exist. It sounds confident, but the result does not work.

🔀

Non-Deterministic Output

The same question gives a different answer in each session. You cannot rely on getting the same code structure or naming.

📉

Context Window Limits

After many messages, the model starts to ignore your original rules — the coding style, the approved libraries, the project setup.

🔄

No Long-Term Memory

Every new session starts from zero. The model has no idea what you decided before — past choices, rejected ideas, team rules.

What Actually Happens in a Long Chat Session

Session start

You give the agent its rules: which libraries to use, how to name things, what to avoid. It follows them well at first.

Mid session

As the chat grows, the model's attention spreads across all the messages. The early rules start to get less weight.

End of session

The model breaks your naming rules, adds libraries you did not ask for, and forgets the decisions you made two hours ago.

❶  No Long-Term Memory

There is no built-in way to carry over what was decided. You have to re-explain everything from the start.

❷  Context Window Problem

A single long system prompt tries to cover everything — and covers nothing well. The model gets confused about its own role.

❸  No Review Step

Without a review step, wrong code gets merged. Problems are found late — or not at all.

A Better Way to Work with AI Agents

  • Specific Agent for Each Job Type Instead of one chat that does everything, use separate agents — one for architecture, one for coding, one for security. Each stays focused on its own area.
  • Start in Plan Mode — Ask Before Deciding The agent explains its plan first and waits. It does not pick a technology or make a design choice on its own — it always asks the developer first.
  • Explicit User Approval at Every Step Nothing moves forward without a clear yes from the developer. This keeps a record of every decision and stops the agent from going in the wrong direction.
  • Agent .md Files + Skills Files A short markdown file defines what the agent does, what tools it can use, and what it must never do. This replaces the long, unreliable system prompt.
  • Pass Output to Sub-Agent as Input The architect writes a decision document. The developer agent reads it and starts coding. Each agent gets the output from the one before it — like a clear handover.
  • Double-Check All Steps After Completion After finishing, the agent goes through a short checklist to make sure it did everything it was asked to do and did not add anything extra.

Security Review Agent — From Definition to Output

① Existing Agent (Trail of Bits)

github.com/trailofbits/skills

└─ plugins/differential-review

Published security methodology

No setup needed — just reference it

Differential security review pattern

Trusted by security researchers

no custom agent file needed

② Prompt Given

"Use the security audit approach from github.com/trailofbits/skills
…/differential-review
to review this Scala push notification API. Read all source files. Find vulnerabilities. For each issue, show the current code, the fixed code, and steps to verify the fix. Group by severity. Mark anything that needs a human to fix manually."

→ agent reads codebase autonomously

③ Output — SECURITY_FIXES_EN.md

🔴 CRITICAL 2 issues  ·  timing attack, leaked Firebase key
🟠 HIGH 6 issues  ·  ownership check, logs, Docker root…
🟡 MEDIUM 5 issues  ·  payload size, Swagger, thread pool…
🟢 LOW 3 issues  ·  dep scope, driver version…
each fix: old code · new code · verification steps
📂 The agent read every source file on its own, grouped the problems, wrote the fixes, and noted which ones need a human. No code was touched — only a report was created. → IDE Demo

Architecture Design with an AI Agent

Before — Everything in one request

API Client
Notification Endpoint
Controller (waits for FCM)
⚠ 20s timeout · no retry · quota errors
FCM API

After — Queue-based, separate worker

API Client
202 Accepted
add to queue
SQS Queues × 3 (+ dead-letter)
worker picks up · retries on failure
Worker (new)
FCM API
🤖 The architect agent found the bottleneck, raised 14 design questions for the developer to answer, and wrote a full decision document — before any code was written. → IDE Demo

How Agent Tasks Run in Parallel

Task 1 · Add SQS library
Task 2 · Config setup
Task 3 · SQS connector
Task 4 · Queue publisher
Task 6 · FCM improvements
▶ run together
Task 5 · Async controller
+
Task 7 · Worker service
Task 8 · Worker entry point
Task 9 · Deploy config
Task 10 · Integration tests
Task 11 · Health checks
▶ run together
📂 The architect wrote out the task order and marked which ones can run at the same time. The coding agent picks up each task as a separate job — no waiting, no overlap. → IDE Demo

What We Covered Today

🤖

Specialized Agents

A separate agent for each job — architecture, coding, security — works much better than trying to do everything in one chat session.

📄

Agent Files = Persistent Memory

A short definition file gives each agent a clear role and rules. It carries over between sessions so you do not have to start from scratch.

Human in the Loop

The agent brings up the choices and waits. It does not pick technologies or make design decisions on its own — that is the developer's job.

Parallel Sub-Agents = Speed

When the task order is clear, agents can work on separate pieces at once. This cuts down the total time without losing quality.

📐 Plan first, code second 🔁 Plan → Approve → Build → Review 🛡 Decision docs prevent mistakes 🔗 Each agent hands off to the next
 The real superpower is not the AI — it is the workflow around it.