Building Custom Agents

Agents combine multiple skills with context, memory, and decision-making to handle complex, multi-step workflows autonomously. Learn how to design, build, and test agents that work.

From Skills to Agents

A skill is a single task. An agent is a workflow. Skills are like individual tools in a workshop — a saw, a drill, a sander. An agent is the craftsperson who knows which tool to pick up, in what order, and how to adapt when something unexpected happens.

The jump from skills to agents is the most powerful upgrade in your Claude Code practice. Skills save you time on individual tasks. Agents save you time on entire processes — the kind that involve research, analysis, decision-making, and output generation across multiple steps.

Here is the key difference: with a skill, you invoke it and get a result. With an agent, you describe an objective and the agent decides which skills to use, in what order, and how to handle intermediate results. You shift from directing each step to defining the goal.

Agent Architecture

Every effective agent is built from four components. Understanding these components helps you design agents that are reliable rather than unpredictable.

1. Skills — The Capabilities

Skills are the actions your agent can take. A prospecting agent might have access to a research skill, a scoring skill, and an email-drafting skill. The agent orchestrates these skills rather than doing everything from a single monolithic prompt.

Agent: Prospecting Agent
├── Skill: research-company.md
├── Skill: score-lead.md
├── Skill: draft-outreach.md
└── Skill: log-to-crm.md

2. Context — The Knowledge

Context is the information the agent uses to make decisions. This comes from your CLAUDE.md file (company info, product details, ICP definition), from files in your project (prospect lists, templates, historical data), and from the inputs you provide when launching the agent.

Context sources:
├── CLAUDE.md → Company info, ICP, product positioning
├── data/prospects.csv → List of leads to process
├── templates/email-templates.md → Approved messaging
└── User input → "Focus on Series B companies in fintech"

3. Memory — The State

Memory is how the agent tracks what it has done, what it has learned, and what it still needs to do. Within a single session, Claude Code maintains conversation context naturally. For longer-running workflows, you can have the agent write intermediate results to files that persist between sessions.

Memory patterns:
├── Session memory → Conversation context (automatic)
├── File memory → Write progress to output/progress.md
├── Log memory → Append results to output/agent-log.md
└── State memory → Track processed items in output/state.json

4. Tools — The Interfaces

Tools are how the agent interacts with the outside world — reading files, searching the web, writing output, running commands. Claude Code provides these natively. Your agent definition specifies which tools the agent should use and how.

Designing an Agent

An agent is defined in a skill file, but with a crucial difference: instead of describing a single task, you describe a workflow with decision points. Here is the structure:

# Agent: {Name}

## Objective
What this agent accomplishes end-to-end.

## Skills Used
- List of skill files this agent orchestrates

## Context Required
- What information the agent needs access to

## Workflow
1. Step 1 (which skill or action)
   - Decision point: if X, do Y; otherwise do Z
2. Step 2
   ...

## Output
What the agent produces when finished.

## Guardrails
- Safety constraints and boundaries
- When to stop and ask for human input
Start Simple
The most common mistake with agents is making them too complex on the first attempt. Your first agent should chain 2-3 skills together in a linear flow. Add branching logic and error handling after the basic flow works reliably. Complex agents built on shaky foundations fail in unpredictable ways.

Example: Prospecting Agent

This agent takes a list of prospect companies, researches each one, scores them against your ideal customer profile, and drafts personalized outreach for the highest-scoring leads.

# Agent: Prospecting Pipeline

## Objective
Process a list of prospect companies through research, scoring, and
outreach drafting. Produce a prioritized list with ready-to-send emails
for the top prospects.

## Skills Used
- research-company.md — gather company intelligence
- score-lead.md — score against ICP criteria
- outreach-email.md — draft personalized emails

## Context Required
- CLAUDE.md: Our ICP definition, product info, value props
- Input: List of company names (provided by user or from a file)

## Workflow
1. **Intake:** Read the prospect list. Log total count.

2. **For each prospect company:**
   a. Run /research-company to gather:
      - What they do, company size, industry, funding stage
      - Recent news and activity
      - Key decision-makers
   b. Run /score-lead using the research output:
      - Score from 1-10 based on ICP fit
      - Scoring criteria: industry match, company size, funding stage,
        technology stack, growth signals
   c. Log the company name, score, and one-line summary to output/scores.md
   d. **Decision point:**
      - If score >= 7: Queue for outreach drafting
      - If score 4-6: Log as "nurture" — no outreach now
      - If score < 4: Log as "not a fit" — skip

3. **For each queued prospect (score >= 7):**
   a. Run /outreach-email using the research and scoring data
   b. Save the draft email to output/emails/{company-name}.md
   c. Log completion to output/progress.md

4. **Wrap-up:**
   a. Produce a summary report:
      - Total prospects processed
      - Breakdown by score tier
      - List of drafted emails with links
   b. Save to output/prospecting-report.md

## Output
- output/scores.md — All prospects with scores and summaries
- output/emails/*.md — Draft emails for high-scoring prospects
- output/prospecting-report.md — Summary report

## Guardrails
- Process max 10 companies per run (to manage quality)
- If research returns very little data, score conservatively
- Never fabricate company information
- Flag any prospect where you are less than 70% confident in the data
- Stop and ask the user if a company seems to be a direct competitor

To launch this agent, you would provide the prospect list and let it run:

/prospecting-pipeline
Here are 8 companies to research: Vercel, Supabase, Railway,
Render, PlanetScale, Neon, Turso, Xata

The agent will work through each company systematically, producing files in the output directory as it goes. You can check progress by looking at output/progress.md, and when it finishes, you have a complete prospecting report with draft emails ready for review.

Example: Content Repurposing Agent

This agent takes a single piece of long-form content (a blog post, whitepaper, or transcript) and transforms it into multiple formats for different channels.

# Agent: Content Repurposer

## Objective
Take one piece of long-form content and produce channel-specific
versions for social media, email newsletter, and internal summary.

## Skills Used
- summarize.md — extract key themes and points
- social-post.md — write platform-specific social posts
- newsletter-block.md — draft email newsletter section
- internal-brief.md — create team-facing summary

## Context Required
- CLAUDE.md: Brand voice guidelines, social media handles, audience info
- Input: Source content (URL, file path, or pasted text)

## Workflow
1. **Analyze source content:**
   - Run /summarize to extract main thesis, key points, notable quotes
   - Identify the 3 most shareable or impactful ideas
   - Note any data points or statistics

2. **Generate social content:**
   a. LinkedIn post (150-200 words):
      - Professional tone, insight-driven
      - End with a question or discussion prompt
   b. Twitter/X thread (5-7 tweets):
      - Punchy, numbered insights
      - First tweet is the hook
      - Last tweet links to the full content
   c. Save both to output/social/

3. **Generate newsletter block:**
   - 100-150 word summary with a compelling subject line option
   - Pull quote or standout statistic
   - CTA linking to the full content
   - Save to output/newsletter-block.md

4. **Generate internal brief:**
   - TL;DR (2 sentences)
   - Why this matters for our team
   - Talking points for sales/CS to use with clients
   - Save to output/internal-brief.md

5. **Produce content kit:**
   - Compile all outputs into a single output/content-kit.md
   - Include source link and metadata
   - Add "suggested posting schedule" (which channel, which day)

## Output
- output/social/linkedin.md
- output/social/twitter-thread.md
- output/newsletter-block.md
- output/internal-brief.md
- output/content-kit.md (master file)

## Guardrails
- Maintain the original author's argument — do not change the message
- Each channel version must feel native (not a copy-paste with minor edits)
- Do not exceed platform character/word limits
- If the source content is thin (< 500 words), warn that repurposing
  may produce repetitive outputs

A single blog post now produces an entire content kit in minutes. The agent ensures each channel gets content that feels native to that platform rather than a one-size-fits-all summary pasted everywhere.

Example: Data Cleaning Agent

This agent takes a messy CSV export (typically from a CRM) and produces a clean, validated version with a detailed report of what was fixed.

# Agent: Data Cleaner

## Objective
Clean and validate a CSV/data file. Fix common issues, flag anomalies,
and produce a clean version with a change log.

## Workflow
1. **Audit the data:**
   - Read the file and report: row count, column count, column names
   - Identify data types per column
   - Count missing values per column
   - Identify duplicate rows
   - Check for encoding issues

2. **Clean the data:**
   a. Standardize formatting:
      - Trim whitespace from all string fields
      - Normalize phone numbers to consistent format
      - Standardize date formats to YYYY-MM-DD
      - Fix common email typos (gmial→gmail, .con→.com)
      - Normalize company names (remove Inc., LLC, etc. inconsistencies)
   b. Handle missing values:
      - If a row is missing > 50% of fields, flag for removal
      - For email: flag as missing, do not fabricate
      - For phone: flag as missing, do not fabricate
      - For name fields: flag as incomplete
   c. Remove exact duplicate rows (keep first occurrence)
   d. Flag near-duplicates (same email, different name) for human review

3. **Validate:**
   - Email format validation (basic regex check)
   - Phone number length/format validation
   - Required fields check (configurable)

4. **Produce outputs:**
   a. Clean CSV: The corrected data file
   b. Change log: Every modification made, with row number and field
   c. Flagged rows: Items that need human review
   d. Summary report: Statistics on what was cleaned

## Output
- output/cleaned-data.csv — The corrected file
- output/change-log.md — Detailed list of every change
- output/flagged-for-review.md — Items needing human judgment
- output/cleaning-report.md — Summary statistics

## Guardrails
- Never delete data silently — every removal must be logged
- Never fabricate or guess at missing data
- Always produce the change log, even if no changes were needed
- If the file has more than 1000 rows, process in batches of 200
  and report progress between batches
- Preserve the original file — never overwrite it

This agent turns a painful, error-prone manual task into a systematic process. The change log is critical — it gives you full visibility into what the agent did, so you can review and override any decisions before using the cleaned data.

Building Your First Agent

Follow these steps to build your first working agent. We will keep it simple — two skills chained together with a decision point.

1

Choose a workflow with 2-3 distinct steps

Pick a process where you currently do multiple things in sequence. Good examples:

  • Research a topic, then write a summary
  • Read a document, then extract action items
  • Analyze data, then produce a report

Avoid workflows with more than 3 steps for your first agent. You can add complexity later.

2

Write the individual skills first

Before writing the agent, create standalone skill files for each step. Test them individually. Make sure each skill produces reliable output on its own. If a skill is inconsistent in isolation, it will be worse inside an agent.

3

Write the agent definition

Create a new file in .claude/skills/ that describes the agent workflow. Use the agent template from the "Designing an Agent" section above. Be explicit about:

  • The objective (what does "done" look like?)
  • The step-by-step workflow
  • Any decision points (if X, then Y)
  • Where to save output
  • When to stop or ask for help
4

Run with a single input

Do not test with a batch. Run the agent on one item first. Watch it work through each step. Check the intermediate outputs. This is where you will spot issues — a skill producing output in a format the next skill does not expect, or a decision point that triggers incorrectly.

5

Iterate and expand

Once the single-item flow works cleanly:

  • Test with 3-5 items to check consistency
  • Add error handling for edge cases you discovered
  • Add progress logging so you can monitor longer runs
  • Consider adding a "dry run" mode that shows what the agent would do without executing

Testing and Iterating on Agents

Agent testing is different from skill testing. Skills have a single input-output relationship. Agents have multiple stages where things can go wrong. Here is a systematic approach:

Unit Test Each Skill

Before testing the agent as a whole, run each skill independently with inputs that simulate what the agent would provide. Verify that the output format matches what the next skill expects.

Integration Test the Flow

Run the full agent with a single, well-understood input. Compare the final output to what you would have produced manually. Check:

  • Did the agent follow the intended sequence?
  • Were decision points triggered correctly?
  • Is the final output complete and well-formatted?
  • Were intermediate files saved correctly?

Edge Case Testing

Test with inputs designed to break things:

  • An empty or minimal input
  • An unusually large input
  • An input where one of the intermediate steps should fail gracefully
  • An input that hits a decision point boundary

Common Failure Modes

When agents produce bad output, the cause is almost always one of these:

  • Format mismatch: Skill A's output is not in the format Skill B expects. Fix: add explicit output templates to each skill.
  • Context loss: By the time the agent reaches step 4, it has forgotten important details from step 1. Fix: have intermediate steps save key information to files.
  • Scope creep: The agent tries to do more than instructed, going down rabbit holes. Fix: add explicit scope boundaries in guardrails.
  • Silent failure: A step fails but the agent continues without flagging it. Fix: add validation checks between steps.
The Progress Log Pattern
Always have your agent write to a progress log file as it works. Something as simple as appending a line after each major step — "Researched Company X: score 8/10" — gives you visibility into what the agent is doing and makes debugging dramatically easier. This is the single most useful pattern for agent reliability.
What's Next
You now understand how to design and build agents from skills. The final lesson in this module provides complete, real-world playbooks across four domains — research, content, operations, and code — so you can see agents in action for common business scenarios.
Ready for Production?
When your agents are working locally, Keyset can take them further. Host your agents in the cloud, schedule them to run automatically, add human-in-the-loop checkpoints for high-stakes decisions, and even offer them as a service to clients. It is the path from "useful local tool" to "production workflow."