A practical guide to using Claude Code as a programmable agent with commands, skills, memory, hooks, and MCP.
Most developers try Claude Code once, get mediocre results, and conclude it's overrated.
They are not wrong about the results. They are wrong about the cause. The problem is not Claude Code — it is the mental model. They are using a programmable agentic system as a chatbot, and chatbot results are exactly what you get when you do that.
Claude Code is not a smarter autocomplete. It is a system with composable primitives: commands, skills, subagents, memory, hooks, and MCP integration. Each primitive solves a specific problem. Together they compound — each one you add multiplies the value of the ones already there.
This article builds the full architecture — primitives, composition patterns, and a concrete end-to-end example — along with what actually breaks in production when you skip the fundamentals.
The Core Primitives
Claude Code has six execution primitives and two system-layer concerns. The execution primitives are the building blocks of any workflow you design. The system layer governs how they operate.
Subagents run in isolated execution contexts with fresh context windows. You use them when tasks are independent and can run in parallel — or when you need to prevent context contamination between concerns. A subagent does not inherit the mess of the parent session. That isolation is the point.
Commands are reusable prompts. Instead of typing the same instruction every time ("review this file for security issues, check input validation, look for SQL injection patterns..."), you write it once as a /review-security command and invoke it. Commands are speed. They also enforce consistency across sessions.
Skills are reusable knowledge workflows — injected reasoning patterns rather than raw instructions. Where a command tells Claude what to do, a skill tells Claude how to think about a class of problems. Skills are reusable thinking.
MCP Servers are tools and APIs. This is where Claude connects to the outside world: GitHub, databases, monitoring systems, deployment pipelines. MCP gives Claude real-time data and real-world actions. Without MCP, Claude is reasoning about your system. With MCP, Claude is operating your system.
Memory provides persistent context across sessions. Without it, you re-explain your architecture, coding conventions, and constraints every conversation. With it, Claude remembers what matters. The distinction between what to store and what to drop is where most people get this wrong — more on that below.
Hooks are event-driven automation. File saved → trigger linting. Test failed → trigger analysis. PR opened → trigger review. Hooks are safety rails and workflow glue. They run without you having to ask.
The six above are your execution primitives. Two system-layer concerns govern how they all operate:
Settings define the boundaries: permissions, model config, cost visibility, output format. These are not afterthoughts — they determine what Claude is allowed to do and how it reports results.
Workflows are the orchestration layer. They compose commands, skills, subagents, and MCP calls into multi-step processes. Workflows are how you scale Claude from one-off task completion to continuous, autonomous operation.
How a Request Flows Through the System
When a request enters Claude Code, here is how it moves through the system:
flowchart TD
A["User Intent"] --> B["Command Layer"]
B --> C["Skills Injection"]
C --> D["Subagent Execution"]
D --> E["MCP Tool Calls"]
E --> F["Output"]
B -- "Memory ↔ all layers" --> C
D -- "Hooks trigger on events" --> E
style A fill:#4A90D9,stroke:#2C5F8A,color:#fff
style B fill:#5BA85A,stroke:#3A7039,color:#fff
style C fill:#E8A838,stroke:#B07820,color:#fff
style D fill:#9B59B6,stroke:#6C3483,color:#fff
style E fill:#E74C3C,stroke:#A93226,color:#fff
style F fill:#1ABC9C,stroke:#148F77,color:#fff
Memory is not a step in this flow — it cuts across all layers. Your architectural context, coding conventions, and project-specific knowledge should be available at every stage, not injected at a single point.
Hooks fire on events (file writes, test outcomes, PR creation) and route to specific actions without a prompt. Reactive tooling waits to be asked. Proactive automation does not.
Context Engineering: The Skill That Separates Good Results from Great Ones
This is the section that separates engineers who get serious results from Claude Code from those who don't.
The default failure mode is Big Prompt Chaos: a massive system prompt stuffed with commands, skills, memory dumps, summaries, and behavioral instructions, all jumbled together. Claude's attention degrades in bloated contexts. You get worse reasoning, missed instructions, and inconsistent behavior — even with a capable model.
The fix is Structured Context. Four clean layers:
- Commands — what to do in common scenarios
- Skills — how to reason about domain-specific problems
- Memory — what is true about this project and codebase
- Subagents — who handles what (so the main agent stays focused)
The principle: smaller context = higher accuracy. It is how transformer attention works. A 50-token instruction in a 2,000-token context lands differently than the same instruction buried in 50,000 tokens. Structure your context so Claude sees what matters when it matters.
CLAUDE.md is your anchor file. Put architecture decisions, coding conventions, non-obvious constraints, and codebase topology here. Everything in CLAUDE.md is persistent, long-term truth. Do not put session-specific notes there — that is what conversation memory is for.
Memory: What to Keep, What to Drop, What to Never Store
Three types of memory. Different purposes, different lifetimes.
Session Context is temporary — the conversation history. It is ephemeral by design. Do not try to preserve everything here. Treat it as working memory: what Claude needs for this task, not for all future tasks.
Skills (playbooks) are persistent but noisy if overloaded. Inject them selectively. A skill that runs on every invocation when it is only relevant for database migrations is wasting tokens and attention. Skills should be invoked when relevant, not pre-loaded unconditionally.
CLAUDE.md stores long-term truth: architecture, conventions, team standards, project constraints. This is what should not change session to session. Be brutal about what earns a place here. Bloated CLAUDE.md files are as bad as bloated prompts.
The discipline: store only what is genuinely long-term truth. If it changes monthly, it probably belongs in session context. If it changes daily, it does not belong in memory at all.
From One-Off Prompt to Reusable Workflow
The path from a raw prompt to a production-grade workflow is:
flowchart LR
A["Prompt"] --> B["Command"] --> C["Skill"] --> D["Workflow"]
style A fill:#E8A838,stroke:#B07820,color:#fff
style B fill:#5BA85A,stroke:#3A7039,color:#fff
style C fill:#4A90D9,stroke:#2C5F8A,color:#fff
style D fill:#9B59B6,stroke:#6C3483,color:#fff
Prompts are one-off. You type them, Claude responds, you move on. No reuse, no consistency, no accumulation.
Commands are saved prompts with names. When you find yourself typing the same instruction across sessions, that instruction should become a command. Examples: /code-review, /write-tests, /explain-architecture, /debug-this.
Skills are deeper — they encode reasoning patterns, not just task descriptions. A skill for debugging async code tells Claude how to think through concurrency issues, not just "debug this." Skills survive context changes. They are reusable thinking.
Workflows compose commands, skills, subagents, and MCP calls into repeatable multi-step processes. A code review workflow might: pull the diff via MCP → inject a code review skill → spawn a subagent per file → aggregate findings → post a structured comment. That entire sequence runs from one invocation.
The principle: stop rewriting prompts every time. Every repeated instruction is a refactoring opportunity.
MCP: From Reasoning About Your System to Operating It
This is where Claude goes from a reasoning engine to an operating agent.
flowchart LR
A["Claude Core"] <--> B["MCP Servers"] <--> C["APIs / DB / Tools"]
style A fill:#4A90D9,stroke:#2C5F8A,color:#fff
style B fill:#E74C3C,stroke:#A93226,color:#fff
style C fill:#5BA85A,stroke:#3A7039,color:#fff
Without MCP, Claude works from the context you give it — files, descriptions, snippets. With MCP, Claude fetches live data, runs queries, pushes commits, triggers deployments, and reads monitoring dashboards. The same reasoning capability now acts on real system state instead of a snapshot.
Three properties MCP unlocks:
Real-time data. Claude is not reasoning from stale context — it is pulling current state. This matters enormously for debugging and incident response, where the system state right now is what needs to be understood.
External hallucination reduction. When Claude can verify a claim against a live data source instead of relying on training data, factual errors drop. A Claude that can query your database schema is less likely to hallucinate column names.
Grounded AI. Actions have real consequences. Claude can close a ticket, merge a branch, or restart a service — not just describe how to do it.
MCP servers are not magic. They are boundaries. Design them with the same care as any API. Scope permissions tightly. Do not give Claude write access to production databases without explicit human approval gates — the failure story later in this article shows exactly what happens when you skip that.
Hooks and Guardrails: Automation That Does Not Ask Permission
Hooks implement the automation that does not require a prompt.
flowchart LR
A["Event"] --> B["Hook"] --> C["Action"]
style A fill:#E8A838,stroke:#B07820,color:#fff
style B fill:#9B59B6,stroke:#6C3483,color:#fff
style C fill:#E74C3C,stroke:#A93226,color:#fff
The event-hook-action model covers two categories:
Quality automation: linting on file save, documentation generation on function creation, test scaffolding on new module creation. These reduce the manual discipline required to maintain code quality standards.
Safety rails: validation before destructive operations, policy enforcement for infrastructure changes, security scans on dependency updates. Hooks here are your last line of defense before Claude takes an irreversible action.
Hooks are where safety earns its place in the architecture. An agent without hooks relies entirely on the model's judgment for when to stop and ask. That is a fragile design. Build explicit guardrails into the execution path.
Settings and Environment: The Layer Most Engineers Configure Once and Forget
The settings layer is underrated. Most people configure it once during setup and never revisit it. This is a mistake.
The five knobs that matter:
Permissions define what Claude can and cannot do without asking. File read vs. write vs. execute. Network access. CLI tool invocation. Get this wrong and you either cripple Claude's usefulness or give it more authority than you want.
Sandbox mode isolates execution. Run untrusted operations in a sandboxed environment before they touch your actual codebase. Use this for exploratory refactors and speculative changes.
Model config controls which model runs for what. For fast, low-stakes operations (generating test names, explaining a function), a cheaper model is fine. For architectural analysis and multi-file refactors, use the highest-capability model you have access to. Routing by task type is a cost and latency optimization that most teams leave on the table.
Output format shapes Claude's responses for your downstream consumers. If the output is going to a structured workflow, JSON. If it is a developer reading it, prose. If it is a CI comment, Markdown. Match format to consumer.
Cost visibility is non-negotiable for production use. You need to know what each operation costs — not at the end of the month, but per session and per workflow. Token budgeting and model routing decisions require cost data.
The framing: environment design matters. The same Claude model running with well-designed settings produces dramatically better outcomes than the same model with defaults.
A Concrete Example: Building a Claude-Powered Code Review System
All of the primitives above are easier to understand in motion. Let's build one real system — an automated code review workflow — and trace how each primitive contributes.
The goal: when a developer opens a PR, Claude reviews the changed files for security issues, logic bugs, and test coverage gaps. It posts a structured comment. A human engineer makes the final merge decision.
Here is the full system at a glance — each primitive mapped to its role:
flowchart TD
A["PR Opened\n(GitHub Event)"] --> B["Hook fires\n(settings.json PostToolUse)"]
B --> C["CLAUDE.md loaded\nProject context injected"]
C --> D["/review-security command\nChecklist invoked"]
D --> E["debug-async skill\nReasoning pattern injected"]
E --> F["GitHub MCP\nFetches live diff"]
F --> G["Subagent per file\nIsolated context"]
G --> H["Findings aggregated\nStructured PR comment"]
H --> I["Engineer reviews\nMerges or requests changes"]
style A fill:#4A90D9,stroke:#2C5F8A,color:#fff
style B fill:#9B59B6,stroke:#6C3483,color:#fff
style C fill:#E8A838,stroke:#B07820,color:#fff
style D fill:#5BA85A,stroke:#3A7039,color:#fff
style E fill:#5BA85A,stroke:#3A7039,color:#fff
style F fill:#E74C3C,stroke:#A93226,color:#fff
style G fill:#9B59B6,stroke:#6C3483,color:#fff
style H fill:#1ABC9C,stroke:#148F77,color:#fff
style I fill:#4A90D9,stroke:#2C5F8A,color:#fff
Each sub-section below builds one node in that diagram.
Step 1 — Command
Create .claude/commands/review-security.md:
# /review-securityYou are performing a security-focused code review on the file(s) provided.Check for:- SQL injection vulnerabilities (raw string interpolation in queries)- Missing input validation on user-controlled fields- Hardcoded secrets, tokens, or credentials- Unsafe deserialization- Missing authentication/authorization guards on sensitive endpoints- Dependency versions with known CVEs (flag for manual verification)For each finding:1. State the file and line number2. Describe the vulnerability class3. Show the vulnerable snippet4. Suggest a concrete fixFormat output as structured Markdown suitable for a GitHub PR comment.If no issues are found, state that explicitly — do not invent findings.
This command is now reusable. Every security review in every future session runs the same checklist, at the same standard, from a single /review-security invocation.
Step 2 — Skill
Create .claude/skills/debug-async.md:
# Async Debugging Reasoning PatternWhen debugging async/concurrent code issues, follow this reasoning sequence:1. **Identify the async boundary** — where does sync code hand off to async? Look for: async/await, callbacks, event emitters, thread pools, message queues.2. **Trace the execution path** — follow each code path through the boundary. Where can execution interleave? Where are shared state mutations?3. **Check for race conditions** — specifically: - Read-modify-write without locks - Shared mutable state accessed from multiple coroutines - Non-atomic check-then-act sequences4. **Check error propagation** — does an exception in an async branch: - Propagate correctly to the caller? - Get swallowed silently? - Leave shared state in a partially-mutated state?5. **Identify missing awaits** — look for async functions called without await that return Promises/coroutines silently discarded.Apply this reasoning before suggesting fixes. State which step reveals the bug.
The difference from a command: this is not "do X" — it is "reason about X this way." Skills encode thinking patterns, not task descriptions.
Step 3 — CLAUDE.md
# Project: payments-service## Architecture- FastAPI backend, PostgreSQL (via SQLAlchemy async), Redis for session cache- All DB access through repository layer in `/app/repositories/`- Never write raw SQL — use SQLAlchemy ORM or parameterized queries only- Auth handled by `app/middleware/auth.py` — all routes under `/api/v2/` require JWT## Coding Conventions- Async-first: all DB calls must be `await`ed, no sync SQLAlchemy sessions- Error handling: use custom exceptions from `app/exceptions.py`, never raise bare `Exception`- No secrets in code — use `app/config.py` which reads from environment variables## Non-obvious Constraints- The `payments` table has a soft-delete column (`deleted_at`), not hard deletes- `user_id` in the JWT payload is a UUID string, not an integer — do not cast to int- Redis keys expire in 15 minutes — do not assume session state persists longer## Test Standards- Unit tests in `/tests/unit/`, integration tests in `/tests/integration/`- All new endpoints require at least one happy-path and one error-path integration test- Use `pytest-asyncio` for async tests
This is what Claude reads before every session on this project. It does not need to ask about your DB layer, your auth pattern, or your UUID convention — it already knows.
Step 4 — MCP Call
With a GitHub MCP server connected, the review workflow can pull the actual diff instead of requiring you to paste it:
/review-securityUse the GitHub MCP to fetch the diff for PR #847 in payments-service.Review only the changed files. Focus on the repository layer changes.
Claude fetches the diff, applies the security command, posts the finding back to the PR. No copy-paste. No stale context from you summarizing what changed.
Step 5 — Subagents
For larger PRs with many changed files, spawning one subagent per file keeps the reviews focused and prevents context contamination:
/review-securityPR #847 has 8 changed files. Spawn one subagent per file.Each subagent should run /review-security on its assigned file independently.Aggregate the findings into a single structured report grouped by severity:Critical, High, Medium, Low.
Each subagent starts with a clean context window. The subagent reviewing payments_repository.py has never seen user_routes.py — it carries no accumulated findings, no code snippets from earlier files, no carryover assumptions. Each file gets full attention. Results are aggregated at the end.
Step 6 — Hooks
Add a hook that runs the security review automatically on every pre-push — no manual invocation required. Hooks in Claude Code are configured in your project's settings.json:
{ "hooks": { "PreToolUse": [ { "matcher": "Bash", "hooks": [ { "type": "command", "command": "echo 'Running pre-push security review...' && /review-security" } ] } ], "PostToolUse": [ { "matcher": "Write|Edit|MultiEdit", "hooks": [ { "type": "command", "command": "python lint_check.py $CLAUDE_TOOL_OUTPUT" // variable name may differ — check Claude Code hook docs } ] } ] }}
PreToolUse fires before Claude executes a tool — use it for validation gates. PostToolUse fires after — use it for quality checks like linting or documentation generation. The hook runs without a prompt. The engineer sees the findings before they start their manual review.
This is the full system: a CLAUDE.md that carries project truth across sessions, a command that enforces review consistency, a skill that governs reasoning quality, an MCP server that eliminates manual context work, subagents that scale the review across files, and hooks in settings.json that trigger the whole thing automatically — before Claude touches a file and after it writes one.
The Production Failure Story
One team built this wrong. They connected a GitHub MCP server and a PostgreSQL MCP server, gave Claude write access to both — for convenience, they wanted Claude to push schema migration files directly and update a tracking table in their staging database.
The workflow worked fine in staging. In production, they had the same MCP configuration, because maintaining two separate setups seemed like overhead.
A developer ran the code review workflow on a production PR. Claude reviewed the PR correctly, then — following the workflow's tracking-table-update step — wrote to the tracking table. In production. The data write was benign. But it demonstrated that Claude had write access to their production database from a workflow that was supposed to be read-only.
They got lucky. What they built was not lucky — it was a production incident waiting for a worse day.
Two rules this enforces:
Scope MCP permissions to the task, not the agent. A code review workflow needs read access to GitHub. It does not need write access to production databases. Separate MCP configurations for separate workflow contexts. This is not paranoia — it is the same principle as least-privilege IAM roles.
Add explicit approval gates for production writes. Before any Claude action that touches production state, the workflow should pause and ask for confirmation. In Claude Code, this is a native hook pattern — stop execution, show the proposed action, wait for human approval. Build this in from day one, not after the first incident.
Safe Iteration: Build, Review, Checkpoint, Rewind
The code review system above works. It will also eventually do something unexpected — a subagent produces a false positive, or an MCP call returns stale data and the review is based on a diff that was already overwritten. Agentic systems hit edge cases. The question is whether you can detect them early, contain the blast radius, and roll back cleanly.
flowchart LR
A["Build"] --> B["Review"]
B --> C["Checkpoint"]
C --> D["Rewind"]
D --> A
style A fill:#5BA85A,stroke:#3A7039,color:#fff
style B fill:#4A90D9,stroke:#2C5F8A,color:#fff
style C fill:#E8A838,stroke:#B07820,color:#fff
style D fill:#E74C3C,stroke:#A93226,color:#fff
Checkpoints are explicit state saves before high-risk operations. In the code review workflow, that means saving state before Claude attempts any multi-file edit suggested by a review finding — not after. Claude Code supports this natively via /checkpoint. Use it before every action that is hard to reverse.
Rewind is what makes checkpoints valuable. Without it, every Claude-initiated change is irreversible in practice, and you will constrain what you let it try. With a known-good checkpoint to fall back to, you can let Claude attempt harder refactors and more aggressive fixes. The safety net is the precondition for ambition.
Review is the human-in-the-loop gate. Not every action needs it — that would eliminate the throughput benefit. But consequential actions, irreversible changes, and cross-system operations need an explicit approval step. In the code review system, this is the final merge decision: Claude reviews, Claude flags, the engineer merges. That division is intentional. Design it in from the start, not as an afterthought when something goes wrong.
What Breaks in Production
Here is what breaks when you treat Claude Code like a chatbot instead of a system:
Context bloat kills reasoning quality. A team on a large Rails monorepo pre-loaded their entire architecture doc, three skill files, and two months of session memory into every context window. Claude started missing instructions buried mid-context. They split into scoped layers — one skill per workflow, memory trimmed to project essentials — and the missed-instruction rate dropped immediately. Smaller context is not a limitation. It is how transformer attention works.
Unbounded MCP access creates blast radius. See the production database write above. The specific mechanism does not matter — what matters is that a code review workflow with write access to production systems is a footgun with a long fuse. It works perfectly until it doesn't. Scope MCP permissions to the task, not the agent.
Missing hooks means relying on Claude's judgment for safety. Claude is good at reasoning about when to stop and ask. It is not consistent. One session it will correctly pause before a destructive file operation. Another session, with slightly different context framing, it will not. Hooks that enforce stopping conditions unconditionally are more reliable than a model that enforces them probabilistically.
No cost visibility means runaway token spend. A team running parallel code review subagents on a 40-file PR hit $180 in a single workflow run because four subagents entered retry loops on malformed tool responses. Without per-session cost gates, you find out at billing time.
Commands and skills not maintained become stale. A /review-security command written when your API used Flask will send Claude looking for Flask route decorators after you migrate to FastAPI. It will not find them. It will report clean. The command degraded silently. Treat commands and skills like code — version them, review them when your stack changes, retire them when they no longer reflect reality.
When NOT to Build This
Not every project warrants this architecture. Building it when you do not need it creates maintenance overhead without the payoff.
Skip it if you are writing one-off scripts. Commands and skills are investments that pay off through repetition. A script you run once has no repeated workflows to systematize. The overhead of maintaining a CLAUDE.md and a command library exceeds the benefit.
Skip it if your workflows do not repeat. The entire reusability engine exists to eliminate repeated manual effort. If every task Claude does for you is genuinely novel, there is nothing to systematize. Use Claude conversationally and move on.
Skip it if you cannot commit to maintaining commands and skills. A stale command is worse than no command — it gives Claude false confidence in outdated assumptions. If your team will not update the command library when the codebase evolves, do not build the command library. The discipline required to maintain it is not optional.
Skip it for greenfield experiments. When you are still figuring out what you are building, the architecture is premature. Get to a working prototype first. Systematize once you know which workflows repeat.
The right time to build this is when you notice yourself typing the same instructions repeatedly, when Claude keeps asking about things it should already know, or when a workflow is important enough to run consistently across your whole team — not on the first day you install Claude Code.
The Practical Starting Point
You do not need to implement all of this on day one. Here is a sequence that works:
-
Start with
CLAUDE.md. Document your codebase topology, key conventions, and non-obvious constraints. This single file pays off immediately. -
Convert your three most-repeated prompts into commands. You already know which ones they are. The ones you type every morning.
-
Add one MCP server that connects to a system Claude currently has to reason about without data — your database, your issue tracker, your monitoring stack.
-
Set up one hook for your most error-prone operation — whether that is a pre-commit lint, a pre-deploy check, or a post-write documentation update.
-
Add checkpoints to any workflow that touches production systems. No exceptions.
Everything else — subagent orchestration, multi-skill composition, advanced workflow routing — layers on top once these foundations are solid.
The developers who get dramatically different results from Claude Code are not using a different model.
Same model. Different system. Completely different results.
If you are building production agentic systems and want to go deeper on memory architecture, tool design, or multi-agent orchestration — the related reading below is the natural next step.
Related Reading
- Building Production-Ready AI Agents with LangGraph — deterministic workflow patterns that apply to any agentic system
- Tool Use in LLM Agents — how to design and scope tools for agents
- Building Agents That Remember — state management and memory architecture
- Can MCP Replace Memory Systems? — where MCP fits and where it does not replace deeper memory layers
References
Anthropic. "Claude Code Overview." Claude Code Docs. https://code.claude.com/docs/en/overview
Anthropic. "Building Agents with the Claude Agent SDK." Anthropic Engineering Blog, September 2025. https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk
Anthropic. "Introducing Advanced Tool Use on the Claude Developer Platform." Anthropic Engineering Blog, November 2025. https://www.anthropic.com/engineering/advanced-tool-use
Anthropic. "Claude Code GitHub Repository." https://github.com/anthropics/claude-code
Taft, Darryl K. and Lawson, Loraine. "Anthropic Launches a Multi-Agent Code Review Tool for Claude Code." The New Stack, March 2025. https://thenewstack.io/anthropic-launches-a-multi-agent-code-review-tool-for-claude-code/
Pandey, Brij Kishore. "Claude Code Best Practices: From Prompts → Agentic Systems." Visual reference diagram, 2026.
Related Articles
- Tool Use in LLM Agents: From Local Functions to the Model Context Protocol
- Agentic AI Observability: Why Traditional Monitoring Breaks with Autonomous Systems
- Orchestration in Agentic AI: Tool Selection, Execution, Planning Topologies, and Context Engineering
- 5 Principles for Building Production-Grade Agentic AI Systems