Works with: Any language · Any framework · Any architecture style
Stack-specific appendices: Java/Spring/JSP · Python/Django/FastAPI · Node.js/Express · .NET/C# · Ruby on Rails · Go
How to Use This Guide
This guide has two layers:
Figure: How to Use This Guide
Start here. Complete the core guide with your codebase. When a step says "identify entry points" or "run dependency analysis," jump to your stack appendix for the exact files, commands, and patterns to look for.
Navigation: Find Your Starting Point
Not every reader needs to start at page 1. Jump directly to what matters for your situation:
| Your situation | Go directly to |
|---|---|
| "I have 2 hours and need something now" | Time-Box Strategy → Minimum Viable Doc |
| "I'm presenting to a CTO / VP next week" | Audience Adaptation → Output Template |
| "I need to audit an unfamiliar codebase" | Pass 1 → Anti-Pattern Checklist |
| "I have a doc but need to validate it" | Architecture Validation Loop |
| "I need to track architecture changes over time" | Architecture Evolution & Decision Records |
| "My team disagrees on the architecture" | Team & Org Context |
| "I want to score debt and prioritize fixes" | Debt Risk × Effort Matrix |
| "My system is microservices or event-driven" | If Your System Is Microservices or Event-Driven |
| "My system is serverless" | If Your System Is Serverless |
| "I'm using AI to help with analysis" | AI-Assisted Workflow → Prompt Library |
Core Mental Model
Code = Details
Architecture = Compression of intent
Rule: If it cannot be drawn, it is not architecture.
Every architectural claim must be expressible as a diagram - a component box, an arrow, a sequence, or a boundary. If you can only describe it in prose, keep digging until you find the shape.
Think in 3 passes:
| Pass | Focus | Output |
|---|---|---|
| 1 | Structure discovery | Modules, services, entry points |
| 2 | Behavior understanding | Data flow, request traces, state |
| 3 | Abstraction | Clean architecture doc |
⏱️ Time-Box Strategy
Match depth of analysis to time available. Start small - the 2-hour version forces you to find the 20% that explains 80% of behavior.
| Time available | Strategy |
|---|---|
| 2 hours | Entry points only + trace 1 core flow. Produce: layer map + 1 sequence diagram. |
| 1 day | Full Pass 1 + 2 key flows. Produce: component diagram + 2 sequence diagrams + debt list. |
| 3–5 days | All 3 passes + complete output doc. Produce: full architecture document (all 12 sections). |
| 1–2 weeks | Above + runtime validation (profiling, log analysis, developer interviews). |
Pass 1 - Structural Mapping (What exists?)
Objective: Build a complete map of the system before understanding behavior. Entry points and module boundaries are defined differently per stack - see your appendix for specifics.
Step 1 - Identify Entry Points
Every system has entry points: the places where external input arrives. Find them all before going deeper.
| Entry point type | What to look for (generic) | Stack appendix has specifics |
|---|---|---|
| HTTP entry | Route definitions, URL mappings, controller registrations | ✓ |
| Config / DI root | Dependency injection setup, service wiring, IoC container config | ✓ |
| Background jobs | Cron expressions, queue consumers, event listeners, schedulers | ✓ |
| CLI entry | Main functions, command definitions, argument parsers | ✓ |
| Event triggers | Message broker subscriptions, webhook receivers, pub-sub listeners | ✓ |
Entry points define system boundaries. If you haven't found all of them, your component diagram will have invisible holes.
Step 2 - Extract Module Structure
Every codebase has a layering strategy - even if it isn't enforced. Look for these universal layers, regardless of what they're named in your stack:
Figure: System Layer Diagram
Map your codebase's actual folder/module structure onto these layers. Do not assume they're clean - verify it.
Step 3 - Identify Dependencies
Three things to extract, regardless of stack:
1. Build file / package manifest
Every stack has one. This is your fastest source of: language version, framework choice, and all third-party dependencies.
| Stack | Build file |
|---|---|
| Java | pom.xml, build.gradle |
| Python | requirements.txt, pyproject.toml, Pipfile |
| Node.js | package.json, yarn.lock |
| .NET | *.csproj, packages.config, NuGet.config |
| Ruby | Gemfile, Gemfile.lock |
| Go | go.mod, go.sum |
2. Dependency graph between modules
Use static analysis to find which module imports which. Look for violations: does your data layer import from the HTTP layer? Does a domain model import an HTTP client? See appendix for tools.
3. Circular dependencies
Circular imports between modules are a strong signal of missing abstraction. Every stack has a tool to detect them - see appendix.
Pass 2 - Behavioral Mapping (How it works?)
Objective: Stop thinking about files. Think about what happens when a user does something. Trace flows through the full stack.
Step 1 - Flow Prioritization Strategy
Not all flows are equally important. Trace in this order:
| Priority | Flow type | Rationale |
|---|---|---|
| 1 - Revenue-generating | Order placement, payment, account creation | Business stops if these break. Highest scrutiny. |
| 2 - Highest-frequency | Search, list views, dashboard load | Performance bottlenecks live here. |
| 3 - Most complex | Multi-step workflows, approval chains, batch | Hidden state and race conditions live here. |
| 4 - Most failure-prone | External integrations, file uploads, scheduled jobs | Error paths are usually underdocumented. |
| 5 - Authentication | Login, session/token management, logout | Defines the trust model for everything else. |
Stop when you can answer: "What does this system actually do, and where does it break?"
Step 2 - Trace Key Request Flows
For each selected flow, trace the full path from entry to response:
Figure: Request Flow Diagram
For each hop, capture:
- What data enters and exits
- Where state changes (DB write, cache update, session mutation)
- Where the transaction boundary is (if applicable)
- What happens on failure at this hop
Step 3 - Map State, Async, and Concurrency
These three are the most underdocumented aspects of any system:
State management
Where does the system store state between requests? Options: session, JWT, database, cache, in-memory. List every state store and what it holds.
Async boundaries
Where does execution leave the request thread? Message queues, async/await, background workers, webhooks. Each async boundary is an invisible failure point if not documented.
Concurrency
Where can two requests race? Shared in-memory state, non-atomic DB operations, cache-then-write patterns. Document these explicitly - they're where production incidents come from.
Pass 3 - Abstraction (Think Like an Architect)
Objective: Compress what you've learned into diagrams and principles. Stop thinking like a developer.
Step 1 - Apply Architecture Compression Rules
This is the step most engineers skip. Without it, two engineers using this guide produce completely different outputs.
Rule 1: Collapse classes/functions into capabilities
Figure: Classes to capabilities
Rule 2: Collapse endpoints into use cases
Figure: Endpoints to Use cases
Rule 3: Collapse tables/collections into domain concepts
Figure: Tables to domain concepts
Rule 4: Collapse integrations into roles
Figure: Integration to roles
Rule 5: Express layer interactions in one sentence per boundary
Figure: Express layer interactions in one sentence per boundary
"The handler layer delegates all business decisions to the service layer. The service layer owns consistency boundaries and orchestrates repositories. Repositories are the only components that touch the data store."If you cannot write this cleanly, the layers are not clean. Document the violation.
Step 2 - Identify the Architecture Style
| Style | Universal signals | Implication |
|---|---|---|
| Layered monolith | Single deployable, shared DB, clear package layers | Simple ops, hard to scale parts independently |
| Modular monolith | Multiple internal modules with defined boundaries, single deploy | Better separation, still one deployment unit |
| Microservices | Multiple deployables, each with own DB, API-to-API communication | Independent scaling, complex distributed failure modes |
| Event-driven | Message broker central, components subscribe to events | Loose coupling, hard to trace flows end-to-end |
| Serverless | Functions as entry points, no persistent process | Low ops overhead, cold start and state limitations |
| Transitional | Mix of the above - some parts modernised, others legacy | Document which style governs which part explicitly |
If your system is Microservices or Event-Driven: The 3-pass methodology still applies, but the artifacts and tools differ. See the callout below before continuing.
If Your System Is Microservices or Event-Driven
The component diagram, sequence diagram, and anti-pattern checklist all remain valid - but three additional documentation concerns apply that don't exist in monolithic systems.
1. Service Map (replaces the component diagram)
Instead of one component diagram, produce a service map: one box per deployable service, arrows showing synchronous calls (HTTP/gRPC) and asynchronous events (message broker topics). Each box must show:
Figure: Order Service Card
Key additions vs monolith component diagram:
- Each service owns its own DB - document which DB per service
- Every event topic must be named and documented (publisher + all subscribers)
- Synchronous vs asynchronous calls must be visually distinct (solid vs dashed arrows)
2. Inter-Service Contract Documentation
In a monolith, interfaces are enforced by the compiler. In microservices, they are enforced by nothing - they drift silently. Document every service boundary:
| Contract type | Tool | What to capture |
|---|---|---|
| REST APIs | OpenAPI / Swagger | Endpoints, request/response schema, error codes |
| Async events | AsyncAPI | Topic names, message schema, producer, consumers |
| gRPC | .proto files | Service definition, message types, versioning |
Minimum requirement: every service must have an OpenAPI or AsyncAPI spec checked into its repository. If it doesn't exist, creating it is the first Pass 1 output for that service.
3. Distributed Tracing Integration
Sequence diagrams that cross service boundaries cannot be verified by reading code alone. They must be validated using distributed tracing.
# If OpenTelemetry is instrumented - query traces for the flow you documented# Jaeger UI: http://localhost:16686# Filter by: operation name matching your entry point# Verify: does the trace span tree match your sequence diagram?# If no tracing is instrumented - this is the highest-priority debt item# for any microservices system. Add it before documenting other flows.Distributed tracing is Gate 3 (runtime validation) for microservices. Without it, sequence diagrams for cross-service flows are educated guesses.
Gate 3 (runtime validation) is explained in the section "Architecture Validation Loop"
Additional anti-patterns specific to microservices/event-driven:
DISTRIBUTED SYSTEMS[ ] Synchronous chain of 3+ services (distributed monolith - single point of failure)[ ] No circuit breaker between services (cascade failure risk)[ ] Shared database between services (defeats service isolation)[ ] No distributed tracing instrumented (flows are unverifiable)[ ] Event topics with no schema registry (consumer drift)[ ] No dead-letter queue for failed event processing[ ] Services communicating without contract/spec (implicit coupling)[ ] No idempotency on event consumers (duplicate processing risk)Add these to your anti-pattern checklist score if your system is microservices or event-driven.
If Your System Is Serverless
Trigger: Functions as entry points, no persistent process (AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers, Vercel Edge Functions).
The 3-pass methodology applies, but serverless introduces four documentation concerns unique to this style.
1. Function Inventory (replaces the component diagram)
In serverless, the unit of deployment is a function, not a service. Your component diagram becomes a function map:
Figure: ProcessOrder Lambda Card
Key fields every function entry must include:
- Trigger - what invokes it (API Gateway, EventBridge rule, SQS, S3 event, cron)
- Runtime + memory + timeout - these are architectural decisions, not config details
- Cold start p95 - the hidden latency that breaks SLAs at low traffic
- State it reads/writes - functions are stateless; state lives elsewhere, document where
2. State Inventory (the critical artifact)
Serverless functions are stateless by design. All state lives outside the function. Document every state store explicitly - this is the architectural skeleton of a serverless system:
| State type | Store used | What it holds | Access pattern |
|---|---|---|---|
| Persistent data | DynamoDB / RDS / S3 | Orders, customers, products | Read-heavy, keyed by ID |
| Session / auth | JWT (stateless) or ElastiCache | User claims | Per-request verification |
| Distributed cache | ElastiCache / Redis | Product catalog, config | TTL-based, warm reads |
| Async queue | SQS / EventBridge | Events between functions | At-least-once delivery |
| File / blob | S3 | Uploads, reports, exports | Event-triggered processing |
If a function reads or writes to something not on this list, the state inventory is incomplete.
3. Cold Start Documentation
Cold start latency is the most common serverless architectural gap - it's invisible during development and catastrophic at low-traffic production. Document it explicitly:
Figure: Cold start profile
Functions on synchronous user-facing paths (API Gateway, ALB) must have cold start latency explicitly accepted or mitigated. Functions on async paths (SQS, EventBridge) can tolerate cold starts - document which category each function falls into.
4. Execution Boundary Tracing
Serverless flows are harder to trace than monolith flows because execution is distributed across functions with no shared call stack. Every flow must be documented as a chain of trigger → function → output:
Place Order flow (serverless):1. User → API Gateway POST /orders2. → ProcessOrder (Lambda, sync) → validates input → writes ORDER record (DynamoDB) → emits order.placed (EventBridge) → returns 202 Accepted [user response ends here]3. order.placed event → EventBridge rule → ReserveInventory (Lambda, async) → reads PRODUCT record (DynamoDB) → updates STOCK_LEVEL (DynamoDB, conditional write) → emits inventory.reserved OR inventory.insufficient4a. inventory.reserved → ChargePayment (Lambda, async) → calls PaymentGateway (external, 30s timeout) → writes PAYMENT record (DynamoDB) → emits payment.completed OR payment.failed4b. inventory.insufficient → NotifyUser (Lambda, async) → sends email via SES → writes ORDER status = CANCELLED (DynamoDB)Every async hop is a potential failure point. Document the dead-letter queue and retry behavior at each async step.
Additional anti-patterns specific to serverless:
SERVERLESS[ ] Synchronous function chain > 3 hops (compounds cold start latency)[ ] Function timeout set to maximum (masks slow dependencies)[ ] Shared mutable state between function invocations (race condition)[ ] No cold start latency documented for user-facing functions[ ] Missing dead-letter queue on async triggers (silent failure)[ ] Database connection pool not managed per invocation (connection exhaustion)[ ] Secrets hardcoded in environment variables (should use Secrets Manager)[ ] No distributed tracing (X-Ray / OpenTelemetry) - flows unverifiable[ ] Function doing > 1 business responsibility (violates single-purpose principle)Add these to your anti-pattern checklist score if your system is serverless.
Step 3 - Identify Cross-Language Design Patterns
These patterns appear in every stack. Learn to recognise them regardless of language:
| Pattern | What it looks like (language-agnostic) |
|---|---|
| Repository | A dedicated class/module for all data access - the only place that touches the DB |
| Service/Use Case | Orchestrates multiple repositories and domain objects for one business operation |
| Adapter/Client | Wraps an external API, translates its interface to your domain's language |
| Middleware/Filter | Intercepts every request - used for auth, logging, rate limiting |
| Factory | Centralises object creation, hides construction complexity |
| Observer/Event bus | Decouples components by emitting events rather than calling directly |
| CQRS | Separates read and write models - two different paths through the stack |
| Saga | Manages multi-step distributed transactions through compensating actions |
Step 4 - Document Actual vs Intended Architecture
The most valuable output is the gap analysis: what the architecture was designed to be vs what it has become.
| Area | Universal check |
|---|---|
| Layer violations | Does any layer import from a layer it shouldn't depend on? |
| Fat handlers | Do HTTP handlers contain business logic beyond routing and input parsing? |
| Logic in views | Do templates/views make data decisions rather than just rendering? |
| God modules | Does one module own too many unrelated responsibilities? |
| Bypassed consistency | Are writes happening outside the documented consistency boundary? |
| Undocumented async | Are there background operations not represented in the architecture? |
Anti-Pattern Detection Framework
Run this language-agnostic checklist against any codebase. Score your findings.
LAYER VIOLATIONS[ ] Handler/controller directly calls data access layer (bypasses service)[ ] Domain/model layer imports from HTTP or UI layer[ ] Data access layer contains business logic beyond querying[ ] Service layer imports from handler/controller layerHANDLER / VIEW LAYER[ ] Handler method exceeds ~50 lines of logic[ ] Business rules computed inside a template or view[ ] State mutated directly in the handler without going through a service[ ] Auth checks scattered across handlers (not centralised in middleware)CONSISTENCY / STATE[ ] Multi-step write operation with no rollback/compensation if step N fails[ ] In-memory state shared across requests without synchronisation[ ] Cache written before DB (or vice versa) without atomic update[ ] Session/token stores domain objects that should live in the DBSERVICE / ORCHESTRATION LAYER[ ] God service: one module/class handling > 3 distinct business domains[ ] Business logic duplicated across multiple services[ ] Service calls another service directly, creating hidden coupling[ ] No clear owner for a cross-domain operation (logic scattered)DATA ACCESS[ ] N+1 queries (queries inside loops)[ ] Raw query strings concatenated with user input (injection risk)[ ] No query abstraction - SQL/query language used directly across many layers[ ] Missing index on frequently filtered or joined columnINTEGRATIONS[ ] External API called directly from service with no adapter/client wrapper[ ] No timeout configured on outbound HTTP calls[ ] No retry or circuit breaker on failure-prone external calls[ ] External API error codes leaked as-is into internal domain errorsScoring Guide
| Score | Health |
|---|---|
| 0–3 violations | Healthy - normal technical debt |
| 4–8 violations | Moderate debt - prioritize top 3 for refactoring |
| 9–15 violations | High debt - architecture needs active remediation |
| 16+ violations | Critical - modernization planning required before feature work |
Common Failure Modes
Use these as a self-correction checklist before sharing any architecture output.
❌ Failure 1 - Too Granular (Function/Class-Level Diagram)
Signal: More than 12 components in your diagram.
Fix: Apply Compression Rule 1. Merge anything that serves the same capability. A good architecture diagram has 5–12 components.
❌ 42 components (basically a class diagram)✅ 7 capabilities: User Mgmt · Order Mgmt · Payment · Inventory · Notification · Reporting · Admin❌ Failure 2 - Too Abstract (No Actionable Structure)
Signal: Every component description uses only business words with no pointer to real code.
Fix: Every component must map to at least one real module, class, or file in the codebase.
❌ "The system manages the customer journey end-to-end"✅ OrderService (src/services/order.py) → handles place, modify, cancel operations depends on: InventoryRepository, PaymentClient, NotificationService❌ Failure 3 - Wrong Boundaries (Domain Leakage)
Signal: A component's responsibility statement uses "and" more than once across unrelated domains.
Fix: Split along noun/domain boundaries. Payment belongs in a Payment component even if OrderService currently calls it.
❌ OrderManagement: orders AND payments AND invoices AND customer lookup✅ OrderManagement: order lifecycle only PaymentProcessing: charge, refund, gateway CustomerManagement: lookup, profile Invoicing: document generation❌ Failure 4 - Missing Flows (Static Architecture Only)
Signal: The doc has component diagrams but no sequence diagrams.
Fix: Add at least one end-to-end sequence diagram for the primary revenue-generating flow.
❌ Doc contains: component diagram + layer table + data model Missing: any sequence diagram showing what actually happens at runtime✅ Add: Key Flow - Place Order User → POST /orders → AuthMiddleware → OrderHandler → OrderService (consistency boundary begins) → InventoryRepo.reserve() → PaymentClient.charge() [external, outside boundary] → OrderRepo.save() → (consistency boundary commits) → response 201❌ Failure 5 - Aspirational Architecture (Documents Intent, Not Reality)
Signal: The doc was written from design docs or README, not from reading the actual code.
Fix: Run the anti-pattern checklist. Mark anything unverified as [unverified - needs validation]. Get a developer who works in the codebase daily to review it.
❌ "Clean layered architecture, no business logic in handlers" (written from the design doc, not the code)✅ "Intended: layered. Actual: 14 handler methods contain business logic. See debt list section 11 for full inventory."Quick Self-Correction Checklist
[ ] Component count is 5–12[ ] Every component maps to a named capability (not a function/class name)[ ] No component description uses "and" across unrelated domains[ ] At least 1 end-to-end sequence diagram exists[ ] Every external dependency has a role name (not just a product name)[ ] Document validated against actual code (not just design docs or README)[ ] A developer has confirmed the component boundaries[ ] Architectural debt is documented, not hiddenGolden Output Example
This is the calibration target. Your output should be this compressed, this diagram-driven, and this decision-ready. The example uses an e-commerce order system - adapt the structure to your domain.
System: E-Commerce Order Processing Platform
Language/framework: (fill in your stack)
Architecture style: Layered Monolith
Component Overview
Figure: Component Overview Diagram
Key Flow: Place Order
Figure: Key Flow: Place Order
Domain Concepts
| Concept | Storage | Aggregate root |
|---|---|---|
| Order | orders, order_lines, order_status | Order |
| Customer | customers, addresses | Customer |
| Inventory | products, stock_levels, reservations | Product |
| Payment | payment_records, payment_attempts | Payment |
Architecture Health
| Check | Status |
|---|---|
| Layer violations | 2 found - document in debt list |
| Logic in handlers | 6 handlers contain business logic |
| God modules | None - services well-scoped |
| Async boundaries documented | Notification only |
| Anti-pattern score | 5/35 - Moderate debt |
Modernization Priorities
- Move business logic out of 6 handlers → service layer (low risk)
- Extract PaymentProcessing into standalone module (enables independent deploy)
- Add circuit breaker to PaymentGateway client (reduces incident blast radius)
Architecture Validation Loop
Architecture is only trustworthy when validated. The loop has 5 steps - do not skip Step 5.
Figure: Architecture Validation Loop
Gate 1 - Static Validation
Goal: Confirm your component diagram reflects what the code actually imports, not what you assumed.
- Run your stack's dependency analysis tool (see appendix)
- Check: does every arrow in your diagram correspond to a real import?
- Check: are there imports in the code that have no arrow in your diagram?
- Run the anti-pattern checklist - score it
Failure signal: Any module importing from a layer it shouldn't depends on.
Gate 2 - Developer Validation
Goal: Confirm component names and boundaries match how the team thinks about the system.
Who: The engineer with the most commits in the past 12 months - not the original designer.
Questions to ask:
1. "Does this component diagram match how you'd explain the system to a new hire?"2. "Are there components I've merged that you'd keep separate?"3. "Is there behavior that doesn't appear in these diagrams?"4. "Does this consistency boundary diagram match what actually commits and rolls back?"5. "Which parts would you argue with?"Failure signal: Developer disputes more than 2 component boundaries - compression decisions were wrong. Redo with their input.
Gate 3 - Runtime Validation
Goal: Confirm your documented flows match actual production behavior.
- Extract most-called endpoints from access logs
- Find most-executed code paths from APM or tracing
- Compare slow operations against documented flows
- Check: do the most-called paths match what you documented?
- Check: are there high-frequency paths not in your diagrams?
| Metric | What it validates |
|---|---|
| Endpoint call frequency | That your priority flow ranking is correct |
| Slowest operations | That complex flows are correctly identified |
| DB query count per request | That N+1 patterns are in your debt list |
| Async queue depth | That your async boundaries can handle load |
| Error rate per component | That failure-prone flows are correctly flagged |
Failure signal: High-frequency production path not in your sequence diagrams → missing component or flow.
Gate 4 - Failure Validation
Goal: Confirm your architecture doc can explain what breaks and what survives under failure.
Walk each scenario against your doc. The doc should be able to answer "what breaks?" and "what's the fallback?"
Scenario 1: Primary database unavailable→ Which components fail immediately?→ Which can degrade gracefully?→ Is there a circuit breaker? Does the doc show it?Scenario 2: External payment/API service returns 503→ Does the sequence diagram show the error path?→ Does the consistency boundary roll back?→ What is the user-visible impact?Scenario 3: Message broker / queue unavailable→ Which async flows stop?→ Are async dependencies documented in the component diagram?→ What is the fallback for each?Scenario 4: Deployment / process restart mid-request→ What state is lost?→ Is the state inventory in the doc accurate?→ What operations are not idempotent?Failure signal: A known production incident from the past 12 months cannot be explained by the doc → the doc is wrong. Find the incident, trace what actually happened, update the doc.
Step 5 - Refinement (Closing the Loop)
Validation without refinement is just criticism. After each gate, findings must flow back into the document.
| Gate | If gaps found, update these |
|---|---|
| Gate 1 - Static | Component diagram, layer violation inventory |
| Gate 2 - Developer | Component names, boundaries, anything disputed |
| Gate 3 - Runtime | Sequence diagrams, priority flow ranking, NFR section |
| Gate 4 - Failure | Error paths in sequence diagrams, fallback behavior, deployment notes |
Refinement rules:
✅ Every gap found → update diagram OR add to debt list (never silently ignore)✅ Every disputed boundary → redrawn with developer input✅ Every unverified claim → marked [unverified] until confirmed✅ Every production incident that the doc can explain → referenced as validation evidence❌ Never mark validation complete if Gate 2 found disputes❌ Never leave a known gap undocumentedValidation Cadence
| When | What |
|---|---|
| Before publishing | Gates 1 + 2 - mandatory |
| Within 2 weeks of publishing | Gate 3 |
| Quarterly | Gate 4 + developer re-review + Step 5 |
| After major releases | All 4 gates + Step 5 |
An architecture doc not validated in 6 months should be treated as a hypothesis, not a fact.
AI-Assisted Workflow
Universal Chunking Strategy
- Start with the build/package file - establishes framework context for all subsequent prompts
- Chunk by module/package, not by file - intra-module relationships are where design decisions live
- Send handlers with their backing services - a handler without its service loses half the context
- Send integration clients with the services that call them - timeout and error handling decisions live in this pairing
What NOT to Do
Figure: AI Workflow Do Not Rules
Universal Prompts by Artifact Type
For any service/use-case module
Analyze this module. Describe:1. Its business responsibility in one sentence (capability, not class name)2. The operations it exposes3. Its dependencies (data access, external clients, other services)4. Where it owns a consistency boundary (transaction, saga, etc.)5. Any notable patterns or anti-patternsApply compression: name the capability, not the class.For any handler/controller/route
Analyze this handler/route file. Describe:1. The URL(s) and HTTP methods it handles2. What input it reads (path params, query, body, headers, session/token)3. What service/use-case it delegates to4. What it returns or renders5. Flag any business logic that should not be in the handler layerMap it as a request flow: entry → middleware → handler → service → response.For any data access layer (ORM model, repository, DAO)
Analyze this data access module. Describe:1. The table(s) or collection(s) it maps to2. The queries or operations it exposes3. Any relationships and their loading strategy (eager/lazy)4. Any identified N+1 query risks5. Whether it contains business logic it shouldn'tFor any external integration client/adapter
Analyze this integration client. Describe:1. What external service it wraps (name it by role, not product)2. The operations it exposes to the internal system3. Timeout, retry, and circuit breaker configuration4. How external errors are translated to internal domain errors5. Whether the external API contract is hidden behind an interfaceFor any build / package manifest
Analyze this build file. Extract:1. Language version and target runtime/platform2. All major framework and library dependencies with versions3. Any build plugins or tools that affect code generation or behavior4. Module/package structure if multi-module5. Any outdated, deprecated, or conflicting dependenciesFor generating an ADR from a known decision
I need to write an Architecture Decision Record for the following decision:Decision context: [describe the situation that forced the decision]What was decided: [state the decision in one sentence]When it was made: [date or approximate period]Constraints at the time: [technical, resource, time, or org constraints]Alternatives that were considered: [list them]Generate a complete ADR using this structure:1. Title (ADR-NNN: [imperative verb + subject])2. Status: Accepted3. Context (2–4 sentences - what was true at the time that forced this decision)4. Decision (1–2 sentences, direct - no hedging)5. Alternatives considered (table: option | why rejected)6. Consequences (positive / negative / risks)7. Review date (suggest a specific date based on the decision type)Write it as if you are the engineer who made the decision, in past tense.Be specific - avoid generic phrases like "to improve performance."For translating a debt list to CTO / executive audience
Below is a technical architectural debt list from an engineering analysis.Rewrite it for a CTO / VP Engineering audience using these rules:1. Replace every technical term with its business impact (e.g. "N+1 query" → "database asks the same question N times under load")2. Frame each item as: [what it is] → [what happens if we don't fix it] → [what fixing it enables]3. Group items into three tiers: - Immediate risk (production incidents or security exposure likely) - Delivery friction (slowing feature development) - Strategic constraints (limiting future scaling or modernization)4. For each item, provide: effort estimate in weeks, not story points5. End with a recommended priority order and the business case for the top 3Technical debt list to translate:[paste your anti-pattern checklist results here]For scoring anti-patterns on the risk × effort matrix
Below is a list of anti-patterns found in a codebase analysis.Score each one on two dimensions and assign a decision:Risk score (1–5): 5 = Data loss, security breach, or production outage likely 4 = Significant user impact or revenue loss possible 3 = Degraded performance or increased incident frequency 2 = Developer friction - slows delivery, doesn't break anything 1 = Cosmetic / style - no functional impactEffort score (1–5): 5 = Architectural overhaul - months 4 = Multi-sprint refactor - weeks, multiple teams 3 = Single-sprint fix - 1–2 weeks, one team 2 = Days of work - targeted, low risk 1 = Hours - config change or single fileDecision rules: High risk + Low effort → Fix Now High risk + Med effort → Fix Next Sprint High risk + High effort → Plan & Track Med risk + Low effort → Fix When Passing Med risk + Med effort → Backlog With Date Low/Med + High effort → Accept & Document Low risk + Any effort → SkipFor each item produce: Item | Risk score (with one-sentence justification) | Effort score (with one-sentence justification) | DecisionAnti-patterns to score:[paste your checklist findings here]For validating a sequence diagram against logs
I have a documented sequence diagram for [flow name]:[paste your sequence diagram here]And the following log excerpt from a production trace of the same flow:[paste log excerpt here]Compare them. For each hop in the sequence diagram:1. Does it appear in the logs?2. Are there hops in the logs that are NOT in the diagram?3. Does the ordering match?4. Do the error paths in the diagram match what the logs show?Produce:- A verified steps list (diagram matches log)- A discrepancy list (diagram says X, log shows Y)- Missing steps list (in logs but not in diagram)- Recommended diagram updatesSynthesis Prompt - Deriving Architecture
Given these module summaries, apply the following compression rules:- Collapse modules/classes into capabilities (not names)- Collapse endpoints into use cases- Collapse tables/collections into domain concepts- Collapse integrations into rolesThen produce:1. Architecture style (one of: layered monolith / modular monolith / microservices / event-driven / serverless / transitional)2. Component diagram (ASCII or Mermaid) with 5–12 components max3. Primary request flows as numbered step sequences4. Key design patterns identified5. Top 5 architectural debt items ranked by risk × effortFormat as an architecture overview document with explicit diagrams.Full Pipeline
| Stage | Input | Tool | Output |
|---|---|---|---|
| 1. Static analysis | Source / compiled output | Stack-specific (see appendix) | Dependency graph |
| 2. Build file parsing | Package manifest | LLM | Framework + dependency inventory |
| 3. Module summarization | Source files (chunked by module) | LLM | Module summaries |
| 4. Compression | Module summaries | LLM + compression rules | Capability map (5–12 components) |
| 5. Pattern detection | Summaries + capability map | LLM + anti-pattern checklist | Pattern catalog, debt score |
| 6. Diagram generation | Capability map + dependency graph | LLM → Mermaid | Component, sequence, ER diagrams |
| 7. Validation | Diagrams + logs + developer | Gates 1–4 | Verified or flagged claims |
| 8. Refinement | Validation findings | Human | Updated diagrams + debt list |
| 9. Doc assembly | All above | LLM | Final architecture doc |
Building an Agentic Pipeline for This Methodology
If you want to automate this methodology — turning a Git repository into an architecture document without manual passes — the guide maps cleanly onto a multi-agent pipeline. This section describes the minimal viable architecture for that system.
Why a pipeline, not a simple LLM call
Three things in this methodology require more than a single LLM invocation:
Cycles. The refinement step (after Gate 4) may loop back to compression if major gaps are found. A chain has no way to do this — it falls over at the loop boundary. You need a graph-based orchestrator.
Human-in-the-loop. Gate 2 (developer review) cannot be automated. Component relationship errors appear in ~30% of LLM-generated diagrams. Without a human checkpoint, you produce a confident but wrong document. The pipeline must be able to pause, surface the draft to a reviewer, collect structured feedback, and resume.
State across 12 nodes. Each node needs access to everything produced before it — module summaries, flow traces, the component diagram, gate violations, human feedback. This state must persist across the human interrupt (which may take hours) and be recoverable if any node fails.
Use LangGraph. It gives you cycles, interrupt nodes, checkpointing, and state persistence out of the box. A simple chain cannot satisfy all three requirements above. Every team that tries to build this on a chain hits the cycle problem on the first real codebase.
The 12-node pipeline
Node 1: ingest_repo → Clone repo, parse build file, extract dependency graphNode 2: chunk_by_module → Group files by module boundary (never by individual file)Node 3: pass1_structure → LLM: summarise each module (G.1–G.6 prompts)Node 4: pass2_behavior → LLM: trace top 3 flows, map state and async boundariesNode 5: pass3_compress → LLM: apply compression rules, generate component diagramNode 6: assemble_draft → Fill 12-section template, mark all claims [unverified]Node 7: gate1_static → Run jdeps/depcruise, diff import graph vs diagramNode 8: gate2_human → interrupt() — pause for developer review and feedbackNode 9: gate3_runtime → Fetch logs/APM, compare top flows against sequence diagramsNode 10: gate4_failure → LLM: test doc against 4 standard failure scenariosNode 11: refinement → Update diagrams + debt list; loop back if major gaps foundNode 12: publish_doc → Write final Markdown + Mermaid, generate ADR stubsNode 8 is the most important node in the pipeline. Do not remove it to make the system "fully automated." The 30% error rate on component relationships is not a prompting problem — it is a structural problem with LLM-generated diagrams that only developer review resolves.
Three non-negotiable design decisions
1. Chunk by module, not by file
The single biggest source of quality failures in codebase analysis agents is sending individual files to the LLM. A PaymentService.java analyzed in isolation looks self-contained — it isn't. Its dependencies, its callers, and the boundary decisions around it are all in the surrounding package. Node 2 must group every file by its module boundary before any LLM call in Node 3.
# Wrong — loses inter-module contextfor file in repo.all_files(): summary = llm.summarise(file)# Right — preserves module relationshipsfor module in repo.modules_by_package(): summary = llm.summarise(module.all_files())2. Apply compression rules as a post-processing step
LLMs decompose codebases into components named after classes and files. The compression agent (Node 5) must explicitly apply the 5 compression rules after the LLM's first output — never publish the raw decomposition. Build a structured prompt that enforces: collapse classes into capabilities, collapse endpoints into use cases, enforce a 5–12 component ceiling.
3. Design the state schema before writing any node
Every node reads from and writes to shared state. Define this schema before building Node 1:
class PipelineState(TypedDict): repo_path: str build_file_summary: dict module_chunks: list[ModuleChunk] module_summaries: list[ModuleSummary] flow_traces: list[FlowTrace] component_diagram: str # Mermaid source anti_pattern_score: int gate1_violations: list[str] human_feedback: str gate3_gaps: list[str] gate4_gaps: list[str] final_doc: str debt_register: list[DebtItem]If you design state as you go, Node 8 will break because it can't find what Node 3 produced.
Where to start
Build nodes 1, 2, and 3 only. Get module chunking working reliably on one real codebase and validate that the module summaries are accurate before touching compression or the validation gates. The quality of everything downstream — the component diagram, the sequence diagrams, the anti-pattern score — is determined entirely by how well Node 2 chunks and Node 3 summarises.
Do not start with the diagram generation. Start with the summaries.
Output Document Template
Sections marked (stack-specific) should be filled in using your appendix.
01 - Executive Overview
System purpose, business context, user base. Architecture style in one paragraph. Key business capabilities supported.
02 - System Architecture
Architecture style. Component diagram (5–12 components). Runtime topology (app server/platform, DB, cache, message broker). Language, framework, runtime version. (stack-specific)
03 - Entry / Handler Layer (stack-specific)
Route/URL mapping table (path → handler). Middleware/filter chain with purpose of each. Session/token management strategy. Framework-specific patterns in use.
04 - Component Design
Capability table (not class names). Per-component: responsibility, interfaces, dependencies. Inter-component dependency diagram. Design patterns identified per component.
05 - Data Flow - Key Use Cases
Sequence diagram per flow (revenue-generating first). Consistency boundary shown per flow. Error handling paths. Async vs sync distinction.
06 - Data Model
ER or schema diagram (key entities only - 80/20 rule). Domain concept groupings. ORM/ODM relationship summary. Any raw query areas outside the ORM.
07 - Security Architecture
Authentication mechanism. Authorization model. Session/token security. Identified gaps. (stack-specific)
08 - External Integrations
All outbound calls grouped by role. Message brokers. Legacy interfaces. Timeout/retry per integration. Error/fallback behavior.
09 - Deployment and Operations (stack-specific)
Deployment unit and target platform. Config management approach. Logging and observability setup. Process management and scaling approach.
10 - Key Design Decisions and Tradeoffs
Why this architecture. Major framework decisions. Known tradeoffs. Regretted decisions.
11 - Architectural Debt and Modernization Map
Anti-pattern checklist score. Layer violation inventory. Refactoring priorities (ranked risk × effort). Migration paths under consideration.
12 - Constraints and Non-Functional Requirements
Performance (observed). Scalability limits. Maintainability. Test coverage gaps. Compliance requirements.
Minimum Viable Architecture Doc (When to Stop)
The guide tells you what to document. This section tells you when enough is enough.
The trap most engineers fall into: treating architecture documentation as a completeness exercise. It isn't. Stop when the document can answer the stakeholder's actual question. Every additional section beyond that point has diminishing returns.
The Minimum Viable Doc by Stakeholder Question
| Stakeholder question | Minimum artifacts required | You can skip |
|---|---|---|
| "What does this system do?" | Executive overview (01) + component diagram (02) | Sections 03–12 |
| "Why is this slow / breaking?" | Key flows (05) + anti-pattern score (11) | Sections 01, 06–10, 12 |
| "Can we onboard a new engineer?" | Sections 01, 02, 03, 05 + stack appendix | Sections 07–12 |
| "Is it safe to go to production?" | Security (07) + deployment (09) + NFR (12) | Most of 03–06 |
| "Should we modernize / rewrite?" | Debt map (11) + design decisions (10) + NFR (12) | Sections 03–06 |
| "Full architecture review" | All 12 sections | Nothing |
The "Done Enough" Test
Before adding another section, ask:
1. Who specifically will read this section?2. What decision will it help them make?3. If I skip it, what is the worst realistic outcome?If you cannot name a real person and a real decision for question 1 and 2, stop. The section is for completeness, not utility.
Minimum Viable Doc: The 3-Artifact Floor
Regardless of stakeholder, every architecture output must have at least these three artifacts before it can be called architecture (not just notes):
✅ 1. Component diagram (5–12 components, capability-named) → Answers: what are the moving parts?✅ 2. One end-to-end sequence diagram (the primary revenue flow) → Answers: what actually happens at runtime?✅ 3. Debt score from the anti-pattern checklist → Answers: how healthy is it, and what's the biggest risk?Everything else is depth on top of this floor. If you have time for only one thing: produce these three artifacts and stop.
When to Expand Beyond the Floor
Add sections when a specific gap creates a specific risk:
| Gap | Risk if undocumented | Add section |
|---|---|---|
| Security mechanism unknown | Audit failure, breach | 07 - Security |
| Deployment process unclear | Production incident | 09 - Deployment |
| External dependencies untracked | Integration failure, vendor lock-in | 08 - Integrations |
| Data model not understood | Data loss, migration failure | 06 - Data Model |
| Performance constraints unknown | SLA breach | 12 - NFR |
Audience Adaptation Guide
The same architecture produces a different document depending on who reads it. Don't produce one document and hope everyone gets what they need from it. Adapt deliberately.
Audience Profiles
CTO / VP Engineering (Strategic)
What they need: Business impact, risk exposure, modernization cost, decision support.
Reading time available: 5–10 minutes.
Format: Executive brief, not technical deep-dive.
Include:✅ Section 01 (Executive Overview) - expanded, business-language✅ Section 02 (Architecture) - component diagram only, no code references✅ Section 10 (Design Decisions) - framed as "why we built it this way"✅ Section 11 (Debt Map) - framed as risk and cost, not technical violations✅ Modernization options table (3 options with effort/risk/benefit)Remove or move to appendix:❌ All code snippets and bash commands❌ Section 03 (Handler Layer detail)❌ Section 06 (Data Model detail)❌ Anti-pattern checklist raw outputFraming rule: Replace all technical terms with business impact statements.
❌ Technical: "14 JSP files contain scriptlets - layer violation"✅ Business: "14 UI components contain business logic - each change requires a developer instead of a designer, slowing feature delivery by ~2 days per change"Engineering Team / New Engineers (Operational)
What they need: How to work in the system. What exists, where things live, what the rules are.
Reading time available: 30–60 minutes.
Format: Reference document they can return to.
Include:✅ All 12 sections✅ Stack appendix (their specific stack)✅ Anti-pattern checklist (so they know what to avoid adding)✅ Sequence diagrams with code-level detail✅ Layer interaction rules (what is and isn't allowed)Emphasise:→ Section 03 (Handler Layer) - where to add new endpoints→ Section 04 (Component Design) - which service to call for what→ Section 05 (Data Flows) - how to trace a bug through the system→ Section 11 (Debt Map) - what not to copy from existing codeSecurity / Compliance Auditor
What they need: Evidence of controls, identified gaps, data flows for sensitive data.
Reading time available: 20–30 minutes on their priority sections.
Format: Structured, citable, gap-explicit.
Include:✅ Section 07 (Security Architecture) - primary section, fully detailed✅ Section 05 (Data Flows) - annotated with: what data, where it goes, who can see it✅ Section 08 (Integrations) - every outbound call with auth mechanism✅ Section 09 (Deployment) - config management, secrets handling✅ Section 12 (NFR) - compliance requirements explicitly called outFormat requirements:→ Every security control must state: what it protects, how it's enforced, where it breaks→ Gaps must be listed explicitly - do not omit or soften them→ Use "identified gap" not "area for improvement"External Consultant / Modernization Assessor
What they need: Current state, debt score, change risk, migration options.
Reading time available: 1–2 hours, deep read.
Format: Complete picture with no gaps hidden.
Include:✅ All 12 sections✅ Actual vs intended architecture gap analysis (Section 11)✅ Full anti-pattern checklist results - raw, unfiltered✅ Architecture style identification (Section Pass 3, Step 2)✅ Known failure modes and production incidents referencedCritical: do not sanitize the debt map for this audience.A consultant working from an optimistic debt list will recommendthe wrong modernization path.Adaptation Checklist
Before sharing any architecture doc, confirm:
[ ] I know who will read this[ ] I have removed or summarised sections they won't use[ ] Technical terms affecting non-technical readers have been translated[ ] The debt/risk section is framed in the language this audience uses[ ] The document is the right length for their available reading timeDebt Risk × Effort Matrix
Finding debt is not enough. The hardest question is: fix it now, or document it and live with it?
This matrix answers that question with a structured decision framework.
Step 1 - Score Each Debt Item
For each item from your anti-pattern checklist, score it on two axes:
Risk score (1–5): What is the blast radius if this goes wrong?
| Score | Meaning |
|---|---|
| 5 | Data loss, security breach, or production outage likely |
| 4 | Significant user impact or revenue loss possible |
| 3 | Degraded performance or increased incident frequency |
| 2 | Developer friction - slows delivery, doesn't break anything |
| 1 | Cosmetic / style - no functional impact |
Effort score (1–5): How much work to fix it?
| Score | Meaning |
|---|---|
| 5 | Architectural overhaul - months, high coordination cost |
| 4 | Multi-sprint refactor - weeks, affects multiple teams |
| 3 | Single-sprint fix - 1–2 weeks, one team |
| 2 | Days of work - targeted, low risk |
| 1 | Hours - a config change or a single file |
Step 2 - Place on the Matrix
Figure: Debt Risk Effot Matrix
Step 3 - Apply the Decision Rules
FIX NOW (High risk, Low effort)
Do this before the next release. These are the "free" wins - disproportionate risk reduction for minimal cost.
Example: Raw SQL string with user input (SQL injection risk, 2-hour fix)
FIX NEXT SPRINT (High risk, Medium effort)
Schedule explicitly. Do not let these sit in a backlog with no date - they will never happen.
Example: No circuit breaker on payment gateway calls (production incident risk, 1-week fix)
PLAN & TRACK (High risk, High effort)
These require a modernization project. Quantify the risk annually and use it to justify the investment.
Example: Monolith preventing independent scaling of payment service
FIX WHEN PASSING (Medium/Low risk, Low effort)
Fix opportunistically - when a developer is already in that file for another reason.
Example: Handler method at 60 lines, should be 40
BACKLOG WITH DATE (Medium risk, Medium effort)
Add to backlog with a real review date. If the date passes without action, escalate the risk score.
Example: 14 view files containing business logic - functional, but slowing delivery
ACCEPT & DOCUMENT (Low/Medium risk, High effort)
Explicitly accept this debt. Document it as a known constraint, not a gap. Include it in onboarding so new engineers understand it's intentional.
Example: Legacy SOAP integration that would take months to replace - system works fine with it
SKIP (Low risk, any effort)
Do not spend architectural attention here. Let linters handle it.
Step 4 - Produce the Debt Register
For each item in "Fix Now" or "Fix Next Sprint," create a debt register entry:
Debt item: [Name of the anti-pattern]Location: [File / module / layer where it occurs]Risk score: [1–5] - [one sentence explaining the risk]Effort score: [1–5] - [one sentence explaining the fix]Decision: [Fix Now / Fix Next Sprint / Plan / Accept / Skip]Owner: [Team or engineer responsible]Target date: [Specific date, not "Q3" - vague dates mean never]Evidence: [Link to the specific code location]Worked Example - Debt Scoring in Practice
This example shows 6 real debt items scored, placed on the matrix, and assigned decisions. Use it to calibrate your own scoring.
| # | Debt item | Location | Risk | Effort | Decision |
|---|---|---|---|---|---|
| 1 | Raw SQL concatenated with user input | ReportController.java:214 | 5 - SQL injection, data breach risk | 1 - Single parameterised query fix | FIX NOW |
| 2 | No timeout on payment gateway HTTP client | PaymentClient.java | 5 - Gateway hang = app hang, revenue loss | 2 - One config property | FIX NOW |
| 3 | OrderController calls InventoryDAO directly | OrderController.java:89 | 3 - Bypasses transaction scope, occasional inconsistency | 3 - Move call through OrderService | FIX NEXT SPRINT |
| 4 | 14 JSP files contain scriptlet business logic | WEB-INF/views/*.jsp | 2 - Slows delivery, doesn't break anything | 4 - Migrate to Thymeleaf, multi-sprint | BACKLOG WITH DATE |
| 5 | OrderService owns Order + Payment + Invoicing | OrderService.java | 3 - Single point of failure, deployment coupling | 5 - Domain split, architectural project | PLAN & TRACK |
| 6 | Handler method names inconsistently cased | AdminController.java | 1 - Cosmetic, no functional impact | 1 - Auto-fixable with linter rule | SKIP / AUTO-LINT |
Scoring notes for borderline cases:
Risk 3 vs 4 - ask: "Has this caused a production incident in the past year?" Yes → score 4. No → score 3.Effort 3 vs 4 - ask: "Does fixing this require coordinating more than one team?" Yes → score 4. No → score 3.When in doubt, score higher on risk, lower on effort. Higher risk → more urgency to fix. Lower effort → easier to justify doing it now.Debt register entries for the "Fix Now" items:
Debt item: Raw SQL concatenated with user inputLocation: src/main/java/com/example/ReportController.java:214Risk score: 5 - SQL injection vulnerability, direct data breach pathEffort score: 1 - Replace string concat with PreparedStatement (2 hours)Decision: Fix NowOwner: Platform teamTarget date: 2025-02-14Evidence: github.com/org/repo/blob/main/.../ReportController.java#L214---Debt item: No timeout on payment gateway HTTP clientLocation: src/main/java/com/example/PaymentClient.javaRisk score: 5 - Gateway hang blocks all payment threads, revenue stopsEffort score: 2 - Set connectTimeout + readTimeout in RestTemplate configDecision: Fix NowOwner: Payments teamTarget date: 2025-02-14Evidence: github.com/org/repo/blob/main/.../PaymentClient.java#L31Team & Org Context
The guide assumes one engineer doing the analysis alone. Real enterprise scenarios are messier. This section covers the human and political dimensions.
When Engineers Disagree on the Architecture
Disagreement about what the architecture is (not what it should be) is more common than it looks and has a specific cause: different engineers have different accurate views of different parts of the system. Both views are correct for their slice. The architecture doc must reconcile them, not pick a winner.
Resolution process:
Step 1: Map the disagreement precisely → "We disagree about whether X calls Y directly" is precise → "We disagree about the architecture" is not → If you can't write the disagreement in one sentence, it's not clear enough to resolveStep 2: Go to the code, not to consensus → Check the actual import graph, not memory → Run the dependency analysis tool (see appendix) → The code is the ground truth - opinions are notStep 3: Document both views if the code is transitional → "Intended: X → Service → Y. Actual: 3 of 14 handlers call Y directly. Migration is 60% complete." → Don't flatten a transitional state into a clean diagramStep 4: If still unresolved after step 2, the dispute is about the intended architecture, not the current one → Separate the two explicitly in the doc → "Current state" vs "target state" are different sectionsWhat to Do with Sensitive Findings
Some findings are politically sensitive: a beloved senior engineer's module is the God service. The team lead's design decision from 2018 is now the biggest scalability constraint. A compliance gap exists that no one wants to own.
Principles for handling sensitive findings:
✅ Document the finding - omitting it makes the doc misleading✅ Frame as system state, not personal failure → "The payment module has grown to own 4 distinct domains" → NOT "John's payment module violates single responsibility"✅ Pair every finding with a recommended action → A finding without a path forward reads as blame✅ Share findings with affected team leads before publishing broadly → No one should read about a problem with their code for the first time in a team-wide document❌ Never soften findings to the point of hiding them❌ Never name individual engineers in debt or violation listsPresenting Architecture to Non-Technical Stakeholders
When presenting architecture findings to business stakeholders, the architecture doc is not the deliverable - a decision brief is.
Structure the brief as:
1. What the system does (1 paragraph, business language)2. What the current constraints are (3 bullets, business impact framing)3. What options we have (2–3 options, each with: cost / risk / benefit)4. What we recommend (1 option, with rationale)5. What we need from you (a decision, a resource, an approval)The architecture doc is the evidence behind the brief. It should be available as an appendix, not the main document.
Translation rules for non-technical audiences:
| Technical term | Business translation |
|---|---|
| Layer violation | "Code in the wrong place - means changes take longer and break more often" |
| God service | "One component doing too many things - a bottleneck and a single point of failure" |
| N+1 query | "The system asks the database the same question N times when once would do - causes slowdowns under load" |
| No circuit breaker | "If payment provider goes down, we go down too - no automatic fallback" |
| Session affinity | "Users are tied to a specific server - we can't add capacity during peak load without disrupting sessions" |
| Tech debt score | "Every point is a risk we're carrying - at 16+, we're spending more managing debt than building features" |
Architecture Evolution & Decision Records
Architecture is not a snapshot - it is a living system. A document that captures the architecture today without capturing how it got here and where it's going will mislead the team within 6 months.
Architecture Decision Records (ADRs)
An ADR is a short document that captures a single architectural decision: what was decided, why, what alternatives were considered, and what the consequences are.
Why ADRs matter:
Every legacy codebase has patterns that look wrong to new engineers. Without ADRs, those engineers refactor them - only to discover six months later why the original decision was made. ADRs prevent that cycle.
ADR format:
# ADR-[number]: [Decision title]**Date:** YYYY-MM-DD**Status:** Proposed | Accepted | Deprecated | Superseded by ADR-[N]**Deciders:** [Names or teams involved]## ContextWhat situation forced this decision? What constraints existed?(2–4 sentences. Be specific about what was true at the time.)## DecisionWhat was decided?(1–2 sentences. State it directly - no hedging.)## Alternatives considered| Option | Why rejected ||--------|-------------|| [Option A] | [Reason] || [Option B] | [Reason] |## Consequences**Positive:** What does this enable?**Negative:** What does this constrain or cost?**Risks:** What could go wrong because of this decision?## Review dateWhen should this decision be revisited? (Set a real date.)ADR examples for common enterprise decisions:
ADR-001: Use layered monolith instead of microservicesADR-002: Use Spring MVC over REST-only APIADR-003: Store sessions in DB instead of in-memoryADR-004: Accept SOAP integration with ERP system (not replace)ADR-005: Use JPA/Hibernate for all data accessWhere to Store ADRs
docs/ architecture/ adr/ ADR-001-monolith-over-microservices.md ADR-002-spring-mvc-choice.md ADR-003-session-storage.md diagrams/ component-diagram-v3.mermaid order-flow-sequence.mermaid architecture-guide.md ← this documentKeep ADRs in the same repository as the code they govern. If the code moves, the ADRs move with it.
Versioning the Architecture Document
When the architecture changes significantly, don't overwrite the previous version - capture the evolution.
Version triggers - create a new architecture version when:
[ ] A new service or major component is added or removed[ ] The deployment model changes (e.g., monolith → modular, on-prem → cloud)[ ] A major external dependency changes (new payment provider, new DB)[ ] The authentication model changes[ ] A significant debt item is resolved (document the before/after)[ ] The team structure changes in a way that reflects in component ownershipVersion header to add to every architecture doc:
## Document Version History| Version | Date | Author | Summary of changes ||---------|------|--------|-------------------|| v1.0 | 2019-03 | [name] | Initial architecture capture || v1.1 | 2020-08 | [name] | Added payment service, updated sequence diagrams || v2.0 | 2022-11 | [name] | Migrated from Struts to Spring MVC - full re-capture || v3.0 | 2024-06 | [name] | Extracted notification module, updated debt map |Architecture Drift Detection
Architecture drift - the gap between documented and actual architecture - accumulates silently. Build a lightweight mechanism to detect it.
Automated drift signals:
# Run monthly - compare import graph against last documented state# If new cross-layer imports appear, flag for reviewjdeps -dotoutput ./deps-current target/myapp.jardiff deps-baseline.dot deps-current/myapp.dot# Count anti-pattern checklist items - track trend over time# Increasing score = drift is acceleratinggrep -c "^\[x\]" architecture-checklist-current.txtHuman drift signals - any of these should trigger an architecture review:
[ ] A developer says "I didn't know that module existed"[ ] A bug required changes in 4+ unrelated modules[ ] A new hire's mental model of the system is significantly wrong after onboarding[ ] A production incident exposed a dependency not in the architecture doc[ ] The team is debating "how it works" rather than "how to improve it"[ ] An integration test exercises a path not in any sequence diagramDrift review cadence:
| Trigger | Action |
|---|---|
| Monthly | Run automated import graph diff - flag new cross-layer violations |
| Quarterly | Gate 4 validation (failure scenarios) + human drift signal check |
| After major feature | Update sequence diagrams for affected flows |
| After incident | Trace the incident path through the doc - update where it diverged |
| After team change | Re-validate component ownership - teams and components should align |
Pro Tips (Language-Agnostic)
- Apply compression rules before writing any component name - ask "is this a capability or a class?"
- Stop when the document can answer the stakeholder's question - completeness is not the goal, utility is
- The middleware/filter chain is your cross-cutting concerns map - document it before anything else in the handler layer
- Document all state stores explicitly - session, JWT, DB, cache, in-memory - this is the hidden scalability constraint
- Config files from the system's early years are the most valuable artifacts - they capture the intended architecture before drift
- Write one ADR for every decision you wish had been documented when you arrived - future engineers will thank you
- Treat the debt map as a deliverable, not a footnote - frame it in risk×effort language that stakeholders can act on
- Start with the happy path per flow, then expand to error paths - error handling reveals the real complexity
- Never name engineers in debt or violation lists - findings are about the system state, not about people
- If it cannot be drawn, it is not architecture - every claim must have a corresponding box or arrow
- An architecture doc not validated in 6 months is a hypothesis - re-validate before any major decision is based on it
Stack Appendices
Each appendix provides the stack-specific details for every step in the core guide.
Appendix A - Java / Spring / JSP
Entry Points
| Type | Files / annotations |
|---|---|
| HTTP | web.xml, @Controller, @RestController, @WebServlet, DispatcherServlet |
| IoC root | applicationContext.xml, @Configuration, @ComponentScan, ejb-jar.xml |
| Background jobs | @Scheduled, QuartzJobBean, @MessageDriven, @JmsListener |
| Build root | pom.xml, build.gradle, settings.gradle |
Module Layer Mapping
| Layer | Typical packages | Key types |
|---|---|---|
| Handler | *.web, *.controller, *.action | Servlet, Controller, ActionForm |
| Orchestration | *.service, *.business, *.facade, *.ejb | @Service, @Stateless, Session Bean |
| Data access | *.dao, *.repository | @Repository, JpaRepository, JdbcTemplate |
| Domain | *.model, *.domain, *.entity | @Entity, POJO |
| Integration | *.integration, *.client, *.adapter | RestTemplate, WebServiceTemplate |
| Cross-cutting | *.aspect, *.security, *.util | @Aspect, Filter, HandlerInterceptor |
Dependency Analysis Tools
# Module-level dependency graphjdeps --print-module-deps -recursive target/myapp.jarjdeps -dotoutput ./deps target/myapp.jar# Full transitive dependency treemvn dependency:tree -Dverbosemvn help:effective-pom# Circular dependency detection# Add to pom.xml:# <rule implementation="org.apache.maven.enforcer.rules.dependency.BanCircularDependencies"/># Architecture rule enforcement (as JUnit tests)# ArchUnit: com.tngtech.archunitJSP / Web Tier Specific Scanning
# Find scriptlets (business logic in views - architectural debt)grep -rn "<%[^@!]" src/main/webapp --include="*.jsp"# Find session state (scalability constraint inventory)grep -rn "session\.setAttribute" src/ --include="*.java" --include="*.jsp"# Find JNDI lookups (service locator pattern - legacy signal)grep -rn "InitialContext\|lookup(" src/main/java --include="*.java"# Find raw SQL strings (injection risk + query inventory)grep -rn "\"SELECT\|\"INSERT\|\"UPDATE\|\"DELETE" src/ --include="*.java"Anti-Pattern Additions (Java-Specific)
[ ] JSP contains <% %> scriptlets with business logic[ ] @Transactional placed on DAO methods (should be on service)[ ] JDBC calls with manual commit/rollback outside service layer[ ] HttpSession stores non-serializable domain entities[ ] N+1 queries from LAZY fetch in a loop (Hibernate)[ ] Spring XML config overridden by annotation config inconsistentlyRuntime Observation Tools
| Tool | Purpose |
|---|---|
| jdeps | Module dependency graph from bytecode |
| ArchUnit | Architecture rules as JUnit tests |
| jQAssistant | Graph-based codebase analysis (Neo4j) |
| JVisualVM / JConsole | Thread dumps, heap, JMX metrics |
| Async Profiler | CPU/allocation profiler, low overhead |
| p6spy / datasource-proxy | JDBC call logging with parameters |
| Hibernate Statistics | N+1 detection, cache hit rate |
| Prometheus + Micrometer | Metrics if Spring Actuator present |
AI Prompt Additions
For a JSP file:
Analyze this JSP. Describe:1. What model attributes it expects2. Any business logic in scriptlets - list each occurrence3. Which forms it submits and to which action URLs4. Which other JSPs it includes5. Classify: pure view / mixed view-logic / heavy logic (migration needed)For pom.xml:
Analyze this Maven POM. Extract:1. Java version and target runtime (Tomcat / JBoss / WebLogic)2. All major framework dependencies with versions3. Build plugins affecting behavior (aspectj, code generation, etc.)4. Multi-module structure5. Dependency conflicts or end-of-life librariesAppendix B - Python / Django / FastAPI
Entry Points
| Type | Files / decorators |
|---|---|
| HTTP (Django) | urls.py, views.py, @api_view, ViewSet |
| HTTP (FastAPI) | main.py, @app.get, @app.post, APIRouter |
| HTTP (Flask) | app.py, @app.route, Blueprint |
| IoC / config | settings.py, config.py, INSTALLED_APPS |
| Background jobs | celery.py, @shared_task, @app.task, APScheduler |
| Build root | requirements.txt, pyproject.toml, Pipfile |
Module Layer Mapping
| Layer | Typical locations | Key types |
|---|---|---|
| Handler | views.py, routers/, api/ | View, ViewSet, APIRouter, endpoint function |
| Orchestration | services/, use_cases/, business/ | Plain Python classes/functions |
| Data access | repositories/, models.py (queries only) | QuerySet, ORM Manager, raw SQL |
| Domain | models.py (structure), domain/, entities/ | Django Model, dataclass, Pydantic model |
| Integration | clients/, adapters/, integrations/ | requests, httpx, boto3 clients |
| Cross-cutting | middleware/, decorators/, utils/ | Django Middleware, FastAPI Dependency |
Dependency Analysis Tools
# Import graph (pip install pydeps)pydeps src/myapp --max-bacon=3 --cluster# Circular import detectionpip install isortisort --check-only --diff .# Dead code / unused importspip install vulturevulture src/# Dependency treepip install pipdeptreepipdeptree# Type-checked architecture rulespip install import-linter# Define contracts in .importlinter configDjango / FastAPI Specific Scanning
# Find business logic in views (fat view detection)grep -rn "def get\|def post\|def put\|def delete" */views.py | wc -l# Find raw SQL (injection risk)grep -rn "raw(\|cursor.execute\|RawSQL" . --include="*.py"# Find direct model access in views (bypasses service layer)grep -rn "\.objects\." */views.py --include="*.py"# Find N+1 risks (queryset in loop)grep -rn "for.*in.*\:" . --include="*.py" -A2 | grep "\.objects\."# Django: check for missing select_related / prefetch_relatedgrep -rn "\.objects\.filter\|\.objects\.all" . --include="*.py" | \ grep -v "select_related\|prefetch_related"Anti-Pattern Additions (Python-Specific)
[ ] Business logic in views.py / route handlers (fat views)[ ] Direct .objects. queryset calls inside view functions[ ] Missing select_related / prefetch_related (Django N+1)[ ] settings.py contains secrets (not env vars)[ ] Celery tasks contain business logic (should delegate to service)[ ] Django signals used for core business logic (obscures flow)[ ] Synchronous external HTTP calls inside async endpoint handlersRuntime Observation Tools
| Tool | Purpose |
|---|---|
| pydeps | Module dependency graph |
| import-linter | Architecture rule enforcement |
| Django Debug Toolbar | Query count, timing per request |
| django-silk | Request/response profiling |
| Flower | Celery task monitoring |
| py-spy | Sampling profiler, low overhead |
| Prometheus + django-prometheus | Metrics |
Appendix C - Node.js / Express / NestJS
Entry Points
| Type | Files / decorators |
|---|---|
| HTTP (Express) | app.js, server.js, routes/, router.use() |
| HTTP (NestJS) | main.ts, @Controller, @Module, AppModule |
| HTTP (Fastify) | app.js, fastify.register(), route plugins |
| Config / DI | app.module.ts (NestJS), container.js (custom DI) |
| Background jobs | bull queues, node-cron, @nestjs/schedule, @Processor |
| Build root | package.json, yarn.lock, tsconfig.json |
Module Layer Mapping
| Layer | Typical locations | Key types |
|---|---|---|
| Handler | controllers/, routes/, *.controller.ts | Express Router, NestJS Controller |
| Orchestration | services/, *.service.ts, use-cases/ | Plain class, NestJS Service |
| Data access | repositories/, *.repository.ts, models/ | TypeORM Repository, Mongoose Model |
| Domain | entities/, domain/, *.entity.ts | TypeORM Entity, Mongoose Schema |
| Integration | clients/, adapters/, *.client.ts | axios/got wrappers, SDK clients |
| Cross-cutting | middleware/, guards/, interceptors/, pipes/ | NestJS Guard, Interceptor, Middleware |
Dependency Analysis Tools
# Module dependency graphnpx depcruise --include-only "^src" --output-type dot src | dot -T svg > deps.svg# Circular dependency detectionnpx madge --circular src/# Unused exports / dead codenpx ts-prune# Outdated packagesnpm outdatednpx npm-check-updates# Bundle analysis (if applicable)npx webpack-bundle-analyzerExpress / NestJS Specific Scanning
# Find route definitions (entry point inventory)grep -rn "router\.\(get\|post\|put\|delete\|patch\)" src/ --include="*.js" --include="*.ts"# Find direct DB calls in controllers (layer violation)grep -rn "\.find\|\.save\|\.query\|\.execute" src/controllers/ --include="*.ts"# Find missing async error handlinggrep -rn "async.*req.*res" src/routes/ --include="*.js" | grep -v "try\|catch"# Find untyped any (TypeScript debt)grep -rn ": any" src/ --include="*.ts" | wc -lAnti-Pattern Additions (Node.js-Specific)
[ ] Business logic in Express route handlers directly[ ] Missing async error handling (unhandled promise rejections)[ ] Synchronous file I/O (fs.readFileSync) inside request handlers[ ] Missing input validation before DB operations[ ] No connection pooling configured for DB client[ ] Secrets hardcoded in source (not process.env)[ ] Callback hell - nested callbacks instead of async/await[ ] No rate limiting on public endpointsRuntime Observation Tools
| Tool | Purpose |
|---|---|
| depcruise | Module dependency graph and rule enforcement |
| madge | Circular dependency detection |
| clinic.js | CPU, memory, async profiling |
| 0x | Flame graph profiler |
| Bull Board | Queue monitoring |
| Pino / Winston | Structured logging |
| Prometheus + prom-client | Metrics |
Appendix D - .NET / C# / ASP.NET
Entry Points
| Type | Files / attributes |
|---|---|
| HTTP | Program.cs, Startup.cs, [ApiController], [Route], MapControllers() |
| IoC / DI root | Program.cs (.AddScoped, .AddSingleton), appsettings.json |
| Background jobs | IHostedService, BackgroundService, Hangfire, Quartz.NET |
| Build root | *.csproj, *.sln, NuGet.config, Directory.Build.props |
Module Layer Mapping
| Layer | Typical locations | Key types |
|---|---|---|
| Handler | Controllers/, Endpoints/, *.Controller.cs | ControllerBase, MinimalAPI handler |
| Orchestration | Services/, Application/, UseCases/ | Plain C# class, MediatR Handler |
| Data access | Repositories/, Data/, *.Repository.cs | EF Core DbContext, Dapper queries |
| Domain | Domain/, Entities/, Models/ | POCO, record types, value objects |
| Integration | Infrastructure/, Clients/, Adapters/ | HttpClient wrappers, SDK clients |
| Cross-cutting | Filters/, Middleware/, Behaviors/ | ActionFilter, Middleware, MediatR Pipeline |
Dependency Analysis Tools
# NDepend (commercial) - comprehensive .NET dependency analysis# dotnet-depends (free)dotnet tool install -g dotnet-dependsdotnet depends# Circular reference detection# ReSharper / Rider: built-in architecture diagram# Outdated packagesdotnet list package --outdated# Unused referencesdotnet tool install -g dotnet-script# Or use ReSharper's "Remove Unused References"Anti-Pattern Additions (.NET-Specific)
[ ] Business logic in Controller action methods[ ] DbContext injected directly into Controllers (bypasses repository)[ ] Missing cancellation token propagation in async methods[ ] Synchronous .Result or .Wait() calls on async methods (deadlock risk)[ ] Missing using statements / DbContext not disposed (connection leak)[ ] Secrets in appsettings.json committed to source control[ ] Missing EF Core AsNoTracking() on read-only queries[ ] N+1 queries from missing .Include() in EF CoreRuntime Observation Tools
| Tool | Purpose |
|---|---|
| NDepend | Architecture rule enforcement, dependency graph |
| dotMemory / dotTrace | Memory and CPU profiling |
| MiniProfiler | Per-request DB query profiling |
| Application Insights | APM, distributed tracing |
| Seq | Structured log analysis |
| Hangfire Dashboard | Background job monitoring |
Appendix E - Ruby on Rails
Entry Points
| Type | Files / conventions |
|---|---|
| HTTP | config/routes.rb, app/controllers/, *_controller.rb |
| Config / init | config/application.rb, config/initializers/, Gemfile |
| Background jobs | app/jobs/, *_job.rb, Sidekiq workers, Resque |
| Build root | Gemfile, Gemfile.lock, .ruby-version |
Module Layer Mapping
| Layer | Typical locations | Key types |
|---|---|---|
| Handler | app/controllers/ | ApplicationController subclasses |
| Orchestration | app/services/, app/interactors/, app/use_cases/ | Plain Ruby objects (POROs) |
| Data access | app/models/ (ActiveRecord queries) | ActiveRecord model query methods |
| Domain | app/models/ (structure), app/domain/ | ActiveRecord model, value objects |
| Integration | app/clients/, app/adapters/ | Faraday, HTTParty wrappers |
| Cross-cutting | app/concerns/, lib/, app/middleware/ | Concern, Rack Middleware |
Dependency Analysis Tools
# Gem dependency treebundle viz # generates dependency graph image# Circular dependency detectiongem install rubocop-railsrubocop --only Rails/FilePath# Dead code detectiongem install debridedebride app/# Unused gemsgem install bundler-auditbundle audit# Rails-specific: check for N+1gem install bullet # add to Gemfile (development group)Anti-Pattern Additions (Rails-Specific)
[ ] Fat controller - business logic beyond params + render in controller[ ] Fat model - ActiveRecord model with 500+ lines of non-persistence logic[ ] Logic in views / ERB templates[ ] Direct ActiveRecord queries in controllers (bypasses service layer)[ ] N+1 queries - missing .includes() / .eager_load()[ ] Callbacks (before_save, after_create) containing business logic[ ] God model - one ActiveRecord class owning unrelated domains[ ] Missing database indexes on foreign keys and frequently queried columnsRuntime Observation Tools
| Tool | Purpose |
|---|---|
| bundle viz | Gem dependency graph |
| Bullet | N+1 query detection |
| rack-mini-profiler | Request profiling |
| Skylight / Scout APM | Production APM |
| Sidekiq Web | Background job monitoring |
| PgHero | PostgreSQL query analysis |
Appendix F - Go
Entry Points
| Type | Files / patterns |
|---|---|
| HTTP | main.go, cmd/, internal/handler/, http.HandleFunc, chi.Router, gin.Engine |
| Config | config/, internal/config/, env-based config structs |
| Background jobs | goroutine launchers in main.go, internal/worker/, cron packages |
| Build root | go.mod, go.sum, Makefile |
Module Layer Mapping
| Layer | Typical locations | Key types |
|---|---|---|
| Handler | internal/handler/, internal/api/ | http.HandlerFunc, gin/chi handler |
| Orchestration | internal/service/, internal/usecase/ | Plain Go struct with methods |
| Data access | internal/repository/, internal/store/ | Interface + concrete DB implementation |
| Domain | internal/domain/, internal/model/ | Go structs, value types |
| Integration | internal/client/, internal/adapter/ | HTTP clients, SDK wrappers |
| Cross-cutting | internal/middleware/, pkg/ | Middleware functions, shared utilities |
Dependency Analysis Tools
# Module dependency graphgo mod graph# Import cycle detection (built into go build)go build ./... # fails on circular imports# Static analysisgo vet ./...# Dead codego install golang.org/x/tools/cmd/deadcode@latestdeadcode ./...# Dependency visualizationgo install github.com/kisielk/godepgraph@latestgodepgraph ./... | dot -Tpng -o deps.png# Outdated dependenciesgo list -m -u allAnti-Pattern Additions (Go-Specific)
[ ] Business logic in http.HandlerFunc directly[ ] Missing context propagation (ctx not passed through call chain)[ ] Goroutine leak - goroutine started with no cancellation mechanism[ ] Missing error wrapping (errors.Wrap / fmt.Errorf %w)[ ] Global state in package-level variables (not safe for concurrent use)[ ] Interface not defined at point of use (defined in implementation package)[ ] Missing graceful shutdown handling (http.Server.Shutdown)[ ] init() functions with side effects (hidden initialization order)Runtime Observation Tools
| Tool | Purpose |
|---|---|
| go tool pprof | CPU, memory, goroutine profiling |
| go tool trace | Execution trace, goroutine scheduling |
| expvar / pprof HTTP endpoint | Runtime metrics |
| OpenTelemetry Go | Distributed tracing |
| Prometheus + promhttp | Metrics |
| golangci-lint | Comprehensive static analysis |
Appendix G - AI Prompt Library (Complete Reference)
This appendix consolidates every prompt in the guide into a single reference. Copy, adapt, and chain these prompts across your LLM tool of choice. Each prompt is self-contained - paste the prompt, fill in the bracketed sections, and send.
Section 1 - Analysis Prompts (Pass 1 & 2)
G.1 - Service / use-case module
Analyze this module. Describe:1. Its business responsibility in one sentence (capability, not class name)2. The operations it exposes3. Its dependencies (data access, external clients, other services)4. Where it owns a consistency boundary (transaction, saga, etc.)5. Any notable patterns or anti-patternsApply compression: name the capability, not the class.[paste module source code here]G.2 - Handler / controller / route
Analyze this handler/route file. Describe:1. The URL(s) and HTTP methods it handles2. What input it reads (path params, query, body, headers, session/token)3. What service/use-case it delegates to4. What it returns or renders5. Flag any business logic that should not be in the handler layerMap it as a request flow: entry → middleware → handler → service → response.[paste handler source code here]G.3 - Data access layer (ORM model, repository, DAO)
Analyze this data access module. Describe:1. The table(s) or collection(s) it maps to2. The queries or operations it exposes3. Any relationships and their loading strategy (eager/lazy)4. Any identified N+1 query risks5. Whether it contains business logic it shouldn't[paste data access source code here]G.4 - External integration client / adapter
Analyze this integration client. Describe:1. What external service it wraps (name it by role, not product)2. The operations it exposes to the internal system3. Timeout, retry, and circuit breaker configuration4. How external errors are translated to internal domain errors5. Whether the external API contract is hidden behind an interface[paste client source code here]G.5 - Build / package manifest
Analyze this build file. Extract:1. Language version and target runtime/platform2. All major framework and library dependencies with versions3. Any build plugins or tools that affect code generation or behavior4. Module/package structure if multi-module5. Any outdated, deprecated, or conflicting dependencies[paste build file contents here]G.6 - Configuration / IoC wiring file
Analyze this configuration file. Extract:1. What components/beans/services are registered2. What environment-specific values are present vs externalised3. Any security-sensitive config (credentials, secrets, connection strings)4. Cross-cutting concerns configured here (logging, auth, caching, tracing)5. What this config tells us about the intended architecture[paste config file here]Section 2 - Synthesis Prompts (Pass 3)
G.7 - Derive architecture from module summaries
Given these module summaries, apply the following compression rules:- Collapse modules/classes into capabilities (not names)- Collapse endpoints into use cases- Collapse tables/collections into domain concepts- Collapse integrations into rolesThen produce:1. Architecture style (one of: layered monolith / modular monolith / microservices / event-driven / serverless / transitional)2. Component diagram (ASCII or Mermaid) with 5–12 components max3. Primary request flows as numbered step sequences4. Key design patterns identified5. Top 5 architectural debt items ranked by risk × effortFormat as an architecture overview document with explicit diagrams.[paste module summaries here]G.8 - Generate Mermaid sequence diagram
Based on these component summaries, generate a Mermaid sequence diagramfor the [flow name] flow.Include every hop: entry point → middleware → handler → service(s) →data access → external calls → response.For each hop show:- The component name- The operation called- Whether it is synchronous or async (use -->> for async)- Where the consistency boundary begins and ends (add a note)- What happens on failure at this hop (add alt block)[paste component summaries here]G.9 - Generate component dependency diagram (Mermaid)
Based on these module summaries, generate a Mermaid graph TD componentdiagram showing:1. All components as nodes (5–12 max, capability-named)2. Dependency arrows (A --> B means A depends on B)3. External systems as a different node shape4. Group components by layer using subgraph blocksLabel each arrow with the relationship type: calls | reads | writes | emits | consumes[paste module summaries here]Section 3 - Decision & Documentation Prompts
G.10 - Generate an ADR from a known decision
I need to write an Architecture Decision Record for the following decision:Decision context: [describe the situation that forced the decision]What was decided: [state the decision in one sentence]When it was made: [date or approximate period]Constraints at the time: [technical, resource, time, or org constraints]Alternatives that were considered: [list them]Generate a complete ADR using this structure:1. Title (ADR-NNN: [imperative verb + subject])2. Status: Accepted3. Context (2–4 sentences - what was true at the time that forced this decision)4. Decision (1–2 sentences, direct - no hedging)5. Alternatives considered (table: option | why rejected)6. Consequences (positive / negative / risks)7. Review date (suggest a specific date based on the decision type)Write it as if you are the engineer who made the decision, in past tense.Be specific - avoid generic phrases like "to improve performance."G.11 - Score anti-patterns on the risk × effort matrix
Below is a list of anti-patterns found in a codebase analysis.Score each one on two dimensions and assign a decision.Risk score (1–5): 5 = Data loss, security breach, or production outage likely 4 = Significant user impact or revenue loss possible 3 = Degraded performance or increased incident frequency 2 = Developer friction - slows delivery, doesn't break anything 1 = Cosmetic / style - no functional impactEffort score (1–5): 5 = Architectural overhaul - months 4 = Multi-sprint refactor - weeks, multiple teams 3 = Single-sprint fix - 1–2 weeks, one team 2 = Days of work - targeted, low risk 1 = Hours - config change or single fileDecision rules: High risk + Low effort → Fix Now High risk + Med effort → Fix Next Sprint High risk + High effort → Plan & Track Med risk + Low effort → Fix When Passing Med risk + Med effort → Backlog With Date Low/Med + High effort → Accept & Document Low risk + Any effort → SkipFor each item produce a table row: Item | Risk (score + one-sentence justification) | Effort (score + justification) | DecisionAnti-patterns to score:[paste your checklist findings here]G.12 - Translate debt list for CTO / executive audience
Below is a technical architectural debt list from an engineering analysis.Rewrite it for a CTO / VP Engineering audience using these rules:1. Replace every technical term with its business impact Examples: - "N+1 query" → "database asks the same question N times under load - causes slowdowns at peak" - "God service" → "one component doing too many things - a bottleneck and single point of failure" - "No circuit breaker" → "if payment provider goes down, we go down with it - no automatic fallback"2. Frame each item as: [What it is in plain language] → [What happens if we don't fix it] → [What fixing it enables]3. Group items into three tiers: - Immediate risk (production incidents or security exposure) - Delivery friction (slowing feature development) - Strategic constraints (limiting future scaling or modernization)4. For each item provide effort in weeks, not story points5. End with: - Recommended priority order (top 3 with business case) - Total estimated remediation cost (engineering weeks) - "If we do nothing" scenario in 12 monthsTechnical debt list:[paste your anti-pattern checklist results here]G.13 - Generate executive architecture brief (1-pager)
Based on this architecture analysis, produce a 1-page executive brieffor a CTO / VP Engineering audience.Structure it exactly as:1. What this system does (2 sentences, business language only)2. Current architecture (1 sentence naming the style + 1 key diagram reference)3. Current constraints (3 bullets, each: constraint → business impact)4. Options (2–3 options, each: name | effort | risk | what it enables)5. Recommendation (1 option with 2-sentence rationale)6. What we need from you (one specific decision or resource)Rules:- No technical jargon- No code references- Each bullet max 2 lines- Total length: fits on one A4 pageArchitecture analysis input:[paste your architecture doc sections 01, 10, 11 here]Section 4 - Validation Prompts
G.14 - Validate sequence diagram against logs
I have a documented sequence diagram for [flow name]:[paste your sequence diagram here]And the following log excerpt from a production trace of the same flow:[paste log excerpt here]Compare them. For each hop in the sequence diagram:1. Does it appear in the logs? (yes / no / partially)2. Are there log entries for hops NOT in the diagram?3. Does the ordering match?4. Do the error paths in the diagram match what the logs show?Produce four lists:- Verified steps (diagram matches log)- Discrepancies (diagram says X, log shows Y - explain each)- Missing from diagram (in logs but not documented)- Recommended diagram updatesG.15 - Identify architecture drift between two doc versions
I have two versions of an architecture document.Version A (older - the baseline):[paste older component diagram / capability table here]Version B (current - what we believe is true now):[paste current component diagram / capability table here]Identify:1. Components added (in B, not in A)2. Components removed (in A, not in B)3. Dependency changes (arrows that changed direction or were added/removed)4. Responsibility changes (same component, different scope)5. Architecture style changes (if the overall style shifted)For each change, assess:- Is this intentional (planned evolution) or drift (unplanned accumulation)?- Does it require a new ADR?- Does it require updating sequence diagrams?G.16 - Review architecture against failure scenario
Here is my architecture documentation for [system name]:[paste component diagram + key sequence diagrams here]Walk through this failure scenario:Scenario: [describe the failure - e.g., "Primary database becomes unavailable for 5 minutes"]For each component in the architecture:1. Is it directly affected? (yes / no / cascades from another)2. What is the user-visible impact?3. Does the architecture documentation show a fallback or circuit breaker?4. Does the documented behavior match what would actually happen?Produce:- Impact map (which components fail, cascade, or survive)- Documentation gaps (what the doc doesn't answer about this scenario)- Recommended additions to sequence diagrams or deployment sectionSection 5 - Prompt Chaining Guide
These prompts are designed to be chained. Here is the recommended sequence for a full analysis:
Day 1 - Structure (Pass 1) G.5 (build file) → G.6 (config) → G.3 (data access) → G.1 (services)Day 2 - Behavior (Pass 2) G.2 (handlers) → G.4 (integrations) → G.8 (sequence diagrams)Day 3 - Abstraction (Pass 3) G.7 (derive architecture) → G.9 (component diagram) → G.11 (score debt)Day 4 - Documentation G.10 (ADRs for top 3 decisions) → G.12 (exec translation) → G.13 (brief)Day 5 - Validation G.14 (validate sequences vs logs) → G.16 (failure scenarios) → refineChaining rule: Always feed the output of one prompt as context into the next. Never start a new analysis prompt from scratch - accumulated context is what makes LLM-assisted architecture work.
Context window management: If your analysis spans more than ~20 modules, split into two chains: one for the service/orchestration layer, one for the data/integration layer. Merge outputs at the G.7 synthesis step.
References
Foundational Architecture Literature
-
Bass, L., Clements, P., & Kazman, R. (2012). Software Architecture in Practice (3rd ed.). Addison-Wesley. - The foundational text on architecture documentation and the ADD (Attribute-Driven Design) method. The quality attribute approach underpins the NFR section of this guide.
-
Fowler, M. (2002). Patterns of Enterprise Application Architecture. Addison-Wesley. - Repository, Service Layer, Data Mapper, and Unit of Work patterns that appear throughout the Java and .NET appendices.
-
Evans, E. (2003). Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley. - The aggregate root, bounded context, and domain concept vocabulary used in Compression Rule 3 (collapse tables into domain concepts).
-
Newman, S. (2021). Building Microservices (2nd ed.). O'Reilly. - Service decomposition, inter-service contracts, and the distributed systems anti-patterns in the microservices callout.
-
Richardson, C. (2018). Microservices Patterns. Manning. - Saga pattern, circuit breaker, and API Gateway patterns referenced in the microservices and serverless anti-pattern checklists.
Architecture Decision Records
-
Nygard, M. (2011). Documenting Architecture Decisions. cognitect.com/blog/2011/11/15/documenting-architecture-decisions - The original ADR format proposal. The ADR template in this guide is adapted directly from Nygard's structure.
-
Keeling, M. (2017). Design It! From Programmer to Software Architect. Pragmatic Bookshelf. - The "just enough architecture" philosophy that shaped the Minimum Viable Architecture Doc section.
Technical Debt and Code Quality
-
Cunningham, W. (1992). The WyCash Portfolio Management System (OOPSLA '92). - The original technical debt metaphor. The debt register and risk×effort matrix extend this framing into a prioritisation tool.
-
Kerievsky, J. (2004). Refactoring to Patterns. Addison-Wesley. - The pattern recognition approach used in Pass 3 design pattern identification.
-
Feathers, M. (2004). Working Effectively with Legacy Code. Prentice Hall. - The legacy codebase characterization that informed the anti-pattern checklist and the actual-vs-intended architecture gap analysis.
Validation and Observability
-
Humble, J., & Farley, D. (2010). Continuous Delivery. Addison-Wesley. - The pipeline and validation gate structure that influenced the 5-gate validation loop.
-
Beyer, B., Jones, C., Petoff, J., & Murphy, N.R. (Eds.) (2016). Site Reliability Engineering. O'Reilly. - The failure scenario validation approach and the "error budget" framing referenced in Gate 4.
Stack-Specific References
Java / Spring / JEE
- Walls, C. (2022). Spring in Action (6th ed.). Manning.
- Johnson, R. et al. (2004). Expert One-on-One J2EE Design and Development. Wrox. - Session Facade, Transfer Object, and DAO patterns in Appendix A.
Python / Django / FastAPI
- Percival, H., & Gregory, B. (2020). Architecture Patterns with Python. O'Reilly. - Repository pattern and dependency injection in Python, directly referenced in Appendix B.
Node.js
- Young, S. (2018). Node.js Design Patterns (2nd ed.). Packt. - Event-driven patterns and middleware chains referenced in Appendix C.
.NET / C#
- Microsoft Docs. Application Architecture Guide. learn.microsoft.com - MediatR, CQRS, and Clean Architecture patterns in Appendix D.
Go
- Butcher, M. (2016). Go Design Patterns. Packt. - Interface-at-point-of-use and dependency inversion patterns in Appendix F.
AI-Assisted Development
-
White, J. et al. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382. - Prompt structuring principles underlying the AI Prompt Library in Appendix G.
-
Khattab, O. et al. (2023). DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. arXiv:2310.03714. - The structured prompt chaining approach referenced in the Section 5 chaining guide.
Tooling Documentation
- jdeps (Java): docs.oracle.com/en/java/javase/17/docs/specs/man/jdeps.html
- ArchUnit: archunit.org/userguide/html/000_Index.html
- jQAssistant: jqassistant.org/get-started
- pydeps: pydeps.readthedocs.io
- depcruise: github.com/sverweij/dependency-cruiser
- madge: github.com/pahen/madge
- NDepend: ndepend.com/docs
- Bullet (Rails N+1): github.com/flyerhzm/bullet
- golangci-lint: golangci-lint.run/usage/quick-start
- OpenTelemetry: opentelemetry.io/docs
- AsyncAPI Specification: asyncapi.com/docs/specifications/v3.0.0
- OpenAPI Specification: spec.openapis.org/oas/v3.1.0
Further Reading on This Blog
-
MCP vs RAG vs Tools: When to Use Each (and When Not To) - Architectural decision-making for AI systems; the same "which control flow matches my problem?" framing applies to codebase architecture choices.
-
Building Production-Ready AI Agents with LangGraph - State machine patterns and observability approaches directly applicable to the async boundary documentation in Pass 2.
-
Asynchronous Processing and Message Queues in Agentic AI Systems - The async boundary and queue architecture patterns referenced in the event-driven callout.
-
Inside the LLM Inference Engine - Inference architecture documentation as a worked example of the component diagram + sequence diagram approach.