RanjanKumar.in - AI & ML Engineering

The Problem: Agent Loops Are State Machines Pretending to Be Intelligent

Your agent architecture looks like this: a loop that generates thoughts, calls tools, observes results, and repeats until some termination condition. You've probably implemented ReAct or something similar. The model generates "Thought: I need to check the database" followed by "Action: query_database" followed by "Observation: Here are 50 rows" and the cycle continues.

This works fine in demos. In production, it's a maintenance nightmare that fails in predictable ways.

The fundamental issue is architectural confusion. You've built a state machine with complex transition logic, but you're pretending it's just a "smart loop." The model decides when to call tools, which tools to call, what parameters to use, when to stop looping, and how to handle failures. That's not a loop—that's distributed control flow with an LLM as the router.

This architecture collapses under three production realities. First, every decision the model makes adds latency and cost. A simple task requiring five tool calls means six LLM invocations (initial reasoning plus five decision points). Second, error handling becomes exponentially complex. When a tool fails on iteration three of seven, what happens? Does the model retry? Switch strategies? Give up? You end up with retry logic, backoff strategies, and circuit breakers scattered across your codebase. Third, observability is nearly impossible. When something goes wrong, you're debugging a conversation transcript, not a data flow graph.

I've seen teams spend months building increasingly sophisticated agent loops, adding error handlers for error handlers, implementing custom retry strategies, building prompt templates that try to constrain model behavior, and wondering why their agents are slow, expensive, and unreliable.

The root cause: they're using control flow patterns (loops, branches, retries) for what is fundamentally a context management problem. Agents don't need to orchestrate tool execution. They need access to relevant context at generation time. The difference matters because it determines every architectural decision downstream.

Model Context Protocol forces a different mental model. Instead of "agent decides what to do, then does it," MCP pushes toward "agent generates with all relevant context already available." This sounds subtle. It's not. It changes everything about how you structure agentic systems.

The Mental Model: Context Fabric vs. Control Flow

Stop thinking of agents as sequential reasoners that execute actions in a loop. Start thinking of them as context-aware generators operating over a dynamic information graph.

Traditional agent architectures are imperative: the model executes a sequence of operations. MCP architectures are declarative: the model generates from available context, and that context is dynamically assembled from distributed sources.

The key abstraction: MCP treats context as first-class infrastructure, not as something the agent manages.

In a ReAct loop, the agent maintains conversation history in a list, decides when to fetch more information, calls tools to get that information, and appends results to the history. The agent is responsible for context management. When context gets too large, the agent must decide what to drop. When context is missing, the agent must figure out how to get it. Every context decision is a model inference.

In MCP architectures, context assembly happens outside the model. The agent declares what context types it might need (databases, file systems, APIs). The MCP layer ensures relevant context is available at generation time. The model doesn't fetch context—it generates with context.

This inversion changes failure modes completely. In loops, failures are model decisions gone wrong: it called the wrong tool, used wrong parameters, or got stuck retrying. In MCP systems, failures are infrastructure problems: context wasn't available, sources timed out, or permissions were denied. The first is debugging conversation transcripts. The second is debugging distributed systems.

The second invariant: MCP separates control decisions from context decisions.

ReAct agents make both simultaneously. "I need information" (context decision) and "I'll call this tool with these parameters" (control decision) happen in the same inference. This coupling is expensive and fragile.

MCP architectures decouple them. Control flow is deterministic: given a user query and agent type, the system knows what context providers to activate. Context assembly is dynamic: those providers fetch current information. The model only makes content decisions: what to say given available context.

Think of it as moving from imperative programming to dataflow programming. Instead of "first do this, then do that, check if it worked, maybe do something else," you have "here are all the relevant data sources, assemble context from them, generate response."

Architecture: From Sequential Loops to Parallel Context Assembly

Traditional ReAct Agent Architecture

This is a state machine. The agent cycles through states: thinking, acting, observing. State transitions are model decisions. Every transition costs tokens and latency.

Component responsibilities in loop architecture:

The agent loop manages state transitions, conversation history, retry logic, and termination conditions. This is hundreds of lines of orchestration code.

The model makes every decision: thought generation, tool selection, parameter extraction, result interpretation, and completion detection. Five decision types, all requiring inference.

Tools are passively invoked. They don't know they're part of an agent system. They're just functions that get called.

State lives in conversation history, growing linearly with every thought-action-observation cycle. Context window pressure builds quickly.

MCP-Based Agent Architecture

This is parallel context assembly followed by generation. No loops. No model-driven state transitions. Deterministic routing and context gathering.

Component responsibilities in MCP architecture:

The routing layer determines which context providers are relevant. This is deterministic logic, not model inference. For a code question, activate filesystem and git providers. For a data question, activate database providers.

MCP servers actively provide context. They're not passive functions—they're context sources that advertise capabilities, handle requests, and manage their own state.

The context assembly layer queries MCP servers in parallel, aggregates responses, and constructs the context the model will see. This happens once per generation, not repeatedly in a loop.

The model makes one decision: what content to generate given assembled context. Not "should I call a tool" or "which tool should I call"—just "what's the answer."

State is distributed across MCP servers. The agent itself is nearly stateless. This changes scaling properties fundamentally.

Hybrid Architecture: LangGraph with MCP Context

This combines LangGraph's explicit control flow with MCP's context assembly. You get the benefits of both: deterministic planning for complex tasks, parallel context gathering for efficiency.

When this matters:

LangGraph alone: you're building state machines with complex transitions. Good for workflows that need explicit branching, error recovery, or multi-agent coordination.

MCP alone: you're optimizing for minimal latency and cost by eliminating control loops. Good for tasks where relevant context is predictable.

LangGraph + MCP: you need workflow control but want efficient context assembly within each step. Most production systems end up here.

Implementation: Building MCP-First Agent Systems

Pattern 1: Stateless Context-Aware Agents

The simplest MCP architecture eliminates the agent loop entirely.

code

from typing import List, Dict, Any, Optionalfrom dataclasses import dataclassimport asyncio@dataclassclass MCPContextProvider:    """    MCP server wrapper that provides domain-specific context.    """    name: str    server_url: str    capabilities: List[str]        async def get_context(        self,        query: str,        context_type: str    ) -> Dict[str, Any]:        """        Fetch relevant context for query.        This happens in parallel with other providers.        """        # In production: actual MCP protocol calls        # For now: interface definition        passclass ContextRouter:    """    Deterministic routing to MCP providers.    Zero model inference required.    """        def __init__(self):        self.providers: Dict[str, MCPContextProvider] = {}                # Route rules are explicit, not learned        self.route_rules = {            "code": ["filesystem", "git", "documentation"],            "data": ["database", "api", "cache"],            "research": ["web_search", "documentation", "knowledge_base"]        }        def route(self, query: str, query_type: str) -> List[MCPContextProvider]:        """        Determine which providers to activate.        This is rule-based, fast, and debuggable.        """        provider_names = self.route_rules.get(query_type, [])        return [            self.providers[name]             for name in provider_names             if name in self.providers        ]class MCPAgent:    """    Agent that generates from assembled MCP context.    No loops. No tool orchestration. Just context + generation.    """        def __init__(        self,        llm_client,        context_router: ContextRouter,        max_context_tokens: int = 8000    ):        self.llm = llm_client        self.router = context_router        self.max_context_tokens = max_context_tokens        async def generate_response(        self,        query: str,        query_type: str    ) -> Dict[str, Any]:        """        Single-pass generation with assembled context.        No loops, no retries, no model-driven control flow.        """        # Step 1: Route to providers (deterministic, fast)        providers = self.router.route(query, query_type)                # Step 2: Fetch context in parallel        context_tasks = [            provider.get_context(query, query_type)            for provider in providers        ]                contexts = await asyncio.gather(*context_tasks)                # Step 3: Assemble and truncate context        assembled_context = self._assemble_context(contexts)                # Step 4: Generate (single inference)        response = await self.llm.generate(            query=query,            context=assembled_context,            max_tokens=1000        )                return {            "response": response,            "context_sources": [p.name for p in providers],            "context_size": len(assembled_context),            "model_calls": 1  # This is the key metric        }        def _assemble_context(        self,        contexts: List[Dict[str, Any]]    ) -> str:        """        Combine contexts from multiple sources.        Apply token budget across sources.        """        # Priority ordering: some sources more important than others        prioritized = sorted(            contexts,            key=lambda c: c.get("priority", 0),            reverse=True        )                assembled = []        token_count = 0                for ctx in prioritized:            content = ctx.get("content", "")            tokens = self._estimate_tokens(content)                        if token_count + tokens <= self.max_context_tokens:                assembled.append(f"Source: {ctx['source']}\n{content}")                token_count += tokens            else:                # Truncate last source to fit budget                remaining = self.max_context_tokens - token_count                truncated = self._truncate_to_tokens(content, remaining)                assembled.append(f"Source: {ctx['source']}\n{truncated}")                break                return "\n\n---\n\n".join(assembled)

Production considerations:

This architecture has exactly one model call per response. That's the point. Every additional call in a traditional agent loop adds 200-2000ms of latency and proportional cost. Eliminating loops isn't optimization—it's a different architecture.

Context assembly happens in parallel. If you have three MCP providers and each takes 100ms, total context gathering is 100ms, not 300ms. This matters at scale.

Failures are infrastructure failures. If an MCP server is down, you get a partial context error, not a model that keeps retrying. This is easier to monitor and debug.

The trade-off: you can't handle complex multi-step tasks. For "analyze this dataset and send results via email," you need explicit orchestration. MCP-first agents work for single-step generation with rich context.

Pattern 2: LangGraph State Machines with MCP Context

For complex tasks requiring explicit control flow, combine LangGraph's state management with MCP's context assembly.

code

from langgraph.graph import StateGraph, ENDfrom typing import TypedDict, Listimport asyncioclass AgentState(TypedDict):    """    State object passed through LangGraph nodes.    """    query: str    plan: Optional[List[str]]    current_step: int    step_results: List[Dict[str, Any]]    final_response: Optional[str]    error: Optional[str]class MCPLangGraphAgent:    """    LangGraph state machine with MCP context at each node.    Control flow is explicit, context assembly is MCP-based.    """        def __init__(        self,        llm_client,        mcp_providers: Dict[str, MCPContextProvider]    ):        self.llm = llm_client        self.mcp_providers = mcp_providers        self.graph = self._build_graph()        def _build_graph(self) -> StateGraph:        """        Define explicit state transitions.        This is your control flow—make it deterministic.        """        workflow = StateGraph(AgentState)                # Nodes perform specific operations        workflow.add_node("analyze_query", self._analyze_query)        workflow.add_node("generate_plan", self._generate_plan)        workflow.add_node("execute_step", self._execute_step)        workflow.add_node("aggregate_results", self._aggregate_results)        workflow.add_node("handle_error", self._handle_error)                # Edges define transitions        workflow.set_entry_point("analyze_query")                workflow.add_conditional_edges(            "analyze_query",            self._should_plan,            {                "simple": "execute_step",                "complex": "generate_plan"            }        )                workflow.add_edge("generate_plan", "execute_step")                workflow.add_conditional_edges(            "execute_step",            self._check_completion,            {                "continue": "execute_step",                "done": "aggregate_results",                "error": "handle_error"            }        )                workflow.add_edge("aggregate_results", END)        workflow.add_edge("handle_error", END)                return workflow.compile()        async def _analyze_query(self, state: AgentState) -> AgentState:        """        Determine query complexity.        This is fast heuristics, not full planning.        """        query = state["query"]                # Simple rules for query classification        complexity = "simple" if len(query.split()) < 20 else "complex"                return {**state, "complexity": complexity}        async def _generate_plan(self, state: AgentState) -> AgentState:        """        Use MCP context to generate execution plan.        """        query = state["query"]                # Fetch planning context from MCP        planning_context = await self._fetch_mcp_context(            query,            ["documentation", "examples"]        )                # Generate plan (single LLM call)        plan_prompt = f"""Given this query: {query}Available context:{planning_context}Generate a step-by-step execution plan.Return as JSON: {{"steps": ["step 1", "step 2", ...]}}"""                plan_response = await self.llm.generate(plan_prompt)        plan = self._parse_plan(plan_response)                return {**state, "plan": plan, "current_step": 0}        async def _execute_step(self, state: AgentState) -> AgentState:        """        Execute current step with MCP context.        """        plan = state["plan"]        current_step = state["current_step"]                if current_step >= len(plan):            return state                step = plan[current_step]                # Fetch context relevant to this step        step_context = await self._fetch_mcp_context(            step,            self._determine_providers_for_step(step)        )                # Execute step (single LLM call)        step_result = await self.llm.generate(            query=step,            context=step_context        )                # Update state        step_results = state.get("step_results", [])        step_results.append({            "step": step,            "result": step_result,            "context_sources": list(step_context.keys())        })                return {            **state,            "current_step": current_step + 1,            "step_results": step_results        }        async def _fetch_mcp_context(        self,        query: str,        provider_names: List[str]    ) -> Dict[str, Any]:        """        Parallel context fetch from specified MCP providers.        """        providers = [            self.mcp_providers[name]            for name in provider_names            if name in self.mcp_providers        ]                tasks = [            provider.get_context(query, "general")            for provider in providers        ]                contexts = await asyncio.gather(*tasks, return_exceptions=True)                # Handle failures gracefully        result = {}        for provider, context in zip(providers, contexts):            if isinstance(context, Exception):                result[provider.name] = {"error": str(context)}            else:                result[provider.name] = context                return result        def _should_plan(self, state: AgentState) -> str:        """Routing logic"""        return state.get("complexity", "simple")        def _check_completion(self, state: AgentState) -> str:        """Check if all steps completed"""        if state.get("error"):            return "error"                plan = state.get("plan", [])        current_step = state.get("current_step", 0)                if current_step >= len(plan):            return "done"                return "continue"

Production considerations:

LangGraph gives you explicit state machines. Every transition is code you wrote, not a model decision. This is debuggable.

MCP context is fetched at each node, not accumulated in conversation history. This prevents context window overflow in long workflows.

Error handling is explicit. The graph has an error node. Failures route there deterministically, not based on model interpretation.

Cost is predictable. Count the nodes in your graph. Each node might call the model once. Maximum model calls = number of nodes × average retries. In ReAct loops, you never know.

The trade-off: more upfront design work. You must define the state machine. For well-understood workflows, this is better. For open-ended exploration, it's constraining.

Pattern 3: Hybrid Context Graphs

The pattern that works for most production systems: deterministic routing to MCP context, with optional LangGraph orchestration for complex cases.

code

class HybridAgent:    """    Default to MCP-first single-pass generation.    Fall back to LangGraph for complex queries.    """        def __init__(        self,        mcp_agent: MCPAgent,        langgraph_agent: MCPLangGraphAgent,        complexity_threshold: float = 0.7    ):        self.mcp_agent = mcp_agent        self.langgraph_agent = langgraph_agent        self.complexity_threshold = complexity_threshold        async def generate_response(        self,        query: str    ) -> Dict[str, Any]:        """        Route based on query complexity.        Simple queries get fast MCP path.        Complex queries get orchestrated LangGraph path.        """        # Fast complexity heuristic        complexity = self._estimate_complexity(query)                if complexity < self.complexity_threshold:            # Fast path: single-pass MCP generation            return await self.mcp_agent.generate_response(                query,                query_type=self._classify_query(query)            )        else:            # Complex path: LangGraph orchestration            state = await self.langgraph_agent.graph.ainvoke({                "query": query,                "plan": None,                "current_step": 0,                "step_results": [],                "final_response": None,                "error": None            })                        return {                "response": state["final_response"],                "steps_executed": len(state.get("step_results", [])),                "model_calls": len(state.get("step_results", [])) + 1            }        def _estimate_complexity(self, query: str) -> float:        """        Fast heuristics for complexity.                Simple queries:        - Single question        - Lookup operation        - Direct information request                Complex queries:        - Multiple steps implied        - Requires analysis + action        - Conditional logic ("if X then Y")        """        indicators = {            "multi_step": ["then", "after that", "next", "finally"],            "conditional": ["if", "unless", "when", "depending on"],            "analysis": ["analyze", "compare", "evaluate", "determine"]        }                query_lower = query.lower()        score = 0.0                for category, keywords in indicators.items():            if any(kw in query_lower for kw in keywords):                score += 0.3                # Length is a factor        word_count = len(query.split())        if word_count > 50:            score += 0.2                return min(score, 1.0)

This is the architecture I recommend for most production systems. Start with the fast path. Reserve orchestration for when you need it.

Pitfalls & Failure Modes

Over-Reliance on Context Assembly

Teams assume MCP context will contain everything needed. It won't.

Symptom: Agents generate correct responses for common queries but fail on edge cases because relevant context wasn't fetched.

Why it happens: Context routing rules are too narrow. You specified "database" provider for data queries, but didn't anticipate queries needing both database and API context.

Detection: Monitor response quality by query type. If certain query patterns consistently fail, context assembly is incomplete.

Prevention: Broad context routing initially. Narrow based on observed access patterns. Better to fetch unused context than miss critical information.

Context Window Overflow in Orchestrated Flows

LangGraph workflows accumulate context across steps, hitting token limits.

Symptom: Workflows fail midway with context length errors. Early steps succeed, later steps fail.

Why it happens: Each step adds to accumulated context. Step five sees results from steps 1-4 plus new MCP context, exceeding limits.

Detection: Track context size per step. Alert when approaching model limits (8k, 16k, etc.).

Prevention: Context summarization between steps. Each step receives summarized results from previous steps, not full transcripts.

MCP Provider Latency Cascades

Parallel context assembly is fast until one provider is slow.

Symptom: Median latency is good (200ms) but p99 is terrible (5s+). Users experience inconsistent response times.

Why it happens: Parallel context fetch waits for slowest provider. One slow database query blocks everything.

Detection: Per-provider latency monitoring. Identify which providers have high p99.

Prevention: Aggressive timeouts per provider (500ms max). Partial context generation is better than waiting forever. Mark incomplete context in response.

State Machine Explosion

LangGraph graphs become unmaintainable as edges multiply.

Symptom: Adding new features requires modifying dozens of transition conditions. Engineers afraid to change graph logic.

Why it happens: Every new workflow pattern adds nodes and edges. Graph grows organically without refactoring.

Detection: Node count over time. Edge complexity (number of conditional edges vs. simple edges).

Prevention: Hierarchical state machines. Sub-graphs for complex workflows. Standardized error handling patterns across all graphs.

False Simplicity From Eliminating Loops

Teams eliminate agent loops but don't handle cases that actually need iteration.

Symptom: Some queries require multiple rounds of context gathering but agent is single-pass. Users get incomplete responses and must ask follow-ups.

Why it happens: Overcommitment to "no loops" principle without considering task requirements.

Detection: Track multi-turn conversations. If users consistently need 3+ turns for tasks competitors handle in one, your agent is too simple.

Prevention: Hybrid architecture. Default to single-pass, but detect when iteration is needed and route to orchestrated flow.

Summary & Next Steps

MCP fundamentally changes agent architecture by treating context as infrastructure. Instead of agents orchestrating tool calls in loops, they generate from dynamically assembled context. This isn't just optimization—it's a different architectural pattern with different trade-offs.

The core insight: separating control flow from context assembly makes systems faster, cheaper, and more debuggable. You trade flexibility for predictability. In production, that's usually the right trade.

Three patterns dominate production MCP systems. Pure MCP agents for simple, predictable tasks where single-pass generation with rich context is sufficient. LangGraph orchestration for complex workflows requiring explicit state management and error handling. Hybrid architectures that route based on query complexity, defaulting to fast paths and escalating when needed.

Start here:

This week: Build a simple MCP-first agent. Pick one use case (code questions, data queries, document search). Implement deterministic routing to 2-3 MCP providers. Measure: how many queries can you handle in a single pass? What's your median latency?

Next sprint: Add complexity routing. Implement fast heuristics to detect complex queries. Route them to a simple LangGraph workflow with 3-4 nodes. Measure: what percentage of queries need orchestration? How does cost scale?

Within month: Instrument everything. Per-provider latency, context size distributions, model call counts, quality metrics. Build dashboards showing fast path vs. orchestrated path usage. Optimize based on data, not intuition.

The goal isn't eliminating agent loops entirely. It's understanding when loops are necessary and when they're architectural laziness. Most tasks don't need orchestration—they need better context. MCP gives you infrastructure for that.

Build for the common case: fast, single-pass generation with assembled context. Add orchestration only when measurements prove it's needed. This is how you ship agents that work in production, not just demos.