In Part 1 I opened with a cooling agent that did everything right and still killed a nine-day training job. It read the aggregate inlet temperatures, saw thermal headroom, saw an expensive energy hour, and backed off the chillers. It was correct about the headroom and correct about the price. It did not know a CRAH (computer room air handler) unit one row over was already degraded. Forty minutes later a rack hit thermal shutdown.
That failure was not a strawman. It is the precise shape of what goes wrong when you take the single most successful agentic use case in the building and deploy it one notch past the authority it has earned. Cooling is where agentic AI in data centers has its longest track record, its hardest numbers, and its clearest production lineage. Which is exactly why it is the use case people draw the wrong lesson from.
This is Part 2 of the series. Part 1 laid out the Operational Authority Gradient (OAG): the four-band model - Observe, Advise, Act-within-bounds, Closed-loop - where an agent's authority is assigned by the irreversibility and blast radius of its actions, not by how good its model is. Cooling sits at Act-within-bounds. This article is about what that band actually demands in production, and why the most-cited success story in the field is usually quoted for the wrong number.
The number everyone quotes, and the one that matters
Here is the canonical story, because you have heard half of it. In 2016, DeepMind and Google applied machine learning to cooling in Google's already heavily optimised data centres. The system pulled a snapshot from thousands of sensors every five minutes and fed it into deep neural networks that predicted how candidate cooling actions would affect future energy use. The result became the most-quoted statistic in data center AI: a 40 percent reduction in cooling energy, around 15 percent off total PUE overhead, the lowest PUE the sites had ever seen.
Then almost everyone stops reading.
The part that matters comes two years later. In 2018, DeepMind took the system off the recommendation leash and let it control cooling directly. And when they did, the savings dropped. The autonomous system delivered around 30 percent on average, not 40. That is not a regression and it is not a disappointment. It is the single most instructive design decision in the entire history of agentic data center operations, and the team said exactly why in plain language: they purposefully constrained the system's optimisation boundaries to a narrower operating regime to prioritise safety and reliability, and acknowledged outright that this was a risk/reward trade off in energy terms.
Read that again with the OAG in mind. The move from 2016 to 2018 is a move up the authority gradient - from Advise (operators vet and implement each recommendation) to Act-within-bounds (the AI implements directly, under supervision). And the price of that promotion was not paid in model quality. The model got better, not worse, with more data. The price was paid in deliberately surrendered headroom. They drew a tighter envelope around what the agent was allowed to do, and the ten points of lost savings is the literal, measurable cost of the envelope.
That ten-point gap is the most honest thing anyone has ever published about this use case. The industry quotes the 40 because it sells autonomy. The 30 is the number that tells you what autonomy actually costs when the consequences are physical. Anyone selling you a cooling agent on the strength of the 40 is quoting the recommendation system - the band where a human was still catching mistakes - and pretending it describes the autonomous one.
Why cooling sits at Act-within-bounds and not higher
It is worth being precise about why cooling cannot just be promoted to Closed-loop, because the reasons are physical and they are the template for reasoning about every other use case in the series.
Thermal inertia means actions are slow to undo. When the agent changes a setpoint, the building does not respond instantly. Heat is already in the air, in the water loop, in the silicon. By the time a wrong decision shows up in the telemetry, the system has been moving in the wrong direction for minutes. Unlike a bad database query you can kill, you cannot un-heat a rack. The verify step in the agentic loop is laggy, and a laggy verify step is the strongest possible argument against full closed-loop autonomy.
The safety floor is hard and non-negotiable. Silicon has a junction temperature above which it throttles, then shuts down, then degrades permanently. There is no clever optimisation that makes crossing that line acceptable. This is what an envelope is for: not to make the agent smarter, but to make a whole region of the action space physically unreachable regardless of what the agent concludes.
The coupling is brutal, and it is getting worse. This was the cause of the Part 1 failure - the agent optimised one variable while blind to a degraded unit. And the trend is making coupling harder, not easier. AI racks have broken the thermal assumptions cooling control was built on. A rack of H100s can pull 60 to 80 kW of heat load, an order of magnitude past what air cooling was designed for. Liquid-cooled capacity equalled air-cooled capacity in 2025 and is expected to double it through 2026. Direct liquid cooling is now the default specification for new hyperscale builds, not an exotic option. Every one of those liquid loops - supply temperature, flow rate, valve actuation at the cabinet, cooling tower setpoints - is another coupled control variable the agent has to reason over without breaking a constraint somewhere else in the loop.
Put those three together and the band assignment writes itself. Slow-to-undo actions plus a hard safety floor plus dense coupling equals: the agent may act, but only inside an envelope it cannot exceed, with a fail-safe that does not trust its judgement. That is the definition of Act-within-bounds.
The envelope is the engineering. The agent is the easy part.
Here is the inversion that practitioners need and vendors avoid. The hard, valuable, defensible engineering in a cooling agent is not the model that picks setpoints. Good setpoint optimisation is a solved-enough problem - reinforcement learning has been shown to beat conventional chiller control by 9 to 13 percent in multi-chiller scenarios, and the techniques are well documented. The model is the commodity. The envelope is the moat.
An envelope is the set of hard constraints, enforced outside the agent, that bound every action the agent can take. For cooling it includes at minimum: absolute setpoint floors and ceilings the agent cannot cross; rate limits on how fast it can change anything, so it cannot swing the system faster than the verify step can catch a mistake; and - the lesson of the opening failure - a live model of equipment health, so that degraded capacity shrinks the envelope automatically.
That last point is the one almost everyone gets wrong, so it is worth stating as a rule. The degraded-equipment check must feed the envelope, not the agent. If you hand the agent the information that a CRAH is degraded and trust it to reason about it, you are back to relying on the model's judgement about its own safety - which is precisely the thing the OAG says you must never do. Instead, degraded capacity should mechanically tighten the bounds the agent operates within, in the control plane, before the agent ever proposes an action. The agent does not need to know the unit is sick. It needs to find that the part of the action space that would have exploited the missing capacity is simply not available to it.
This is epistemic restraint by design, made concrete. The framework I have been developing across this work says systems should be built to respect the edge of their own knowledge. A cooling envelope is the physical instantiation of that idea: the agent's confidence is irrelevant, because the boundary is enforced by something that does not care how sure the agent is.
# The shape that matters: the envelope is computed independently of the agent,# and it is the control plane - not the agent - that clamps the action.def safe_cooling_action(agent_proposal: SetpointDelta, telemetry: Telemetry, equipment: EquipmentHealth) -> SetpointDelta: # 1. Envelope is derived from physics + live equipment health, # NOT from anything the agent said. envelope = Envelope( floor=JUNCTION_TEMP_FLOOR, # hard physical limit ceiling=EFFICIENCY_CEILING, max_rate=rate_limit_for(telemetry), # cannot move faster than verify ) # Degraded capacity shrinks the envelope. The agent is never told why. envelope = envelope.shrink_for(equipment.degraded_units) # 2. The agent proposed freely. The control plane clamps. safe = envelope.clamp(agent_proposal) # 3. If the clamp had to intervene hard, that is a signal worth logging - # the agent is repeatedly pushing the boundary it cannot see. if envelope.clamp_was_binding(agent_proposal, safe): emit_boundary_pressure_event(agent_proposal, safe, equipment) return safeNotice what this code does not do. It does not ask the agent to confirm the action is safe. It does not pass equipment health into the agent's context and hope. It computes the safe region from the world, lets the agent propose whatever it wants inside or outside that region, and clamps. The agent is a proposal engine. The envelope is the authority.
The 2026 frontier: multi-agent, multi-objective, and interpretable
The current research frontier is worth knowing because it sharpens, rather than softens, the envelope argument. The most interesting recent work is LC-Opt, a benchmark from Hewlett Packard Enterprise and Oak Ridge National Laboratory built on a high-fidelity digital twin of the Frontier supercomputer's liquid cooling system. It models the whole chain, from site-level cooling towers down to cabinets and individual server blade groups, and turns it into a multi-objective reinforcement learning problem: agents control supply temperature, flow rate, granular valve actuation at the cabinet, and tower setpoints, all at once, under shifting workloads.
Three things about LC-Opt matter for this article.
First, it is explicitly a digital twin. This is the connecting layer from Part 1 doing its job - the agents reason over a continuously modelled representation of the physical plant, not a keyhole view of aggregate temperatures. A twin that models per-cabinet behaviour is structurally what would have prevented the opening failure, because the degraded unit would have been in the model.
Second, it is multi-agent and multi-objective, balancing local thermal regulation against global energy efficiency. That coupling - the tension between keeping one cabinet safe and keeping the whole site efficient - is the real problem, and it is exactly the kind of tension where an unconstrained optimiser will happily sacrifice a local safety margin for a global efficiency gain. Which is to say: the more capable and more global the optimiser, the more you need the local envelope to be inviolable.
Third, and most telling, the work distills the learned policies into decision and regression trees for interpretable control, and uses LLM-based agents to explain control actions in natural language. That is not a nicety. An action you cannot interpret is an action you cannot bound with confidence, and an envelope you cannot explain to an operator is an envelope no operator will trust enough to hand over real authority. Interpretability is not adjacent to the authority question. It is load-bearing for it.
What this means if you are building one
Strip it to the decisions you actually have to make.
Do not lead with the model. The setpoint optimiser is the part most likely to already exist as a library or a paper. Your differentiated, dangerous, valuable work is the envelope: the floors, the rate limits, and the live coupling between equipment health and available action space.
Enforce the boundary outside the agent, always. The moment the only thing preventing an unsafe action is the agent's own assessment that the action is safe, you have left Act-within-bounds and you are running a closed-loop system that merely feels supervised. The clamp lives in the control plane. The agent never gets a vote on its own limits.
Treat the degraded-equipment case as the design centre, not an edge case. The opening failure is not rare. Equipment degrades constantly. A cooling agent that is only safe when every unit is healthy is a cooling agent that is unsafe in normal operation. Wire equipment health into the envelope from day one.
Instrument boundary pressure. When the clamp is repeatedly binding - when the agent keeps proposing actions the envelope has to cut back - that is one of the most valuable signals you have. It means either the envelope is too tight and is costing you real savings, or the agent is consistently trying to exploit capacity that is not safely there. Both are things you must know. Log every binding clamp.
Quote the right number. If you are reporting results, report the autonomous, enveloped number, not the recommendation-mode number. DeepMind's honesty about the 30 is the standard. The 40 describes a system with a human in the loop, and presenting it as the autonomous result is, at best, quoting the wrong band.
Where this lands
Cooling earns its place at the front of this series by being the use case with the most production evidence behind it. But the lesson that evidence actually teaches is not "look how autonomous we can be." It is the opposite. The defining moment in the field's flagship example was a team voluntarily giving up a third of their headline savings to draw a tighter boundary, and saying so out loud.
That is what maturity looks like on the Operational Authority Gradient. Not the model that optimises hardest, but the envelope that holds when the model is wrong - and the discipline to quote the number that includes the cost of the wall.
Part 3 takes this into the use case where the bands are most unequal and most misrepresented: agentic SRE and the marketing of self-healing infrastructure, where detection and diagnosis are genuinely production-ready and autonomous remediation mostly is not.
References
The DeepMind cooling lineage
- Rich Evans and Jim Gao, "DeepMind AI Reduces Google Data Centre Cooling Bill by 40%" - Google DeepMind (2016). Source for the 40 percent figure, the five-minute sensor snapshot, and the recommendation-system architecture.
- "Safety-first AI for autonomous data centre cooling and industrial control" - Google DeepMind (2018). Source for the move to direct control, the deliberately narrowed optimisation boundaries, and the ~30 percent autonomous figure.
- "Google just gave control over data center cooling to an AI" - MIT Technology Review (2018). Source for the recommendation-to-autonomy transition.
Cooling control and reinforcement learning
- "AI-driven data centers' cooling systems" - Araner. Source for the 9-13 percent RL chiller-optimisation figure (Luo et al., 2022).
- Avisek Naug, Antonio Guillen, et al., "LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers" - HPE and Oak Ridge National Laboratory, NeurIPS 2025. Source for the Frontier digital twin, multi-objective multi-agent RL, and policy distillation into interpretable trees with LLM explanation.
AI rack density and liquid cooling
- "Why You Need Liquid Cooling for AI Performance at Scale" - CoreWeave. Source for rack-scale density and the liquid-cooling necessity for AI.
- "AI Demand Is Forcing a Rethink of Data Center Power, Cooling" - TechRepublic / Data Center World (Apr 2026). Source for liquid cooling equalling then doubling air-cooling capacity across 2025-2026.
- "Data Center Cooling in 2026: Technology Options, Site Constraints, and the AI Advantage" - Build (Apr 2026). Source for the 60-80 kW H100 rack heat load and DLC as default hyperscale specification.
Related Articles
- Agentic AI in the Data Center Boom: A Unified Map of Where Agents Actually Run the Building
- Claude Code Guide: Build Agentic Workflows with Commands, MCP, and Subagents
- 5 Principles for Building Production-Grade Agentic AI Systems